problem report data

As part of our analysis of the language, content, and structure of problem report summaries, we gathered a corpus of data from various open source projects on or around January 18th, 2006. You can download these for your own analyses:

the problem report corpus and analysis tool

We've also provided a simple querying tool with the data, written in Java, that we used to analyze various parts of the data. It's not very well documented, but most of its features should be self-explanatory.

These are the projects we studied:

The data is structured as comma-separated values, where each value is wrapped with "'s. The first line is a comma separated list of names for each column.

For example, here's the first two lines of the Linux kernel file:

The columns are:

To run the analysis tool, run the jar file included in the download as follows:

java -jar Analyze file1.csv file2.csv ...

If you have any questions, e-mail Andy.


Copyright © 1996-2020 - Carnegie Mellon University - All Rights Reserved.