From Maisqual Wiki
The Maisqual project delivered some data sets for researchers in software engineering. The intent is to foster academic software research, by providing pair-reviewed and simple-to-use CSV files for a few well-known software projects. The metrics extracted for these three purposes are listed in Maisqual Metrics and the rules checked are described in Maisqual Rules.
Data sets come in three flavours:
- Evolution data sets show a weekly snapshot of the project repository over a long period of time – up to 12 years, in the case of Ant. The time interval between extracts is constant: the source has been extracted every Monday over the defined period of time. Characteristics include common and differential code metrics, communication measures, and configuration management measures.
- Release data sets show analyses of the source releases of the software product, as published by the project. They differ from the evolution data sets in their nature (the source release does not necessarily show the same structure and contents as the project repository) and in time structure (releases show no time structure: a new release may happen only a few day after the previous one – e.g. in the case of blocking issues, or months after).
- Version data sets present information on a single release or extract of a product. They include only static information: all time-dependent attributes are discarded, but they still provide valuable insights into the intrinsic measures and detected practices.
The full list of selected projects is provided in the Maisqual Projects page.
Other similar software data sets include:
- COMETS data sets: http://java.llp.dcc.ufmg.br/sqj2013
- Helix data set: http://www.ict.swin.edu.au/research/projects/helix/download.html
- Promise data set: http://promisedata.googlecode.com/
- Marco d'Ambros data set for bugs: http://bug.inf.usi.ch/
There is also a list of open research data sets at http://www.cc.uah.es/drg/c/RHH_RAISE12_Repos.html
- ↑ Couto, C., Maffort, C., Garcia, R., & Valente, M. T. (2013). COMETS: A Dataset for Empirical Research on Software Evolution using Source Code Metrics and Time Series Analysis. ACM SIGSOFT Software Engineering Notes, 38(1), 1–3.
- ↑ Marco D’Ambros, Michele Lanza, R. R. (n.d.). An Extensive Comparison of Bug Prediction Approaches. In Proceedings of MSR 2010 (7th IEEE Working Conference on Mining Software Repositories) (p. 2010).