The Ant data set is an extract of some common software metrics gathered every monday from the subversion repository all along the life of the Apache Ant software, from its beginning (2000-01-14) and during 12 years (ends on 2012-07-30).

The data set is available as the following downloads:

  • A single compressed file [ metrics_ant_v1.0.tar.xz ] (size: 24M) with all data sets (application, file, function).
  • The application level data set [ metrics_ant_app_v1.0.gz ] (size: 24K) which features 16 variables and 652 records. Each line represents a version of Ant.
  • The file level data set [ metrics_ant_files_v1.0.gz ] (size: 20M) which features 14 variables and 680 835 records. Each line represents a Java file.
  • The function level data set [ metrics_ant_functions_v1.0.gz ] (size: 103M) which features 13 variables and 7 113 059 records. Each line represents a Java function.

These can be imported in R with the following command:

> project_app <- read.csv("metrics_ant_app_v1.0.csv", sep="!")
> names(project_app)
 [1] "Application"       "Version"           "BLAN"             
 [4] "CFT"               "CLOC"              "CLAS"             
 [7] "COMR"              "ELOC"              "FILE"             
[10] "FUNC"              "LC"                "SCM_FIXES"        
[16] "SLOC"              "STAT"              "VG"               

This has been submitted to the data track of the Mining Software Repositories 2013 conference, in San Francisco, and will be made available as a R package soon.

