From Maisqual Private Wiki
This is the monthly checkpoint for summarising work that have been accomplished recently. Please feel free to comment.
04/05/2011 Ant plugin update 05/05/2011 CruiseControl Setup 06/05/2011 CruiseControl Setup + RDV Christophe These 11/05/2011 Update Wiki + Organisation 12/05/2011 Update Wiki + Organisation + SQuORE CC XSL 13/05/2011 Update Wiki + Organisation 18/05/2011 Update Wiki : added metrics, papers categories, did some presentation stuff 19/05/2011 Readings + svn_tools 20/05/2011 Readings + svn_tools 24/05/2011 School inscription + Readings 25/05/2011 Discussed tools for Maisqual + School inscription 26/05/2011 Installed Squore for metrics + Getting measures + Update Wiki 27/05/2011 Requirements for SVN DataProdivers + Update wiki 31/05/2011 Requirements for API + Doc Introduction to Maisqual
The public wiki has the following categories defined:
A private wiki has also been set up, for all non-public matters: organisation, on-going research work, milestones. It is available, with authentication, at maisqual.squoring.com/privatewiki.
We have found many papers about "Data Mining Software Engineering Data".
The following articles have been read and summarised:
- maisqual:Data_Mining_for_Software_Engineering, Tao Xie, Suresh Thummalapenta
- maisqual:A_statistical_Examination_of_the_Evolution_and_Properties_of_Libre_Software, Israel Herraiz
- maisqual:Software Intelligence: The Future of Mining Software Engineering Data, Ahmed E. Hassan, Tao Xie.
Other papers are following soon.
 Tools and Projects
We have started to set up a process for metrics gathering on some aspects of software repositories: SVN repositories, source code (SQuORE). The projects selected so far are:
- SQuORE trunk HEAD
- The Linux Kernel,
- Apache Ant.
 Research, data mining, first steps
The readings and discussions we had during this month have led us to the following:
- We now know better what data to investigate for data mining, and what types of algorithms can be used for that. This has been summarised in the Data_To_Mine page.
- The metrics we will start with are the most basic known: SLOC for size, McCabe/Halstead for complexity. Many studies show that these metrics are often the best (and most simple) bet.
- Some tests have been made for retrieving informations/metrics from remote repositories (SVN). This has been put on the private wiki CustomTools:SVN_Analysis for records.
- Some developments are required from SQuORING, namely:
- A new Data Provider working on remote repositories instead of local directories. For now, the very same metrics that are computed by Squore will be enough. Specifications have been written for that Data Provider.
- A new API to export Data. These are now stored in database, but are not easy to extract to a parseable output. Specifications have been written for that API.
 Next Steps
 Better defining the Thesis goals
The main goal of this project is about improvement of quality of processes and products.
- For processes, this is achieved through:
- Best practices investigation, e.g. "Peer Review is good for the reliability and we can prove it".
- Good advice, e.g. "The shortest way from this state of quality to the next state I want is to refactor this module" or "Set up a continuous integration framework for better build stability".
- Help for estimating work, e.g. this bug may take approx. 2 men/day to resolve, next iteration backlog should be completed in 2 weeks.
- Help for management decisions, e.g. who this bug should be affected to, the target release won't be reached because of the number of bugs, etc.
- For products, this is achieved through:
- Good advice, e.g. add comments to this module for readability,
- Proposing patterns for specific purposes: e.g. refactor this module with the Singleton design pattern,
- Bug finding technics -- this is mainly for the reliability characteristic of quality, but it can helped a lot by data mining.
For both of these, the measured result should be the products quality: a good process leads to good products.
Metrics are probably the way to go for the quality assessment, and this means we have to know "What metrics is relevant for what quality characteristic (and we can prove it)".
 What's next
Here are the items to be addressed in the near future:
- Follow the data mining course of Philippe Preux.
- Read more articles and papers, write summaries.
- Collect data, and try to visualise them in R.