Timeline:2013.02

From Maisqual Private Wiki

Jump to: navigation, search

This is the checkpoint for the meeting on the 14.02 in Lille with Philippe.


Contents


[edit] State of art

[edit] Data mining

We identified data mining techniques that could be used on software engineering data. The following methods have been either listed in the data_mining[1] document:

  • Outlier detection:
    • Boxplot (works quite well)
    • LOF (not working yet)
  • Ecological inference
  • Regression analysis
    • Pairwise linear regression for all metrics
    • Correlation matrix
  • Principal Component analysis
  • Clustering
    • Hierarchical
    • K-means
    • Dbscan (not working yet)
  • Supervised classification
    • LDA (not written yet)
    • tree (not written yet)
  • Time series
    • Full/Partial autocorrelation
    • Templates for specific types of projects, and different metrics.
  • Survival analysis

Some promising methods have not yet been investigated, and should be added to the analysis documents for rough testing:

  • Rules inference (à la Weka JRip), but this applies mainly on nominal variables, which we do not have for now -- only numeric metrics have been gathered. Rules inference may be used to identify relationships between variables, e.g. for the process-related metrics. This may be especially useful to link quality attributes to practices.
  • Supervised classification holds great promises as well: it may allow to identify error-prone components and generally help validate or propose estimation models (reliability, effort or size models).

[edit] Metrics

A document has been started to list all metrics we may come across: Media:metrics.pdf.

[edit] Software Quality

A document has been started as a survey on software quality: Media:quality_models.pdf. It lists definitions of quality, quality models, standards..


[edit] Analysis documents

[edit] Analyse project version

This document tries to apply various data mining and statistics methods to software engineering data. More focused works will probably start from this and expand on a specific subject. It is is available on the Maisqual Jenkins at [2].

It was written using Sweave, but has been moved to knitr[3] recently. Knitr allows among other things caching of R chunks, which is a really nice feature when a single run takes more than a five minutes (or less if you are debugging).

The document now contains two sections for files and functions. The input files (files and functions) are composed of one version of a program. Two distinct csv files are used as input. For now the following sections have been written:

  • Files analysis:
    • Exploratory analysis: means, mins, maxes, variance. NA's and constant values are removed.
    • Distribution of variables: plotting and qq-plots against norm.
    • Outliers detection: boxplots, univariate and multivariate LOF.
    • Regression analysis: 1st order linear regression analysis is applied on all metrics (pairwise), and 1st, 2nd, 3rd orders models regressions are applied on some variables. Correlation matrix is computed as well.
    • PCA
    • Unsupervised classification: clustering or artefacts or metrics with Hierarchical clustering, k-means and dbscan.
  • Functions analysis:
    • Exploratory analysis: means, mins, maxes, variance. NA's and constant values are removed.
    • Regression analysis: 1st order linear regression analysis is applied on all metrics (pairwise), and 1st, 2nd, 3rd orders models regressions are applied on some variables.
    • PCA
    • Unsupervised classification: clustering or artefacts or metrics with Hierarchical clustering, k-means and dbscan.

Many advances have been made on the files analysis part, which will be "backported" to functions analysis.


[edit] Analyse project evolution

This document takes as input a csv file with many versions of a program. Some of the techniques investigated in the analyse project version document are applied on the whole dataset, and some specific techniques more targeted at the project evolution are added: time series analysis, clustering, and outliers. It is available at [4]

There are actually 3 documents [5] for application level, file level and function level analyses:

  • The application level document is the most advanced for now, and is the one being setup on the Maisqual CI engine.
  • The file level and function level documents are under development and will benefit from the application level techniques, although there is still a huge work to be done on these.


[edit] Jenkins

A working continuous build has been setup on the maisqual server. Different versions of Ant, Gcc, Subversion and Eclipse Papyrus have been extracted and analysed with the analyse project version document, and the generated pdfs are publicly available.

The evolution analysis document is also being prepared on the same continuous integration engine, but doesn't work as for now. Its runs take days, and the debugging and finalising will still take some time.


[edit] Actions

[edit] For now

During the meeting, the following things need to be done:

  • Gather articles and papers that are unavailable.
  • Talk about OSEO, its 2-years financing and the objectives we stated at that time.
  • Talk about the objectives of this work. The objectives written in the OSEO document are no longer relevant and we should clearly state now new objectives, considering the advances we have made.
  • List and answers todo's in Rnw files.

==> add todo: what are the limits or assumtions made for the techniques used? e.g. distribution, etc.

[edit] For next weeks

  • Complete data_mining document, backport what has been done in dataset_ant, analyse_project_version, and analyse_project_evolution. Get to know how we may use it.
  • Complete metrics document, backport what has been done in dataset_ant. Get to know how we may use it.
  • Complete software_quality document, backport what has been done in dataset_ant, analyse_project_version, and analyse_project_evolution. Get to know how we may use it.


[edit] References

  1. Check Media:quality_models.pdf.
  2. Analyse project version: http://ns228394.ovh.net:8080/job/Maisqual_Version/ws/analyse_project_version
  3. http://yihui.name/knitr
  4. Analyse project evolution: http://ns228394.ovh.net:8080/job/Maisqual_Evolution/ws/analyse_project_evolution
  5. Once upon a time this has been a single document, which has been split for better efficiency when developping and debugging it.
Personal tools