Data to Mine: Process

From Maisqual Private Wiki

Jump to: navigation, search

Take a look at the Base Measures category.


Change Management

Change Management is the task of tracking and controlling change requests in the software.

Change requests can be:

  • Bug reports reflect the faults found in the software.
  • Enhancements are new feature requests.
  • Questions may be asked and followed for knowledge exchange.


  • It is the main information of the number of faults in software.


  • If a software is never used, few bug reports may appear.. Which does not mean that the software is bug-free.
  • If a team provides a easy-to-use mean to report bugs, these could be numerous, which is good for feedback and general quality and does not mean that the software is especially buggy.

CCM Base Measures have been put in the BUG Base Measures page.

Data to Mine

The data that have an interest for our purpose are:

  • Creator of the CR
  • Component / Subsystem
  • Summary
  • Product version
  • Fixed in version
  • Priority
  • Severity
  • Create time (Tcr)
  • Time of passage to Under Analysis (Tua)
  • Time of passage to Working (Tw)
  • Time of passage to Closed (Tcl)

These would allow us to compute:

  • Time to resolve (time of passage to closed minus creation time = Tcl - Tcr)
  • Time to assign (time of passage to analysis minus time creation time = Tua - Tcr)
  • Time to analyse (time of passage to working minus time of passage to analysis = Tw - Tua)
  • Time to work (time of passage to closed minus time of passage to working = Tcl - Tw)
  • Number of bugs opened for each version

We assume here that the classic workflow for change requests is as follow:

  1. Opened: the CR has been identified.
  2. Under Analysis: someone tries to investigate where does it comes from.
  3. Working: someone is working on a fix.
  4. Closed: the CR is resolved.

Note there is no Validate step here, since it is of no use for our purpose.

These steps can be recognised in all tools; if a setup provides a more complete lifecycle, we should find a mapping for all steps to these bare minimal steps.

Bug Tracking tools

Data Analysis

Algorithms exist to:

  • Find duplicates, and help people reducing the number of non-relevant bug reports.
  • Get statistics on the lifecycle: time to analyse, time to close, etc.

Tools that can be used to analyze Bug reports are:

Configuration Management

Software Configuration Management (SCM) is the task of tracking and controlling artefact changes in the software, and to be able to reproduce any past configuration of the software at any time.

Base measures related to CCM have been put in CCM Base Measures.

Data to Mine

The following information present some interest for us:

  • Commits, with:
    • Revision number,
    • Date of commit,
    • User who made it,
    • Comment, if available,
    • Files impacted by this commit ,if available.

As for Subversion, files impacted by the commit can be retrieved through the following command:

boris@borispc ~ $ svn log -r 2100 -v svn+ssh://
r2100 | pskali | 2010-03-31 17:33:49 +0200 (mer. 31 mars 2010) | 2 lignes
Chemins modifiés :
   M /trunk/SQuOREServer/view/conf/tools/SQuORE/Analyzer/

FI_FUNC was wrong


Configuration Management tools

Data Analysis

Build and Release Management

Release Management refers to management of the release cycle within a software project, which itself is when the software engineers provide a uniquely identified set of files for others to use.

Base measures for these two areas have been put in BUILD Base Measures and REL Base Measures.


The following tools help managing release cycle:

  • Maven
  • Ant (in a lesser measure)

Continuous Integration is also included in the release management information:

  • CruiseControl
  • Jenkins
  • Hudson
  • TeamCity
  • Bamboo


Communication history gives insights on what happened at a given time in the history of the project, design decisions and coding enquiries.

Communication includes:

  • Mailing Lists,
  • Forums,
  • News Groups

Base measures for these areas have been put in MAIL Base Measures, WEBSITE Base Measures.

Data to Mine

The information we might gather from communication means is the following:

  • The nature of the different communication means: how many mailing lists, are there forums, newsgroups, etc.
  • The project activity during the project's lifetime (or pure volume of communications between developers).
  • Usage of the software (or pure volume of communications between users: support, etc.).
  • Identify developers concerns (architecture change, refactoring, etc.).
  • Identify frequency of some subjects.

Mailing list tools

  • Mailman

Data Analysis

Personal tools