Talk:Data To Mine

From Maisqual Private Wiki

Jump to: navigation, search


[edit] Intent

We have established the following list of types of data that can be used for data mining in Software Engineering.

Data Mining on software repositories techniques are used for two purposes:

  • Assessing the product and process quality through metrics. This is mainly achieved on:
    • The source code for the product quality.
    • The tools for the process quality.
  • Identifying practices. This is mainly achieved through
    • Mining patterns in the project's history (defects, commits, mailing lists records..).
    • Mining patterns in the source code (when tests were setup, refactoring, etc.).
    • Surveys, that often bring information that would difficult, if not impossible, to mine in the repositories.

[edit] Product information

Product information includes:

  • Source code
  • Dynamic execution traces
  • Tests
  • Documentation

All these have been gathered in Data to Mine: Product.

[edit] Process information

Process information includes:

  • Configuration Management
  • Change Management
  • Release Management
  • Mailing Lists, Forums & Communication

All these have been gathered in Data to Mine: Process.

Check also the Project Release Survey for a template of a checklist to apply on every project release.

[edit] User satisfaction

User satisfaction, which is one of the main quality criteria, is purposely put apart the product and process information, since it may come from both.

Community web sites holding surveys:

Other means:

  • popularity contests
  • number of (pertinent) results in search engines
Personal tools