Software Intelligence: The Future of Mining Software Engineering Data

From Maisqual Wiki

Jump to: navigation, search

Ahmed E. Hassan, School of Computing, Queen's University, Kingston, ON, Canada

Tao Xie, Department of Computer Science, North Carolina State University, Raleigh, NC, USA.



  1. Abstract
  2. Introduction
  3. State of Practice
  4. State of Research
  5. Enabling Software Intelligence
    1. SI Throughout the Lifecycle of a project
    2. SI Using non-historical Repositories
    3. SI Use of Effective Mining Techniques
    4. SI Adoption in Practice
  6. Discussion and Conclusion


  author = {Ahmed E. Hassan and Tao Xie},
  title = {Software Intelligence: Future of Mining Software Engineering Data},
  booktitle = {Proc. FSE/SDP Workshop on the Future of Software Engineering Research (FoSER 2010)},
  month = {November},
  year = {2010}, 
  location = {Santa Fe, NM},
  pages = {161--166},
  url = {},

Download File

File:Software intelligence future of mining se data.pdf


This paper intends to apply SI to the Software Lifecycle, as is done for the BI decision-making process: take decisions based on fact and clear vision of the software development.

SI is to replace feeling and intuitions for decision-making by giving a fact-based vision and undestanding of the process.

State of Practice

Many software practitioners rely on their experience, intuition and gut feeling in making everyday and important decisions.

Documentation is not so often accurate or available, e.g. knowledge may be put in wikis whereas decisions are made from spreadsheets and slides.

State of Research

Examples of software repositories:

  • Historical repositories: SCM, Bugs and/or Communications show the evolution and progress of a project.
  • Run-time repositories (e.g. deployment logs) show execution and usage of a software system.
  • Code repositories (e.g., google code,, for their code access.

All these give different insights into the software system, e.g. dependency between "write data to file" and "read data from the file" codes: the dependency is only shown in execution logs.

Software repositories are used for record-keeping, but not that much for decision-making; e.g. use time-to-resolve on that module to predict new bugs resolution estimations. Switching from passive record-tracking to active data-mining helps e.g. propagate complex changes, or warn about fragile/risky code, changes or bugs.

Enabling Software Intelligence

SI throughout the lifecycle of a project

MSR conf analysis shows that 80% of published papers focus on source code and bug-related repositories. Documentation repositories (e.g. requirements) are rarely used.

This benefits developers, but managers, support teams, testers and deployers also need SI. Si is more than just helping with coding.

SI Using Non-historical Repositories

MSR strongly relies on historical repositories. "In our view, MSR and Mining Software Engineering Data are synonyms: MSR is about mining any type of software engineering data."

Future directions should widens the scope of Mining Software Engineering Data / Mining Software Repositories, e.g. to the developers interactions with IDE. Also, privacy should be looked thoroughly, considering that more and more relatively private data are available for analysis and mining.

Data repositories should think/enable easy collection of data.

SI should leverage all types of repositories, not just historical ones.

SI use of Effective Mining Techniques

MSR exploits basic off-the-shelf data mining algorithms, e.g. association rules and frequent timesets.

As future directions, one may consider:

  1. Empirically investigate problems
  2. Identify mining requirements for these problems
  3. Adopt or dapt advanced mining algorithms or develop new mining requirements

Software Intelligence and Data Mining fields should work closer.

SI Adoption in practice

Some tools (e.g. coverity or Pattern Insight) propose ideas based on mining software repositories. SI acces is low-cost: if you have a repository you can mine it with minimal effort.

Adoption of SI would be made easier with:

  • make SI help available at low-level (help to change this code) as well as high levels (help make this choice), and provide means to act without management approvals.
  • Provide intuitive and easy results, by explaining them. Visualising will help adoption.

SI should help explain but will never replace practitioners.

Discussion and Conclusion

Software Intelligence I should lean from Business Intelligence

  • BI is already adopted in many large organisations,
  • BI is polishedand well advanced (visualisation, presentation).

SI can help enabling automated empirical Software Engineering. Examples: tests improvements through machine learning.

SI can help both industry and research fields by providing fact-based arguments and vision for all low-level and high-level decisions.

Personal tools