Recommendation Systems for Software Engineering

From Maisqual Wiki

Jump to: navigation, search

Martin P. Robillard, McGill University.

Robert J. Walker, University of Calgary.

Thomas Zimmermann, Microsoft Research.



Recommendation systems specific to software engineering are emerging to assist developers in a wide range of activities. This overview of available systems describes what they are, what they can do now, and what they might do in the future.


 author = {Martin Robillard and Robert Walker and Thomas Zimmermann},
 title = {Recommendation Systems for Software Engineering},
 journal ={IEEE Software},
 volume = {27},
 issn = {0740-7459},
 year = {2010},
 pages = {80-86},
 doi = {},
 publisher = {IEEE Computer Society},
 address = {Los Alamitos, CA, USA},


  1. RSSEs: what they are
  2. What RSSEs Do for Developers
    1. Guiding Software Challenges with eRose
    2. Finding Relevant Examples with Strathcona
    3. Guiding Sofware Navigation with Suade
  3. RSSEs Design Dimensions
    1. Nature of the Context
    2. Recommendation Engine
    3. Output Modes
    4. Cross-Dimentional Features
  4. RSSEs Limitations and Potential


Key factors giving rise to practical RSSEs (Recommendation Systems for Software Engineering) include:

  • large store of publicly available source code for analyzing recommendations,
  • mature software repositories data mining techniques, and
  • mainstream adoption of common software development interfaces (e.g. bugzilla, Eclipse).

RSSEs: what they are

Here is a general definition[1]:

[Recommendation] systems are software applications that aim to support users in their decision-making while interacting with large information spaces. They recommend items of interest to users based on preferences they have expressed, either explicitly or implicitly. The ever-expanding volume and increasing complexity of information [...] has therefore made such systems essential tools for users in a variety of information seeking [...] activities. [Recommendation] systems help overcome the information overload problem by exposing users to the most interesting items, and by offering novelty, surpise, and relevance.

General challenge for recommendation systems: how to establish context, which could include all relevant information about the user, his or her working environment, and the project or task status at the time of the recommendation. As an example:

  • the user's characteristics: job description, expertise level, prior work, and social network.
  • the kind of task being conducted, such as adding new features, debugging, and optimizing,
  • the task's specific characteristics, such as edited code, viewed code, or code dependencies; and
  • the user's past actions or those of the user's peers, such as artefacts viewed or explicitly recommended.

Qualities of a recommender system are: novelty, surpise, relevance.

Authors propose the following definition for RSSEs:

An RSSE is a software application that provides information items estimated to be valuable for a software engineering task in a given context.

What RSSEs Do for Developers

Most current RSSEs support developers while programming.

  • CodeBroker (surfacing reuse opportunities),
  • Expertise Browser (locating expert consultants),
  • Strathcona (what example to use)[2][3],
  • ParseWeb (what call sequences to make),
  • Suade (Where to look in the code)[4],
  • eRose (what to change next)[5][6],
  • SemDiff (recomment replacement methods for adapting code to a new library version),
  • Dhruv (find code and people related to a bug fix).

Most RSSEs involve three main functionalities:

  • a data-collection mechanism to collect data in a data model,
  • a recommendation engine to analyse the data model and generate recommendations,
  • a user interface to trigger the recommendation cycle and present its results.

RSSE Design Dimensions

We consider the following three design dimensions: nature of the context, recommendation engine, and output mode.

Nature of the context

It's the RSSE input, and may be implicit or explicit. Implicit is transparent and allows more gathering. Explicit is needed when the data is too difficult to express/gather (interest, feeling..). Many cases require a combination of both.

Recommendation engine

Data: source, change, bug reports, mailing lists, interaction history, peers' actions. Most RSSEs use a ranking system, based on what the developer will find useful.

Output modes

Output modes can be Pull (call the engine, can be as simple as a single click in the UI) or Push (deliver continuously, may be obstructive if not well designed).

Cross dimensional features

The user should be able to flag recommendations as good or bad, thus re-feeding the system. Ranking mechanism can be:

  • locally adjustable (the developer adjusts the inferred context manually),
  • individually adaptive (the algorithm is refined for individuals according to their implicit or explicit feedback), or
  • globally adaptive (feedback from one user affects another user).

RSSE Limitations and Potential

  • Limitations: Cold-start problem when repositories are large.
  • Proactive discovery is a direction for future RSSEs. Rather than waiting for developers to realise they need a certain kind of information, deliver it automatically.
  • Giving rationale to explain recommendations is great (confidence) but should not overload the user with too many information.


  1. ACM International Conference on Recommender Systems. ResSys 09, .
  2. R. Holmes, R.J. Walker, and G.C. Murphy, "Approximate Structural Context Matching: an Approach for Recommending Relevant Examples", IEEE Trans. Software Eng., vol. 32, no. 1, 2006, pp. 952-970.
  3. A prototype and more details are available at .
  5. T. Zimmermann et al., "Mining Version Histories to Guide Software Changes", IEEE Trans. Software Eng., vol. 31, no. 6, 2005, pp. 429-445.
  6. A prototype implementation of eRose is available at .

See also


Personal tools