Potentials and Challenges of Recommendation Systems for Software Development

From Maisqual Wiki

Jump to: navigation, search

Hans-Jörg Happel, FZI Forschungszentrum Informatik, Karlsruhe, Germany.

Walid Maalej, Technische Universität München, Munich, Germany.



By surveying recommendation systems in software development, we found that existing approaches have been focusing on "you might like what similar developers like" scenarios. However structured artifacts and semantically well-defined development activities bear large potentials for further recommendation scenarios. We introduce a novel "landscape" of software development recommendation systems and line out several scenarios for knowledge sharing and collaboration. Basic challenges are improving context-awareness and particularly addressing information providers.


  1. Abstract
  2. Introduction
  3. State of the Art
    1. Surveyed Systems
    2. Summary and Areas of Improvement
  4. Recommendation Landscape
    1. When to Recommend
    2. What to Recommend
  5. Paths for Realization
  6. Conclusions


 author = {Happel, Hans-J\"{o}rg and Maalej, Walid},
 title = {Potentials and challenges of recommendation systems for software development},
 booktitle = {Proceedings of the 2008 international workshop on Recommendation systems for software engineering},
 series = {RSSE '08},
 year = {2008},
 isbn = {978-1-60558-228-3},
 location = {Atlanta, Georgia},
 pages = {11--15},
 numpages = {5},
 url = {http://doi.acm.org/10.1145/1454247.1454251},
 publisher = {ACM},
 address = {New York, NY, USA},



Current recommendation systems address the information seekers, by trying to answers their questions: Which interface should I use? Whom should I notify about my change? What should I do next? This article argues that proactive recommendations should be supportive of both roles: information seekers and information providers.

State of the art

Surveyed systems


CodeBroker[1] recommends methods that are suitable in a development environment. CodeBroker uses a user model.


Dhruv[2] assists in software maintenance by recommending information during bug inspection. It is typically embedded in a web-based bug tracking system and provides information in a sidebar. First meta-data is extracted from source code, mailing lists and similar bug reports, and then algorithms are employed to infer relationships among meta-data. After analysis, the various meta-data entities form an interconnected semantically described graph structure.; recommendations are drawn by relational similarity.

Dhruv does not maintain a user profile.


Hipikat[3] assists working developers by recommending source code, email discussions or bug reports related to an artefact. Relations are inferred by five different, manually implemented heuristics.

Hipikat does not maintain a user profile.


Mylyn[4] provides a narrowed view of artefacts needed for development, thus enabling the developer to focus on relevant information only. Mylyn works on the task level, and maintain a "degree of interest" value for each task (which is modeled after the interactions between the user and the file), along with a "degree of separation" value (which represents relations among the source code files.


Rascal[5] is similar to CodeBroker, and tries to predict the next method the developer would use by analysing the current class and comparing it to similar classes.


Strathcona[6] recommend source code examples relevant for the current development task. Four different heuristics are used (all implemented as SQL queries), then results are merged and presented to the user.

Summary and Areas of Improvement

A number of limitations could be addressed by future systems:

  • Existing systems are limited to either recommend methods to use next or artefacts which are related to the current situation.
  • Existing systems are based upon a centralised, static corpus. The aspect of information provision is not addressed.
  • The description of the user's situation or "context" is limited to single properties such as the current class a user is working in.
  • There is no pro-active triggering of information push: recommendations are either triggered automatically in a continuous way, or have to be requested by users.
  • Architectures of the surveyed systems are inflexible and do not allow for extensions.


Most systems use a client/server architecture; which limits the scope and amount of information. P2P-based approaches makes more information available without introducing performance problems.

Knowledge representation

Most systems are working with traditional knowledge representations and hard-coded heuristics. Use of the semantic web technologies would allow systems to bring more information, in a more transparent way (i.e. say why this item is recommended).


True pro-activeness assistance should identify certain problem situations (e.g. run time errors, unexpected program behaviour) based on a richer user context, which allows more focused recommendations.

Automatic Experience Capture

The presented systems focus on recommending methods to use or artefact type, but capturing e.g. problem solving patterns, which are usually not explicitely documented, would be of great help.

Recommendation landscape

Two major dimensions are considered:

  • the addressed stage of the knowledge sharing process (when to recommend), and
  • the recommended information (what to recommend).

When to recommend

There are some information that depend on a local context and/or immediate sharing, e.g. how to solve a common problem, what decision is taken.. For that reason, information retrieval is as important as information sharing. Publish the right information at the right time.

What to recommend

  • Code: less time for coding, less errors due to good patterns.
  • Artefacts: use related artefacts to help understand a task: tests, specifications, documents, etc.
  • Quality Measures: detect and highlight areas which are error-prone, recommend/identify best/worst patterns, help other users that face the same problem.
  • Tools: use the right tool for the right thing. Recommendation systems may use activity logs to deduce questions developers often ask, coach them automatically on tools or features that will make them more efficient.
  • People: for identification of expertise, whom advice is required, and to make information available to all.
  • Awareness: in distributed developments, awareness information is of crucial importance. Technical and ethical (privacy) reasons make this difficult, but recommendation systems can assist developers to selectively push information when required (e.g. please document what you have just done).
  • Status and Priorities: suggesting priority changes, showing personal performance overviews, as well as others are performing similar tasks (anonymously).Could help project and risk management.

Paths for Ralisation

Basic building blocks for improvements proposed in this article are:

Improving context-awareness

A framework is needed to gather all low-level data that may have an interest: log-like information, error messages or developers' searches. TeamWeaver[2] is such an attempt by the authors of the article.

Addressing information providers (who could need this information?).

Privacy issues should be addressed. Authors propose to use a private information need model, which could be anonymously exchanged and used to connect information seekers and providers.

Combination of context-awareness and assistance on information provision addresses several important issues of current approaches. It allows access to content and artefacts of in the private space of developers, without threatening their privacy.


Future research areas:

  • Recommendations to capture experiences and share information.
  • Semantic analysis and description of working context.
  • Automatic context-aware triggering of recommendations.


  1. Yunwen Ye and Gerhard Fischer. Automated Software Engineering, 12(2):199-235, 2005.
  2. 2.0 2.1 Anupriya Ankolekar, Katia Sycara, James Herbsleb, Robert Kraut, and Chris Welty. Supporting online problem-solving communities with the semantic web. In WWW '06, New York, NY, USA, 2006. ACM.
  3. Davor Cubranic, Gail C. Murphy, Janice Singer, and Kellogg S. Booth. Hipikat: A project memory for software development. IEEEE Trans. Softw. Eng., 31(6):446-465, 2005.
  4. Mik Kersten and Gail C. Murphy. Using task context to improve programmer productivity. In Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering, pages 1-11, New York, NY, USA, 2006. ACM.
  5. Frank Mccarey, Mel O. Cinnéide, and Nicholas Kushmerick. Rascal: A recommender agent for agile reuse. Artif. Intell. Rev., 24(3-4):253-276, 2005.
  6. Reid Holmes, Robert J. Walker, and Gail C. Murphy. Approximate structural context matching: An approach to recommend relevant examples. IEEE Transactions on Software Engineering, 32(12):952-970, 2006.
Personal tools