Maisqual Metrics

From Maisqual Wiki

Revision as of 08:30, 6 March 2014 by Bbaldassari (Talk | contribs)
Jump to: navigation, search

This page lists the metrics retrieved for the different analyses performed on projects.

Contents

Availability of metrics

The set of available metrics depends on the type of artefact (e.g. application, file, function) and data set (weekly, releases, version), and on the characteristics of the project (e.g. object-oriented).

Common OO Diff. Total Time Time
Java Evolution X X X X X X
Java Releases X X X X
Java Versions X X X
C Evolution X X X X X
C Releases X X X
C Versions X

Some metrics are only available for specific contexts, like the number of classes for object-oriented programming. Metrics available for each type of data set are described in table 6.1. Object-Oriented metrics are CLAS (number of classes defined in the artefact) and DITM (Depth of Inheritance Tree). Diff metrics are LADD, LMOD, LREM (Number of lines added, modified and removed since the last analysis). Time metrics for SCM are SCM_*_1W, SCM_*_1M, AND SCM_*_3M. Total metrics for SCM are SCM_COMMITS_TOTAL, SCM_COMMITTERS_TOTAL, SCM_COMMITS_FILES_TOTAL and scm_fixes_total. Time metrics for Communication are COM_*_1W, COM_*_1M, and COM_*_3M.

Metrics defined on a lower level (e.g. function) can be aggregated to upper level in a smart manner: as an example, the cyclomatic number at the file level is the overall sum of its function’s cyclomatic numbers. The meaning of the upper-level metric shall be interpreted with this fact in mind, since it may introduce a bias (also known as the Ecological fallacy[1]). When needed, the smart manner used to aggregate information at upper levels is described hereafter.

All data sets are structured in three files, corresponding to the different artefact types that were investigated: application, file and function.



Source code metrics

Artefact counting metrics

  • The number of files (FILE) counts the number of source files in the project, i.e. which have an extension corresponding to the defined language (.java for Java or .c and .h files for C).
  • The number of functions (FUNC) sums up the number of methods or functions recursively defined in the artefact.

Line counting metrics

Line counting metrics propose a variety of different means to grasp the size of code from different perspectives. It includes STAT, SLOC, ELOC, CLOC, MLOC, and BRAC.

  • The number of statements (STAT) counts the total number of instructions. Examples of instructions include control-flow tokens, plus else, cases, and assignments.
  • Source lines of code (SLOC) is the number of non-blank and non-comment lines in code.
  • Effective lines of code (ELOC) also removes the number of lines that contain only braces.
  • Comment lines of code (CLOC) counts the number of lines that include a comment in the artefact. If a line includes both code and comment, it will be counted in SLOC, CLOC and MLOC metrics.


Configuration Management metrics

Application level

We retrieve the following metrics on application artefacts:

  • SCM_COMMITS: number of commits.
  • SCM_COMMITS_FILES: number of files associated to commits.
  • SCM_COMMITTERS: number of distinct committers.
  • SCM_FIXES: number of fix-related commits, i.e. commits that include either the fix, issue, problem or error keywords in their message.

Metrics are retrieved for the overall time, the last week, last month, and last three months.

Variable names are:

  • SCM_COMMITS_1W SCM_COMMITS_1M SCM_COMMITS_3M SCM_COMMITS_TOTAL
  • SCM_COMMITS_FILES_1W SCM_COMMITS_FILES_1M SCM_COMMITS_FILES_3M SCM_COMMITS_FILES_TOTAL
  • SCM_COMMITTERS_1W SCM_COMMITTERS_1M SCM_COMMITTERS_3M SCM_COMMITTERS_TOTAL
  • SCM_FIXES_1W SCM_FIXES_1M SCM_FIXES_3M SCM_FIXES_TOTAL

File level

We retrieve the following metrics on file artefacts:

  • SCM_COMMITS: number of commits for the artefact.
  • SCM_COMMITTERS: number of distinct committers for the artefact.
  • SCM_FIXES: number of fix-related commits for the artefact, i.e. commits that include either the fix, issue, problem or error keywords in their message.

Metrics are retrieved for the overall time, the last week, last month, and last three months.

Variable names are:

  • SCM_COMMITS_1W SCM_COMMITS_1M SCM_COMMITS_3M SCM_COMMITS_TOTAL
  • SCM_COMMITTERS_1W SCM_COMMITTERS_1M SCM_COMMITTERS_3M SCM_COMMITTERS_TOTAL
  • SCM_FIXES_1W SCM_FIXES_1M SCM_FIXES_3M SCM_FIXES_TOTAL


Communication metrics

Communication metrics show an unusual part of the project: people’s activity and interactions during the elaboration of the product. Most software projects have two communication media: one targeted at the internal development of the product, for developers who actively contribute to the project by committing in the source repository, testing the product, or finding bugs (a.k.a. developers mailing list); and one targeted at end-users for general help and good use of the product (a.k.a. user mailing list).

The type of media varies across the different forges or projects: most of the time mailing lists are used, with a web interface like MHonArc or mod_mbox. In some cases, projects may use as well forums (especially for user-oriented communication) or NNTP news servers, as for the Eclipse foundation projects. The variety of media and tools makes it difficult to be extensive; however data providers can be written to map these to the common mbox format. We wrote connectors for mboxes, MHonArc, GMane and FUDForum (used by Eclipse).

We retrieve the following metrics on application artefacts:

  • The number of posts (COM_DEV_VOL, COM_USR_VOL) is the total number of mails posted on the mailing list during the considered period of time. All posts are counted, regardless of their depth (i.e. new posts or answers).
  • The number of distinct authors (COM_DEV_AUTH, COM_USR_AUTH) is the number of people having posted at least once on the mailing list during the considered period of time. Authors are counted once even if they posted multiple times, based on their email address.
  • The number of threads (COM_DEV_SUBJ, COM_USR_SUBJ) is the number of diffent subjects (i.e. a question and its responses) that have been posted on the mailing list during the considered period of time. Subjects that are replies to other subjects are not counted, even if the subject text is different.
  • The number of answers (COM_DEV_RESP_VOL, COM_USR_RESP_VOL) is the total number of replies to requests on the user mailing list during the considered period of time. A message is considered as an answer if it is using the Reply-to header field. The number of answers is often associated to the number of threads to compute the useful response ratio metric.
  • The median time to first reply (COM_DEV_RESP_TIME_MED, COM_USR_RESP_TIME_MED) is the number of seconds between a question (first post of a thread) and the first response (second post of a thread) on the mailing list during the considered period of time.

As for configuration management metrics, we worked on temporal measures to produce measures for the last week, last month, and last three months. Communication metrics are only available at the application level.

  • COM_DEV_AUTH_1M, <tt>COM_DEV_AUTH_3M, <tt>COM_DEV_AUTH_1W,
  • <tt>COM_DEV_RESP_TIME_MED_1M, <tt>COM_DEV_RESP_TIME_MED_3M, <tt>COM_DEV_RESP_TIME_MED_1W,
  • <tt>COM_DEV_RESP_VOL_1M, <tt>COM_DEV_RESP_VOL_3M, <tt>COM_DEV_RESP_VOL_1W,
  • <tt>COM_DEV_SUBJ_1M, <tt>COM_DEV_SUBJ_3M, <tt>COM_DEV_SUBJ_1W,
  • <tt>COM_DEV_VOL_1M, <tt>COM_DEV_VOL_3M, <tt>COM_DEV_VOL_1W,
  • <tt>COM_USR_AUTH_1M, COM_USR_AUTH_3M, <tt>COM_USR_AUTH_1W,
  • <tt>COM_USR_RESP_TIME_MED_1M, <tt>COM_USR_RESP_TIME_MED_3M, <tt>COM_USR_RESP_TIME_MED_1W,
  • <tt>COM_USR_RESP_VOL_1M, <tt>COM_USR_RESP_VOL_3M, COM_USR_RESP_VOL_1W,
  • <tt>COM_USR_SUBJ_1M, <tt>COM_USR_SUBJ_3M, <tt>COM_USR_SUBJ_1W,
  • <tt>COM_USR_VOL_1M, <tt>COM_USR_VOL_3M, <tt>COM_USR_VOL_1W,


References

  1. Posnett, D., Filkov, V., & Devanbu, P. (2011). Ecological inference in empirical software engineering. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering (pp. 362–371). IEEE Computer Society. doi:10.1109/ASE.2011.6100074
Personal tools