Topcased-MM R Analysis

From Maisqual Private Wiki

Jump to: navigation, search

This article describes the steps followed for the analysis of Topcased-MM, as of version 2.0.0.


[edit] Get data

We extracted file metrics out of the SQuORE database and put it in a csv file. The default extract method gives many metrics, among which only a few will probably interest us. The data has 4786 rows and Header (first line of file) is as follows:

#Application,Version,File,R_NOGOTO,R_NOCONT,R_SGLBRK,R_RETURN,R_COMPOUND,R_COMPOUNDIF,
R_COMPOUNDELSE,R_ELSEFINAL,R_NOLABEL,R_BRKFINAL,R_DEFAULT,R_ONECASE,R_NOFALLTHROUGH,
R_BWGOTO,R_NOASGCOND,R_NOASGINBOOL,R_NORECURSIVITY,R_NOCLONE_FUNCTION,
[SNIP]

[edit] Read and prepare data in R

We execute R in the directory containing the csv file, and ask him to read the csv file as a data frame.

top_files <- read.csv("topcased-mm_all_metrics_files.csv")

To get an idea of how this data is organised, lets have a look at its summary:

> summary(top_files)
    X.Application    Version    
Topcased-mm:4786   v1_0_0:   6  
                   v2_0_0:4767  
                   v4_0_0:   7  
                   v4_2_0:   6  
                                
                                
                                
                                                                           File     
plugins/org.topcased.rcp/src/org/topcased/rcp/Activator.java                 :   4  
plugins/org.topcased.rcp/src/org/topcased/rcp/CheckFeature.java              :   4  
plugins/org.topcased.rcp/src/org/topcased/rcp/IRCPPreferenceConstants.java   :   4  
plugins/org.topcased.rcp/src/org/topcased/rcp/RCPPreferenceInitializer.java  :   4  
plugins/org.topcased.toolkit/src/org/topcased/toolkit/Activator.java         :   4  
plugins/org.topcased.toolkit/src/org/topcased/toolkit/CheckConfiguration.java:   4  
(Other)                                                                      :4762  

R_NOGOTO       R_NOCONT       R_SGLBRK       R_RETURN       R_COMPOUND    
Mode:logical   Mode:logical   Mode:logical   Mode:logical   Mode:logical  
NA's:4786      NA's:4786      NA's:4786      NA's:4786      NA's:4786     
                                                                          
                                                                          
R_COMPOUNDIF   R_COMPOUNDELSE R_ELSEFINAL    R_NOLABEL      R_BRKFINAL    
Mode:logical   Mode:logical   Mode:logical   Mode:logical   Mode:logical  
NA's:4786      NA's:4786      NA's:4786      NA's:4786      NA's:4786     

[SNIP]                                                                          
                                                                          
R_NOCLONE_FILE R_NOCLONE_FOLDER R_NOBREAK      R_IFELSE          RULE_ANA    
Mode:logical   Mode:logical     Mode:logical   Mode:logical   Min.   :0.000  
NA's:4786      NA's:4786        NA's:4786      NA's:4786      1st Qu.:8.000  
                                                              Median :8.000  
                                                              Mean   :6.571  
                                                              3rd Qu.:8.000  
                                                              Max.   :8.000  
                                                                             
   NCC_ANA           RKO_ANA         ROKR_ANA       RULE_CHAN     
Min.   :   0.00   Min.   :0.000   Min.   : 37.5   Min.   : 2.000  
1st Qu.:   0.00   1st Qu.:0.000   1st Qu.: 87.5   1st Qu.:11.000  
Median :   0.00   Median :0.000   Median :100.0   Median :11.000  
Mean   :   7.32   Mean   :0.616   Mean   : 92.3   Mean   : 9.392  
3rd Qu.:   3.00   3rd Qu.:1.000   3rd Qu.:100.0   3rd Qu.:11.000  
Max.   :1152.00   Max.   :5.000   Max.   :100.0   Max.   :11.000  
                                                                  
   NCC_CHAN          RKO_CHAN        ROKR_CHAN        RULE_STAB    
Min.   :  0.000   Min.   :0.0000   Min.   : 36.36   Min.   :0.000  
1st Qu.:  0.000   1st Qu.:0.0000   1st Qu.: 90.91   1st Qu.:9.000  
Median :  0.000   Median :0.0000   Median :100.00   Median :9.000  
Mean   :  3.508   Mean   :0.6628   Mean   : 93.97   Mean   :7.392  
3rd Qu.:  1.000   3rd Qu.:1.0000   3rd Qu.:100.00   3rd Qu.:9.000  
Max.   :961.000   Max.   :7.0000   Max.   :100.00   Max.   :9.000  
                                                                   

[SNIP]

It looks like there are many columns that have only NAs. We first want to remove these:

files_small <- top_files[,colSums(is.na(top_files)) == 0]
> names(files_small)
 [1] "X.Application"   "Version"         "File"            "RULE_ANA"       
 [5] "NCC_ANA"         "RKO_ANA"         "ROKR_ANA"        "RULE_CHAN"      
 [9] "NCC_CHAN"        "RKO_CHAN"        "ROKR_CHAN"       "RULE_STAB"      
[13] "NCC_STAB"        "RKO_STAB"        "ROKR_STAB"       "RULE_TEST"      
[17] "NCC_TEST"        "RKO_TEST"        "ROKR_TEST"       "RULE"           
[21] "RULE_REQ"        "NCC"             "NCC_REQ"         "RKO"            
[25] "RKO_REQ"         "ROKR"            "ROKR_REQ"        "STAT"           
[29] "LC"              "CLOC"            "BLAN"            "SLOC"           
[33] "BRAC"            "HLOC"            "MLOC"            "ELOC"           
[37] "COMR"            "TOPT"            "TOPD"            "DOPT"           
[41] "DOPD"            "VG"              "MAXVG"           "AVGVG"          
[45] "CLOR"            "CFT"             "CFTC"            "FUNC"           
[49] "G_FUNC"          "F_FUNC"          "E_FUNC"          "D_FUNC"         
[53] "C_FUNC"          "B_FUNC"          "A_FUNC"          "FUMAI_DEBT"     
[57] "FUMAI_IDX"       "FUANA_DEBT"      "FUANA_IDX"       "FUCHAN_DEBT"    
[61] "FUCHAN_IDX"      "FUSTAB_DEBT"     "FUSTAB_IDX"      "FUTEST_DEBT"    
[65] "FUTEST_IDX"      "CLAS"            "G_CLAS"          "F_CLAS"         
[69] "E_CLAS"          "D_CLAS"          "C_CLAS"          "B_CLAS"         
[73] "A_CLAS"          "CLMAI_DEBT"      "CLMAI_IDX"       "CLANA_DEBT"     
[77] "CLANA_IDX"       "CLCHAN_DEBT"     "CLCHAN_IDX"      "CLSTAB_DEBT"    
[81] "CLSTAB_IDX"      "CLTEST_DEBT"     "CLTEST_IDX"      "MAINTAINABILITY"
[85] "ANALYSABILITY"   "CHANGEABILITY"   "STABILITY"       "TESTABILITY"    
[89] "TECH_DEBT_DST"   "TECH_DEBT_IDX"   "TECH_DEBT"       "FUNC_TDEBT"     
[93] "CLAS_TDEBT"      "SI"              "TXREM"           "TXADD"          
[97] "TXMOD"           "MPI"             "TECH_DEBT_TREND" "SIZE_TREND"     


files_small <- files_small[(NCC < 1000) && (DOPD < 2000),c("SLOC", "NCC", "DOPT", "DOPD", "VG", "CLOR", "CFT")]
files_small_acp <- princomp(files_small, cor=T)
plot(files_small_acp)
plot(files_small, main="Metrics comparison", col=rgb(0,100,0,50,maxColorValue=255), pch=16)


summary(files)
names(files)
Personal tools