A Complexity Measure

From Maisqual Wiki

Jump to: navigation, search

A complexity measure

Thomas J. McCabe


Contents


Contents

  1. Abstract
  2. Introduction
  3. A Complexity Measure
  4. Working Experience with the complexity measure
  5. Decomposition
  6. Simplification
  7. Nonstructured programming
  8. A testing metholodogy
  9. Appendix


Bibtex

@inproceedings{McCabe:1976:CM:800253.807712,
 author = {McCabe, Thomas J.},
 title = {A complexity measure},
 booktitle = {Proceedings of the 2nd international conference on Software engineering},
 series = {ICSE '76},
 year = {1976},
 location = {San Francisco, California, United States},
 pages = {407--},
 url = {http://portal.acm.org/citation.cfm?id=800253.807712},
 acmid = {807712},
 publisher = {IEEE Computer Society Press},
 address = {Los Alamitos, CA, USA},
 keywords = {Basis, Complexity measure, Control flow, Decomposition, Graph theory, Independence, Linear, Modularization, Programming, Reduction, Software, Testing},
} 


Download File

The file can be downloaded from here.


Notes

Abstract

This paper describes a graph-theoretic complexity measure and illustrates the correlation between the intuitive complexity and the graph-theoretic complexity.

Introduction

How to modularize a software system so the resulting modules are both testable and maintainable? Methods exist based on code size, but this is not enough: small-sized programs can have millions of paths.

A Complexity Measure

The complexity measure approach is to measure and control the number of paths through a program.

Definition 1 
The cyclomatic number V(G) of a graph G with n vertices, e edges, and p connected components is:

v(G) = en + p

Theorem 1 
In a strongly connected[1] graph G, the cyclomatic number is equal to the maximum number of linearly independent circuits.

The overall strategy will be to measure the complexity of a program by computing the number of linearly independent paths v(G), control the "size" of programs by setting an upper limit to v(G) (instead of using just physical size), an duse the cyclomatic complexity as the basis for a testing methodology.

Cyclomatic complexity is defined as:

v(G) = en + 2p

The role of the p variable is to be explained in section IV: decomposition.

Properties of cyclomatic complexity are:

  • v(G) >= 1.
  • v(G) is the maximum number of linearly independent paths in G; it is the size of a basis set..
  • Inserting a deleting functional statements to G doesn't affect v(G).
  • G has only one path if and only if V(G) = 1.
  • Inserting a new edge in G increases v(G) by unit.
  • v(G) depends only on the decision structure of G.

Working Experience with the complexity measure

The flow between the blocks is represented as an n by n matrix (where n is the number of blocks), having a 1 in the i-j th position if block i can branch to block j in a single step.

The reachability matrix is defined by having a 1 in the i-j th position if block i can branch to block j in any number of steps.

The control graphs often denote the "style" of programming by showing similar patterns.

Results were applied in an operational environment, to limit the software modules by cyclomatic complexity instead of physical size. The upper bound that has been used for cyclomatic complexity is 10, which is a "reasonable, but not magical upper limit". When they reached this limit, developers were required either to modularize or rewrite the code. The only situation in which this limit has seemed unreasonable was for large switch/case statements, which were allowed.

In most cases, the high cyclomatic complexity was related to hard to maintain, troublesome and/or unreliable software pieces.

Decomposition

Remember in Definition 1 that p was the number of connected components[2]. Now if we have a main program and two subroutines A and B, such as the total graph is M ∪ A ∪ B. Then

v(M \cup A \cup B) = e - n + 2p

With p = 3.

This method with p ≠ 1 can be used to calculate the complexity of a collection of programs, particularly a hierarchical nest of subroutines.

In general, the complexity of a collection C of control graphs with k connected components is equal to the summation of their complexities; let ei and ni be the number of edges and nodes in the ith connected component, then:

v(C) = e - n + 2p = \sum_1^k e_i -\sum_1^k n_i + 2k

v(C) = \sum_1^k (e_i -n_i +2) = \sum_1^k v(C_i)

Simplification

The v(G) can be tedious to compute; some simplifications are shown below for a single component graph.

Mills[3] proves the following: let be θ the number of functions, π the number of predicates, and e the number of edges. Then:

e = 1 + θ + 3π.

Since for every predicate there is exactly one collecting node and there are unique entry and exit nodes it follows that:

n = θ + 2π + 2.

For a single component graph (p = 1) we get:

v(G) = (1 + θ + 3π) − (θ + 2π + 2) + 2 = π + 1

Which means that the cyclomatic complexity is equal to the number of predicates plus one.

In practice compound predicates such as if (C1 and C2) then contribute two to complexity since it could be rewritten if (C1) then if (C2) then. For this reason it may be more convenient to count conditions rather than predicates when calculating complexity. Similarly, for switch case constructs with N cases use N - 1 for the number of conditions.

Nonstructured programming

Structured programming lacks some clarity for its definition; more specifically, a clear definition of the constructs that structured programming excludes would also sensitize programmers to their use while programs are being created, which would have a desirable effect.

Furthermore, there are some cases (see Knuth[4]) where an unstructured construct works best.

What would help is first the definition of the unstructured components and second a measure of the structureness of a program as well as the complexity of a program.

(a), (b), (c), and (d) are given control structures, such as respectively: (a) is a branching loop; (b) is a branching into a loop; (c) is a branching into a decision; (d) is a branching out of a decision.

The following results are stated without proof:

Result 1
A necessary and sufficient condition that a program[5] is nonstructured is that it contains as a subgraph either (a), (b) or (c). (d) is slighted because any 3 of the 4 graphs will generate all the unstructured programs.

A structured program can be written by not "branching out of or into a loop, or out of or into a decision".

Result 2
A nonsructured program cannot be just a little nonstructured. That is any nonstructured program must contain at least 2 of the 4 graphs (a-d).
Result 3
A necessary and sufficient condition for a program to be nonstructured is that it contains at least one of (a,b), (a,d), (b,c), (c,d).
Result 4
The cyclomatic complexity in a nonstructured program is at least 3.
Result 5
A structured program can be written by not branching out of loops or into decisions.
Result 6
A structured program can be written by not branching into loops or out of decisions.
Result 7
A structured program is reducible[6] to a program of unit complexity.

The following definition of essential complexity (ev) is used to reflect the lack of structure. Let n be the number of proper subgraphs with unique entry and exit nodes.

Definition -- Essential Complexity
ev = vn
Result 8
The essential complexity of a structured program is one.

A testing metholodogy

This section defines the relationship between testing and cyclomatic complexity.

Let ac be the number of tested paths, and v the cyclomatic complexity. If ac is less than v, one of the three conditions must be true:

  • more testing needs to be done,
  • the program flow can be reduced in complexity by (v - ac)
  • portions of the program can be reduced to in line code.

If we compare the flow graph (and associated complexity) and the paths tested, several additional paths may be discovered; note that v is only the minimal number of independent paths that should be tested.


References

  1. Definition of a strongly connected graph: there is a path joining any pair of arbitrary distinct vertices.
  2. A graph is connected if for every pair of vertices there is a chain going from on to the other.
  3. H.D. Mills, "Mathematical foundations for structured programming", Federal System Division, IBM Corp., Gaithersburg, MD, FSC 72-6012, 1972
  4. D.E. Knuth, "Structured programming with GOTO statements", Computing Surveys, vol. 6, pp. 261-301, Dec. 1974.
  5. Assuming the program does not contain unconditional GOTO's.
  6. Reduction is the process of removing subgraphs (subroutines) with unique entry and exit nodes.
Personal tools