McCabe Complexity
   HOME

TheInfoList



OR:

Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
. It was developed by Thomas J. McCabe, Sr. in 1976. Cyclomatic complexity is computed using the
control-flow graph In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was discovered by Frances E. Allen, who noted that ...
of the program: the nodes of the
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
correspond to indivisible groups of commands of a program, and a
directed Director may refer to: Literature * ''Director'' (magazine), a British magazine * ''The Director'' (novel), a 1971 novel by Henry Denker * ''The Director'' (play), a 2000 play by Nancy Hasty Music * Director (band), an Irish rock band * ''D ...
edge connects two nodes if the second command might be executed immediately after the first command. Cyclomatic complexity may also be applied to individual functions,
modules Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a s ...
,
methods Method ( grc, μέθοδος, methodos) literally means a pursuit of knowledge, investigation, mode of prosecuting such inquiry, or system. In recent centuries it more often means a prescribed process for completing a task. It may refer to: *Scien ...
or classes within a program. One testing strategy, called basis path testing by McCabe who first proposed it, is to test each linearly independent path through the program; in this case, the number of test cases will equal the cyclomatic complexity of the program.


Description


Definition

The cyclomatic complexity of a section of
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
is the number of
linearly independent In the theory of vector spaces, a set of vectors is said to be if there is a nontrivial linear combination of the vectors that equals the zero vector. If no such linear combination exists, then the vectors are said to be . These concepts are ...
paths within it—a set of paths being linearly dependent if there is a subset of one or more paths where the
symmetric difference In mathematics, the symmetric difference of two sets, also known as the disjunctive union, is the set of elements which are in either of the sets, but not in their intersection. For example, the symmetric difference of the sets \ and \ is \. Th ...
of their edge sets is empty. For instance, if the source code contained no control flow statements (conditionals or decision points), the complexity would be 1, since there would be only a single path through the code. If the code had one single-condition IF statement, there would be two paths through the code: one where the IF statement evaluates to TRUE and another one where it evaluates to FALSE, so the complexity would be 2. Two nested single-condition IFs, or one IF with two conditions, would produce a complexity of 3. Mathematically, the cyclomatic complexity of a structured program is defined with reference to the
control-flow graph In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was discovered by Frances E. Allen, who noted that ...
of the program, a
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
containing the
basic block In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually deco ...
s of the program, with an edge between two basic blocks if control may pass from the first to the second. The complexity is then defined as :M = E - N + 2P, where : = the number of edges of the graph. : = the number of nodes of the graph. : = the number of connected components. An alternative formulation is to use a graph in which each exit point is connected back to the entry point. In this case, the graph is
strongly connected In the mathematical theory of directed graphs, a graph is said to be strongly connected if every vertex is reachable from every other vertex. The strongly connected components of an arbitrary directed graph form a partition into subgraphs that ...
, and the cyclomatic complexity of the program is equal to the
cyclomatic number In graph theory, a branch of mathematics, the circuit rank, cyclomatic number, cycle rank, or nullity of an undirected graph is the minimum number of edges that must be removed from the graph to break all its cycles, making it into a tree or fo ...
of its graph (also known as the first Betti number), which is defined as :M = E - N + P. This may be seen as calculating the number of
linearly independent cycle In graph theory, a branch of mathematics, a cycle basis of an undirected graph is a set of simple cycles that forms a basis of the cycle space of the graph. That is, it is a minimal set of cycles that allows every even-degree subgraph to be expr ...
s that exist in the graph, i.e. those cycles that do not contain other cycles within themselves. Note that because each exit point loops back to the entry point, there is at least one such cycle for each exit point. For a single program (or subroutine or method), is always equal to 1. So a simpler formula for a single subroutine is :M = E - N + 2. Cyclomatic complexity may, however, be applied to several such programs or subprograms at the same time (e.g., to all of the methods in a class), and in these cases will be equal to the number of programs in question, as each subprogram will appear as a disconnected subset of the graph. McCabe showed that the cyclomatic complexity of any structured program with only one entry point and one exit point is equal to the number of decision points (i.e., "if" statements or conditional loops) contained in that program plus one. However, this is true only for decision points counted at the lowest, machine-level instructions. Decisions involving compound predicates like those found in high-level languages like IF cond1 AND cond2 THEN ... should be counted in terms of predicate variables involved, i.e. in this example one should count two decision points, because at machine level it is equivalent to IF cond1 THEN IF cond2 THEN .... Cyclomatic complexity may be extended to a program with multiple exit points; in this case it is equal to :\pi - s + 2, where \pi is the number of decision points in the program, and is the number of exit points.


Explanation in terms of algebraic topology

An ''even subgraph'' of a graph (also known as an Eulerian subgraph) is one where every
vertex Vertex, vertices or vertexes may refer to: Science and technology Mathematics and computer science *Vertex (geometry), a point where two or more curves, lines, or edges meet * Vertex (computer graphics), a data structure that describes the positio ...
is incident with an even number of edges; such subgraphs are unions of cycles and isolated vertices. In the following, even subgraphs will be identified with their edge sets, which is equivalent to only considering those even subgraphs which contain all vertices of the full graph. The set of all even subgraphs of a graph is closed under
symmetric difference In mathematics, the symmetric difference of two sets, also known as the disjunctive union, is the set of elements which are in either of the sets, but not in their intersection. For example, the symmetric difference of the sets \ and \ is \. Th ...
, and may thus be viewed as a vector space over
GF(2) (also denoted \mathbb F_2, or \mathbb Z/2\mathbb Z) is the finite field of two elements (GF is the initialism of ''Galois field'', another name for finite fields). Notations and \mathbb Z_2 may be encountered although they can be confused with ...
; this vector space is called the ''cycle space'' of the graph. The
cyclomatic number In graph theory, a branch of mathematics, the circuit rank, cyclomatic number, cycle rank, or nullity of an undirected graph is the minimum number of edges that must be removed from the graph to break all its cycles, making it into a tree or fo ...
of the graph is defined as the dimension of this space. Since GF(2) has two elements and the cycle space is necessarily finite, the cyclomatic number is also equal to the 2-logarithm of the number of elements in the cycle space. A basis for the cycle space is easily constructed by first fixing a
spanning forest In the mathematical field of graph theory, a spanning tree ''T'' of an undirected graph ''G'' is a subgraph that is a tree which includes all of the vertices of ''G''. In general, a graph may have several spanning trees, but a graph that is not ...
of the graph, and then considering the cycles formed by one edge not in the forest and the path in the forest connecting the endpoints of that edge; these cycles constitute a basis for the cycle space. Hence, the cyclomatic number also equals the number of edges not in a maximal spanning forest of a graph. Since the number of edges in a maximal spanning forest of a graph is equal to the number of vertices minus the number of components, the formula E-N+P above for the cyclomatic number follows. For the more topologically inclined, cyclomatic complexity can alternatively be defined as a relative
Betti number In algebraic topology, the Betti numbers are used to distinguish topological spaces based on the connectivity of ''n''-dimensional simplicial complexes. For the most reasonable finite-dimensional spaces (such as compact manifolds, finite simplici ...
, the size of a
relative homology In algebraic topology, a branch of mathematics, the (singular) homology of a topological space relative to a subspace is a construction in singular homology, for pairs of spaces. The relative homology is useful and important in several ways. Intui ...
group: :M := b_1(G,t) := \operatornameH_1(G,t), which is read as "the rank of the first homology group of the graph ''G'', relative to the terminal nodes ''t''". This is a technical way of saying "the number of linearly independent paths through the flow graph from an entry to an exit", where: * "linearly independent" corresponds to homology, and means one does not double-count backtracking; * "paths" corresponds to ''first'' homology: a path is a 1-dimensional object; * "relative" means the path must begin and end at an entry or exit point. This corresponds to the intuitive notion of cyclomatic complexity, and can be calculated as above. Alternatively, one can compute this via absolute Betti number (absolute homology – not relative) by identifying (gluing together) all the terminal nodes on a given component (or equivalently, draw paths connecting the exits to the entrance), in which case (calling the new, augmented graph \tilde G, which is ), one obtains :M = b_1(\tilde G) = \operatornameH_1(\tilde G). It can also be computed via
homotopy In topology, a branch of mathematics, two continuous functions from one topological space to another are called homotopic (from grc, ὁμός "same, similar" and "place") if one can be "continuously deformed" into the other, such a defor ...
. If one considers a (connected) control-flow graph as a 1-dimensional
CW complex A CW complex (also called cellular complex or cell complex) is a kind of a topological space that is particularly important in algebraic topology. It was introduced by J. H. C. Whitehead (open access) to meet the needs of homotopy theory. This cl ...
called X, then the fundamental group of X will be \pi_1(X) \cong \Z^. The value of n+1 is the cyclomatic complexity. The fundamental group counts how many loops there are through the graph, up to homotopy, and hence aligns with what we would intuitively expect. This corresponds to the characterization of cyclomatic complexity as "number of loops plus number of components".


Interpretation

In his presentation 'Software Quality Metrics to Identify Risk' for the Department of Homeland Security, Tom McCabe introduces the following categorisation to interpret cyclomatic complexity: * 1 - 10 Simple procedure, little risk * 11 - 20 More complex, moderate risk * 21 - 50 Complex, high risk * > 50 Untestable code, very high risk


Applications


Limiting complexity during development

One of McCabe's original applications was to limit the complexity of routines during program development; he recommended that programmers should count the complexity of the modules they are developing, and split them into smaller modules whenever the cyclomatic complexity of the module exceeded 10. This practice was adopted by the NIST Structured Testing methodology, with an observation that since McCabe's original publication, the figure of 10 had received substantial corroborating evidence, but that in some circumstances it may be appropriate to relax the restriction and permit modules with a complexity as high as 15. As the methodology acknowledged that there were occasional reasons for going beyond the agreed-upon limit, it phrased its recommendation as "For each module, either limit cyclomatic complexity to he agreed-upon limitor provide a written explanation of why the limit was exceeded."


Measuring the "structuredness" of a program

Section VI of McCabe's 1976 paper is concerned with determining what the control-flow graphs (CFGs) of non- structured programs look like in terms of their subgraphs, which McCabe identifies. (For details on that part see
structured program theorem The structured program theorem, also called the Böhm–Jacopini theorem, is a result in programming language theory. It states that a class of control-flow graphs (historically called flowcharts in this context) can compute any computable function ...
.) McCabe concludes that section by proposing a numerical measure of how close to the structured programming ideal a given program is, i.e. its "structuredness" using McCabe's neologism. McCabe called the measure he devised for this purpose
essential complexity Essential complexity is a numerical measure defined by Thomas J. McCabe, Sr., in his highly cited, 1976 paper better known for introducing cyclomatic complexity. McCabe defined essential complexity as the cyclomatic complexity of the reduced CFG ( ...
. In order to calculate this measure, the original CFG is iteratively reduced by identifying subgraphs that have a single-entry and a single-exit point, which are then replaced by a single node. This reduction corresponds to what a human would do if they extracted a subroutine from the larger piece of code. (Nowadays such a process would fall under the umbrella term of
refactoring In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the '' factoring''—without changing its external behavior. Refactoring is intended to improve the design, structu ...
.) McCabe's reduction method was later called ''condensation'' in some textbooks, because it was seen as a generalization of the condensation to components used in graph theory. If a program is structured, then McCabe's reduction/condensation process reduces it to a single CFG node. In contrast, if the program is not structured, the iterative process will identify the irreducible part. The essential complexity measure defined by McCabe is simply the cyclomatic complexity of this irreducible graph, so it will be precisely 1 for all structured programs, but greater than one for non-structured programs.


Implications for software testing

Another application of cyclomatic complexity is in determining the number of test cases that are necessary to achieve thorough test coverage of a particular module. It is useful because of two properties of the cyclomatic complexity, , for a specific module: * is an upper bound for the number of test cases that are necessary to achieve a complete branch coverage. * is a lower bound for the number of paths through the control-flow graph (CFG). Assuming each test case takes one path, the number of cases needed to achieve path coverage is equal to the number of paths that can actually be taken. But some paths may be impossible, so although the number of paths through the CFG is clearly an upper bound on the number of test cases needed for path coverage, this latter number (of ''possible'' paths) is sometimes less than . All three of the above numbers may be equal: branch coverage \leq cyclomatic complexity \leq number of paths. For example, consider a program that consists of two sequential if-then-else statements. if (c1()) f1(); else f2(); if (c2()) f3(); else f4(); In this example, two test cases are sufficient to achieve a complete branch coverage, while four are necessary for complete path coverage. The cyclomatic complexity of the program is 3 (as the strongly connected graph for the program contains 9 edges, 7 nodes and 1 connected component) (). In general, in order to fully test a module, all execution paths through the module should be exercised. This implies a module with a high complexity number requires more testing effort than a module with a lower value since the higher complexity number indicates more pathways through the code. This also implies that a module with higher complexity is more difficult for a programmer to understand since the programmer must understand the different pathways and the results of those pathways. Unfortunately, it is not always practical to test all possible paths through a program. Considering the example above, each time an additional if-then-else statement is added, the number of possible paths grows by a factor of 2. As the program grows in this fashion, it quickly reaches the point where testing all of the paths becomes impractical. One common testing strategy, espoused for example by the NIST Structured Testing methodology, is to use the cyclomatic complexity of a module to determine the number of white-box tests that are required to obtain sufficient coverage of the module. In almost all cases, according to such a methodology, a module should have at least as many tests as its cyclomatic complexity; in most cases, this number of tests is adequate to exercise all the relevant paths of the function. As an example of a function that requires more than simply branch coverage to test accurately, consider again the above function, but assume that to avoid a bug occurring, any code that calls either f1() or f3() must also call the other. Assuming that the results of c1() and c2() are independent, that means that the function as presented above contains a bug. Branch coverage would allow us to test the method with just two tests, and one possible set of tests would be to test the following cases: * c1() returns true and c2() returns true * c1() returns false and c2() returns false Neither of these cases exposes the bug. If, however, we use cyclomatic complexity to indicate the number of tests we require, the number increases to 3. We must therefore test one of the following paths: * c1() returns true and c2() returns false * c1() returns false and c2() returns true Either of these tests will expose the bug.


Correlation to number of defects

A number of studies have investigated the correlation between McCabe's cyclomatic complexity number with the frequency of defects occurring in a function or method. Some studies find a positive correlation between cyclomatic complexity and defects: functions and methods that have the highest complexity tend to also contain the most defects. However, the correlation between cyclomatic complexity and program size (typically measured in
lines of code Source lines of code (SLOC), also known as lines of code (LOC), is a software metric used to measure the size of a computer program by counting the number of lines in the text of the program's source code. SLOC is typically used to predict the am ...
) has been demonstrated many times.
Les Hatton Les Hatton (born 5 February 1948) is a British-born computer scientist and mathematician most notable for his work on failures and vulnerabilities in software controlled systems. He was educated at King's College, Cambridge 1967–1970 and the U ...
has claimed that complexity has the same predictive ability as lines of code. Studies that controlled for program size (i.e., comparing modules that have different complexities but similar size) are generally less conclusive, with many finding no significant correlation, while others do find correlation. Some researchers who have studied the area question the validity of the methods used by the studies finding no correlation. Although this relation is probably true, it isn't easily utilizable. Since program size is not a controllable feature of commercial software, the usefulness of McCabes's number has been called to question. The essence of this observation is that larger programs tend to be more complex and to have more defects. Reducing the cyclomatic complexity of code is
not proven Not proven (, ) is a verdict available to a court of law in Scotland. Under Scots law, a criminal trial may end in one of three verdicts, one of conviction ("guilty") and two of acquittal ("not proven" and "not guilty").The Scottish criminal ju ...
to reduce the number of errors or bugs in that code. International safety standards like
ISO 26262 ISO 26262, titled "Road vehicles – Functional safety", is an international standard for functional safety of electrical and/or electronic systems that are installed in serial production road vehicles (excluding mopeds), defined by the Interna ...
, however, mandate coding guidelines that enforce low code complexity.


Artificial intelligence

Cyclomatic complexity may also be used for the evaluation of the semantic complexity of artificial intelligence programs.


Ultrametric topology

Cyclomatic complexity has proven useful in geographical and landscape-ecological analysis, after it was shown that it can be implemented on graphs of
ultrametric In mathematics, an ultrametric space is a metric space in which the triangle inequality is strengthened to d(x,z)\leq\max\left\. Sometimes the associated metric is also called a non-Archimedean metric or super-metric. Although some of the theorems ...
distances.


See also

* Programming complexity * Complexity trap *
Computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
*
Computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
*
Control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''im ...
*
Decision-to-decision path A decision-to-decision path, or DD-path, is a path of execution (usually through a flow graph representing a program, such as a flow chart) between two decisions. More recent versions of the concept also include the decisions themselves in their own ...
*
Design predicates Design predicates are a method invented by Thomas McCabe, to quantify the complexity of the integration of two units of software. Each of the four types of design predicates have an associated integration complexity rating. For pieces of code that ...
*
Essential complexity (numerical measure of "structuredness") Essential complexity is a numerical measure defined by Thomas J. McCabe, Sr., in his highly cited, 1976 paper better known for introducing cyclomatic complexity. McCabe defined essential complexity as the cyclomatic complexity of the reduced CFG ( ...
* Halstead complexity measures *
Software engineering Software engineering is a systematic engineering approach to software development. A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term '' ...
*
Software testing Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to apprecia ...
*
Static program analysis In computer science, static program analysis (or static analysis) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution. The term ...
*
Maintainability In engineering, maintainability is the ease with which a product can be maintained to: * correct defects or their cause, * Repair or replace faulty or worn-out components without having to replace still working parts, * prevent unexpected working ...


Notes


References

{{reflist, 40em


External links


Generating cyclomatic complexity metrics with Polyspace

The role of empiricism in improving the reliability of future software

McCabe's Cyclomatic Complexity and Why We Don't Use It
Software metrics