HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a control-flow graph (CFG) is a representation, using
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
notation, of all paths that might be traversed through a
program Program, programme, programmer, or programming may refer to: Business and management * Program management, the process of managing several related projects * Time management * Program, a part of planning Arts and entertainment Audio * Progra ...
during its
execution Capital punishment, also known as the death penalty, is the State (polity), state-sanctioned practice of deliberately killing a person as a punishment for an actual or supposed crime, usually following an authorized, rule-governed process to ...
. The control-flow graph was discovered by
Frances E. Allen Frances Elizabeth Allen (August 4, 1932August 4, 2020) was an American computer scientist and pioneer in the field of optimizing compilers. Allen was the first woman to become an IBM Fellow, and in 2006 became the first woman to win the Turing ...
, who noted that Reese T. Prosser used boolean connectivity matrices for flow analysis before. The CFG is essential to many
compiler optimization In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
s and static-analysis tools.


Definition

In a control-flow graph each
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...
in the
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
represents a
basic block In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually deco ...
, i.e. a straight-line piece of code without any jumps or jump targets; jump targets start a block, and jumps end a block. Directed
edge Edge or EDGE may refer to: Technology Computing * Edge computing, a network load-balancing system * Edge device, an entry point to a computer network * Adobe Edge, a graphical development application * Microsoft Edge, a web browser developed by ...
s are used to represent jumps in the
control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''imper ...
. There are, in most presentations, two specially designated blocks: the ''entry block'', through which control enters into the flow graph, and the ''exit block'', through which all control flow leaves. Because of its construction procedure, in a CFG, every edge A→B has the property that: :
outdegree In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
(A) > 1 or indegree(B) > 1 (or both). The CFG can thus be obtained, at least conceptually, by starting from the program's (full) flow graph—i.e. the graph in which every node represents an individual instruction—and performing an
edge contraction In graph theory, an edge contraction is an operation that removes an edge from a graph while simultaneously merging the two vertices that it previously joined. Edge contraction is a fundamental operation in the theory of graph minors. Vertex ide ...
for every edge that falsifies the predicate above, i.e. contracting every edge whose source has a single exit and whose destination has a single entry. This contraction-based algorithm is of no practical importance, except as a visualization aid for understanding the CFG construction, because the CFG can be more efficiently constructed directly from the program by scanning it for basic blocks.


Example

Consider the following fragment of code:
0: (A) t0 = read_num
1: (A) if t0 mod 2 

0 2: (B) print t0 + " is even." 3: (B) goto 5 4: (C) print t0 + " is odd." 5: (D) end program
In the above, we have 4 basic blocks: A from 0 to 1, B from 2 to 3, C at 4 and D at 5. In particular, in this case, A is the "entry block", D the "exit block" and lines 4 and 5 are jump targets. A graph for this fragment has edges from A to B, A to C, B to D and C to D.


Reachability

Reachability In graph theory, reachability refers to the ability to get from one Vertex (graph theory), vertex to another within a graph. A vertex s can reach a vertex t (and t is reachable from s) if there exists a sequence of Glossary of graph theory#Basics, ...
is a graph property useful in optimization. If a subgraph is not connected from the subgraph containing the entry block, that subgraph is unreachable during any execution, and so is
unreachable code In computer programming, unreachable code is part of the source code of a program which can never be executed because there exists no control flow path to the code from the rest of the program. Unreachable code is sometimes also called ''dead code' ...
; under normal conditions it can be safely removed. If the exit block is unreachable from the entry block, an
infinite loop In computer programming, an infinite loop (or endless loop) is a sequence of instructions that, as written, will continue endlessly, unless an external intervention occurs ("pull the plug"). It may be intentional. Overview This differs from: * ...
may exist. Not all infinite loops are detectable, see
Halting problem In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a g ...
. A halting order may also exist there. Unreachable code and infinite loops are possible even if the programmer does not explicitly code them: optimizations like constant propagation and constant folding followed by jump threading can collapse multiple basic blocks into one, cause edges to be removed from a CFG, etc., thus possibly disconnecting parts of the graph.


Domination relationship

A block M '' dominates'' a block N if every path from the entry that reaches block N has to pass through block M. The entry block dominates all blocks. In the reverse direction, block M ''postdominates'' block N if every path from N to the exit has to pass through block M. The exit block postdominates all blocks. It is said that a block M ''immediately dominates'' block N if M dominates N, and there is no intervening block P such that M dominates P and P dominates N. In other words, M is the last dominator on all paths from entry to N. Each block has a unique immediate dominator. Similarly, there is a notion of ''immediate postdominator'', analogous to ''immediate dominator''. The ''dominator tree'' is an ancillary data structure depicting the dominator relationships. There is an arc from Block M to Block N if M is an immediate dominator of N. This graph is a tree, since each block has a unique immediate dominator. This tree is rooted at the entry block. The dominator tree can be calculated efficiently using
Lengauer–Tarjan's algorithm In computer science, a node of a control-flow graph dominates a node if every path from the ''entry node'' to must go through . Notationally, this is written as (or sometimes ). By definition, every node dominates itself. There are a numb ...
. A ''postdominator tree'' is analogous to the ''dominator tree''. This tree is rooted at the exit block.


Special edges

A ''back edge'' is an edge that points to a block that has already been met during a depth-first ( DFS) traversal of the graph. Back edges are typical of loops. A ''critical edge'' is an edge which is neither the only edge leaving its source block, nor the only edge entering its destination block. These edges must be ''split'': a new block must be created in the middle of the edge, in order to insert computations on the edge without affecting any other edges. An ''abnormal edge'' is an edge whose destination is unknown.
Exception handling In computing and computer programming, exception handling is the process of responding to the occurrence of ''exceptions'' – anomalous or exceptional conditions requiring special processing – during the execution of a program. In general, an ...
constructs can produce them. These edges tend to inhibit optimization. An ''impossible edge'' (also known as a ''fake edge'') is an edge which has been added to the graph solely to preserve the property that the exit block postdominates all blocks. It cannot ever be traversed.


Loop management

A ''loop header'' (sometimes called the ''entry point'' of the loop) is a dominator that is the target of a loop-forming back edge. The loop header dominates all blocks in the loop body. A block may be a loop header for more than one loop. A loop may have multiple entry points, in which case it has no "loop header". Suppose block M is a dominator with several incoming edges, some of them being back edges (so M is a loop header). It is advantageous to several optimization passes to break M up into two blocks Mpre and Mloop. The contents of M and back edges are moved to Mloop, the rest of the edges are moved to point into Mpre, and a new edge from Mpre to Mloop is inserted (so that Mpre is the immediate dominator of Mloop). In the beginning, Mpre would be empty, but passes like
loop-invariant code motion In computer programming, loop-invariant code consists of statements or expressions (in an imperative programming language) that can be moved outside the body of a loop without affecting the semantics of the program. Loop-invariant code motion ( ...
could populate it. Mpre is called the ''loop pre-header'', and Mloop would be the loop header.


Reducibility

A reducible CFG is one with edges that can be partitioned into two disjoint sets: forward edges, and back edges, such that: * Forward edges form a
directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ve ...
with all nodes reachable from the entry node. * For all back edges (A, B), node B dominates node A.
Structured programming Structured programming is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by making extensive use of the structured control flow constructs of selection ( if/then/else) and repetition ( ...
languages are often designed such that all CFGs they produce are reducible, and common structured programming statements such as IF, FOR, WHILE, BREAK, and CONTINUE produce reducible graphs. To produce irreducible graphs, statements such as
GOTO GoTo (goto, GOTO, GO TO or other case combinations, depending on the programming language) is a statement found in many computer programming languages. It performs a one-way transfer of control to another line of code; in contrast a function ca ...
are needed. Irreducible graphs may also be produced by some compiler optimizations.


Loop connectedness

The loop connectedness of a CFG is defined with respect to a given depth-first search tree (DFST) of the CFG. This DFST should be rooted at the start node and cover every node of the CFG. Edges in the CFG which run from a node to one of its DFST ancestors (including itself) are called back edges. The loop connectedness is the largest number of back edges found in any cycle-free path of the CFG. In a reducible CFG, the loop connectedness is independent of the DFST chosen. Loop connectedness has been used to reason about the time complexity of
data-flow analysis In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dataf ...
.


Inter-procedural Control Flow Graph

While Control Flow Graphs represent the control flow of a single procedure, Inter-procedural Control Flow Graphs represent the control flow of whole programs.


See also

*
Abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
* Flowchart *
Control-flow diagram A control-flow diagram (CFD) is a diagram to describe the control flow of a business process, process or review. Control-flow diagrams were developed in the 1950s, and are widely used in multiple engineering disciplines. They are one of the cla ...
*
Control-flow analysis In computer science, control-flow analysis (CFA) is a static-code-analysis technique for determining the control flow of a program. The control flow is expressed as a control-flow graph (CFG). For both functional programming languages and object- ...
*
Data-flow analysis In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dataf ...
*
Interval (graph theory) This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges. Symbols A B ...
* Program dependence graph *
Cyclomatic complexity Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976. ...
* Static single assignment *
Compiler construction In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
*
Intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...


References


External links

{{Commons category
The Machine-SUIF Control Flow Graph Library
*Paper

by Zdeněk Dvořák ''et al.'' ;Examples

Compiler construction * Application-specific graphs Modeling languages