Definite assignment analysis
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, definite assignment analysis is a
data-flow analysis In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dataf ...
used by
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
s to conservatively ensure that a variable or location is always assigned before it is used.


Motivation

In C and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
programs, a source of particularly difficult-to-diagnose errors is the nondeterministic behavior that results from reading
uninitialized variable In computing, an uninitialized variable is a variable that is declared but is not set to a definite known value before it is used. It will have ''some'' value, but not a predictable one. As such, it is a programming error and a common source of b ...
s; this behavior can vary between platforms, builds, and even from run to run. There are two common ways to solve this problem. One is to ensure that all locations are written before they are read.
Rice's theorem In computability theory, Rice's theorem states that all non-trivial semantic properties of programs are undecidable. A semantic property is one about the program's behavior (for instance, does the program terminate for all inputs), unlike a synta ...
establishes that this problem cannot be solved in general for all programs; however, it is possible to create a conservative (imprecise) analysis that will accept only programs that satisfy this constraint, while rejecting some correct programs, and definite assignment analysis is such an analysis. The
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
and C# programming language specifications require that the compiler report a compile-time error if the analysis fails. Both languages require a specific form of the analysis that is spelled out in meticulous detail. In Java, this analysis was formalized by Stärk et al., and some correct programs are rejected and must be altered to introduce explicit unnecessary assignments. In C#, this analysis was formalized by Fruja, and is precise as well as sound, in the sense that all variables assigned along all control flow paths will be considered definitely assigned. The Cyclone language also requires programs to pass a definite assignment analysis, but only on variables with pointer types, to ease porting of C programs. The second way to solve the problem is to automatically initialize all locations to some fixed, predictable value at the point at which they are defined, but this introduces new assignments that may impede performance. In this case, definite assignment analysis enables a compiler optimization where redundant assignments — assignments followed only by other assignments with no possible intervening reads — can be eliminated. In this case, no programs are rejected, but programs for which the analysis fails to recognize definite assignment may contain redundant initialization. The Common Language Infrastructure relies on this approach.


Terminology

A variable or location can be said to be in one of three states at any given point in the program: * ''Definitely assigned'': The variable is known with certainty to be assigned. * ''Definitely unassigned'': The variable is known with certainty to be unassigned. * ''Unknown'': The variable may be assigned or unassigned; the analysis is not precise enough to determine which.


The analysis

The following is based on Fruja's formalization of the C# intraprocedural (single method) definite assignment analysis, which is responsible for ensuring that all local variables are assigned before they are used. It simultaneously does definite assignment analysis and constant propagation of boolean values. We define five static functions: We supply data-flow equations that define the values of these functions on various expressions and statements, in terms of the values of the functions on their syntactic subexpressions. Assume for the moment that there are no ''goto'', ''break'', ''continue'', ''return'', or
exception handling In computing and computer programming, exception handling is the process of responding to the occurrence of ''exceptions'' – anomalous or exceptional conditions requiring special processing – during the execution of a program. In general, an ...
statements. Following are a few examples of these equations: * Any expression or statement ''e'' that does not affect the set of variables definitely assigned: ''after''(''e'') = ''before''(''e'') * Let ''e'' be the assignment expression ''loc'' = ''v''. Then ''before''(''v'') = ''before''(''e''), and ''after''(''e'') = ''after''(''v'') U . * Let ''e'' be the expression true. Then ''true''(''e'') = ''before''(''e'') and ''false''(''e'') = ''vars''(''e''). In other words, if ''e'' evaluates to false, all variables are ( vacuously) definitely assigned, because ''e'' does not evaluate to false. * Since method arguments are evaluated left to right, before(''arg''''i'' + 1) = after(''arg''''i''). After a method completes, ''out'' parameters are definitely assigned. * Let ''s'' be the conditional statement if (''e'') ''s''1 else ''s''2. Then ''before''(''e'') = ''before''(''s''), ''before''(s1) = ''true''(''e''), ''before''(''s''2) = ''false''(''e''), and after(''s'') = after(''s''1) intersect after(''s''2). * Let ''s'' be the while loop statement while (''e'') ''s''1. Then before(''e'') = before(''s''), before(''s''1) = true(''e''), and after(''s'') = false(''e''). * And so on. At the beginning of the method, no local variables are definitely assigned. The verifier repeatedly iterates over the
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurr ...
and uses the data-flow equations to migrate information between the sets until a fixed point can be reached. Then, the verifier examines the ''before'' set of every expression that uses a local variable to ensure that it contains that variable. The algorithm is complicated by the introduction of control-flow jumps like ''goto'', ''break'', ''continue'', ''return'', and exception handling. Any statement that can be the target of one of these jumps must intersect its ''before'' set with the set of definitely assigned variables at the jump source. When these are introduced, the resulting data flow may have multiple fixed points, as in this example: int i = 1; L: goto L; Since the label L can be reached from two locations, the control-flow equation for goto dictates that ''before''(2) = ''after''(1) intersect ''before''(3). But ''before''(3) = ''before''(2), so ''before''(2) = ''after''(1) intersect ''before''(2). This has two fixed-points for ''before''(2), and the empty set. However, it can be shown that because of the monotonic form of the data-flow equations, there is a unique maximal fixed point (fixed point of largest size) that provides the most possible information about the definitely assigned variables. Such a maximal (or maximum) fixed point may be computed by standard techniques; see
data-flow analysis In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dataf ...
. An additional issue is that a control-flow jump may render certain control flows infeasible; for example, in this code fragment the variable ''i'' is definitely assigned before it is used: int i; if (j < 0) return; else i = j; print(i); The data-flow equation for ''if'' says that ''after''(2) = after(return) intersect after(''i'' = ''j''). To make this work out correctly, we define ''after''(''e'') = ''vars''(''e'') for all control-flow jumps; this is vacuously valid in the same sense that the equation ''false''(true) = ''vars''(''e'') is valid, because it is not possible for control to reach a point immediately after a control-flow jump.


References

{{DEFAULTSORT:Definite Assignment Analysis Data-flow analysis