Abstract Interpretation
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on
monotonic function In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order ...
s over
ordered set In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a binary r ...
s, especially
lattice Lattice may refer to: Arts and design * Latticework, an ornamental criss-crossed framework, an arrangement of crossing laths or other thin strips of material * Lattice (music), an organized grid model of pitch ratios * Lattice (pastry), an orna ...
s. It can be viewed as a partial
execution Capital punishment, also known as the death penalty, is the state-sanctioned practice of deliberately killing a person as a punishment for an actual or supposed crime, usually following an authorized, rule-governed process to conclude that ...
of a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
which gains information about its semantics (e.g.,
control-flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''imp ...
, data-flow) without performing all the calculations. Its main concrete application is formal
static analysis Static analysis, static projection, or static scoring is a simplified analysis wherein the effect of an immediate change to a system is calculated without regard to the longer-term response of the system to that change. If the short-term effect i ...
, the automatic extraction of information about the possible executions of computer programs; such analyses have two main usages: * inside
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
s, to analyse programs to decide whether certain
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
s or
transformation Transformation may refer to: Science and mathematics In biology and medicine * Metamorphosis, the biological process of changing physical form after birth or hatching * Malignant transformation, the process of cells becoming cancerous * Tran ...
s are applicable; * for debugging or even the certification of programs against classes of bugs. Abstract interpretation was formalized by the French computer scientist working couple
Patrick Cousot Patrick Cousot (born 3 December 1948) is a French computer scientist, currently Silver Professor of Computer Science at the Courant Institute of Mathematical Sciences, New York University, USA. Before he was Professor at the École Normale Supéri ...
and
Radhia Cousot Radhia Cousot (6 August 1947 – 1 May 2014) was a Tunisian French computer scientist known for inventing abstract interpretation. Studies Radhia Cousot was born on 6 August 1947, in Sakiet Sidi Youssef in Tunisia, where she survived the :fr:B ...
in the late 1970s.


Intuition

This section illustrates abstract interpretation by means of real-world, non-computing examples. Consider the people in a conference room. Assume a unique identifier for each person in the room, like a social security number in the United States. To prove that someone is not present, all one needs to do is see if their social security number is not on the list. Since two different people cannot have the same number, it is possible to prove or disprove the presence of a participant simply by looking up his or her number. However it is possible that only the names of attendees were registered. If the name of a person is not found in the list, we may safely conclude that that person was not present; but if it is, we cannot conclude definitely without further inquiries, due to the possibility of
homonym In linguistics, homonyms are words which are homographs (words that share the same spelling, regardless of pronunciation), or homophones ( equivocal words, that share the same pronunciation, regardless of spelling), or both. Using this definiti ...
s (for example, two people named John Smith). Note that this imprecise information will still be adequate for most purposes, because homonyms are rare in practice. However, in all rigor, we cannot say for sure that somebody was present in the room; all we can say is that he or she was ''possibly'' here. If the person we are looking up is a criminal, we will issue an ''alarm''; but there is of course the possibility of issuing a ''false alarm''. Similar phenomena will occur in the analysis of programs. If we are only interested in some specific information, say, "was there a person of age ''n'' in the room?", keeping a list of all names and dates of births is unnecessary. We may safely and without loss of precision restrict ourselves to keeping a list of the participants' ages. If this is already too much to handle, we might keep only the age of the youngest, ''m'' and oldest person, ''M''. If the question is about an age strictly lower than ''m'' or strictly higher than ''M'', then we may safely respond that no such participant was present. Otherwise, we may only be able to say that we do not know. In the case of computing, concrete, precise information is in general not computable within finite time and memory (see
Rice's theorem In computability theory, Rice's theorem states that all non-trivial semantic properties of programs are undecidable. A semantic property is one about the program's behavior (for instance, does the program terminate for all inputs), unlike a synta ...
and the
halting problem In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a ...
). Abstraction is used to allow for generalized answers to questions (for example, answering "maybe" to a yes/no question, meaning "yes or no", when we (an algorithm of abstract interpretation) cannot compute the precise answer with certainty); this simplifies the problems, making them amenable to automatic solutions. One crucial requirement is to add enough vagueness so as to make problems manageable while still retaining enough precision for answering the important questions (such as "might the program crash?").


Abstract interpretation of computer programs

Given a programming or specification language, abstract interpretation consists of giving several semantics linked by relations of abstraction. A semantics is a mathematical characterization of a possible behavior of the program. The most precise semantics, describing very closely the actual execution of the program, are called the ''concrete semantics''. For instance, the concrete semantics of an
imperative programming In computer science, imperative programming is a programming paradigm of software that uses statements that change a program's state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program ...
language may associate to each program the set of execution traces it may produce – an execution trace being a sequence of possible consecutive states of the execution of the program; a state typically consists of the value of the program counter and the memory locations (globals, stack and heap). More abstract semantics are then derived; for instance, one may consider only the set of reachable states in the executions (which amounts to considering the last states in finite traces). The goal of static analysis is to derive a computable semantic interpretation at some point. For instance, one may choose to represent the state of a program manipulating integer variables by forgetting the actual values of the variables and only keeping their signs (+, − or 0). For some elementary operations, such as multiplication, such an abstraction does not lose any precision: to get the sign of a product, it is sufficient to know the sign of the operands. For some other operations, the abstraction may lose precision: for instance, it is impossible to know the sign of a sum whose operands are respectively positive and negative. Sometimes a loss of precision is necessary to make the semantics decidable (see
Rice's theorem In computability theory, Rice's theorem states that all non-trivial semantic properties of programs are undecidable. A semantic property is one about the program's behavior (for instance, does the program terminate for all inputs), unlike a synta ...
and the
halting problem In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a ...
). In general, there is a compromise to be made between the precision of the analysis and its decidability (
computability Computability is the ability to solve a problem in an effective manner. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science. The computability of a problem is clo ...
), or tractability (
computational cost In computational complexity theory, a computational resource is a resource used by some computational models in the solution of computational problems. The simplest computational resources are computation time, the number of steps necessary to s ...
). In practice the abstractions that are defined are tailored to both the program properties one desires to analyze, and to the set of target programs. The first large scale automated analysis of computer programs with abstract interpretation was motivated by the accident that resulted in the destruction of the first flight of the Ariane 5 rocket in 1996.


Formalization

Let ''L'' be an ordered set, called a ''concrete set'' and let ''L''′ be another ordered set, called an ''abstract set''. These two sets are related to each other by defining
total function In mathematics, a partial function from a set to a set is a function from a subset of (possibly itself) to . The subset , that is, the domain of viewed as a function, is called the domain of definition of . If equals , that is, if is de ...
s that map elements from one to the other. A function α is called an ''abstraction function'' if it maps an element ''x'' in the concrete set ''L'' to an element α(''x'') in the abstract set ''L''′. That is, element α(''x'') in ''L''′ is the ''abstraction'' of ''x'' in ''L''. A function γ is called a ''concretization function'' if it maps an element ''x''′ in the abstract set ''L''′ to an element γ(''x''′) in the concrete set ''L''. That is, element γ(''x''′) in ''L'' is a ''concretization'' of ''x''′ in ''L''′. Let ''L''1, ''L''2, ''L''′1 and ''L''′2 be ordered sets. The concrete semantics ''f'' is a monotonic function from ''L''1 to ''L''2. A function ''f''′ from ''L''′1 to ''L''′2 is said to be a ''valid abstraction'' of ''f'' if for all ''x''′ in ''L''′1, (''f'' ∘ γ)(''x''′) ≤ (γ ∘ ''f''′)(''x''′). Program semantics are generally described using fixed points in the presence of loops or recursive procedures. Let us suppose that ''L'' is a complete lattice and let ''f'' be a monotonic function from ''L'' into ''L''. Then, any ''x''′ such that ''f''(''x''′) ≤ ''x''′ is an abstraction of the least fixed-point of ''f'', which exists, according to the Knaster–Tarski theorem. The difficulty is now to obtain such an ''x''′. If ''L''′ is of finite height, or at least verifies the
ascending chain condition In mathematics, the ascending chain condition (ACC) and descending chain condition (DCC) are finiteness properties satisfied by some algebraic structures, most importantly ideals in certain commutative rings.Jacobson (2009), p. 142 and 147 These con ...
(all ascending sequences are ultimately stationary), then such an ''x''′ may be obtained as the stationary limit of the ascending sequence ''x''′''n'' defined by induction as follows: ''x''′0=⊥ (the least element of ''L''′) and ''x''′''n''+1=''f''′(''x''′''n''). In other cases, it is still possible to obtain such an ''x''′ through a widening operator ∇: for all ''x'' and ''y'', ''x'' ∇ ''y'' should be greater or equal than both ''x'' and ''y'', and for any sequence ''y''′''n'', the sequence defined by ''x''′0=⊥ and ''x''′''n''+1=''x''′''n'' ∇ ''y''′''n'' is ultimately stationary. We can then take ''y''′''n''=''f''′(''x''′''n''). In some cases, it is possible to define abstractions using Galois connections (α, γ) where α is from ''L'' to ''L''′ and γ is from ''L''′ to ''L''. This supposes the existence of best abstractions, which is not necessarily the case. For instance, if we abstract sets of couples (''x'', ''y'') of
real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every ...
s by enclosing convex
polyhedra In geometry, a polyhedron (plural polyhedra or polyhedrons; ) is a three-dimensional shape with flat polygonal faces, straight edges and sharp corners or vertices. A convex polyhedron is the convex hull of finitely many points, not all on ...
, there is no optimal abstraction to the disc defined by ''x''2+''y''2 ≤ 1.


Examples of abstract domains


Numerical abstract domains

One can assign to each variable ''x'' available at a given program point an interval 'L''''x'', ''H''''x'' A state assigning the value ''v''(''x'') to variable ''x'' will be a concretization of these intervals if for all ''x'', ''v''(''x'') is in 'L''''x'', ''H''''x'' From the intervals 'L''''x'', ''H''''x''and 'L''''y'', ''H''''y''for variables ''x'' and ''y'', one can easily obtain intervals for ''x''+''y'' ( 'L''''x''+''L''''y'', ''H''''x''+''H''''y'' and for ''x''−''y'' ( 'L''''x''−''H''''y'', ''H''''x''−''L''''y''; note that these are ''exact'' abstractions, since the set of possible outcomes for, say, ''x''+''y'', is precisely the interval ( 'L''''x''+''L''''y'', ''H''''x''+''H''''y''. More complex formulas can be derived for multiplication, division, etc., yielding so-called
interval arithmetic Interval arithmetic (also known as interval mathematics, interval analysis, or interval computation) is a mathematical technique used to put bounds on rounding errors and measurement errors in mathematical computation. Numerical methods usin ...
s. Let us now consider the following very simple program:
y = x;
z = x - y;
With reasonable arithmetic types, the result for should be zero. But if we do interval arithmetic starting from in
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
one gets in ˆ’1, +1 While each of the operations taken individually was exactly abstracted, their composition isn't. The problem is evident: we did not keep track of the equality relationship between and ; actually, this domain of intervals does not take into account any relationships between variables, and is thus a ''non-relational domain''. Non-relational domains tend to be fast and simple to implement, but imprecise. Some examples of ''relational'' numerical abstract domains are: * congruence relations on integers * convex
polyhedra In geometry, a polyhedron (plural polyhedra or polyhedrons; ) is a three-dimensional shape with flat polygonal faces, straight edges and sharp corners or vertices. A convex polyhedron is the convex hull of finitely many points, not all on ...
(cf. left picture) – with some high computational costs * difference-bound matrices * "octagons" * linear equalities and combinations thereof (such as the reduced product, cf. right picture). When one chooses an abstract domain, one typically has to strike a balance between keeping fine-grained relationships, and high computational costs.


Machine word abstract domains

While high-level languages such as
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
or
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
use unbounded integers by default, lower-level programming languages such as C or assembly language typically operate on finitely-sized machine words, which are more suitably modeled using the integers modulo 2^n (where ''n'' is the bit width of a machine word). There are several abstract domains suitable for various analyses of such variables. The ''bitfield domain'' treats each bit in a machine word separately, i.e., a word of width ''n'' is treated as an array of ''n'' abstract values. The abstract values are taken from the set \, and the abstraction and concretization functions are given by: :\gamma(0) = \ :\gamma(1) = \ :\gamma(\bot) = \ :\alpha(\) = 0 :\alpha(\) = 1 :\alpha(\) = \bot :\alpha(\) = \bot Bitwise operations on these abstract values are identical with the corresponding logical operations in some three-valued logics:


See also

* Model checking * Symbolic simulation * Symbolic execution *
List of tools for static code analysis This is a list of notable tools for static program analysis (program analysis is a synonym for code analysis). Static code analysis tools Languages Ada * * * * * * * * * * * C, C++ * * * * * * * * * * * * ...
— contains both abstract-interpretation based (sound) and ad hoc (unsound) tools *
Static program analysis In computer science, static program analysis (or static analysis) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution. The term ...
— overview of analysis methods, including, but not restricted to, abstract interpretation *
Interpreter (computing) In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpr ...


References

{{Reflist, 40em


External links


A web-page on Abstract Interpretation maintained by Patrick Cousot

Roberto Bagnara's paper showing how it is possible to implement an abstract-interpretation based static analyzer for a C-like programming language

The Static Analysis Symposiaproceedings
appearing in the Springer
LNCS ''Lecture Notes in Computer Science'' is a series of computer science books published by Springer Science+Business Media since 1973. Overview The series contains proceedings, post-proceedings, monographs, and Festschrifts. In addition, tutorials, ...
series * Conference on Verification, Model-Checking, and Abstract Interpretation (VMCAI), affiliated at the POPL conference
proceedings
appearing in the Springer
LNCS ''Lecture Notes in Computer Science'' is a series of computer science books published by Springer Science+Business Media since 1973. Overview The series contains proceedings, post-proceedings, monographs, and Festschrifts. In addition, tutorials, ...
series ; Lecture notes
Abstract Interpretation
Patrick Cousot. MIT.


Møller and Schwarzbach's lecture notes on Static Program Analysis

Agostino Cortesi's lecture notes on Program Analysis and Verification

Slides by Grégoire Sutre going through every step of Abstract Interpretation with many examples - also introducing Galois connections
Abstract interpretation