In computer science, static program analysis (or static analysis) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution. The term is usually applied to analysis performed by an automated tool, with human analysis typically being called "program understanding",

program comprehension Program comprehension (also program understanding or ourcecode comprehension) is a domain of computer science concerned with the ways software engineers maintain existing source code. The cognitive and other processes involved are identified and s ...

, or

code review Code review (sometimes referred to as peer review) is a software quality assurance activity in which one or several people check a program mainly by viewing and reading parts of its source code, and they do so after implementation or as an interru ...

. In the last of these, software inspection and software walkthroughs are also used. In most cases the analysis is performed on some version of a program's source code, and, in other cases, on some form of its object code.

Rationale

The sophistication of the analysis performed by tools varies from those that only consider the behaviour of individual statements and declarations, to those that include the complete source code of a program in their analysis. The uses of the information obtained from the analysis vary from highlighting possible coding errors (e.g., the

lint Lint may refer to: * Fibrous coat of thick hairs covering the seeds of the cotton plant * Lint (material), an accumulation of fluffy fibers that collect on fabric Places * Lint, Belgium, a municipality located in Antwerp, Belgium * Linț, a vill ...

tool) to formal methods that mathematically prove properties about a given program (e.g., its behaviour matches that of its specification).

Software metric In software engineering and development, a software metric is a standard of measure of a degree to which a software system or process possesses some property. Even if a metric is not a measurement (metrics are functions, while measurements are t ...

s and

reverse engineering Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accompli ...

can be described as forms of static analysis. Deriving software metrics and static analysis are increasingly deployed together, especially in creation of embedded systems, by defining so-called ''software quality objectives''. A growing commercial use of static analysis is in the verification of properties of software used in safety-critical computer systems and locating potentially

vulnerable Vulnerable may refer to: General * Vulnerability * Vulnerability (computing) * Vulnerable adult * Vulnerable species Music Albums * ''Vulnerable'' (Marvin Gaye album), 1997 * ''Vulnerable'' (Tricky album), 2003 * ''Vulnerable'' (The Used album) ...

code. For example, the following industries have identified the use of static code analysis as a means of improving the quality of increasingly sophisticated and complex software: # Medical software: The US Food and Drug Administration (FDA) has identified the use of static analysis for medical devices. # Nuclear software: In the UK the Office for Nuclear Regulation (ONR) recommends the use of static analysis on reactor protection systems. # Aviation software (in combination with

dynamic analysis Dynamic scoring is a forecasting technique for government revenues, expenditures, and budget deficits that incorporates predictions about the behavior of people and organizations based on changes in fiscal policy, usually tax rates. Dynamic scoring ...

) #Automotive & Machines (Functional safety features form an integral part of each automotive product development phase, ISO 26262, Sec 8.) A study in 2012 by VDC Research reported that 28.7% of the embedded software engineers surveyed currently use static analysis tools and 39.7% expect to use them within 2 years. A study from 2010 found that 60% of the interviewed developers in European research projects made at least use of their basic IDE built-in static analyzers. However, only about 10% employed an additional other (and perhaps more advanced) analysis tool. In the application security industry the name

Static application security testing Static application security testing (SAST) is used to secure software by reviewing the source code of the software to identify sources of vulnerabilities. Although the process of statically analyzing the source code has existed as long as computers ...

(SAST) is also used. SAST is an important part of Security Development Lifecycles (SDLs) such as the SDL defined by Microsoft and a common practice in software companies.

Tool types

The OMG ( Object Management Group) published a study regarding the types of software analysis required for

software quality In the context of software engineering, software quality refers to two related but distinct notions: * Software functional quality reflects how well it complies with or conforms to a given design, based on functional requirements or specification ...

measurement and assessment. This document on "How to Deliver Resilient, Secure, Efficient, and Easily Changed IT Systems in Line with CISQ Recommendations" describes three levels of software analysis. ; Unit Level: Analysis that takes place within a specific program or subroutine, without connecting to the context of that program. ; Technology Level: Analysis that takes into account interactions between unit programs to get a more holistic and semantic view of the overall program in order to find issues and avoid obvious false positives. For instance, it is possible to statically analyze the Android technology stack to find permission errors. ; System Level: Analysis that takes into account the interactions between unit programs, but without being limited to one specific technology or programming language. A further level of software analysis can be defined. ; Mission/Business Level: Analysis that takes into account the business/mission layer terms, rules and processes that are implemented within the software system for its operation as part of enterprise or program/mission layer activities. These elements are implemented without being limited to one specific technology or programming language and in many cases are distributed across multiple languages, but are statically extracted and analyzed for system understanding for mission assurance.

Formal methods

Formal methods is the term applied to the analysis of software (and

computer hardware Computer hardware includes the physical parts of a computer, such as the computer case, case, central processing unit (CPU), Random-access memory, random access memory (RAM), Computer monitor, monitor, Computer mouse, mouse, Computer keyboard, ...

) whose results are obtained purely through the use of rigorous mathematical methods. The mathematical techniques used include

denotational semantics In computer science, denotational semantics (initially known as mathematical semantics or Scott–Strachey semantics) is an approach of formalizing the meanings of programming languages by constructing mathematical objects (called ''denotations'' ...

axiomatic semantics Axiomatic semantics is an approach based on mathematical logic for proving the correctness of computer programs. It is closely related to Hoare logic. Axiomatic semantics define the meaning of a command in a program by describing its effect on ass ...

, operational semantics, and

abstract interpretation In computer science, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices. It can be viewed as a partial execution of a computer prog ...

. By a straightforward reduction to the halting problem, it is possible to prove that (for any Turing complete language), finding all possible run-time errors in an arbitrary program (or more generally any kind of violation of a specification on the final result of a program) is undecidable: there is no mechanical method that can always answer truthfully whether an arbitrary program may or may not exhibit runtime errors. This result dates from the works of Church, Gödel and Turing in the 1930s (see: Halting problem and Rice's theorem). As with many undecidable questions, one can still attempt to give useful approximate solutions. Some of the implementation techniques of formal static analysis include: *

Abstract interpretation In computer science, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices. It can be viewed as a partial execution of a computer prog ...

, to model the effect that every statement has on the state of an abstract machine (i.e., it 'executes' the software based on the mathematical properties of each statement and declaration). This abstract machine over-approximates the behaviours of the system: the abstract system is thus made simpler to analyze, at the expense of ''incompleteness'' (not every property true of the original system is true of the abstract system). If properly done, though, abstract interpretation is ''sound'' (every property true of the abstract system can be mapped to a true property of the original system). *

Data-flow analysis In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dataf ...

, a lattice-based technique for gathering information about the possible set of values; *

Hoare logic Hoare logic (also known as Floyd–Hoare logic or Hoare rules) is a formal system with a set of logical rules for reasoning rigorously about the correctness of computer programs. It was proposed in 1969 by the British computer scientist and log ...

, a formal system with a set of logical rules for reasoning rigorously about the correctness of computer programs. There is tool support for some programming languages (e.g., the

SPARK programming language SPARK is a formally defined computer programming language based on the Ada programming language, intended for the development of high integrity software used in systems where predictable and highly reliable operation is essential. It facilit ...

(a subset of

Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, Tur ...

) and the

Java Modeling Language The Java Modeling Language (JML) is a specification language for Java programs, using Hoare style pre- and postconditions and invariants, that follows the design by contract paradigm. Specifications are written as Java annotation comments to th ...

—JML—using

ESC/Java ESC/Java (and more recently ESC/Java2), the "Extended Static Checker for Java," is a programming tool that attempts to find common run-time errors in Java programs at compile time. The underlying approach used in ESC/Java is referred to as extend ...

and ESC/Java2, Frama-C WP (

weakest precondition Weakness is a symptom of a number of different conditions. The causes are many and can be divided into conditions that have true or perceived muscle weakness. True muscle weakness is a primary symptom of a variety of skeletal muscle diseases, i ...

) plugin for the C language extended with ACSL ( ANSI/ISO C Specification Language) ). *

Model checking In computer science, model checking or property checking is a method for checking whether a finite-state model of a system meets a given specification (also known as correctness). This is typically associated with hardware or software systems ...

, considers systems that have finite state or may be reduced to finite state by abstraction; *

Symbolic execution In computer science, symbolic execution (also symbolic evaluation or symbex) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inp ...

, as used to derive mathematical expressions representing the value of mutated variables at particular points in the code.

Data-driven static analysis

Data-driven static analysis uses large amounts of code to infer coding rules. For instance, one can use all Java open-source packages on GitHub to learn a good analysis strategy. The rule inference can use machine learning techniques. For instance, it has been shown that when one deviates too much in the way one uses an object-oriented API, it is likely to be a bug. It is also possible to learn from a large amount of past fixes and warnings.

Remediation

Static analyzers produce warnings. For certain types of warnings, it is possible to design and implement automated remediation techniques. For example, Logozzo and Ball have proposed automated remediations for C# ''cccheck'' and Etemadi and colleagues use program transformation to automatically fix

SonarQube SonarQube (formerly Sonar) is an open-source platform developed by SonarSource for continuous inspection of code quality to perform automatic reviews with static analysis of code to detect bugs and code smells on 29 programming languages. S ...

's warnings.

Rationale

Tool types

Formal methods

Data-driven static analysis

Remediation

See also

References

Further reading