program analysis In computer science, program analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program op ...

, shape analysis is a

static code analysis In computer science, static program analysis (or static analysis) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution. The term i ...

technique that discovers and verifies properties of linked, dynamically allocated data structures in (usually imperative) computer programs. It is typically used at compile time to find software bugs or to verify high-level correctness properties of programs. In

Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...

programs, it can be used to ensure that a sort method correctly sorts a list. For C programs, it might look for places where a block of memory is not properly freed.

Applications

Shape analysis has been applied to a variety of problems: * Memory safety: finding

memory leak In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that Computer memory, memory which is no longer needed is not released. A memory leak may also happe ...

s, dereferences of

dangling pointer Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are ...

s, and discovering cases where a block of memory is freed more than once. * Finding array out-of-bounds errors * Checking type-state properties (for example, ensuring that a file is open() before it is read()) * Ensuring that a method to reverse a

linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...

does not introduce cycles into the list * Verifying that a sort method returns a result that is in sorted order

Example

Shape analysis is a form of

pointer analysis In computer science, pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables, or storage locations. It is often a component of more complex anal ...

, although it is more precise than typical pointer analysis. Pointer analysis attempts to determine the set of objects to which a pointer can point (called the points-to set of the pointer). Unfortunately, these analysis are necessarily approximate (since a perfectly precise static analysis could solve the

halting problem In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a g ...

). Shape analysis can determine smaller (more precise) points-to sets. Consider the following simple C++ program. Item *items 0 for (int i = 0; i < 10; ++i) process_items(items); // line for (int i = 0; i < 10; ++i) This program builds an array of objects, processes them in some arbitrary way, and then deletes them. Assuming that the process_items function is free of errors, it is clear that the program is safe: it never references freed memory, and it deletes all the objects that it has constructed. Unfortunately, most pointer analyses have difficulty analyzing this program precisely. In order to determine points-to sets, a pointer analysis must be able to ''name'' a program's objects. In general, programs can allocate an unbounded number of objects; but in order to terminate, a pointer analysis can only use a finite set of names. A typical approximation is to give all the objects allocated on a given line of the program the same name. In the example above, all the objects constructed at line would have the same name. Therefore, when the delete statement is analyzed for the first time, the analysis determines that one of the objects named is being deleted. The second time the statement is analyzed (since it is in a loop) the analysis warns of a possible error: since it is unable to distinguish the objects in the array, it may be that the second delete is deleting the same object as the first delete. This warning is spurious, and the goal of shape analysis is to avoid such warnings.

Summarization and materialization

Shape analysis overcomes the problems of pointer analysis by using a more flexible naming system for objects. Rather than giving an object the same name throughout a program, objects can change names depending on the program's actions. Sometimes, several distinct objects with different names may be ''summarized,'' or merged, so that they have the same name. Then, when a summarized object is about to be used by the program, it can be ''materialized''—that is, the summarized object is split into two objects with distinct names, one representing a single object and the other representing the remaining summarized objects. The basic heuristic of shape analysis is that objects that are being used by the program are represented using unique materialized objects, while objects not in use are summarized. The array of objects in the example above is summarized in separate ways at lines and At line the array has been only partly constructed. The array elements 0..i-1 contain constructed objects. The array element i is about to be constructed, and the following elements are uninitialized. Shape analysis can approximate this situation using a summary for the first set of elements, a materialized memory location for element i, and a summary for the remaining uninitialized locations, as follows: After the loop terminates, at line there is no need to keep anything materialized. The shape analysis determines at this point that all the array elements have been initialized: At line however, the array element i is in use again. Therefore, the analysis splits the array into three segments as in line This time, though, the first segment before i has been deleted, and the remaining elements are still valid (assuming the delete statement hasn't executed yet). Notice that in this case, the analysis recognizes that the pointer at index i has not been deleted yet. Therefore, it doesn't warn of a double deletion.

References

Bibliography

* * * {{Compiler optimizations Static program analysis Articles with example C++ code

Applications

Example

Summarization and materialization

See also

References

Bibliography