Soot (software)
   HOME

TheInfoList



OR:

In
static program analysis In computer science, static program analysis (or static analysis) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution. The term ...
, Soot is a bytecode manipulation and optimization framework consisting of
intermediate language An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A " ...
s for
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
. It has been developed by the Sable Research Group at
McGill University McGill University (french: link=no, Université McGill) is an English-language public research university located in Montreal, Quebec Montreal ( ; officially Montréal, ) is the second-most populous city in Canada and most populous ...
. Soot is currently maintained by the Secure Software Engineering Group at
Paderborn University Paderborn University (german: Universität Paderborn) is one of the fourteen public research universities in the state of North Rhine-Westphalia in Germany. It was founded in 1972 and 20,308 students were enrolled at the university in the winter ...
. Soot provides four
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
s for use through its
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
for other analysis programs to access and build upon: * Baf: a near
bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
representation. * Jimple: a simplified version of Java source code that has a maximum of three components per statement. * Shimple: an SSA variation of Jimple (similar to
GIMPLE The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
). * Grimp: an aggregated version of Jimple suitable for
decompilation A decompiler is a computer program that translates an executable file to a high-level source file which can be recompiled successfully. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level la ...
and code inspection. The current Soot software release also contains detailed program analyses that can be used out-of-the-box, such as context-sensitive flow-insensitive points-to analysis,
call graph A call graph (also known as a call multigraph) is a control-flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge ''(f, g)'' indicates that procedure ''f'' cal ...
analysis and domination analysis (answering the question "must event ''a'' follow event ''b''?"). It also has a decompiler called dava. Soot is
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
available under the
GNU Lesser General Public License The GNU Lesser General Public License (LGPL) is a free-software license published by the Free Software Foundation (FSF). The license allows developers and companies to use and integrate a software component released under the LGPL into their own ...
(LGPL). In 2010, two research papers on Soot ( and ) were selected as IBM '' CASCON First Decade High Impact Papers'' among 12 other papers from the 425 entries.


Jimple

Jimple is an
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
of a
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
program designed to be easier to optimize than
Java bytecode In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming langua ...
. It is typed, has a concrete syntax and is based on
three-address code In computer science, three-address code (often abbreviated to TAC or 3AC) is an intermediate code used by optimizing compilers to aid in the implementation of code-improving transformations. Each TAC instruction has at most three operands and is ty ...
. Jimple includes only 15 different operations, thus simplifying flow analysis. By contrast, java bytecode includes over 200 different operations. Unlike java bytecode, in Jimple local and stack variables are typed and Jimple is inherently type safe. Converting to Jimple, or "Jimplifying" (after "simplifying"), is conversion of bytecode to three-address code. The idea behind the conversion, first investigated by Clark Verbrugge, is to associate a variable to each position in the stack. Hence stack operations become assignments involving the stack variables.


Example

Consider the following bytecode, which is from the
iload 1  // load variable x1, and push it on the stack
iload 2  // load variable x2, and push it on the stack
iadd     // pop two values, and push their sum on the stack
istore 1 // pop a value from the stack, and store it in variable x1
The above translates to the following three-address code:
stack1 = x1 // iload 1
stack2 = x2 // iload 2
stack1 = stack1 + stack2 // iadd
x1 = stack1 // istore 1
In general the resulting code does not have
static single assignment form In compiler design, static single assignment form (often abbreviated as SSA form or simply SSA) is a property of an intermediate representation (IR) that requires each variable to be assigned exactly once and defined before it is used. Existing var ...
.


SootUp

Soot is now succeeded by the SootUp framework developed by the Secure Software Engineering Group at
Paderborn University Paderborn University (german: Universität Paderborn) is one of the fourteen public research universities in the state of North Rhine-Westphalia in Germany. It was founded in 1972 and 20,308 students were enrolled at the university in the winter ...
. SootUp is a complete reimplementation of Soot with a novel design, that focuses more on static program analysis, rather than bytecode optimization.


References


Further reading

* Republished in * Republished in *


External links

* {{official website, https://soot-oss.github.io/soot/
Scientific publications citing Soot
(on
Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes ...
) Free software programmed in Java (programming language) Free computer programming tools Static program analysis tools McGill University