HOME

TheInfoList



OR:

An abstract syntax tree (AST) is a data structure used in
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
to represent the structure of a program or code snippet. It is a
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
representation of the abstract syntactic structure of text (often
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
) written in a
formal language In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet". The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
. Each node of the tree denotes a construct occurring in the text. It is sometimes called just a syntax tree. The syntax is "abstract" in the sense that it does not represent every detail appearing in the real syntax, but rather just the structural or content-related details. For instance, grouping parentheses are implicit in the tree structure, so these do not have to be represented as separate nodes. Likewise, a syntactic construct like an if-condition-then statement may be denoted by means of a single node with three branches. This distinguishes abstract syntax trees from concrete syntax trees, traditionally designated
parse tree A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...
s. Parse trees are typically built by a
parser Parsing, syntax analysis, or syntactic analysis is a process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar by breaking it into parts. The term '' ...
during the source code translation and
compiling In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
process. Once built, additional information is added to the AST by means of subsequent processing, e.g., contextual analysis. Abstract syntax trees are also used in
program analysis In computer science, program analysis is the process of analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program optimization an ...
and
program transformation A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular Formal semantics of p ...
systems.


Application in compilers

Abstract syntax trees are
data structures In computer science, a data structure is a data organization and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functi ...
widely used in
compilers In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
to represent the structure of program code. An AST is usually the result of the syntax analysis phase of a compiler. It often serves as an intermediate representation of the program through several stages that the compiler requires, and has a strong impact on the final output of the compiler.


Motivation

An AST has several properties that aid the further steps of the compilation process: * An AST can be edited and enhanced with information such as properties and annotations for every element it contains. Such editing and annotation is impossible with the source code of a program, since it would imply changing it. * Compared to the
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
, an AST does not include inessential punctuation and delimiters (braces, semicolons, parentheses, etc.). * An AST usually contains extra information about the program, due to the consecutive stages of analysis by the compiler. For example, it may store the position of each element in the source code, allowing the compiler to print useful error messages. Languages are often ambiguous by nature. In order to avoid this ambiguity, programming languages are often specified as a
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
(CFG). However, there are often aspects of programming languages that a CFG can't express, but are part of the language and are documented in its specification. These are details that require a context to determine their validity and behaviour. For example, if a language allows new types to be declared, a CFG cannot predict the names of such types nor the way in which they should be used. Even if a language has a predefined set of types, enforcing proper usage usually requires some context. Another example is
duck typing In computer programming, duck typing is an application of the duck test—"If it walks like a duck and it quacks like a duck, then it must be a duck"—to determine whether an object can be used for a particular purpose. With nominative ...
, where the type of an element can change depending on context.
Operator overloading In computer programming, operator overloading, sometimes termed ''operator ad hoc polymorphism'', is a specific case of polymorphism, where different operators have different implementations depending on their arguments. Operator overloading ...
is yet another case where correct usage and final function are context-dependent.


Design

The design of an AST is often closely linked with the design of a compiler and its expected features. Core requirements include the following: * Variable types must be preserved, as well as the location of each declaration in source code. * The order of executable statements must be explicitly represented and well defined. * Left and right components of binary operations must be stored and correctly identified. * Identifiers and their assigned values must be stored for assignment statements. These requirements can be used to design the data structure for the AST. Some operations will always require two elements, such as the two terms for addition. However, some language constructs require an arbitrarily large number of children, such as argument lists passed to programs from the
command shell An operating system shell is a computer program that provides relatively broad and direct access to the system on which it runs. The term ''shell'' refers to how it is a relatively thin layer around an operating system. A shell is generally a ...
. As a result, an AST used to represent code written in such a language has to also be flexible enough to allow for quick addition of an unknown quantity of children. To support compiler verification it should be possible to unparse an AST into source code form. The source code produced should be sufficiently similar to the original in appearance and identical in execution, upon recompilation. The AST is used intensively during semantic analysis, where the compiler checks for correct usage of the elements of the program and the language. The compiler also generates symbol tables based on the AST during semantic analysis. A complete traversal of the tree allows verification of the correctness of the program. After verifying correctness, the AST serves as the base for code generation. The AST is often used to generate an intermediate representation (IR), sometimes called an
intermediate language An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
, for the code generation.


Other usages


AST differencing

AST differencing, or for short tree differencing, consists of computing the list of differences between two ASTs. This list of differences is typically called an edit script. The edit script directly refers to the AST of the code. For instance, an edit action may result in the addition of a new AST node representing a function.


Clone detection

An AST is a powerful abstraction to perform code clone detection.


See also

* Abstract semantic graph (ASG), also called ''term graph'' *
Composite pattern In software engineering, the composite pattern is a partitioning design pattern (computer science), design pattern. The composite pattern describes a group of objects that are treated the same way as a single instance of the same type of object. Th ...
* Control-flow graph *
Directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ...
(DAG) *
Document Object Model The Document Object Model (DOM) is a cros s-platform and language-independent API that treats an HTML or XML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with ...
(DOM) * Expression tree *
Extended Backus–Naur form Extension, extend or extended may refer to: Mathematics Logic or set theory * Axiom of extensionality * Extensible cardinal * Extension (model theory) * Extension (proof theory) * Extension (predicate logic), the set of tuples of values ...
*
Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...
, a family of languages written in trees, with macros to manipulate code trees *
Parse tree A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...
, also known as ''concrete syntax tree'' * Semantic resolution tree (SRT) * Shunting-yard algorithm * Symbol table * TreeDL * Abstract Syntax Tree Interpreters


References


Further reading

* (overview of AST implementation in various language families) * * *


External links


AST View
an
Eclipse An eclipse is an astronomical event which occurs when an astronomical object or spacecraft is temporarily obscured, by passing into the shadow of another body or by having another body pass between it and the viewer. This alignment of three ...
plugin to visualize a
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
abstract syntax tree * *
eli project
Abstract Syntax Tree Unparsing * ( OMG standard).
JavaParser
The JavaParser library provides you with an Abstract Syntax Tree of your Java code. The AST structure then allows you to work with your Java code in an easy programmatic way.
Spoon
A library to analyze, transform, rewrite, and transpile Java source code. It parses source files to build a well-designed AST with powerful analysis and transformation API.
AST Explorer
A website to help visualize ASTs in several popular languages such as Go, Python, Java, and JavaScript. {{DEFAULTSORT:Abstract Syntax Tree Trees (data structures) Formal languages