In
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a symbol table is a
data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
used by a language
translator
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''trans ...
such as a
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
or
interpreter
Interpreting is translation from a spoken or signed language into another language, usually in real time to facilitate live communication. It is distinguished from the translation of a written text, which can be more deliberative and make use o ...
, where each
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
,
symbol
A symbol is a mark, Sign (semiotics), sign, or word that indicates, signifies, or is understood as representing an idea, physical object, object, or wikt:relationship, relationship. Symbols allow people to go beyond what is known or seen by cr ...
,
constant,
procedure and
function in a program's
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
is associated with information relating to its
declaration or appearance in the source. In other words, the entries of a symbol table store the information related to the entry's corresponding symbol.
Background
A symbol table may only exist in memory during the translation process, or it may be embedded in the output of the translation, such as in an
ABI object file
An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is kno ...
for later use. For example, it might be used during an interactive
debugging session, or as a resource for formatting a diagnostic report during or after
execution
Capital punishment, also known as the death penalty and formerly called judicial homicide, is the state-sanctioned killing of a person as punishment for actual or supposed misconduct. The sentence ordering that an offender be punished in ...
of a program.
Description
The minimum information contained in a symbol table used by a translator and
intermediate representation
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
(IR) includes the symbol's name and its location or address. For a compiler targeting a platform with a concept of
relocatability, it will also contain relocatability attributes (absolute, relocatable, etc.) and needed relocation information for relocatable symbols. Symbol tables for
high-level programming language
A high-level programming language is a programming language with strong Abstraction (computer science), abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be ea ...
s may store the symbol's type: string, integer, floating-point, etc., its size, and its dimensions and its bounds. Not all of this information is included in the output file, but may be provided for use in
debugging
In engineering, debugging is the process of finding the Root cause analysis, root cause, workarounds, and possible fixes for bug (engineering), bugs.
For software, debugging tactics can involve interactive debugging, control flow analysis, Logf ...
. In many cases, the symbol's
cross-reference
The term cross-reference (abbreviation: xref) can refer to either:
* An instance within a document which refers to related information elsewhere in the same document. In both printed and online dictionaries cross-references are important because ...
information is stored with or linked to the symbol table. Most compilers print some or all of this information in symbol table and cross-reference listings at the end of translation.
Implementation
Numerous
data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
s are available for implementing tables. Trees, linear lists and
self-organizing lists can all be used to implement a symbol table. The symbol table is accessed by most phases of a compiler, beginning with
lexical analysis
Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful ''lexical tokens'' belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives ...
, and continuing through optimization.
A compiler may use one large symbol table for all symbols or use separated, or hierarchical symbol tables for different
scopes. For example, in a scoped language such as
Algol
ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...
or
PL/I
PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has b ...
a symbol "p" can be declared separately in several procedures, perhaps with different attributes. The scope of each declaration is the section of the program in which references to "p" resolve to that declaration. Each declaration represents a unique identifier "p". The symbol table must have some means of differentiating references to the different "p"s.
A common data structure used to implement symbol tables is the
hash table
In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
. The time for searching in hash tables is independent of the number of elements stored in the table, so it is efficient for a large number of elements. It also simplifies the classification of literals in tabular format by including the classification in calculation of the hash key.
As the lexical analyser spends a great proportion of its time looking up the symbol table, this activity has a crucial effect on the overall speed of the compiler. A symbol table must be organised in such a way that entries can be found as quickly as possible. Hash tables are usually used to organise a symbol table, where the keyword or identifier is 'hashed' to produce an array subscript.
Collisions are inevitable in a hash table, and a common way of handling them is to store the synonym in the next available free space in the table.
Applications
An
object file
An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is kno ...
will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a
linker
Linker or linkers may refer to:
Computing
* Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
will identify and resolve these symbol references. Usually all undefined external symbols will be searched for in one or more
object libraries. If a module is found that defines that symbol it is linked together with the first object file, and any undefined external identifiers are added to the list of identifiers to be looked up. This process continues until all external references have been resolved. It is an error if one or more remains unresolved at the end of the process.
While
reverse engineering
Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accompl ...
an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been
stripped or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program.
Example
Consider the following program written in
C:
// Declare an external function
extern double bar(double x);
// Define a public function
double foo(int count)
A C compiler that parses this code will contain at least the following symbol table entries:
In addition, the symbol table may also contain entries generated by the compiler for intermediate expression values (e.g., the expression that casts the
i
loop variable into a
double
, and the return value of the call to function
bar()
), statement labels, and so forth.
Example: SysV ABI
An example of a symbol table can be found in the
SysV
Unix System V (pronounced: "System Five") is one of the first commercial versions of the Unix operating system. It was originally developed by AT&T and first released in 1983. Four major versions of System V were released, numbered 1, 2, 3, an ...
Application Binary Interface
An application binary interface (ABI) is an interface exposed by software that is defined for in-process machine code access. Often, the exposing software is a library, and the consumer is a program.
An ABI is at a relatively low-level of a ...
(ABI) specification, which mandates how
symbols
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise different concep ...
are to be laid out in a binary file, so that different compilers, linkers and loaders can all consistently find and work with the symbols in a compiled object.
The SysV ABI is implemented in the
GNU binutils' nm utility. This format uses a sorted
memory address
In computing, a memory address is a reference to a specific memory location in memory used by both software and hardware. These addresses are fixed-length sequences of digits, typically displayed and handled as unsigned integers. This numeric ...
field, a "symbol type" field, and a symbol identifier (called "Name").
The symbol types in the SysV ABI (and nm's output) indicate the nature of each entry in the symbol table. Each symbol type is represented by a single character. For example, symbol table entries representing initialized data are denoted by the character "d" and symbol table entries for functions have the symbol type "t" (because executable code is located in the ''text'' section of an object file). Additionally, the capitalization of the symbol type indicates the type of linkage: lower-case letters indicate the symbol is local and upper-case indicates external (global) linkage.
Example: the Python symbol table
The
Python programming language includes extensive support for creating and manipulating symbol tables. Properties that can be queried include whether a given symbol is a
free variable
In mathematics, and in other disciplines involving formal languages, including mathematical logic and computer science, a variable may be said to be either free or bound. Some older books use the terms real variable and apparent variable for f ...
or a
bound variable
In mathematics, and in other disciplines involving formal languages, including mathematical logic and computer science, a variable may be said to be either free or bound. Some older books use the terms real variable and apparent variable for f ...
, whether it is
block scope
In computer programming, the scope of a name binding (an association of a name to an entity, such as a variable) is the part of a program where the name binding is valid; that is, where the name can be used to refer to the entity. In other parts ...
or
global scope, whether it is imported, and what
namespace
In computing, a namespace is a set of signs (''names'') that are used to identify and refer to objects of various kinds. A namespace ensures that all of a given set of objects have unique names so that they can be easily identified.
Namespaces ...
it belongs to.
Example: Dynamic symbol tables
Some programming languages allow the symbol table to be manipulated at run-time, so that symbols can be added at any time.
Racket is an example of such a language.
Both the
LISP
Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation.
Originally specified in the late 1950s, ...
and the
Scheme programming languages allow arbitrary, generic properties to be associated with each symbol.
[Symbols ]
Guile Documentation
/ref>
The Prolog
Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics.
Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...
programming language is essentially a symbol-table manipulation language; symbols are called ''atoms'', and the relationships between symbols can be reasoned over. Similarly, OpenCog provides a dynamic symbol table, called the ''atomspace'', which is used for knowledge representation
Knowledge representation (KR) aims to model information in a structured manner to formally represent it as knowledge in knowledge-based systems whereas knowledge representation and reasoning (KRR, KR&R, or KR²) also aims to understand, reason, and ...
.
See also
*Debug symbol
A debug symbol is a special kind of symbol that attaches additional information to the symbol table of an object file, such as a shared library or an executable. This information allows a symbolic debugger to gain access to information from the ...
* .debug_info
References
Bibliography
*
{{Authority control
Compiler structures