In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, code generation is part of the process chain of a
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
and converts
intermediate representation of
source code
In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
into a form (e.g.,
machine code) that can be readily executed by the target system.
Sophisticated compilers typically perform
multiple passes over various intermediate forms. This multi-stage process is used because many
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s for
code optimization are easier to apply one at a time, or because the input to one optimization relies on the completed processing performed by another optimization. This organization also facilitates the creation of a single compiler that can target multiple architectures, as only the last of the code generation stages (the ''backend'') needs to change from target to target. (For more information on compiler design, see
Compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
.)
The input to the code generator typically consists of a
parse tree or an
abstract syntax tree.
The tree is converted into a linear sequence of instructions, usually in an
intermediate language such as
three-address code. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program. (For example, a
peephole optimization pass would not likely be called "code generation", although a code generator might incorporate a peephole optimization pass.)
Major tasks
In addition to the basic conversion from an intermediate representation into a linear sequence of machine instructions, a typical code generator tries to optimize the generated code in some way.
Tasks which are typically part of a sophisticated compiler's "code generation" phase include:
*
Instruction selection __NOTOC__
In computer science, ''instruction selection'' is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruct ...
: which instructions to use.
*
Instruction scheduling: in which order to put those instructions. Scheduling is a speed optimization that can have a critical effect on
pipelined machines.
*
Register allocation: the allocation of
variables to
processor registers
*
Debug data generation if required so the code can be
debugged.
Instruction selection is typically carried out by doing a
recursive postorder traversal
In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a tree data structure, exactly once ...
on the abstract syntax tree, matching particular tree configurations against templates; for example, the tree
W := ADD(X,MUL(Y,Z))
might be transformed into a linear sequence of instructions by recursively generating the sequences for
t1 := X
and
t2 := MUL(Y,Z)
, and then emitting the instruction
ADD W, t1, t2
.
In a compiler that uses an intermediate language, there may be two instruction selection stages—one to convert the parse tree into intermediate code, and a second phase much later to convert the intermediate code into instructions from the
instruction set of the target machine. This second phase does not require a tree traversal; it can be done linearly, and typically involves a simple replacement of intermediate-language operations with their corresponding
opcodes. However, if the compiler is actually a
language translator (for example, one that converts
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
to
C++), then the second code-generation phase may involve ''building'' a tree from the linear intermediate code.
Runtime code generation
When code generation occurs at
runtime, as in
just-in-time compilation (JIT), it is important that the entire process be
efficient with respect to space and time. For example, when
regular expressions are interpreted and used to generate code at runtime, a non-deterministic
finite state machine is often generated instead of a deterministic one, because usually the former can be created more quickly and occupies less memory space than the latter. Despite its generally generating less efficient code, JIT code generation can take advantage of
profiling information that is available only at runtime.
Related concepts
The fundamental task of taking input in one language and producing output in a non-trivially different language can be understood in terms of the core
transformational operations of
formal language theory. Consequently, some techniques that were originally developed for use in compilers have come to be employed in other ways as well. For example,
YACC (Yet Another
Compiler-Compiler) takes input in
Backus–Naur form and converts it to a parser in
C. Though it was originally created for automatic generation of a parser for a compiler, yacc is also often used to automate writing code that needs to be modified each time specifications are changed.
Code Generation: The Real Lesson of Rails
Artima.com (2006-03-16). Retrieved on 2013-08-10.
Many integrated development environment
An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of at least a source code editor, build automation tools ...
s (IDEs) support some form of automatic source-code generation, often using algorithms in common with compiler code generators, although commonly less complicated. (See also: Program transformation, Data transformation.)
Reflection
In general, a syntax and semantic analyzer tries to retrieve the structure of the program from the source code, while a code generator uses this structural information (e.g., data type
In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
s) to produce code. In other words, the former ''adds'' information while the latter ''loses'' some of the information. One consequence of this information loss is that reflection becomes difficult or even impossible. To counter this problem, code generators often embed syntactic and semantic information in addition to the code necessary for execution.
See also
* Automatic programming
* Comparison of code generation tools
List of tools
Technical features
References
{{DEFAULTSORT:Code Generation Tools, Comparison Of
Programming tools
Code generation tools
Source code generation ...
* Source-to-source compilation: automatic translation of a computer program from one programming language to another
References
{{Authority control
Machine code
Compiler construction