Code Generation (compiler)
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, code generation is part of the process chain of a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
and converts
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
of
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
into a form (e.g.,
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very ...
) that can be readily executed by the target system. Sophisticated compilers typically perform multiple passes over various intermediate forms. This multi-stage process is used because many
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s for
code optimization In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be o ...
are easier to apply one at a time, or because the input to one optimization relies on the completed processing performed by another optimization. This organization also facilitates the creation of a single compiler that can target multiple architectures, as only the last of the code generation stages (the ''backend'') needs to change from target to target. (For more information on compiler design, see
Compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
.) The input to the code generator typically consists of a
parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
or an
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
. The tree is converted into a linear sequence of instructions, usually in an
intermediate language An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A " ...
such as
three-address code In computer science, three-address code (often abbreviated to TAC or 3AC) is an intermediate code used by optimizing compilers to aid in the implementation of code-improving transformations. Each TAC instruction has at most three operands and is ty ...
. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program. (For example, a
peephole optimization Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions; the small set is known as the peephole or window. Peephole optimization involves changing the small set of instructions to an equiva ...
pass would not likely be called "code generation", although a code generator might incorporate a peephole optimization pass.)


Major tasks

In addition to the basic conversion from an intermediate representation into a linear sequence of machine instructions, a typical code generator tries to optimize the generated code in some way. Tasks which are typically part of a sophisticated compiler's "code generation" phase include: *
Instruction selection __NOTOC__ In computer science, ''instruction selection'' is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruction ...
: which instructions to use. *
Instruction scheduling In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing ...
: in which order to put those instructions. Scheduling is a speed optimization that can have a critical effect on
pipeline Pipeline may refer to: Electronics, computers and computing * Pipeline (computing), a chain of data-processing stages or a CPU optimization found on ** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
d machines. *
Register allocation In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers. Register allocation can happen over a basic block (''local register allocatio ...
: the allocation of variables to
processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
s * Debug data generation if required so the code can be
debugged In computer programming and software development, debugging is the process of finding and resolving ''Software bug, bugs'' (defects or problems that prevent correct operation) within computer programs, software, or Software system, systems. Deb ...
. Instruction selection is typically carried out by doing a
recursive Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
postorder traversal In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a tree data structure, exactly once. S ...
on the abstract syntax tree, matching particular tree configurations against templates; for example, the tree W := ADD(X,MUL(Y,Z)) might be transformed into a linear sequence of instructions by recursively generating the sequences for t1 := X and t2 := MUL(Y,Z), and then emitting the instruction ADD W, t1, t2. In a compiler that uses an intermediate language, there may be two instruction selection stages—one to convert the parse tree into intermediate code, and a second phase much later to convert the intermediate code into instructions from the
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
of the target machine. This second phase does not require a tree traversal; it can be done linearly, and typically involves a simple replacement of intermediate-language operations with their corresponding
opcode In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the operat ...
s. However, if the compiler is actually a language translator (for example, one that converts
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
to
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
), then the second code-generation phase may involve ''building'' a tree from the linear intermediate code.


Runtime code generation

When code generation occurs at runtime, as in
just-in-time compilation In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may cons ...
(JIT), it is important that the entire process be efficient with respect to space and time. For example, when
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s are interpreted and used to generate code at runtime, a non-deterministic
finite state machine A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
is often generated instead of a deterministic one, because usually the former can be created more quickly and occupies less memory space than the latter. Despite its generally generating less efficient code, JIT code generation can take advantage of profiling information that is available only at runtime.


Related concepts

The fundamental task of taking input in one language and producing output in a non-trivially different language can be understood in terms of the core transformational operations of
formal language theory In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...
. Consequently, some techniques that were originally developed for use in compilers have come to be employed in other ways as well. For example,
YACC Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a com ...
(Yet Another
Compiler-Compiler In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
) takes input in
Backus–Naur form In computer science, Backus–Naur form () or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats ...
and converts it to a parser in C. Though it was originally created for automatic generation of a parser for a compiler, yacc is also often used to automate writing code that needs to be modified each time specifications are changed.Code Generation: The Real Lesson of Rails
Artima.com (2006-03-16). Retrieved on 2013-08-10. Many
integrated development environment An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of at least a source code editor, build automation tools a ...
s (IDEs) support some form of automatic source-code generation, often using algorithms in common with compiler code generators, although commonly less complicated. (See also:
Program transformation A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and ...
,
Data transformation In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integrationCIO.com. Agile Comes to Data Integration. Retrieved from: http ...
.)


Reflection

In general, a syntax and semantic analyzer tries to retrieve the structure of the program from the source code, while a code generator uses this structural information (e.g.,
data type In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
s) to produce code. In other words, the former ''adds'' information while the latter ''loses'' some of the information. One consequence of this information loss is that
reflection Reflection or reflexion may refer to: Science and technology * Reflection (physics), a common wave phenomenon ** Specular reflection, reflection from a smooth surface *** Mirror image, a reflection in a mirror or in water ** Signal reflection, in ...
becomes difficult or even impossible. To counter this problem, code generators often embed syntactic and semantic information in addition to the code necessary for execution.


See also

*
Automatic programming In computer science, the term automatic programming identifies a type of computer programming in which some mechanism generates a computer program to allow human programmers to write the code at a higher abstraction level. There has been little ...
*
Comparison of code generation tools List of tools Technical features References {{DEFAULTSORT:Code Generation Tools, Comparison Of Programming tools Code generation tools Source code generation ...
*
Source-to-source compilation A source-to-source translator, source-to-source compiler (S2S compiler), transcompiler, or transpiler is a type of translator that takes the source code of a program written in a programming language as its input and produces an equivalent sou ...
: automatic translation of a computer program from one programming language to another


References

{{Authority control Machine code Compiler construction