A compiler is computer software that transforms computer code written
in one programming language (the source language) into another
programming language (the target language).
Compilers are a type of
translator that support digital devices, primarily computers. The name
compiler is primarily used for programs that translate source code
from a high-level programming language to a lower level language
(e.g., assembly language, object code, or machine code) to create an
However, there are many different types of compilers. If the compiled
program can run on a computer whose CPU or operating system is
different from the one on which the compiler runs, the compiler is a
cross-compiler. A bootstrap compiler is written in the language that
it intends to compile. A program that translates from a low-level
language to a higher level one is a decompiler. A program that
translates between high-level languages is usually called a
source-to-source compiler or transpiler. A language rewriter is
usually a program that translates the form of expressions without a
change of language. The term compiler-compiler refers to tools used to
create parsers that perform syntax analysis.
A compiler is likely to perform many or all of the following
operations: preprocessing, lexical analysis, parsing, semantic
analysis (syntax-directed translation), conversion of input programs
to an intermediate representation, code optimization and code
Compilers implement these operations in phases that
promote efficient design and correct transformations of source input
to target output. Program faults caused by incorrect compiler behavior
can be very difficult to track down and work around; therefore,
compiler implementers invest significant effort to ensure compiler
Compilers are not the only translators used to transform source
programs. An interpreter is computer software that transforms and then
executes the indicated operations. The translation process influences
the design of computer languages which leads to a preference of
compilation or interpretation. In practice, an interpreter can be
implemented for compiled languages and compilers can be implemented
for interpreted languages.
2.1 One-pass versus multi-pass compilers
2.2 Three-stage compiler structure
2.3 Front end
2.4 Middle end
2.5 Back end
3 Compiled versus interpreted languages
Compilers in education
6 Conferences and organizations
7 See also
10 External links
Main article: History of compiler construction
A diagram of the operation of a typical multi-language, multi-target
Theoretical computing concepts developed by scientists,
mathematicians, and engineers formed the basis of digital modern
computing development during World War II. Primitive binary languages
evolved because digital devices only understand ones and zeros and the
circuit patterns in the underlying machine architecture. In the late
forties, assembly languages were created to offer a more workable
abstraction of the computer architectures. Limited memory capacity of
early computers led to substantial technical challenges when the first
compilers were designed. Therefore, the compilation process needed to
be divided into several small programs. The front end programs produce
the analysis products used by the back end programs to generate target
code. As computer technology provided more resources compiler designs
could align better with the compilation process.
The human mind can design better solutions as the language moves from
the machine to a higher level. So the development of high-level
languages follows naturally from the capabilities offered by the
digital computers. High-level languages are formal languages that are
strictly defined by their syntax and semantics which form the
high-level language architecture. Elements of these formal languages
Alphabet, any finite set of symbols;
String, a finite sequence of symbols;
Language, any set of strings on an alphabet.
The sentences in a language may be defined by a set of rules called a
Backus-Naur form (BNF) describes the syntax of "sentences" of a
language and was used for the syntax of Algol 60 by John Backus.
The ideas derive from the context-free grammar concepts by Noam
Chomsky, a linguist. "BNF and its extensions have become standard
tools for describing the syntax of programming notations, and in many
cases parts of compilers are generated automatically from a BNF
In the 1940s,
Konrad Zuse designed an algorithmic programming language
Plankalkül ("Plan Calculus"). While no actual implementation
occurred until the 1970s, it presented concepts later seen in APL
designed by Ken Iverson in the late 1950s. APL is a language for
High-level language design during the formative years of digital
computing provided useful programming tools for a variety of
FORTRAN (Formula Translation) for engineering and science applications
is considered to be the first high-level language.
COBOL (Common Business-Oriented Language) evolved from A-0 and
FLOW-MATIC to become the dominant high-level language for business
LISP (List Processor) for symbolic computation.
Compiler technology evolved from the need for a strictly defined
transformation of the high-level source program into a low-level
target program for the digital computer. The compiler could be viewed
as a front end to deal with analysis of the source code and a back end
to synthesize the analysis into the target code. Optimization between
the front end and back end could produce more efficient target
Some early milestones in the development of compiler technology:
1952 – An
Autocode compiler developed by
Alick Glennie for the
Manchester Mark I
Manchester Mark I computer at the University of Manchester is
considered by some to be the first compiled programming language.
1952 – Grace Hopper's team at
Remington Rand wrote the compiler for
the A-0 programming language (and coined the term compiler to describe
it), although the A-0 compiler functioned more as a loader or
linker than the modern notion of a full compiler.
1954-1957 – A team led by
John Backus at
IBM developed FORTRAN which
is usually considered the first high-level language. In 1957, they
completed a FORTRAN compiler that is generally credited as having
introduced the first unambiguously complete compiler.
1959 – The Conference on Data Systems Language (CODASYL) initiated
development of COBOL. The
COBOL design drew on A-0 and FLOW-MATIC. By
the early 1960s
COBOL was compiled on multiple architectures.
1958-1962 – John McCarthy at
MIT designed LISP. The symbol
processing capabilities provided useful features for artificial
intelligence research. In 1962,
LISP 1.5 release noted some tools: an
interpreter written by Stephen Russell and Daniel J. Edwards, a
compiler and assembler written by Tim Hart and Mike Levin.
Early operating systems and software were written in assembly
language. In the 60s and early 70s, the use of high-level languages
for system programming was still controversial due to resource
limitations. However, several research and industry efforts began the
shift toward high-level systems programming languages, for example,
BCPL, BLISS, B, and C.
BCPL (Basic Combined Programming Language) designed in 1966 by Martin
Richards at the University of Cambridge was originally developed as a
compiler writing tool. Several compilers have been implemented,
Richards' book provides insights to the language and its compiler.
BCPL was not only an influential systems programming language that is
still used in research but also provided a basis for the design of
B and C languages.
BLISS (Basic Language for Implementation of System Software) was
developed for a Digital Equipment Corporation (DEC) PDP-10 computer by
W.A. Wulf's Carnegie Mellon University (CMU) research team. The CMU
team went on to develop BLISS-11 compiler one year later in 1970.
Multics (Multiplexed Information and Computing Service), a
time-sharing operating system project, involved MIT, Bell Labs,
General Electric (later Honeywell) and was led by Fernando Corbató
Multics was written in the
PL/I language developed by
IBM User Group. IBM's goal was to satisfy business,
scientific, and systems programming requirements. There were other
languages that could have been considered but
PL/I offered the most
complete solution even though it had not been implemented. For the
first few years of the Mulitics project, a subset of the language
could be compiled to assembly language with the Early
compiler by Doug McIlory and Bob Morris from Bell Labs. EPL
supported the project until a boot-strapping compiler for the full
PL/I could be developed.
Bell Labs left the
Multics project in 1969: "Over time, hope was
replaced by frustration as the group effort initially failed to
produce an economically useful system." Continued participation
would drive up project support costs. So researchers turned to other
development efforts. A system programming language B based on BCPL
concepts was written by
Dennis Ritchie and Ken Thompson. Ritchie
created a boot-strapping compiler for B and wrote Unics (Uniplexed
Information and Computing Service) operating system for a PDP-7 in B.
Unics eventually became spelled Unix.
Bell Labs started development and expansion of C based on B and BCPL.
BCPL compiler had been transported to
Bell Labs and
BCPL was a preferred language at Bell Labs. Initially, a front-end
program to Bell Labs' B compiler was used while a C compiler was
developed. In 1971, a new PDP-11 provided the resource to define
extensions to B and rewrite the compiler. By 1973 the design of C
language was essentially complete and the
Unix kernel for a PDP-11 was
rewritten in C. Steve Johnson started development of Portable C
Compiler (PCC) to support retargeting of C compilers to new
Object-oriented programming (OOP) offered some interesting
possibilities for application development and maintenance. OOP
concepts go further back but were part of
science. At Bell Labs, the development of
C++ became interested in
C++ was first used in 1980 for systems programming. The
initial design leveraged C language systems programming capabilities
Simula concepts. Object-oriented facilities were added in
Cfront program implemented a
C++ front-end for C84
language compiler. In subsequent years several
C++ compilers were
C++ popularity grew.
In many application domains, the idea of using a higher-level language
quickly caught on. Because of the expanding functionality supported by
newer programming languages and the increasing complexity of computer
architectures, compilers became more complex.
DARPA (Defense Advanced Research Projects Agency) sponsored a compiler
project with Wulf's CMU research team in 1970. The Production Quality
PQCC design would produce a Production Quality
Compiler (PQC) from formal definitions of source language and the
PQCC tried to extend the term compiler-compiler beyond the
traditional meaning as a parser generator (e.g., Yacc) without much
PQCC might more properly be referred to as a compiler
PQCC research into code generation process sought to build a truly
automatic compiler-writing system. The effort discovered and designed
the phase structure of the PQC. The BLISS-11 compiler provided the
initial structure. The phases included analyses (front end),
intermediate translation to virtual machine (middle end), and
translation to the target (back end). TCOL was developed for the PQCC
research to handle language specific constructs in the intermediate
representation. Variations of TCOL supported various languages.
PQCC project investigated techniques of automated compiler
construction. The design concepts proved useful in optimizing
compilers and compilers for the object-oriented programming language
The Ada Stoneman Document formalized the program support environment
(APSE) along with the kernel (KAPSE) and minimal (MAPSE). An Ada
interpreter NYU/ED supported development and standardization efforts
with American National Standards Institute (ANSI) and the
International Standards Organization (ISO). Initial Ada compiler
development by the U.S. Military Services included the compilers in a
complete integrated design environment along the lines of the Stoneman
Document. Army and Navy worked on the Ada Language System (ALS)
project targeted to DEC/VAX architecture while the Air Force started
on the Ada Integrated Environment (AIE) targeted to
IBM 370 series.
While the projects did not provide the desired results, they did
contribute to the overal effort on Ada development.
Other Ada compiler efforts got under way in Britain at University of
York and in Germany at University of Karlsruhe. In the U. S., Verdix
(later acquired by Rational) delivered the Verdix Ada Development
System (VADS) to the Army. VADS provided a set of development tools
including a compiler. Unix/VADS could be hosted on a variety of Unix
platforms such as DEC Ultrix and the Sun 3/60 Solaris targeted to
Motorola 68020 in an Army CECOM evaluation. There were soon many
Ada compilers available that passed the Ada Validation tests. The Free
Software Foundation GNU project developed the GNU
(GCC) which provides a core capability to support multiple languages
and targets. The Ada version
GNAT is one of the most widely used Ada
GNAT is free but there is also commercial support, for
example, AdaCore, was founded in 1994 to provide commercial software
solutions for Ada.
GNAT Pro includes the GNU GCC based
GNAT with a
tool suite to provide an integrated development environment.
High-level languages continued to drive compiler research and
development. Focus areas included optimization and automatic code
generation. Trends in programming languages and development
environments influenced compiler technology. More compilers became
included in language distributions (PERL, Java Development Kit) and as
a component of an IDE (VADS, Eclipse, Ada Pro). The interrelationship
and interdependence of technologies grew. The advent of web services
promoted growth of web languages and scripting languages. Scripts
trace back to the early days of Command Line Interfaces (CLI) where
the user could enter commands to be executed by the system. User Shell
concepts developed with languages to write shell programs. Early
Windows designs offered a simple batch programming capability. The
conventional transformation of these language used an interpreter.
While not widely used, Bash and Batch compilers have been written.
More recently sophisticated interpreted languages became part of the
developers tool kit. Modern scripting languages include PHP, Python,
Ruby and Lua. (Lua is widely used in game development.) All of these
have interpreter and compiler support.
"When the field of compiling began in the late 50s, its focus was
limited to the translation of high-level language programs into
machine code ... The compiler field is increasingly intertwined with
other disciplines including computer architecture, programming
languages, formal methods, software engineering, and computer
security." The "
Compiler Research: The Next 50 Years" article
noted the importance of object-oriented languages and Java. Security
and parallel computing were cited among the future research targets.
This section does not cite any sources. Please help improve this
section by adding citations to reliable sources. Unsourced material
may be challenged and removed. (September 2010) (Learn how and when to
remove this template message)
A compiler implements a formal transformation from a high-level source
program to a low-level target program.
Compiler design can define an
end to end solution or tackle a defined subset that interfaces with
other compilation tools e.g. preprocessors, assemblers, linkers.
Design requirements include rigorously defined interfaces both
internally between compiler components and externally between
In the early days, the approach taken to compiler design was directly
affected by the complexity of the computer language to be processed,
the experience of the person(s) designing it, and the resources
available. Resource limitations led to the need to pass through the
source code more than once.
A compiler for a relatively simple language written by one person
might be a single, monolithic piece of software. However, as the
source language grows in complexity the design may be split into a
number of interdependent phases. Separate phases provide design
improvements that focus development on the functions in the
One-pass versus multi-pass compilers
Classifying compilers by number of passes has its background in the
hardware resource limitations of computers. Compiling involves
performing lots of work and early computers did not have enough memory
to contain one program that did all of this work. So compilers were
split up into smaller programs which each made a pass over the source
(or some representation of it) performing some of the required
analysis and translations.
The ability to compile in a single pass has classically been seen as a
benefit because it simplifies the job of writing a compiler and
one-pass compilers generally perform compilations faster than
multi-pass compilers. Thus, partly driven by the resource limitations
of early systems, many early languages were specifically designed so
that they could be compiled in a single pass (e.g., Pascal).
In some cases the design of a language feature may require a compiler
to perform more than one pass over the source. For instance, consider
a declaration appearing on line 20 of the source which affects the
translation of a statement appearing on line 10. In this case, the
first pass needs to gather information about declarations appearing
after statements that they affect, with the actual translation
happening during a subsequent pass.
The disadvantage of compiling in a single pass is that it is not
possible to perform many of the sophisticated optimizations needed to
generate high quality code. It can be difficult to count exactly how
many passes an optimizing compiler makes. For instance, different
phases of optimization may analyse one expression many times but only
analyse another expression once.
Splitting a compiler up into small programs is a technique used by
researchers interested in producing provably correct compilers.
Proving the correctness of a set of small programs often requires less
effort than proving the correctness of a larger, single, equivalent
Three-stage compiler structure
Regardless of the exact number of phases in the compiler design, the
phases can be assigned to one of three stages. The stages include a
front end, a middle end, and a back end.
The front end verifies syntax and semantics according to a specific
source language. For statically typed languages it performs type
checking by collecting type information. If the input program is
syntactically incorrect or has a type error, it generates errors and
warnings, highlighting[dubious – discuss] them on the source code.
Aspects of the front end include lexical analysis, syntax analysis,
and semantic analysis. The front end transforms the input program into
an intermediate representation (IR) for further processing by the
middle end. This IR is usually a lower-level representation of the
program with respect to the source code.
The middle end performs optimizations on the IR that are independent
of the CPU architecture being targeted. This source code/machine code
independence is intended to enable generic optimizations to be shared
between versions of the compiler supporting different languages and
target processors. Examples of middle end optimizations are removal of
useless (dead code elimination) or unreachable code (reachability
analysis), discovery and propagation of constant values (constant
propagation), relocation of computation to a less frequently executed
place (e.g., out of a loop), or specialization of computation based on
the context. Eventually producing the "optimized" IR that is used by
the back end.
The back end takes the optimized IR from the middle end. It may
perform more analysis, transformations and optimizations that are
specific for the target CPU architecture. The back end generates the
target-dependent assembly code, performing register allocation in the
process. The back end performs instruction scheduling, which re-orders
instructions to keep parallel execution units busy by filling delay
slots. Although most algorithms for optimization are NP-hard,
heuristic techniques are well-developed and currently implemented in
production-quality compilers. Typically the output of a back end is
machine code specialized for a particular processor and operating
This front/middle/back-end approach makes it possible to combine front
ends for different languages with back ends for different CPUs while
sharing the optimizations of the middle end. Practical examples of
this approach are the GNU
Compiler Collection, LLVM, and the
Compiler Kit, which have multiple front-ends, shared
optimizations and multiple back-ends.
Lexer and parser example for C. Starting from the sequence of
characters "if(net>0.0)total+=net*(1.0+tax/100.0);", the scanner
composes a sequence of tokens, and categorizes each of them, for
example as identifier, reserved word, number literal, or operator. The
latter sequence is transformed by the parser into a syntax tree, which
is then treated by the remaining compiler phases. The scanner and
parser handles the regular and properly context-free parts of the
grammar for C, respectively.
The front end analyzes the source code to build an internal
representation of the program, called the intermediate representation
(IR). It also manages the symbol table, a data structure mapping each
symbol in the source code to associated information such as location,
type and scope.
While the frontend can be a single monolithic function or program, as
in a scannerless parser, it is more commonly implemented and analyzed
as several phases, which may execute sequentially or concurrently.
This method is favored due to its modularity and separation of
concerns. Most commonly today, the frontend is broken into three
phases: lexical analysis (also known as lexing), syntax analysis (also
known as scanning or parsing), and semantic analysis. Lexing and
parsing comprise the syntactic analysis (word syntax and phrase
syntax, respectively), and in simple cases these modules (the lexer
and parser) can be automatically generated from a grammar for the
language, though in more complex cases these require manual
modification. The lexical grammar and phrase grammar are usually
context-free grammars, which simplifies analysis significantly, with
context-sensitivity handled at the semantic analysis phase. The
semantic analysis phase is generally more complex and written by hand,
but can be partially or fully automated using attribute grammars.
These phases themselves can be further broken down: lexing as scanning
and evaluating, and parsing as building a concrete syntax tree (CST,
parse tree) and then transforming it into an abstract syntax tree
(AST, syntax tree). In some cases additional phases are used, notably
line reconstruction and preprocessing, but these are rare.
The main phases of the front end include the following:
Line reconstruction converts the input character sequence to a
canonical form ready for the parser. Languages which strop their
keywords or allow arbitrary spaces within identifiers require this
phase. The top-down, recursive-descent, table-driven parsers used in
the 1960s typically read the source one character at a time and did
not require a separate tokenizing phase. Atlas Autocode, and Imp (and
some implementations of
ALGOL and Coral 66) are examples of stropped
languages which compilers would have a Line Reconstruction phase.
Preprocessing supports macro substitution and conditional compilation.
Typically the preprocessing phase occurs before syntactic or semantic
analysis; e.g. in the case of C, the preprocessor manipulates lexical
tokens rather than syntactic forms. However, some languages such as
Scheme support macro substitutions based on syntactic forms.
Lexical analysis (also known as lexing or tokenization) breaks the
source code text into a sequence of small pieces called lexical
tokens. This phase can be divided into two stages: the scanning,
which segments the input text into syntactic units called lexemes and
assign them a category; and the evaluating, which converts lexemes
into a processed value. A token is a pair consisting of a token name
and an optional token value. Common token categories may include
identifiers, keywords, separators, operators, literals and comments,
although the set of token categories varies in different programming
languages. The lexeme syntax is typically a regular language, so a
finite state automaton constructed from a regular expression can be
used to recognize it. The software doing lexical analysis is called a
lexical analyzer. This may not be a separate step—it can be combined
with the parsing step in scannerless parsing, in which case parsing is
done at the character level, not the token level.
Syntax analysis (also known as parsing) involves parsing the token
sequence to identify the syntactic structure of the program. This
phase typically builds a parse tree, which replaces the linear
sequence of tokens with a tree structure built according to the rules
of a formal grammar which define the language's syntax. The parse tree
is often analyzed, augmented, and transformed by later phases in the
Semantic analysis adds semantic information to the parse tree and
builds the symbol table. This phase performs semantic checks such as
type checking (checking for type errors), or object binding
(associating variable and function references with their definitions),
or definite assignment (requiring all local variables to be
initialized before use), rejecting incorrect programs or issuing
warnings. Semantic analysis usually requires a complete parse tree,
meaning that this phase logically follows the parsing phase, and
logically precedes the code generation phase, though it is often
possible to fold multiple phases into one pass over the code in a
The middle end performs optimizations on the intermediate
representation in order to improve the performance and the quality of
the produced machine code. The middle end contains those
optimizations that are independent of the CPU architecture being
The main phases of the middle end include the following:
Analysis: This is the gathering of program information from the
intermediate representation derived from the input; data-flow analysis
is used to build use-define chains, together with dependence analysis,
alias analysis, pointer analysis, escape analysis, etc. Accurate
analysis is the basis for any compiler optimization. The control flow
graph of every compiled function and the call graph of the program and
are usually also built during the analysis phase.
Optimization: the intermediate language representation is transformed
into functionally equivalent but faster (or smaller) forms. Popular
optimizations are inline expansion, dead code elimination, constant
propagation, loop transformation and even automatic parallelization.
Compiler analysis is the prerequisite for any compiler optimization,
and they tightly work together. For example, dependence analysis is
crucial for loop transformation.
The scope of compiler analysis and optimizations vary greatly, from as
small as a basic block to the procedure/function level, or even over
the whole program (interprocedural optimization).
Obviously,[clarification needed] a compiler can potentially do a
better job using a broader view. But that broad view is not free:
large scope analysis and optimizations are very costly in terms of
compilation time and memory space; this is especially true for
interprocedural analysis and optimizations.
Interprocedural analysis and optimizations are common in modern
commercial compilers from HP, IBM, SGI, Intel, Microsoft, and Sun
Microsystems. The open source GCC was criticized for a long time for
lacking powerful interprocedural optimizations, but it is changing in
this respect. Another open source compiler with full analysis and
optimization infrastructure is Open64, which is used by many
organizations for research and commercial purposes.
Due to the extra time and space needed for compiler analysis and
optimizations, some compilers skip them by default. Users have to use
compilation options to explicitly tell the compiler which
optimizations should be enabled.
The back end is responsible for the CPU architecture specific
optimizations and for code generation.
The main phases of the back end include the following:
Machine dependent optimizations: optimizations that depend on the
details of the CPU architecture that the compiler targets. A
prominent example is peephole optimizations, which rewrites short
sequences of assembler instructions into more efficient instructions.
Code generation: the transformed intermediate language is translated
into the output language, usually the native machine language of the
system. This involves resource and storage decisions, such as deciding
which variables to fit into registers and memory and the selection and
scheduling of appropriate machine instructions along with their
associated addressing modes (see also Sethi-Ullman algorithm). Debug
data may also need to be generated to facilitate debugging.
Compiler correctness is the branch of software engineering that deals
with trying to show that a compiler behaves according to its language
specification.[self-published source?][non-primary source needed]
Techniques include developing the compiler using formal methods and
using rigorous testing (often called compiler validation) on an
Compiled versus interpreted languages
Higher-level programming languages usually appear with a type of
translation in mind: either designed as compiled language or
interpreted language. However, in practice there is rarely anything
about a language that requires it to be exclusively compiled or
exclusively interpreted, although it is possible to design languages
that rely on re-interpretation at run time. The categorization usually
reflects the most popular or widespread implementations of a language
— for instance,
BASIC is sometimes called an interpreted language,
and C a compiled one, despite the existence of
BASIC compilers and C
Interpretation does not replace compilation completely. It only hides
it from the user and makes it gradual. Even though an interpreter can
itself be interpreted, a directly executed program is needed somewhere
at the bottom of the stack (see machine language).
Further, compilers can contain interpreters for optimization reasons.
For example, where an expression can be executed during compilation
and the results inserted into the output program, then it prevents it
having to be recalculated each time the program runs, which can
greatly speed up the final program. Modern trends toward just-in-time
compilation and bytecode interpretation at times blur the traditional
categorizations of compilers and interpreters even further.
Some language specifications spell out that implementations must
include a compilation facility; for example, Common Lisp. However,
there is nothing inherent in the definition of
Common Lisp that stops
it from being interpreted. Other languages have features that are very
easy to implement in an interpreter, but make writing a compiler much
harder; for example, APL, SNOBOL4, and many scripting languages allow
programs to construct arbitrary source code at runtime with regular
string operations, and then execute that code by passing it to a
special evaluation function. To implement these features in a compiled
language, programs must usually be shipped with a runtime library that
includes a version of the compiler itself.
One classification of compilers is by the platform on which their
generated code executes. This is known as the target platform.
A native or hosted compiler is one whose output is intended to
directly run on the same type of computer and operating system that
the compiler itself runs on. The output of a cross compiler is
designed to run on a different platform. Cross compilers are often
used when developing software for embedded systems that are not
intended to support a software development environment.
The output of a compiler that produces code for a virtual machine (VM)
may or may not be executed on the same platform as the compiler that
produced it. For this reason such compilers are not usually classified
as native or cross compilers.
The lower level language that is the target of a compiler may itself
be a high-level programming language. C, often viewed as some sort of
portable assembler, can also be the target language of a compiler.
E.g.: Cfront, the original compiler for
C++ used C as target language.
The C created by such a compiler is usually not intended to be read
and maintained by humans. So indent style and pretty C intermediate
code are irrelevant. Some features of C turn it into a good target
language. E.g.: C code with #line directives can be generated to
support debugging of the original source.
While a common compiler type outputs machine code, there are many
A source-to-source compiler is a type of compiler that takes a
high-level language as its input and outputs a high-level language.
For example, an automatic parallelizing compiler will frequently take
in a high-level language program as an input and then transform the
code and annotate it with parallel code annotations (e.g. OpenMP) or
language constructs (e.g. Fortran's DOALL statements).
Bytecode compilers that compile to assembly language of a theoretical
machine, like some
Prolog machine is also known as the
Warren Abstract Machine (or
Bytecode compilers for Java, Python are also examples of this
Just-in-time compiler (JIT compiler) is the last part of a multi-pass
compiler chain in which some compilation stages are deferred to
run-time. Examples are implemented in Smalltalk, Java and Microsoft
Common Intermediate Language (CIL) systems.
Applications are first compiled using a bytecode compiler and
delivered in a machine-independent intermediate representation. This
bytecode is then compiled using a JIT compiler to native machine code
just when the execution of the program is required.[non-primary
hardware compilers (also known as syntheses tools) are compilers whose
output is a description of the hardware configuration instead of a
sequence of instructions.
The output of these compilers target computer hardware at a very low
level, for example a field-programmable gate array (FPGA) or
structured application-specific integrated circuit
(ASIC).[non-primary source needed] Such compilers are said to be
hardware compilers, because the source code they compile effectively
controls the final configuration of the hardware and how it operates.
The output of the compilation is only an interconnection of
transistors or lookup tables.
An example of hardware compiler is XST, the Xilinx Synthesis Tool used
for configuring FPGAs.[non-primary source needed] Similar tools
are available from Altera,[non-primary source needed] Synplicity,
Synopsys and other hardware vendors.
An assembler is a program that compiles
Assembly language - a type of
low-level language; with the inverse program known as a disassembler.
A program that translates from a low-level language to a higher level
one is a decompiler.
A program that translates between high-level languages is usually
called a language translator, source-to-source compiler, language
converter, or language rewriter. The last term is
usually applied to translations that do not involve a change of
A program that translates into an object code format that is not
supported on the compilation machine is called a cross compiler and is
commonly used to prepare code for embedded applications.[citation
A program that rewrites object code back into the same type of object
code while applying optimisations and transformations is a binary
Compilers in education
Compiler construction and compiler optimization are taught at
universities and schools as part of a computer science
curriculum.[non-primary source needed] Such courses are usually
supplemented with the implementation of a compiler for an educational
programming language. A well-documented example is Niklaus Wirth's
PL/0 compiler, which Wirth used to teach compiler construction in the
1970s. In spite of its simplicity, the
introduced several influential concepts to the field, including uses
akin to the 1971 paper by Wirth, program development
by stepwise refinement;[clarification needed][jargon]
a recursive descent parser;[clarification needed][jargon]
an extended Backus–Naur form (EBNF) to specify the syntax of a
a code generator producing portable P-code;[clarification
tombstone diagrams in the formal description of the bootstrapping
Conferences and organizations
This section relies largely or entirely on a single source. Relevant
discussion may be found on the talk page. Please help improve this
article by introducing citations to additional sources. (February
High-level programming languages mature over time and lead to a need
for Standardization. The American National Standards Institute (ANSI)
and the International Organization for Standardization (ISO) manage
standards for various programming languages such as FORTRAN, COBOL, C,
C++ and so on.
Universities in conjunction with industry and government provide
active research and development for programming languages and the
associated language tools: compilers, integrated development
environments, formal validation suites.
Professional organizations have representation from across the
research, education, industry, and government. These include the
Institute of Electrical and Electronic Engineers (IEEE) and
Association for Computing Machinery
Association for Computing Machinery (ACM).
A number of conferences in the field of programming languages present
advances in compiler construction as one of their main topics.
SIGPLAN supports a number of conferences, including:
Programming Language Design and Implementation (PLDI)
Principles of Programming Languages (POPL)
Object-Oriented Programming, Systems, Languages, and Applications
International Conference on Functional Programming (ICFP)
The European Joint Conferences on Theory and Practice of Software
(ETAPS) sponsors the International Conference on Compiler
Construction, with papers from both the academic and industrial
Asian Symposium on Programming Languages and Systems (APLAS) is
organized by the Asian Association for Foundation of Software (AAFS).
Computer programming portal
Compile and go loader
List of compilers
List of important publications in computer science § Compilers
^ PC Mag Staff (28 February 2017). "Encyclopedia: Definition of
Compiler". PCMag.com. Retrieved 28 February 2017.
^ Sun, Chengnian; Le, Vu; Zhang, Qirun; Su, Zhendong (2016). "Toward
Compiler Bugs in GCC and LLVM". ACM.
^ lecture notes Compilers: Principles, Techniques, and Tools Jing-Shin
Chang Department of Computer Science & Information Engineering
National Chi-Nan University
^ Naur, P. et al. Report on
ALGOL 60. Communications of the ACM 3 (May
^ Syntactic Structures ISBN 3-11-017279-8
^ Science of Programming, Appendix 1, ISBN 1461259835
^ A Programming Language K. E. Iverson ISBN 0-471430-14-5
^ John Backus. "The history of FORTRAN I, II and III" (PDF).
^ Porter Adams, Vicki (5 October 1981). "Captain Grace M. Hopper: the
Mother of COBOL". InfoWorld. 3 (20): 33. ISSN 0199-6649.
^ McCarthy, J.; Brayton, R.; Edwards, D.; Fox, P.; Hodes, L.; Luckham,
D.; Maling, K.; Park, D.; Russell, S. (March 1960). "
Programmers Manual" (PDF). Boston, Massachusetts: Artificial
Intelligence Group, M.I.T. Computation Center and Research Laboratory.
Compilers Principles, Techniques, & Tools 2nd edition by Aho,
Lam, Sethi, Ullman ISBN 0-321-48681-1
^ Hopper, Grace Murray (1952). "The Education of a Computer".
Proceedings of the 1952 ACM National Meeting (Pittsburgh).
^ Ridgway, Richard K. (1952). "Compiling routines". Proceedings of the
1952 ACM National Meeting (Toronto).
^ "Recursive Functions of Symbolic Expressions and Their Computation
by Machine", Communications of the ACM, April 1960
^ Lisp 1.5 Programmers Manual, The
^ "BCPL: A tool for compiler writing and system programming" M.
Richards, University Mathematical Laboratory Cambridge, England 1969
^ BCPL: The Language and Its Compiler, M Richards, Cambridge
University Press (first published 31 December 1981)
BCPL Cintsys and Cintpos User Guide, M. Richards, 2017
^ Corabato/Vyssotsky "Introduction and Overview of the MULTICS System"
^ Report II of the SHARE Advanced Language Development Committee, 25
^ Multicians.org "The Choice of PL/I" article, Editor /tom Van Vleck
PL/I As a Tool for System Programming", F.J. Corbato, Datamation
May 6, 1969 issue
Multics PL/1 Compiler", R. A. Freiburghouse, GE, Fall Joint
Computer Conference 1969
Datamation column, 1969
^ Dennis M. Ritchie, "The Development of the C Language", ACM Second
History of Programming Languages Conference, April, 1993
^ S.C. Johnson, "a Portable C Compiler: Theory and Practice", 5th ACM
POPL Symposium, January 1978
^ A. Snyder, A Portable
Compiler for the Language C, MIT, 1974.
^ K. Nygarard, University of Oslo, Norway, "Basic Concepts in Object
SIGPLAN Notices V21, 1986
^ B. Stroustrup: "What is Object-Oriented Programming?" Proceedings
14th ASU Conference, 1986.
^ Bjarne Stroustrup, "An Overview of the
C++ Programming Language",
Handbook of Object Technology (Editor: Saba Zamir,
^ Leverett, Cattell, Hobbs, Newcomer, Reiner, Schatz, Wulf: "An
Overview of the Production Quality Compiler-
^ W. Wulf, K. Nori, "Delayed binding in
PQCC generated compilers", CMU
Research Showcase Report, CMU-CS-82-138, 1982
^ Joseph M. Newcomer, David Alex Lamb, Bruce W. Leverett, Michael
Tighe, William A. Wulf - Carnegie-Mellon University and David Levine,
Andrew H. Reinerit - Intermetrics: "TCOL Ada: Revised Report on An
Intermediate Representation for the DOD Standard Programming
^ William A. Whitaker, "Ada - the project: the DoD High Order Working
SIGPLAN Notices (Volume 28, No. 3, March 1991)
^ CECOM Center for Software Engineering Advanced Software Technology,
"Final Report - Evaluation of the ACEC Benchmark Suite for Real-Time
Applications", AD-A231 968, 1990
^ P.Biggar, E. de Vries, D. Gregg, "A Practical Solution for Scripting
Language Compilers", submission to Science of Computer Programming,
^ M.Hall, D. Padua, K. Pingali, "
Compiler Research: The Next 50
Years", ACM Communications 2009 Vol 54 #2
^ Cooper and Torczon 2012, p. 8
^ Lattner, Chris (2017). "LLVM". In Brown, Amy & Wilson, Greg. The
Architecture of Open Source Applications. Archived from the original
on 2 December 2016. Retrieved 28 February 2017. CS1 maint: Uses
editors parameter (link)
^ Aho, Lam, Sethi, Ullman 2007, p. 5-6, 109-189
^ Aho, Lam, Sethi, Ullman 2007, p. 111
^ Aho, Lam, Sethi, Ullman 2007, p. 8, 191-300
^ Aho, Lam, Sethi, Ullman 2007, p. 10, 583-703
^ Cooper and Toczon (2012), p. 540
^ Chlipala, Adam. "Syntactic Proofs of Compositional Compiler
Correctness" (manuscript draft, publication date unknown). Archived
(PDF) from the original on 29 August 2017. Retrieved 28 February 2017
– via Adam.Chlipala.net. [self-published source?][non-primary
^ Aycock, John (2003). "A Brief History of Just-in-Time". ACM Comput.
Surv. 35 (2; June): 93–113. doi:10.1145/857076.857077. Retrieved 28
February 2017. (Subscription required (help)). CS1 maint: Uses
authors parameter (link) [non-primary source needed]
^ Swartz, Jordan S.; Betz, Vaugh; Rose, Jonathan. "A Fast
Routability-Driven Router for FPGAs" (manuscript draft, publication
date unknown). Toronto, CA: Univ. of Toronto, Dept. of Electrical and
Computer Engineering. Archived (PDF) from the original on 9 August
2017. Retrieved 28 February 2017. [non-primary source needed]
^ Xilinx Staff (2009). "XST Synthesis Overview". Xilinx, Inc. Archived
from the original on 2 November 2016. Retrieved 28 February
2017. [non-primary source needed]
^ Altera Staff (2017). "Spectra-Q™ Engine". Altera.com. Archived
from the original on 10 October 2016. Retrieved 28 February
2017. [non-primary source needed]
^ "Language Translator Tutorial" (PDF). Washington University.
^ Chakraborty, P.; Saxena, P. C.; Katti, C. P.; Pahwa, G.; Taneja, S.
(2011). "A New Practicum in
Compiler Construction". Computer
Applications in Engineering Education. 22 (3; 25 July). Archived from
the original on 16 November 2016. Retrieved 28 February 2017.
(Subscription required (help)). [non-primary source needed]
ETAPS Staff (28 February 2017). "Conferences". ETAPS.org. Archived
from the original on 1 March 2017. Retrieved 28 February 2017.
LLVM community. "The
LLVM Target-Independent Code Generator". LLVM
Documentation. Retrieved 17 June 2016.
Compiler textbook references A collection of references to mainstream
Compiler Construction Textbooks
Aho, Alfred V.; Sethi, Ravi; Ullman, Jeffrey D. (1986). Compilers:
Principles, Techniques, and Tools (1st ed.). Addison-Wesley.
Allen, Frances E. (September 1981). "A History of Language Processor
Technology in IBM" (PDF).
IBM Journal of Research and Development.
IBM. 25 (5). doi:10.1147/rd.255.0535. (Subscription required
Allen, Randy; Kennedy, Ken (2001). Optimizing
Compilers for Modern
Morgan Kaufmann Publishers.
Appel, Andrew Wilson (2002). Modern
Compiler Implementation in Java
(2nd ed.). Cambridge University Press. ISBN 0-521-82060-X.
Appel, Andrew Wilson (1998). Modern
Compiler Implementation in ML.
Cambridge University Press. ISBN 0-521-58274-1.
Bornat, Richard (1979). Understanding and Writing Compilers: A Do It
Yourself Guide (PDF). Macmillan Publishing.
Cooper, Keith Daniel; Torczon, Linda (2012). Engineering a compiler
(2nd ed.). Amsterdam: Elsevier/Morgan Kaufmann. p. 8.
ISBN 9780120884780. OCLC 714113472.
McKeeman, William Marshall; Horning, James J.; Wortman, David B.
Compiler Generator. Englewood Cliffs, NJ: Prentice-Hall.
Muchnick, Steven (1997). Advanced
Compiler Design and Implementation.
Morgan Kaufmann Publishers. ISBN 1-55860-320-4.
Scott, Michael Lee (2005). Programming Language Pragmatics (2nd ed.).
Morgan Kaufmann. ISBN 0-12-633951-1.
Srikant, Y. N.; Shankar, Priti (2003). The
Compiler Design Handbook:
Optimizations and Machine Code Generation. CRC Press.
Terry, Patrick D. (1997).
Compiler Generators: An
Introduction with C++. International Thomson Computer Press.
Wirth, Niklaus (1996).
Compiler Construction (PDF). Addison-Wesley.
Look up compiler in Wiktionary, the free dictionary.
Wikibooks has a book on the topic of:
Compilers at Curlie (based on DMOZ)
Incremental Approach to
Compiler Construction – a PDF tutorial
Compiler Design by Torben Ægidius Mogensen
Short animation on
YouTube explaining the key conceptual difference
between compilers and interpreters
Syntax Analysis & LL1
Parsing on YouTube
Let's Build a Compiler, by Jack Crenshaw
Forum about compiler development
BNF: cb120631538 (data)