HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, an optimizing compiler is a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a
program Program, programme, programmer, or programming may refer to: Business and management * Program management, the process of managing several related projects * Time management * Program, a part of planning Arts and entertainment Audio * Progra ...
's execution time,
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
footprint, storage size, and
power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...
consumption (the last three being popular for
portable computer A portable computer is a computer designed to be easily moved from one place to another and included a display and keyboard together, with a single plug, much like later desktop computers called '' all-in-ones'' (AIO), that integrate the s ...
s). Compiler optimization is generally implemented using a sequence of ''optimizing transformations'', algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster. It has been shown that some code optimization problems are
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...
, or even undecidable. In practice, factors such as the programmer's willingness to wait for the compiler to complete its task place upper limits on the optimizations that a compiler might provide. Optimization is generally a very CPU- and memory-intensive process. In the past, computer memory limitations were also a major factor in limiting which optimizations could be performed. Because of these factors, optimization rarely produces "optimal" output in any sense, and in fact, an "optimization" may impede performance in some cases. Rather, they are heuristic methods for improving resource usage in typical programs.


Types of optimization

Techniques used in optimization can be broken up among various ''scopes'' which can affect anything from a single statement to the entire program. Generally speaking, locally scoped techniques are easier to implement than global ones but result in smaller gains. Some examples of scopes include: ;
Peephole optimization Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions; the small set is known as the peephole or window. Peephole optimization involves changing the small set of instructions to an equiva ...
s: These are usually performed late in the compilation process after
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
has been generated. This form of optimization examines a few adjacent instructions (like "looking through a peephole" at the code) to see whether they can be replaced by a single instruction or a shorter sequence of instructions. For instance, a multiplication of a value by 2 might be more efficiently executed by left-shifting the value or by adding the value to itself (this example is also an instance of
strength reduction In compiler construction, strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loo ...
). ;Local optimizations: These only consider information local to a
basic block In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually deco ...
. Since basic blocks have no control flow, these optimizations need very little analysis, saving time and reducing storage requirements, but this also means that no information is preserved across jumps. ;Global optimizations: These are also called "intraprocedural methods" and act on whole functions. This gives them more information to work with, but often makes expensive computations necessary. Worst case assumptions have to be made when function calls occur or global variables are accessed because little information about them is available. ;
Loop optimization In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabi ...
s: These act on the statements which make up a loop, such as a ''for'' loop, for example
loop-invariant code motion In computer programming, loop-invariant code consists of statements or expressions (in an imperative programming language) that can be moved outside the body of a loop without affecting the semantics of the program. Loop-invariant code motion ( ...
. Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops.Aho, Sethi, and Ullman, ''Compilers'', p. 596. ;Prescient store optimizations: These allow store operations to occur earlier than would otherwise be permitted in the context of threads and locks. The process needs some way of knowing ahead of time what value will be stored by the assignment that it should have followed. The purpose of this relaxation is to allow compiler optimization to perform certain kinds of code rearrangement that preserve the semantics of properly synchronized programs. ;Interprocedural, whole-program or
link-time optimization Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimiz ...
: These analyze all of a program's source code. The greater quantity of information extracted means that optimizations can be more effective compared to when they only have access to local information, i.e. within a single function. This kind of optimization can also allow new techniques to be performed. For instance, function
inlining In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without cha ...
, where a call to a function is replaced by a copy of the function body. ;Machine code optimization and
object code optimizer An object code optimizer, sometimes also known as a post pass optimizer or, for small sections of code, peephole optimizer, takes the output from a source language compile step - the object code or binary file - and tries to replace identifiabl ...
: These analyze the executable task image of the program after all of an executable machine code has been linked. Some of the techniques that can be applied in a more limited scope, such as macro compression which saves space by collapsing common sequences of instructions, are more effective when the entire executable task image is available for analysis. * In addition to scoped optimizations, there are two further general categories of optimization: ;
Programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
-independent vs. language-dependent: Most high-level languages share common programming constructs and abstractions: decision (if, switch, case), looping (for, while, repeat.. until, do.. while), and encapsulation (structures, objects). Thus similar optimization techniques can be used across languages. However, certain language features make some kinds of optimizations difficult. For instance, the existence of pointers in C and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
makes it difficult to optimize array accesses (see
alias analysis Alias may refer to: * Pseudonym * Pen name * Nickname Arts and entertainment Film and television * ''Alias'' (2013 film), a 2013 Canadian documentary film * ''Alias'' (TV series), an American action thriller series 2001–2006 * ''Alias the J ...
). However, languages such as
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
(that also supports pointers) nevertheless have available sophisticated optimizing compilers to achieve better performance in various other ways. Conversely, some language features make certain optimizations easier. For example, in some languages functions are not permitted to have
side effects In medicine, a side effect is an effect, whether therapeutic or adverse, that is secondary to the one intended; although the term is predominantly employed to describe adverse effects, it can also apply to beneficial, but unintended, consequence ...
. Therefore, if a program makes several calls to the same function with the same arguments, the compiler can immediately infer that the function's result need be computed only once. In languages where functions are allowed to have side effects, another strategy is possible. The optimizer can determine which function has no side effects and restrict such optimizations to side effect free functions. This optimization is only possible when the optimizer has access to the called function. ;Machine-independent vs. machine-dependent: Many optimizations that operate on abstract programming concepts (loops, objects, structures) are independent of the machine targeted by the compiler, but many of the most effective optimizations are those that best exploit special features of the target platform. Examples are instructions which do several things at once, such as decrement register and branch if not zero. The following is an instance of a local machine dependent optimization. To set a
register Register or registration may refer to: Arts entertainment, and media Music * Register (music), the relative "height" or range of a note, melody, part, instrument, etc. * ''Register'', a 2017 album by Travis Miller * Registration (organ), th ...
to 0, the obvious way is to use the constant '0' in an instruction that sets a register value to a constant. A less obvious way is to XOR a register with itself. It is up to the compiler to know which instruction variant to use. On many RISC machines, both instructions would be equally appropriate, since they would both be the same length and take the same time. On many other
microprocessor A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circ ...
s such as the
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
family, it turns out that the XOR variant is shorter and probably faster, as there will be no need to decode an immediate operand, nor use the internal "immediate operand register". A potential problem with this is that XOR may introduce a data dependency on the previous value of the register, causing a
pipeline Pipeline may refer to: Electronics, computers and computing * Pipeline (computing), a chain of data-processing stages or a CPU optimization found on ** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
stall. However, processors often have XOR of a register with itself as a special case that does not cause stalls.


Factors affecting optimization

;The machine itself: Many of the choices about which optimizations can and should be done depend on the characteristics of the target machine. It is sometimes possible to parameterize some of these machine dependent factors, so that a single piece of compiler code can be used to optimize different machines just by altering the machine description parameters. GCC from
GNU Project The GNU Project () is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and computing devices by collaborat ...
and
Clang Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection ...
from the LLVM Compiler Infrastructure are examples of optimizing compilers that follow this approach. ;The architecture of the target CPU: Number of CPU registers: To a certain extent, the more registers, the easier it is to optimize for performance.
Local variable In computer science, a local variable is a variable that is given ''local scope''. A local variable reference in the function or block in which it is declared overrides the same variable name in the larger scope. In programming languages with o ...
s can be allocated in the registers and not on the stack. Temporary/intermediate results can be left in registers without writing to and reading back from memory. * RISC vs CISC: CISC instruction sets often have variable instruction lengths, often have a larger number of possible instructions that can be used, and each instruction could take differing amounts of time. RISC instruction sets attempt to limit the variability in each of these: instruction sets are usually constant length, with few exceptions, there are usually fewer combinations of registers and memory operations, and the instruction issue rate (the number of instructions completed per time period, usually an integer multiple of the clock cycle) is usually constant in cases where memory latency is not a factor. There may be several ways of carrying out a certain task, with CISC usually offering more alternatives than RISC. Compilers have to know the relative costs among the various instructions and choose the best instruction sequence (see
instruction selection __NOTOC__ In computer science, ''instruction selection'' is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruction ...
). * Pipelines: A pipeline is essentially a CPU broken up into an
assembly line An assembly line is a manufacturing process (often called a ''progressive assembly'') in which parts (usually interchangeable parts) are added as the semi-finished assembly moves from workstation to workstation where the parts are added in se ...
. It allows use of parts of the CPU for different instructions by breaking up the execution of instructions into various stages: instruction decode, address decode, memory fetch, register fetch, compute, register store, etc. One instruction could be in the register store stage, while another could be in the register fetch stage. Pipeline conflicts occur when an instruction in one stage of the pipeline depends on the result of another instruction ahead of it in the pipeline but not yet completed. Pipeline conflicts can lead to
pipeline stall In the design of pipelined computer processors, a pipeline stall is a delay in execution of an instruction in order to resolve a hazard. Details In a standard five-stage pipeline, during the decoding stage, the control unit will determine whe ...
s: where the CPU wastes cycles waiting for a conflict to resolve. :Compilers can ''schedule'', or reorder, instructions so that pipeline stalls occur less frequently. * Number of functional units: Some CPUs have several ALUs and FPUs. This allows them to execute multiple instructions simultaneously. There may be restrictions on which instructions can pair with which other instructions ("pairing" is the simultaneous execution of two or more instructions), and which functional unit can execute which instruction. They also have issues similar to pipeline conflicts. : Here again, instructions have to be scheduled so that the various functional units are fully fed with instructions to execute. ;The architecture of the machine: *
Cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
size (256 kiB–12 MiB) and type (direct mapped, 2-/4-/8-/16-way associative, fully associative): Techniques such as
inline expansion In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without cha ...
and
loop unrolling Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation ...
may increase the size of the generated code and reduce code locality. The program may slow down drastically if a highly utilized section of code (like inner loops in various algorithms) suddenly cannot fit in the cache. Also, caches which are not fully associative have higher chances of cache collisions even in an unfilled cache. * Cache/Memory transfer rates: These give the compiler an indication of the penalty for cache misses. This is used mainly in specialized applications. ;Intended use of the generated code: :; Debugging: While writing an application, a programmer will recompile and test often, and so compilation must be fast. This is one reason most optimizations are deliberately avoided during the test/debugging phase. Also, program code is usually "stepped through" (see
Program animation Program animation or stepping refers to the debugging method of executing code one instruction or line at a time. The programmer may examine the state of the program, machine, and related data before and after execution of a particular line of c ...
) using a
symbolic debugger A debugger or debugging tool is a computer program used to test and debug other programs (the "target" program). The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its executi ...
, and optimizing transformations, particularly those that reorder code, can make it difficult to relate the output code with the line numbers in the original source code. This can confuse both the debugging tools and the programmers using them. :;General-purpose use: Prepackaged software is very often expected to be executed on a variety of machines and CPUs that may share the same instruction set, but have different timing, cache or memory characteristics. As a result, the code may not be tuned to any particular CPU, or may be tuned to work best on the most popular CPU and yet still work acceptably well on other CPUs. :;Special-purpose use: If the software is compiled to be used on one or a few very similar machines, with known characteristics, then the compiler can heavily tune the generated code to those specific machines, provided such options are available. Important special cases include code designed for
parallel Parallel is a geometric term of location which may refer to: Computing * Parallel algorithm * Parallel computing * Parallel metaheuristic * Parallel (software), a UNIX utility for running programs in parallel * Parallel Sysplex, a cluster of ...
and
vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data calle ...
s, for which special parallelizing compilers are employed. :;
Embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
s: These are a common case of special-purpose use. Embedded software can be tightly tuned to an exact CPU and memory size. Also, system cost or reliability may be more important than the code's speed. For example, compilers for embedded software usually offer options that reduce code size at the expense of speed, because memory is the main cost of an embedded computer. The code's timing may need to be predictable, rather than as fast as possible, so code caching might be disabled, along with compiler optimizations that require it.


Common themes

To a large extent, compiler optimization techniques have the following themes, which sometimes conflict. ;Optimize the common case: The common case may have unique properties that allow a ''
fast path Fast path is a term used in computer science to describe a path with shorter instruction path length through a program compared to the normal path. For a fast path to be effective it must handle the most commonly occurring tasks more efficiently th ...
'' at the expense of a ''slow path''. If the fast path is taken most often, the result is better overall performance. ;Avoid redundancy: Reuse results that are already computed and store them for later use, instead of recomputing them. ;Less code: Remove unnecessary computations and intermediate values. Less work for the CPU, cache, and memory usually results in faster execution. Alternatively, in
embedded systems An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
, less code brings a lower product cost. ;Fewer jumps by using ''straight line code'', also called ''
branch-free code A branch is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. ''Branch'' (or ''branching'', ''branc ...
'': Less complicated code. Jumps (conditional or
unconditional branch A branch is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. ''Branch'' (or ''branching'', ''branc ...
es) interfere with the prefetching of instructions, thus slowing down code. Using inlining or loop unrolling can reduce branching, at the cost of increasing
binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fil ...
size by the length of the repeated code. This tends to merge several
basic block In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually deco ...
s into one. ;Locality: Code and data that are accessed closely together in time should be placed close together in memory to increase spatial
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
. ;Exploit the memory hierarchy: Accesses to memory are increasingly more expensive for each level of the memory hierarchy, so place the most commonly used items in registers first, then caches, then main memory, before going to disk. ;Parallelize: Reorder operations to allow multiple computations to happen in parallel, either at the instruction, memory, or thread level. ;More precise information is better: The more precise the information the compiler has, the better it can employ any or all of these optimization techniques. ;Runtime metrics can help: Information gathered during a test run can be used in
profile-guided optimization Profile-guided optimization (PGO, sometimes pronounced as ''pogo''), also known as profile-directed feedback (PDF), and feedback-directed optimization (FDO) is a compiler optimization technique in computer programming that uses profiling to imp ...
. Information gathered at runtime, ideally with minimal overhead, can be used by a
JIT Jit (also known as jiti, jit-jive and the Harare beat) is a style of popular Zimbabwean dance music. It features a swift rhythm played on drums and accompanied by a guitar. Jit evolved out many diverse influences, including domestic chimurenga, ...
compiler to dynamically improve optimization. ;Strength reduction: Replace complex or difficult or expensive operations with simpler ones. For example, replacing division by a constant with multiplication by its reciprocal, or using
induction variable analysis In computer science, an induction variable is a variable that gets increased or decreased by a fixed amount on every iteration of a loop or is a linear function of another induction variable. For example, in the following loop, i and j are induct ...
to replace multiplication by a loop index with addition.


Specific techniques


Loop optimizations

Some optimization techniques primarily designed to operate on loops include: ;
Induction variable analysis In computer science, an induction variable is a variable that gets increased or decreased by a fixed amount on every iteration of a loop or is a linear function of another induction variable. For example, in the following loop, i and j are induct ...
: Roughly, if a variable in a loop is a simple linear function of the index variable, such as j := 4*i + 1, it can be updated appropriately each time the loop variable is changed. This is a
strength reduction In compiler construction, strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loo ...
, and also may allow the index variable's definitions to become dead code. This information is also useful for bounds-checking elimination and
dependence analysis In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement ''S2'' depends on ''S1'' if ''S1'' must be executed before ''S2''. Broadly, there are two classes of depen ...
, among other things. ;
Loop fission In computer science, loop fission (or loop distribution) is a compiler optimization in which a loop is broken into multiple loops over the same index range with each taking only a part of the original loop's body. The goal is to break down a large l ...
or loop distribution: Loop fission attempts to break a loop into multiple loops over the same index range but each taking only a part of the loop's body. This can improve
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
, both of the data being accessed in the loop and the code in the loop's body. ;
Loop fusion In computer science, loop fission (or loop distribution) is a compiler optimization in which a loop is broken into multiple loops over the same index range with each taking only a part of the original loop's body. The goal is to break down a large ...
or loop combining or loop ramming or loop jamming : Another technique which attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times regardless of whether that number is known at compile time, their bodies can be combined as long as they make no reference to each other's data. ;
Loop inversion In computer science, loop inversion is a compiler optimization and loop transformation in which a while loop is replaced by an if block containing a do..while loop. When used correctly, it may improve performance due to instruction pipelining. ...
: This technique changes a standard ''while'' loop into a ''do/while'' (also known as ''repeat/until'') loop wrapped in an ''if'' conditional, reducing the number of jumps by two, for cases when the loop is executed. Doing so duplicates the condition check (increasing the size of the code), but is more efficient because jumps usually cause a
pipeline stall In the design of pipelined computer processors, a pipeline stall is a delay in execution of an instruction in order to resolve a hazard. Details In a standard five-stage pipeline, during the decoding stage, the control unit will determine whe ...
. Additionally, if the initial condition is known at compile-time and is known to be
side-effect In medicine, a side effect is an effect, whether therapeutic or adverse, that is secondary to the one intended; although the term is predominantly employed to describe adverse effects, it can also apply to beneficial, but unintended, consequence ...
-free, the ''if'' guard can be skipped. ;
Loop interchange In compiler theory, loop interchange is the process of exchanging the order of two iteration variables used by a nested loop. The variable used in the inner loop switches to the outer loop, and vice versa. It is often done to ensure that the elemen ...
: These optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve locality of reference, depending on the array's layout. ;
Loop-invariant code motion In computer programming, loop-invariant code consists of statements or expressions (in an imperative programming language) that can be moved outside the body of a loop without affecting the semantics of the program. Loop-invariant code motion ( ...
: If a quantity is computed inside a loop during every iteration, and its value is the same for each iteration, it can vastly improve efficiency to hoist it outside the loop and compute its value just once before the loop begins. This is particularly important with the address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with
loop inversion In computer science, loop inversion is a compiler optimization and loop transformation in which a while loop is replaced by an if block containing a do..while loop. When used correctly, it may improve performance due to instruction pipelining. ...
, because not all code is safe to be hoisted outside the loop. ;
Loop nest optimization In computer science and particularly in compiler design, loop nest optimization (LNO) is an optimization technique that applies a set of loop transformations for the purpose of locality optimization or parallelization or another loop overhead re ...
: Some pervasive algorithms such as matrix multiplication have very poor cache behavior and excessive memory accesses. Loop nest optimization increases the number of cache hits by performing the operation over small blocks and by using a loop interchange. ;
Loop reversal Loop or LOOP may refer to: Brands and enterprises * Loop (mobile), a Bulgarian virtual network operator and co-founder of Loop Live * Loop, clothing, a company founded by Carlos Vasquez in the 1990s and worn by Digable Planets * Loop Mobile, an ...
: Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization which can help eliminate dependencies and thus enable other optimizations. Furthermore, on some architectures, loop reversal contributes to smaller code, as when the loop index is being decremented, the condition that needs to be met in order for the running program to exit the loop is a comparison with zero. This is often a special, parameter-less instruction, unlike a comparison with a number, which needs the number to compare to. Therefore, the amount of bytes needed to store the parameter is saved by using the loop reversal. Additionally, if the comparison number exceeds the size of word of the platform, in standard loop order, multiple instructions would need to be executed in order to evaluate the comparison, which is not the case with loop reversal. ;
Loop unrolling Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation ...
: Unrolling duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which hurt performance by impairing the instruction pipeline. A "fewer jumps" optimization. Completely unrolling a loop eliminates all overhead, but requires that the number of iterations be known at compile time. ;
Loop splitting Loop splitting is a compiler optimization technique. It attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range. Loop ...
: Loop splitting attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range. A useful special case is '' loop peeling'', which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop. ;
Loop unswitching Loop unswitching is a compiler optimization. It moves a conditional inside a loop outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional. This can improve the paralleliza ...
: Unswitching moves a conditional from inside a loop to outside the loop by duplicating the loop's body inside each of the if and else clauses of the conditional. ;
Software pipelining In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler (or in the ...
: The loop is restructured in such a way that work done in an iteration is split into several parts and done over several iterations. In a tight loop, this technique hides the latency between loading and using values. ;
Automatic parallelization Automatic may refer to: Music Bands * Automatic (band), Australian rock band * Automatic (American band), American rock band * The Automatic, a Welsh alternative rock band Albums * ''Automatic'' (Jack Bruce album), a 1983 electronic rock ...
: A loop is converted into multi-threaded or vectorized (or even both) code in order to utilize multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine, including multi-core machines.


Data-flow optimizations

Data-flow optimizations, based on
data-flow analysis In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dataf ...
, primarily depend on how certain properties of data are propagated by control edges in the
control-flow graph In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was discovered by Frances E. Allen, who noted that ...
. Some of these include: ;
Common subexpression elimination In compiler theory, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions (i.e., they all evaluate to the same value), and analyzes whether it is worthwhile replacing them with a si ...
: In the expression (a + b) - (a + b)/4, "common subexpression" refers to the duplicated (a + b). Compilers implementing this technique realize that (a + b) will not change, and so only calculate its value once. ; Constant folding and propagation: replacing expressions consisting of constants (e.g., 3 + 5) with their final value (8) at compile time, rather than doing the calculation in run-time. Used in most modern languages. ; Induction variable recognition and elimination: see discussion above about ''induction variable analysis''. ; Alias classification and pointer analysis: in the presence of pointers, it is difficult to make any optimizations at all, since potentially any variable can have been changed when a memory location is assigned to. By specifying which pointers can alias which variables, unrelated pointers can be ignored. ; Dead-store elimination: removal of assignments to variables that are not subsequently read, either because the lifetime of the variable ends or because of a subsequent assignment that will overwrite the first value.


SSA-based optimizations

These optimizations are intended to be done after transforming the program into a special form called Static Single Assignment, in which every variable is assigned in only one place. Although some function without SSA, they are most effective with SSA. Many optimizations listed in other sections also benefit with no special changes, such as register allocation. ;
Global value numbering Value numbering is a technique of determining when two computations in a program are equivalent and eliminating one of them with a semantics-preserving optimization. Global value numbering Global value numbering (GVN) is a compiler optimization bas ...
: GVN eliminates redundancy by constructing a value graph of the program, and then determining which values are computed by equivalent expressions. GVN is able to identify some redundancy that
common subexpression elimination In compiler theory, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions (i.e., they all evaluate to the same value), and analyzes whether it is worthwhile replacing them with a si ...
cannot, and vice versa. ;
Sparse conditional constant propagation In computer science, sparse conditional constant propagation (SCCP) is an optimization frequently applied in compilers after conversion to static single assignment form (SSA). It simultaneously removes some kinds of dead code and propagates const ...
: Combines constant propagation, constant folding, and
dead-code elimination In compiler theory, dead-code elimination (also known as DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefit ...
, and improves upon what is possible by running them separately. This optimization symbolically executes the program, simultaneously propagating constant values and eliminating portions of the
control-flow graph In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was discovered by Frances E. Allen, who noted that ...
that this makes unreachable.


Code generator optimizations

;
Register allocation In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers. Register allocation can happen over a basic block (''local register allocatio ...
: The most frequently used variables should be kept in processor registers for fastest access. To find which variables to put in registers, an interference-graph is created. Each variable is a vertex and when two variables are used at the same time (have an intersecting liverange) they have an edge between them. This graph is colored using for example
Chaitin's algorithm Chaitin's algorithm is a bottom-up, graph coloring register allocation algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems o ...
using the same number of colors as there are registers. If the coloring fails one variable is "spilled" to memory and the coloring is retried. ;
Instruction selection __NOTOC__ In computer science, ''instruction selection'' is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruction ...
: Most architectures, particularly CISC architectures and those with many
addressing mode Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions i ...
s, offer several different ways of performing a particular operation, using entirely different sequences of instructions. The job of the instruction selector is to do a good job overall of choosing which instructions to implement which operators in the low-level
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
with. For example, on many processors in the
68000 family The Motorola 68000 series (also known as 680x0, m68000, m68k, or 68k) is a family of 32-bit complex instruction set computer (CISC) microprocessors. During the 1980s and early 1990s, they were popular in personal computers and workstations and ...
and on the x86 architecture, complex addressing modes can be used in statements like "lea 25(a1,d5*4), a0", allowing a single instruction to perform a significant amount of arithmetic with less storage. ;
Instruction scheduling In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing ...
: Instruction scheduling is an important optimization for modern pipelined processors, which avoids stalls or bubbles in the pipeline by clustering instructions with no dependencies together, while being careful to preserve the original semantics. ;
Rematerialization In computer science, rematerialization or remat is a compiler optimization which saves time by recomputing a value instead of loading it from memory. It is typically tightly integrated with register allocation, where it is used as an alternative ...
: Rematerialization recalculates a value instead of loading it from memory, preventing a memory access. This is performed in tandem with register allocation to avoid spills. ;Code factoring: If several sequences of code are identical, or can be parameterized or reordered to be identical, they can be replaced with calls to a shared subroutine. This can often share code for subroutine set-up and sometimes tail-recursion.Cx51 Compiler Manual, version 09.2001, p155, Keil Software Inc. ;
Trampolines A trampoline is a device consisting of a piece of taut, strong fabric stretched between a steel frame using many coiled springs. Not all trampolines have springs, as the Springfree Trampoline uses glass-reinforced plastic rods. People bounce on ...
: Many CPUs have smaller subroutine call instructions to access low memory. A compiler can save space by using these small calls in the main body of code. Jump instructions in low memory can access the routines at any address. This multiplies space savings from code factoring. ;Reordering computations: Based on
integer linear programming An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective ...
, restructuring compilers enhance data locality and expose more parallelism by reordering computations. Space-optimizing compilers may reorder code to lengthen sequences that can be factored into subroutines.


Functional language optimizations

Although many of these also apply to non-functional languages, they either originate in or are particularly critical in
functional language In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
s such as Lisp and ML. ;
Tail-call optimization In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recu ...
: A function call consumes stack space and involves some overhead related to parameter passing and flushing the instruction cache. Tail-recursive algorithms can be converted to
iteration Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
through a process called tail-recursion elimination or tail-call optimization. ;
Deforestation Deforestation or forest clearance is the removal of a forest or stand of trees from land that is then converted to non-forest use. Deforestation can involve conversion of forest land to farms, ranches, or urban use. The most concentrated ...
( data structure fusion): In languages where it is common for a sequence of transformations to be applied to a list, deforestation attempts to remove the construction of intermediate data structures. ; Partial evaluation


Other optimizations

; Bounds-checking elimination: Many languages, such as
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
, enforce
bounds checking In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before it is used. It is usually used to ensure that a number fits into a given type (range checking), or that a variable being used as ...
of all array accesses. This is a severe performance
bottleneck Bottleneck literally refers to the narrowed portion (neck) of a bottle near its opening, which limit the rate of outflow, and may describe any object of a similar shape. The literal neck of a bottle was originally used to play what is now known as ...
on certain applications such as scientific code. Bounds-checking elimination allows the compiler to safely remove bounds checking in many situations where it can determine that the index must fall within valid bounds; for example, if it is a simple loop variable. ;Branch-offset optimization (machine dependent): Choose the shortest branch displacement that reaches the target. ;Code-block reordering: Code-block reordering alters the order of the basic blocks in a program in order to reduce conditional branches and improve locality of reference. ;
Dead-code elimination In compiler theory, dead-code elimination (also known as DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefit ...
: Removes instructions that will not affect the behaviour of the program, for example definitions which have no uses, called dead code. This reduces code size and eliminates unnecessary computation. ;Factoring out of invariants (
loop invariant In computer science, a loop invariant is a property of a program loop that is true before (and after) each iteration. It is a logical assertion, sometimes checked within the code by an assertion call. Knowing its invariant(s) is essential in und ...
s): If an expression is carried out both when a condition is met and is not met, it can be written just once outside of the conditional statement. Similarly, if certain types of expressions (e.g., the assignment of a constant into a variable) appear inside a loop, they can be moved out of it because their effect will be the same no matter if they're executed many times or just once. This is also known as total redundancy elimination. A similar but more powerful optimization is
partial-redundancy elimination In compiler theory, partial redundancy elimination (PRE) is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program. PRE is a form of common subexpression elimination. An ex ...
(PRE). ;
Inline expansion In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without cha ...
or macro expansion: When some code invokes a procedure, it is possible to directly insert the body of the procedure inside the calling code rather than transferring control to it. This saves the overhead related to procedure calls, as well as providing an opportunity for many different parameter-specific optimizations, but comes at the cost of space; the procedure body is duplicated each time the procedure is called inline. Generally, inlining is useful in performance-critical code that makes a large number of calls to small procedures. A "fewer jumps" optimization. The
statements Statement or statements may refer to: Common uses *Statement (computer science), the smallest standalone element of an imperative programming language *Statement (logic), declarative sentence that is either true or false *Statement, a declarative ...
of
imperative programming In computer science, imperative programming is a programming paradigm of software that uses statements that change a program's state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program ...
languages are also an example of such an optimization. Although statements could be implemented with function calls they are almost always implemented with code inlining. ; Jump threading: In this optimization, consecutive conditional jumps predicated entirely or partially on the same condition are merged. : E.g., to , :and to . ;Macro compression: A space optimization that recognizes common sequences of code, creates subprograms ("code macros") that contain the common code, and replaces the occurrences of the common code sequences with calls to the corresponding subprograms. This is most effectively done as a
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
optimization, when all the code is present. The technique was first used to conserve space in an interpretive
byte stream A bitstream (or bit stream), also known as binary sequence, is a sequence of bits. A bytestream is a sequence of bytes. Typically, each byte is an 8-bit quantity, and so the term octet stream is sometimes used interchangeably. An octet may ...
used in an implementation of Macro Spitbol on
microcomputers A microcomputer is a small, relatively inexpensive computer having a central processing unit (CPU) made out of a microprocessor. The computer also includes memory and input/output (I/O) circuitry together mounted on a printed circuit board (PC ...
. The problem of determining an optimal set of macros that minimizes the space required by a given code segment is known to be
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...
, but efficient heuristics attain near-optimal results. ;Reduction of
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
collisions: (e.g., by disrupting alignment within a page) ; Stack-height reduction: Rearrange expression tree to minimize resources needed for expression evaluation. ;Test reordering: If we have two tests that are the condition for something, we can first deal with the simpler tests (e.g., comparing a variable to something) and only then with the complex tests (e.g., those that require a function call). This technique complements
lazy evaluation In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed ( non-strict evaluation) and which also avoids repeated evaluations (sharing). The ...
, but can be used only when the tests are not dependent on one another. Short-circuiting semantics can make this difficult.


Interprocedural optimizations

Interprocedural optimization Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used Function (computer science), functions of small or medium length. IPO differs ...
works on the entire program, across procedure and file boundaries. It works tightly with intraprocedural counterparts, carried out with the cooperation of a local part and global part. Typical interprocedural optimizations are: procedure
inlining In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without cha ...
, interprocedural dead-code elimination, interprocedural constant propagation, and procedure reordering. As usual, the compiler needs to perform interprocedural analysis before its actual optimizations. Interprocedural analyses include alias analysis, array access analysis, and the construction of a
call graph A call graph (also known as a call multigraph) is a control-flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge ''(f, g)'' indicates that procedure ''f'' cal ...
. Interprocedural optimization is common in modern commercial compilers from
SGI SGI may refer to: Companies *Saskatchewan Government Insurance *Scientific Games International, a gambling company *Silicon Graphics, Inc., a former manufacturer of high-performance computing products *Silicon Graphics International, formerly Rac ...
,
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
,
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washin ...
, and Sun Microsystems. For a long time, the open source GCC was criticized for a lack of powerful interprocedural analysis and optimizations, though this is now improving. Another open source compiler with full analysis and optimization infrastructure is
Open64 Open64 is a free, open-source, optimizing compiler for the Itanium and x86-64 microprocessor architectures. It derives from the SGI compilers for the MIPS R10000 processor, called ''MIPSPro''. It was initially released in 2000 as GNU GPL s ...
. Due to the extra time and space required by interprocedural analysis, most compilers do not perform it by default. Users must use compiler options explicitly to tell the compiler to enable interprocedural analysis and other expensive optimizations.


Practical considerations

There can be a wide range of optimizations that a compiler can perform, ranging from the simple and straightforward that take little compilation time to the elaborate and complex that involve considerable amounts of compilation time. Accordingly, compilers often provide options to their control command or procedure to allow the compiler user to choose how much optimization to request; for instance, the IBM FORTRAN H compiler allowed the user to specify no optimization, optimization at the registers level only, or full optimization.Aho, Sethi, and Ullman, ''Compilers'', p. 737. By the 2000s, it was common for compilers, such as
Clang Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection ...
, to have a number of compiler command options that could affect a variety of optimization choices, starting with the familiar -O2 switch. An approach to isolating optimization is the use of so-called
post-pass optimizer An object code optimizer, sometimes also known as a post pass optimizer or, for small sections of code, peephole optimizer, takes the output from a source language compile step - the object code or binary file - and tries to replace identifiabl ...
s (some commercial versions of which date back to mainframe software of the late 1970s). These tools take the executable output by an optimizing compiler and optimize it even further. Post-pass optimizers usually work on the assembly language or
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
level (in contrast with compilers that optimize intermediate representations of programs). One such example is the
Portable C Compiler The Portable C Compiler (also known as pcc or sometimes pccm - portable C compiler machine) is an early compiler for the C programming language written by Stephen C. Johnson of Bell Labs in the mid-1970s, based in part on ideas proposed by Alan ...
(pcc) of the 1980s, which had an optional pass that would perform post-optimizations on the generated assembly code. Another consideration is that optimization algorithms are complicated and, especially when being used to compile large, complex programming languages, can contain bugs that introduce errors in the generated code or cause internal errors during compilation. Compiler errors of any kind can be disconcerting to the user, but especially so in this case, since it may not be clear that the optimization logic is at fault. In the case of internal errors, the problem can be partially ameliorated by a "fail-safe" programming technique in which the optimization logic in the compiler is coded such that a failure is trapped, a warning message issued, and the rest of the compilation proceeds to successful completion.


History

Early compilers of the 1960s were often primarily concerned with simply compiling code correctly or efficiently, such that compile times were a major concern. One notable early optimizing compiler was the IBM FORTRAN H compiler of the late 1960s. Another of the earliest and important optimizing compilers, that pioneered several advanced techniques, was that for
BLISS BLISS is a system programming language developed at Carnegie Mellon University (CMU) by W. A. Wulf, D. B. Russell, and A. N. Habermann around 1970. It was perhaps the best known system language until C debuted a few years later. Since then, C b ...
(1970), which was described in ''
The Design of an Optimizing Compiler ''The Design of an Optimizing Compiler'' (Elsevier Science Ltd, 1980, ), by William Wulf, Richard K. Johnson, Charles B. Weinstock, Steven O. Hobbs, and Charles M. Geschke, was published in 1975 by Elsevier. It describes the BLISS optimizing comp ...
'' (1975).Aho, Sethi, and Ullman, ''Compilers'', pp. 740, 779. By the late 1980s, optimizing compilers were sufficiently effective that programming in assembly language declined. This co-evolved with the development of RISC chips and advanced processor features such as instruction scheduling and speculative execution, which were designed to be targeted by optimizing compilers rather than by human-written assembly code.


List of static code analyses

*
Alias analysis Alias may refer to: * Pseudonym * Pen name * Nickname Arts and entertainment Film and television * ''Alias'' (2013 film), a 2013 Canadian documentary film * ''Alias'' (TV series), an American action thriller series 2001–2006 * ''Alias the J ...
*
Pointer analysis In computer science, pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables, or storage locations. It is often a component of more complex anal ...
* Shape analysis *
Escape analysis In compiler optimization, escape analysis is a method for determining the dynamic scope of pointers where in the program a pointer can be accessed. It is related to pointer analysis and shape analysis. When a variable (or an object) is allocate ...
*
Array-access analysis In computer science, array-access analysis is a compiler analysis approach used to decide the read and write access patterns to elements or portions of arrays. The major data type manipulated in scientific programs is the array. The define/use anal ...
*
Dependence analysis In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement ''S2'' depends on ''S1'' if ''S1'' must be executed before ''S2''. Broadly, there are two classes of depen ...
* Control-flow analysis *
Data-flow analysis In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dataf ...
** Use-define chain analysis **
Live-variable analysis In compilers, live variable analysis (or simply liveness analysis) is a classic data-flow analysis to calculate the variables that are ''live'' at each point in the program. A variable is ''live'' at some point if it holds a value that may be neede ...
**
Available expression In the field of compiler optimizations, available expressions is an analysis algorithm that determines for each point in the program the set of expressions that need not be recomputed. Those expressions are said to be ''available'' at such a point. ...
analysis


See also

*
Algorithmic efficiency In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algo ...
*
Compile-time function execution In computing, compile-time function execution (or compile time function evaluation, or general constant expressions) is the ability of a compiler, that would normally compile a function to machine code and execute it at run time, to execute the fu ...
*
Profile-guided optimization Profile-guided optimization (PGO, sometimes pronounced as ''pogo''), also known as profile-directed feedback (PDF), and feedback-directed optimization (FDO) is a compiler optimization technique in computer programming that uses profiling to imp ...
* Full-employment theorem *
Just-in-time compilation In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may co ...
(JIT) *
Program optimization In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be o ...


References


External links


Citations from CiteSeer

Optimization manuals
by Agner Fog documentation about x86 processor architecture and low-level code optimization {{DEFAULTSORT:Optimizing Compiler Compiler construction Programming language implementation Compiler theory