__NOTOC__ In

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, ''instruction selection'' is the stage of a

compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

backend that transforms its middle-level

intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...

(IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruction scheduling and

register allocation In compiler optimization, register allocation is the process of assigning local automatic variables and Expression (computer science), expression results to a limited number of processor registers. Register allocation can happen over a basic bloc ...

; hence its output IR has an infinite set of pseudo-registers (often known as ''temporaries'') and may still be – and typically is – subject to peephole optimization. Otherwise, it closely resembles the target

machine code In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...

bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...

, or

assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...

. For example, for the following sequence of middle-level IR code

t1 = a
t2 = b
t3 = t1 + t2
a = t3
b = t1

a good instruction sequence for the

x86 architecture x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. T ...

is MOV EAX, a XCHG EAX, b ADD a, EAX For a comprehensive survey on instruction selection, see.

Macro expansion

The simplest approach to instruction selection is known as ''macro expansion'' or ''interpretative code generation''. A macro-expanding instruction selector operates by matching ''templates'' over the middle-level IR. Upon a match the corresponding ''macro'' is executed, using the matched portion of the IR as input, which emits the appropriate target instructions. Macro expansion can be done either directly on the textual representation of the middle-level IR, or the IR can first be transformed into a graphical representation which is then traversed depth-first. In the latter, a template matches one or more adjacent nodes in the graph. Unless the target machine is very simple, macro expansion in isolation typically generates inefficient code. To mitigate this limitation, compilers that apply this approach typically combine it with peephole optimization to replace combinations of simple instructions with more complex equivalents that increase performance and reduce code size. This is known as the ''Davidson-Fraser approach'' and is currently applied in GCC.

Graph covering

Another approach is to first transform the middle-level IR into a graph and then cover the graph using ''patterns''. A pattern is a template that matches a portion of the graph and can be implemented with a single instruction provided by the target machine. The goal is to cover the graph such that the total cost of the selected patterns is minimized, where the cost typically represents the number of cycles it takes to execute the instruction. For tree-shaped graphs, the least-cost cover can be found in linear time using dynamic programming, but for DAGs and full-fledged graphs the problem becomes NP-complete and thus is most often solved using either greedy algorithms or methods from combinatorial optimization.

References

External links

Alternative ways of supporting different generations of computer
{{Compiler optimizations Compiler optimizations