compiler optimization In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power c ...

, register allocation is the process of assigning local automatic variables and

expression Expression may refer to: Linguistics * Expression (linguistics), a word, phrase, or sentence * Fixed expression, a form of words with a specific meaning * Idiom, a type of fixed expression * Metaphorical expression, a particular word, phrase, ...

results to a limited number of

processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...

s. Register allocation can happen over a

basic block In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually deco ...

(''local register allocation''), over a whole function/

procedure Procedure may refer to: * Medical procedure * Instructions or recipes, a set of commands that show how to achieve some result, such as to prepare or make something * Procedure (business), specifying parts of a business process * Standard operat ...

(''global register allocation''), or across function boundaries traversed via call-graph (''interprocedural register allocation''). When done per function/procedure the

calling convention In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have b ...

may require insertion of save/restore around each call-site.

Context

Principle

{, class="wikitable floatright" , + Different number of scalar registers in the most common architectures , - ! Architecture ! scope="col" , 32 bits ! scope="col" , 64 bits , - ! scope="row" , ARM , 15 , 31 , - ! scope="row" , Intel x86 , 8 , 16 , - ! scope="row" , MIPS , 32 , 32 , - ! scope="row" , POWER/PowerPC , 32 , 32 , - ! scope="row" , RISC-V , 16/32 , 32 , - ! scope="row" , SPARC , 31 , 31 , - In many

programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming l ...

s, the programmer may use any number of variables. The computer can quickly read and write registers in the

CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...

, so the

computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer progra ...

runs faster when more variables can be in the CPU's registers. Also, sometimes code accessing registers is more compact, so the code is smaller, and can be fetched faster if it uses registers rather than memory. However, the number of registers is limited. Therefore, when the

compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...

is translating code to machine-language, it must decide how to allocate variables to the limited number of registers in the CPU. Not all variables are in use (or "live") at the same time, so, over the lifetime of a program, a given register may be used to hold different variables. However, two variables in use at the ''same'' time cannot be assigned to the same register without corrupting one of the variables. If there are not enough registers to hold all the variables, some variables may be moved to and from

RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch ...

. This process is called "spilling" the registers. Over the lifetime of a program, a variable can be both spilled and stored in registers: this variable is then considered as "split". Accessing RAM is significantly slower than accessing registers and so a compiled program runs slower. Therefore, an optimizing compiler aims to assign as many variables to registers as possible. A high "

Register pressure Register or registration may refer to: Arts entertainment, and media Music * Register (music), the relative "height" or range of a note, melody, part, instrument, etc. * ''Register'', a 2017 album by Travis Miller * Registration (organ), t ...

" is a technical term that means that more spills and reloads are needed; it is defined by Braun et al. as "the number of simultaneously live variables at an instruction". In addition, some computer designs

cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache Coun ...

frequently-accessed registers. So, programs can be further optimized by assigning the same register to a source and destination of a move instruction whenever possible. This is especially important if the compiler is using an

intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good ...

such as static single-assignment form (SSA). In particular, when SSA is not fully optimized it can artificially generate additional move instructions.

Components of register allocation

Register allocation consists therefore of choosing where to store the variables at runtime, i.e. inside or outside registers. If the variable is to be stored in registers, then the allocator needs to determine in which register(s) this variable will be stored. Eventually, another challenge is to determine the duration for which a variable should stay at the same location. A register allocator, disregarding the chosen allocation strategy, can rely on a set of core actions to address these challenges. These actions can be gathered in several different categories: ;Move insertion: This action consists of increasing the number of move instructions between registers, i.e. make a variable live in different registers during its lifetime, instead of one. This occurs in the split live range approach. ;Spilling: This action consists of storing a variable into memory instead of registers. ;Assignment: This action consists of assigning a register to a variable. ;Coalescing: This action consists of limiting the number of moves between registers, thus limiting the total number of instructions. For instance, by identifying a variable live across different methods, and storing it into one register during its whole lifetime. Many register allocation approaches optimize for one or more specific categories of actions. Registers CPU i386

Common problems raised in register allocation

Register allocation raises several problems that can be tackled (or avoided) by different register allocation approaches. Three of the most common problems are identified as follows: ;Aliasing: In some architectures, assigning a value to one register can affect the value of another: this is called aliasing. For example, the x86 architecture has four general purpose 32-bit registers that can also be used as 16-bit or 8-bits registers. In this case, assigning a 32-bits value to the eax register will affect the value of the al register. ;Pre-coloring: This problem is an act to force some variables to be assigned to particular registers. For example, in

PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM ...

s, parameters are commonly passed in R3-R10 and the return value is passed in R3. ;NP-Problem: Chaitin et al. showed that register allocation is a

NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryin ...

problem. They reduce the

graph coloring In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices ...

problem to the register allocation problem by showing that for an arbitrary graph, a program can be constructed such that the register allocation for the program (with registers representing nodes and machine registers representing available colors) would be a coloring for the original graph. As Graph Coloring is an NP-Hard problem and Register Allocation is in NP, this proves the NP-completeness of the problem.

Register allocation techniques

of code: it is said to be "local", and was first mentioned by Horwitz et al. As basic blocks do not contain branches, the allocation process is thought to be fast, because the management of

control-flow graph In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was discovered by Frances E. Allen, who noted tha ...

merge points in register allocation reveals itself a time-consuming operation. However, this approach is thought not to produce as optimized code as the "global" approach, which operates over the whole compilation unit (a method or procedure for instance).

Graph-coloring allocation

Graph-coloring allocation is the predominant approach to solve register allocation. It was first proposed by Chaitin et al. In this approach, nodes in the

graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...

represent live ranges ( variables,

temporaries ''Richelieu'' (french: Richelieu), also released as ''Temporaries'' in some territories, is a Canadian drama film, directed by Pier-Philippe Chevigny and released in 2023. The film stars Ariane Castellanos as Ariane, a woman who is hired as a Sp ...

, virtual/symbolic registers) that are candidates for register allocation. Edges connect live ranges that interfere, i.e., live ranges that are simultaneously live at at least one program point. Register allocation then reduces to the

problem in which colors (registers) are assigned to the nodes such that two nodes connected by an edge do not receive the same color. Using

liveness analysis In compilers, live variable analysis (or simply liveness analysis) is a classic data-flow analysis to calculate the variables that are ''live'' at each point in the program. A variable is ''live'' at some point if it holds a value that may be neede ...

, an interference graph can be built. The interference graph, which is an

undirected graph In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' v ...

where the nodes are the program's variables, is used to model which variables cannot be allocated to the same register.

Principle

The main phases in a Chaitin-style graph-coloring register allocator are: # Renumber: discover live range information in the source program. # Build: build the interference graph. # Coalesce: merge the live ranges of non-interfering variables related by copy instructions. # Spill cost: compute the spill cost of each variable. This assesses the impact of mapping a variable to memory on the speed of the final program. # Simplify: construct an ordering of the nodes in the inferences graph # Spill Code: insert spill instructions, i.e. loads and stores to commute values between registers and memory. # Select: assign a register to each variable.

Drawbacks and further improvements

The graph-coloring allocation has three major drawbacks. First, it relies on graph-coloring, which is an NP-complete problem, to decide which variables are spilled. Finding a minimal coloring graph is indeed an NP-complete problem. Second, unless live-range splitting is used, evicted variables are spilled everywhere: store (respectively load) instructions are inserted as early (respectively late) as possible, i.e., just after (respectively before) variable definitions (respectively uses). Third, a variable that is not spilled is kept in the same register throughout its whole lifetime. On the other hand, a single register name may appear in multiple register classes, where a class is a set of register names that are interchangeable in a particular role. Then, multiple register names may be aliases for a single hardware register. Finally, graph coloring is an aggressive technique for allocating registers, but is computationally expensive due to its use of the interference graph, which can have a worst-case size that is

quadratic In mathematics, the term quadratic describes something that pertains to squares, to the operation of squaring, to terms of the second degree, or equations or formulas that involve such terms. ''Quadratus'' is Latin for ''square''. Mathematics ...

in the number of live ranges. The traditional formulation of graph-coloring register allocation implicitly assumes a single bank of non-overlapping general-purpose registers and does not handle irregular architectural features like overlapping registers pairs, special purpose registers and multiple register banks. One later improvement of Chaitin-style graph-coloring approach was found by Briggs et al.: it is called conservative coalescing. This improvement adds a criterion to decide when two live ranges can be merged. Mainly, in addition to the non-interfering requirements, two variables can only be coalesced if their merging will not cause further spilling. Briggs et al. introduces a second improvement to Chaitin's works which is biased coloring. Biased coloring tries to assign the same color in the graph-coloring to live range that are copy related.

Linear scan

Linear scan is another global register allocation approach. It was first proposed by Poletto et al. in 1999. In this approach, the code is not turned into a graph. Instead, all the variables are linearly scanned to determine their live range, represented as an interval. Once the live ranges of all variables have been figured out, the intervals are traversed chronologically. Although this traversal could help identifying variables whose live ranges interfere, no interference graph is being built and the variables are allocated in a greedy way. The motivation for this approach is speed; not in terms of execution time of the generated code, but in terms of time spent in code generation. Typically, the standard graph coloring approaches produce quality code, but have a significant overhead, the used graph coloring algorithm having a quadratic cost. Owing to this feature, linear scan is the approach currently used in several JIT compilers, like the Hotspot client compiler, V8 and

Jikes RVM Jikes is an open-source Java compiler written in C++. It is no longer being updated. The original version was developed by David L. "Dave" Shields and Philippe Charles at IBM but was quickly transformed into an open-source project contributed ...

. The Hotspot server compiler uses graph coloring for its superior code.

Pseudocode

This describes the algorithm as first proposed by Poletto et al., where: * R is the number of available registers. * active is the list, sorted in order of increasing end point, of live intervals overlapping the current point and placed in registers. LinearScanRegisterAllocation active ← {} for each live interval ''i'', in order of increasing start point do ExpireOldIntervals(i) if length(active) = R then SpillAtInterval(i) else register ← a register removed from pool of free registers add ''i'' to active, sorted by increasing end point ExpireOldIntervals(i) for each interval ''j'' in active, in order of increasing end point do if endpoint ≥ startpoint then return remove ''j'' from active add register to pool of free registers SpillAtInterval(i) spill ← last interval in active if endpoint

pill Pill or The Pill may refer to: Drugs * Pill (pharmacy), referring to anything small for a specific dose of medicine * "The Pill", a general nickname for the combined oral contraceptive pill Film and television * ''The Pill'' (film), a 2011 fil ...

> endpoint then register ← register

location

← new stack location remove spill from active add ''i'' to active, sorted by increasing end point else location ← new stack location

Drawbacks and further improvements

However, the linear scan presents two major drawbacks. First, due to its greedy aspect, it does not take lifetime holes into account, i.e. "ranges where the value of the variable is not needed". Besides, a spilled variable will stay spilled for its entire lifetime. Shorter live ranges with SSA approach

Many other research works followed up on the Poletto's linear scan algorithm. Traub et al., for instance, proposed an algorithm called second-chance binpacking aiming at generating code of better quality. In this approach, spilled variables get the opportunity to be stored later in a register by using a different

heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediat ...

from the one used in the standard linear scan algorithm. Instead of using live intervals, the algorithm relies on live ranges, meaning that if a range needs to be spilled, it is not necessary to spill all the other ranges corresponding to this variable. Linear scan allocation was also adapted to take advantage from the SSA form: the properties of this intermediate representation simplify the allocation algorithm and allow lifetime holes to be computed directly. First, the time spent in data-flow graph analysis, aimed at building the lifetime intervals, is reduced, namely because variables are unique. It consequently produces shorter live intervals, because each new assignment corresponds to a new live interval. To avoid modeling intervals and liveness holes, Rogers showed a simplification called future-active sets that successfully removed intervals for 80% of instructions .

Rematerialization

The problem of optimal register allocation is NP-complete. As a consequence, compilers employ heuristic techniques to approximate its solution. Chaitin et al. discuss several ideas for improving the quality of spill code. They point out that certain values can be recomputed in a single instruction and that required operand will always be available for the computation. They call these exceptional values "never-killed" and note that such values should be recalculated instead of being spilled and reloaded. They further note that an uncoalesced copy of a never-killed value can be eliminated by recomputing it directly into the desired register. These techniques are termed rematerialization. In practice, opportunities for rematerialization include: * immediate loads of integer constants and, on some machines, floating-point constants, * computing a constant offset from the frame pointer or the static data area, and * loading non-local frame pointers from a display. Briggs et al. extend Chaitin's work to take advantage of rematerialization opportunities for complex, multi-valued live ranges. They found that each value needs to be tagged with enough information to allow the allocator to handle it correctly. Briggs's approach is the following: first, split each live range into its component values, then propagate rematerialization tags to each value, and form new live ranges from connected values having identical tags.

Coalescing

In the context of register allocation, coalescing is the act of merging variable-to-variable move operations by allocating those two variables to the same location. The coalescing operation takes place after the interference graph is built. Once two nodes have been coalesced, they must get the same color and be allocated to the same register, once the copy operation becomes unnecessary. Doing coalescing might have both positive and negative impacts on the colorability of the interference graph. For example, one negative impact that coalescing could have on graph inference colorability is when two nodes are coalesced, as the result node will have a union of the edges of those being coalesced. A positive impact of coalescing on inference graph colorability is, for example, when a node interferes with both nodes being coalesced, the degree of the node is reduced by one which leads to improving the overall colorability of the interference graph. There are several coalescing heuristics available: ; Aggressive coalescing: it was first introduced by Chaitin original register allocator. This heuristic aims at coalescing any non-interfering, copy-related nodes. From the perspective of copy elimination, this heuristic has the best results. On the other hand, aggressive coalescing could impact the colorability of the inference graph. ;Conservative Coalescing: it mainly uses the same heuristic as aggressive coalescing but it merges moves if, and only if, it does not compromise the colorability of the interference graph. ;Iterated coalescing: it removes one particular move at the time, while keeping the colorability of the graph. ;Optimistic coalescing: it is based on aggressive coalescing, but if the inference graph colorability is compromised, then it gives up as few moves as possible.

Mixed approaches

Hybrid allocation

Some other register allocation approaches do not limit to one technique to optimize register's use. Cavazos et al., for instance, proposed a solution where it is possible to use both the linear scan and the graph coloring algorithms. In this approach, the choice between one or the other solution is determined dynamically: first, a

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

algorithm is used "offline", that is to say not at runtime, to build a heuristic function that determines which allocation algorithm needs to be used. The heuristic function is then used at runtime; in light of the code behavior, the allocator can then chose between one of the two available algorithms. Trace register allocation is a recent approach developed by Eisl et al. This technique handles the allocation locally: it relies on dynamic profiling data to determine which branches will be the most frequently used in a given control flow graph. It then infers a set of "traces" (i.e. code segments) in which the merge point is ignored in favor of the most used branch. Each trace is then independently processed by the allocator. This approach can be considered as hybrid because it is possible to use different register allocation algorithms between the different traces.

Split allocation

Split allocation is another register allocation technique that combines different approaches, usually considered as opposite. For instance, the hybrid allocation technique can be considered as split because the first heuristic building stage is performed offline, and the heuristic use is performed online. In the same fashion, B. Diouf et al. proposed an allocation technique relying both on offline and online behaviors, namely static and dynamic compilation. During the offline stage, an optimal spill set is first gathered using

Integer Linear Programming An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective ...

. Then, live ranges are annotated using the compressAnnotation algorithm which relies on the previously identified optimal spill set. Register allocation is performed afterwards during the online stage, based on the data collected in the offline phase. In 2007, Bouchez et al. suggested as well to split the register allocation in different stages, having one stage dedicated to spilling, and one dedicated to coloring and coalescing.

Comparison between the different techniques

Several metrics have been used to assess the performance of one register allocation technique against the other. Register allocation has typically to deal with a trade-off between code quality, i.e. code that executes quickly, and analysis overhead, i.e. the time spent determining analyzing the source code to generate code with optimized register allocation. From this perspective, execution time of the generated code and time spent in liveness analysis are relevant metrics to compare the different techniques. Once relevant metrics have been chosen, the code on which the metrics will be applied should be available and relevant to the problem, either by reflecting the behavior of real-world application, or by being relevant to the particular problem the algorithm wants to address. The more recent articles about register allocation uses especially the Dacapo benchmark suite.

References

Sources

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

External links

A Tutorial on Integer Programming
* Conferenc
Integer Programming and Combinatorial Optimization, IPCO

The Aussois Combinatorial Optimization Workshop
* Bosscher, Steven; and Novillo, Diego
GCC gets a new Optimizer Framework
An article about GCC's use of SSA and how it improves over older IRs.

Extensive catalogue of SSA research papers. * Zadeck, F. Kenneth
"The Development of Static Single Assignment Form"
December 2007 talk on the origins of SSA. * VV.AA
"SSA-based Compiler Design"
(2014)
Citations from CiteSeer

Optimization manuals
by Agner Fog - documentation about x86 processor architecture and low-level code optimization {{DEFAULTSORT:Register Allocation Compiler optimizations

Context

Principle

Components of register allocation

Common problems raised in register allocation

Register allocation techniques

Graph-coloring allocation

Principle

Drawbacks and further improvements

Linear scan

Pseudocode

Drawbacks and further improvements

Rematerialization

Coalescing

Mixed approaches

Hybrid allocation

Split allocation

Comparison between the different techniques

See also

References

Sources

External links