HOME

TheInfoList



OR:

In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache Count ...
performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster.


Representation of computation and transformations

Since instructions inside loops can be executed repeatedly, it is frequently not possible to give a bound on the number of instruction executions that will be impacted by a loop optimization. This presents challenges when reasoning about the correctness and benefits of a loop optimization, specifically the representations of the computation being optimized and the optimization(s) being performed.In the book
Reasoning About Program Transformations
', Jean-Francois Collard discusses in depth the general question of representing executions of programs rather than program text in the context of static optimization.


Optimization via a sequence of loop transformations

Loop optimization can be viewed as the application of a sequence of specific loop transformations (listed below or in ''Compiler transformations for high-performance computing''David F. Bacon, Susan L. Graham, and Oliver J. Sharp. ''Compiler transformations for high-performance computing.'' Report No. UCB/CSD 93/781, Computer Science Division-EECS, University of California, Berkeley, Berkeley, California 94720, November 1993 (available at
CiteSeer CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer's goal is to improve the dissemination and access of a ...
br>
. Introduces compiler analysis such as data dependence analysis and interprocedural analysis, as well as a very complete list of loop transformations
) to the source code or
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
, with each
transformation Transformation may refer to: Science and mathematics In biology and medicine * Metamorphosis, the biological process of changing physical form after birth or hatching * Malignant transformation, the process of cells becoming cancerous * Trans ...
having an associated test for legality. A transformation (or sequence of transformations) generally must preserve the temporal sequence of all dependencies if it is to preserve the result of the program (i.e., be a legal transformation). Evaluating the benefit of a transformation or sequence of transformations can be quite difficult within this approach, as the application of one beneficial transformation may require the prior use of one or more other transformations that, by themselves, would result in reduced performance. Common loop transformations include: * Fission or distribution – loop fission attempts to break a loop into multiple loops over the same index range, but each new loop takes only part of the original loop's body. This can improve
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference locali ...
, both of the data being accessed in the loop and the code in the loop's body. *
Fusion Fusion, or synthesis, is the process of combining two or more distinct entities into a new whole. Fusion may also refer to: Science and technology Physics *Nuclear fusion, multiple atomic nuclei combining to form one or more different atomic nucl ...
or combining – this combines the bodies of two adjacent loops that would iterate the same number of times (whether or not that number is known at compile time), as long as they make no reference to each other's data. * Interchange or permutation – these optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve locality of reference, depending on the array's layout. *
Inversion Inversion or inversions may refer to: Arts * , a French gay magazine (1924/1925) * ''Inversion'' (artwork), a 2005 temporary sculpture in Houston, Texas * Inversion (music), a term with various meanings in music theory and musical set theory * ...
– this technique changes a standard ''while'' loop into a ''do/while'' (a.k.a. ''repeat/until'' ) loop wrapped in an ''if'' conditional, reducing the number of jumps by two for cases where the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a
pipeline stall In the design of pipelined computer processors, a pipeline stall is a delay in execution of an instruction in order to resolve a hazard. Details In a standard five-stage pipeline, during the decoding stage, the control unit will determine wh ...
. Additionally, if the initial condition is known at compile-time and is known to be side-effect-free, the initial ''if''-guard can be skipped. *
Loop-invariant code motion In computer programming, loop-invariant code consists of statements or expressions (in an imperative programming language) that can be moved outside the body of a loop without affecting the semantics of the program. Loop-invariant code motion ( ...
– this can vastly improve efficiency by moving a computation from inside the loop to outside of it, computing a value just once before the loop begins, if the resultant quantity of the calculation will be the same for every loop iteration (i.e., a loop-invariant quantity). This is particularly important with address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with inversion, because not all code is safe to be moved outside the loop. *
Parallelization Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different for ...
– this is a special case of automatic parallelization focusing on loops, restructuring them to run efficiently on multiprocessor systems. It can be done automatically by compilers ( automatic parallelization) or manually (inserting parallel directives like
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating sy ...
). * Reversal – a subtle optimization that reverses the order in which values are assigned to the index variable. This can help eliminate dependencies and thus enable other optimizations. Certain architectures utilize looping constructs at
assembly Assembly may refer to: Organisations and meetings * Deliberative assembly, a gathering of members who use parliamentary procedure for making decisions * General assembly, an official meeting of the members of an organization or of their representa ...
level that count in a single direction only (e.g., decrement-jump-if-not-zero JNZ. *
Scheduling A schedule or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things are i ...
– this divides a loop into multiple parts that may be run concurrently on multiple processors. * Skewing – this technique is applied to a nested loop iterating over a multidimensional array, where each iteration of the
inner loop Inner loop may refer to: *Inner loop in computer programs *Inner Loop (Phoenix), a section of Interstate 10 in downtown Phoenix, Arizona, United States *Inner Loop (Rochester), an expressway around downtown Rochester, New York, United States *Inner ...
depends on previous iterations, and rearranges its array accesses so that the only dependencies are between iterations of the outer loop. *
Software pipelining In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler (or in the ...
– a type of
out-of-order execution In computer engineering, out-of-order execution (or more formally dynamic execution) is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a process ...
of loop iterations to hide the latencies of processor function units. *
Splitting Splitting may refer to: * Splitting (psychology) * Lumpers and splitters, in classification or taxonomy * Wood splitting * Tongue splitting * Splitting, railway operation Mathematics * Heegaard splitting * Splitting field * Splitting principle ...
or peeling – this attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different portions of the index range. A special case is ''loop peeling'', which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop. *
Tiling Tiling may refer to: *The physical act of laying tiles *Tessellations Computing *The compiler optimization of loop tiling * Tiled rendering, the process of subdividing an image by regular grid *Tiling window manager People * Heinrich Sylvester T ...
or blocking – reorganizes a loop to iterate over blocks of data sized to fit in the cache. *
Vectorization Vectorization may refer to: Computing * Array programming, a style of computer programming where operations are applied to whole arrays instead of individual elements * Automatic vectorization, a compiler optimization that transforms loops to vect ...
– attempts to run as many of the loop iterations as possible at the same time on a
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
system. * Unrolling – duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which may degrade performance by impairing the instruction pipeline. Completely unrolling a loop eliminates all overhead (except multiple instruction fetches and increased program load time), but requires that the number of iterations be known at compile time (except in the case of
Just-in-time compilation In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may co ...
). Care must also be taken to ensure that multiple re-calculation of indexed variables is not a greater overhead than advancing pointers within the original loop. * Unswitching – moves a conditional from inside a loop to outside of it by duplicating the loop's body, and placing a version of it inside each of the ''if'' and ''else'' clauses of the conditional. *
Sectioning Involuntary commitment, civil commitment, or involuntary hospitalization/hospitalisation is a legal process through which an individual who is deemed by a qualified agent to have symptoms of severe mental disorder is detained in a psychiatric hos ...
or strip-mining – introduced for
vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...
s, loop-sectioning is a loop-transformation technique for enabling
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
(single instruction, multiple data)-encodings of loops and improving memory performance. This involves each vector operation being done for a size less-than or equal-to the maximum vector length on a given vector machine.


The unimodular transformation framework

The unimodular transformation approachSteven S. Muchnick,
Advanced Compiler Design and Implementation
'' 1997 Morgan Kaufmann. Section 20.4.2 discusses loop optimization.
uses a single
unimodular matrix In mathematics, a unimodular matrix ''M'' is a square integer matrix having determinant +1 or −1. Equivalently, it is an integer matrix that is invertible over the integers: there is an integer matrix ''N'' that is its inverse (these are equ ...
to describe the combined result of a sequence of many of the above transformations. Central to this approach is the view of the set of all executions of a statement within ''n'' loops as a set of integer points in an ''n''-dimensional space, with the points being executed in
lexicographical order In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a ...
. For example, the executions of a statement nested inside an outer loop with index ''i'' and an inner loop with index ''j'' can be associated with the pairs of integers . The application of a unimodular transformation corresponds to the multiplication of the points within this space by the matrix. For example, the interchange of two loops corresponds to the matrix \begin0&1\\1&0\end. A unimodular transformation is legal if it preserves the temporal sequence of all dependencies; measuring the performance impact of a unimodular transformation is more difficult. Imperfectly nested loops and some transformations (such as tiling) do not fit easily into this framework.


The polyhedral or constraint-based framework

The polyhedral modelR. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002. handles a wider class of programs and transformations than the unimodular framework. The set of executions of a set of statements within a possibly imperfectly nested set of loops is seen as the union of a set of polytopes representing the executions of the statements.
Affine transformation In Euclidean geometry, an affine transformation or affinity (from the Latin, ''affinis'', "connected with") is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles. More generally, ...
s are applied to these polytopes, producing a description of a new execution order. The boundaries of the polytopes, the data dependencies, and the transformations are often described using systems of constraints, and this approach is often referred to as a constraint-based approach to loop optimization. For example, a single statement within an outer loop '' and an inner loop '' is executed once for each pair such that . Once again, a transformation is legal if it preserves the temporal sequence of all dependencies. Estimating the benefits of a transformation, or finding the best transformation for a given code on a given computer, remain the subject of ongoing research as of the time of this writing (2010).


See also

*
Loop nest optimization In computer science and particularly in compiler design, loop nest optimization (LNO) is an optimization technique that applies a set of loop transformations for the purpose of locality optimization or parallelization or another loop overhead redu ...
* Polytope model * Scalable parallelism * Scalable locality


References

{{Compiler optimizations Compiler optimizations