HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, loop fission (or loop distribution) is a
compiler optimization In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
in which a
loop Loop or LOOP may refer to: Brands and enterprises * Loop (mobile), a Bulgarian virtual network operator and co-founder of Loop Live * Loop, clothing, a company founded by Carlos Vasquez in the 1990s and worn by Digable Planets * Loop Mobile, an ...
is broken into multiple loops over the same index range with each taking only a part of the original loop's body. The goal is to break down a large loop body into smaller ones to achieve better utilization of
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
. This optimization is most efficient in
multi-core processor A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...
s that can split a task into multiple tasks for each
processor Processor may refer to: Computing Hardware * Processor (computing) **Central processing unit (CPU), the hardware within a computer that executes a program *** Microprocessor, a central processing unit contained on a single integrated circuit (I ...
. Conversely, loop fusion (or loop jamming) is a
compiler optimization In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
and
loop transformation In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capab ...
which replaces multiple
loop Loop or LOOP may refer to: Brands and enterprises * Loop (mobile), a Bulgarian virtual network operator and co-founder of Loop Live * Loop, clothing, a company founded by Carlos Vasquez in the 1990s and worn by Digable Planets * Loop Mobile, an ...
s with a single one. Loop fusion does not always improve run-time speed. On some
architectures Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings o ...
, two loops may actually perform better than one loop because, for example, there is increased
data locality In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
within each loop. One of the main benefits of loop fusion is that it allows temporary allocations to be avoided, which can lead to huge performance gains in numerical computing languages such as
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
when doing elementwise operations on arrays (however, Julia's loop fusion is not technically a compiler optimization, but a syntactic guarantee of the language). Other benefits of loop fusion are that it avoids the overhead of the loop control structures, and also that it allows the loop body to be parallelized by the processor by taking advantage of
instruction-level parallelism Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution. Disc ...
. This is possible when there are no data dependencies between the bodies of the two loops (this is in stark contrast to the other main benefit of loop fusion described above, which only presents itself when there ''are'' data dependencies that require an intermediate allocation to store the results). If loop fusion is able to remove redundant allocations, performance increases can be large. Otherwise, there is a more complex trade-off between data locality, instruction-level parallelism, and loop overhead (branching, incrementing, etc.) that may make loop fusion, loop fission, or neither, the preferable optimization.


Fission


Example in C

int i, a 00 b 00 for (i = 0; i < 100; i++) is equivalent to: int i, a 00 b 00 for (i = 0; i < 100; i++) for (i = 0; i < 100; i++)


Fusion


Example in C++ and MATLAB

Consider the following MATLAB code: x = 0:999; % Create an array of numbers from 0 to 999 (range is inclusive) y = sin(x) + 4; % Take the sine of x (element-wise) and add 4 to each element You could achieve the same syntax in
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
by using function and operator overloading: #include #include #include #include class Array ; int main(int argc, char* argv[]) However, the above example unnecessarily allocates a temporary array for the result of sin(x). A more efficient implementation would allocate a single array for y, and compute y in a single loop. To optimize this, a C++ compiler would need to: # Inline the sin and operator+ function calls. # Fuse the loops into a single loop. # Remove the unused stores into the temporary arrays (can use a register or stack variable instead). # Remove the unused allocation and free. All of these steps are individually possible. Even step four is possible despite the fact that functions like malloc and free have global side effects, since some compilers hardcode symbols such as malloc and free so that they can remove unused allocations from the code. However, as of
clang Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection ...
12.0.0 and gcc 11.1, this loop fusion and redundant allocation removal does not occur - even on the highest optimization level. Some languages specifically targeted towards numerical computing such as Julia might have the concept of loop fusion built into it at a high level, where the compiler will notice adjacent elementwise operations and fuse them into a single loop. Currently, to achieve the same syntax in general purpose languages like C++, the sin and operator+ functions must pessimistically allocate arrays to store their results, since they do not know what context they will be called from. This issue can be avoided in C++ by using a different syntax that does not rely on the compiler to remove unnecessary temporary allocations (e.g., using functions and overloads for in-place operations, such as operator+= or std::transform).


See also

*
Expression templates Expression templates are a C++ template metaprogramming technique that builds structures representing a computation at compile time, where expressions are evaluated only as needed to produce efficient code for the entire computation. Expression te ...
*
Loop transformation In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capab ...


References


Loop fission
{{Compiler optimizations Articles with example C code Compiler optimizations