HOME

TheInfoList



OR:

Zero-overhead looping is a feature of some
processor Processor may refer to: Computing Hardware * Processor (computing) **Central processing unit (CPU), the hardware within a computer that executes a program *** Microprocessor, a central processing unit contained on a single integrated circuit (I ...
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s whose hardware can repeat the body of a
loop Loop or LOOP may refer to: Brands and enterprises * Loop (mobile), a Bulgarian virtual network operator and co-founder of Loop Live * Loop, clothing, a company founded by Carlos Vasquez in the 1990s and worn by Digable Planets * Loop Mobile, an ...
automatically, rather than requiring software instructions which take up
cycle Cycle, cycles, or cyclic may refer to: Anthropology and social sciences * Cyclic history, a theory of history * Cyclical theory, a theory of American political history associated with Arthur Schlesinger, Sr. * Social cycle, various cycles in soc ...
s (and therefore time) to do so. Zero-overhead loops are common in
digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio si ...
s and some CISC instruction sets.


Background

In many instruction sets, a loop must be implemented by using instructions to increment or decrement a counter, check whether the end of the loop has been reached, and if not jump to the beginning of the loop so it can be repeated. Although this typically only represents around 3–16 bytes of space for each loop, even that small amount could be significant depending on the size of the
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...
s. More significant is that those instructions each take time to execute, time which is not spent doing useful work. The overhead of such a loop is apparent compared to a completely unrolled loop, in which the body of the loop is duplicated exactly as many times as it will execute. In that case, no space or execution time is wasted on instructions to repeat the body of the loop. However, the duplication caused by loop unrolling can significantly increase code size, and the larger size can even impact execution time due to
cache miss In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewher ...
es. (For this reason, it's common to only partially unroll loops, such as transforming it into a loop which performs the work of four iterations in one step before repeating. This balances the advantages of unrolling with the overhead of repeating the loop.) Moreover, completely unrolling a loop is only possible for a limited number of loops: those whose number of iterations is known at
compile time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concept ...
. For example, the following C code could be compiled and optimized into the following x86 assembly code:


Implementation

Processors with zero-overhead looping have machine instructions and registers to automatically repeat one or more instructions. Depending on the instructions available, these may only be suitable for count-controlled loops ("for loops") in which the number of iterations can be calculated in advance, or only for condition-controlled loops ("while loops") such as operations on
null-terminated string In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article). Alternative names are C stri ...
s.


Examples


PIC

In the PIC instruction set, the and instructions implement zero-overhead loops. only repeats a single instruction, while repeats a specified number of following instructions.


Blackfin

Blackfin The Blackfin is a family of 16-/32-bit microprocessors developed, manufactured and marketed by Analog Devices. The processors have built-in, fixed-point digital signal processor (DSP) functionality supplied by 16-bit multiply–accumulates (MA ...
offers two zero-overhead loops. The loops can be nested; if both hardware loops are configured with the same "loop end" address, loop 1 will behave as the inner loop and repeat, and loop 0 will behave as the outer loop and repeat only if loop 1 would not repeat. Loops are controlled using the and registers ( either 0 to 1) to set the top and bottom of the loop — that is, the first and last instructions to be executed, which can be the same for a loop with only one instruction — and for the loop count. The loop repeats if is nonzero at the end of the loop, in which case is decremented. The loop registers can be set manually, but this would typically consume 6 bytes to load the registers, and 8–16 bytes to set up the values to be loaded. More common is to use the loop setup instruction (represented in assembly as either with pseudo-instruction and , or in a single line as ), which optionally initializes and sets and to the desired values. This only requires 4–6 bytes, but can only set and within a limited range relative to where the loop setup instruction is located. P0 = array + 396; R0 = 100; LC0 = R0; LOOP my_loop LC0; // sets LT0 and LB0 LOOP_BEGIN my_loop; // pseudo-instruction; generates a label used to compute LT0 // LC0 cannot be written directly to memory, // so we must use a temporary register. R0 += -1; // equally fast and small would be R0 = LC0 0--= R0; LOOP_END my_loop; // pseudo-instruction; generates a label used to compute LB0


x86

The
x86 assembly language x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for t ...
prefixes implement zero-overhead loops for a few instructions (namely ). Depending on the prefix and the instruction, the instruction will be repeated a number of times with holding the repeat count, or until a match (or non-match) is found with or with . This can be used to implement some types of searches and operations on
null-terminated string In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article). Alternative names are C stri ...
s.


References

{{Reflist Computer hardware