Peephole optimization is an
optimization technique performed on a small set of compiler-generated instructions; the small set is known as the peephole or window.
Peephole optimization involves changing the small set of instructions to an equivalent set that has better performance.
For example:
* instead of pushing register A onto the stack and then immediately popping the value back into register A, a peephole optimization would remove both instructions;
* instead of adding A to A, a peephole optimization might do an arithmetic shift left;
* instead of multiplying a floating point register by 8, a peephole optimization might scale the floating point register's exponent by 3; and
* instead of multiplying an index by 4, adding the result to a base address to get a pointer value, and then dereferencing the pointer, a peephole optimization might use a hardware addressing mode that accomplishes the same result with one instruction.
The term ''peephole optimization'' was introduced by William Marshall McKeeman in 1965.
Replacement rules
Common techniques applied in peephole optimization:
* Null sequences – Delete useless operations.
* Combine operations – Replace several operations with one equivalent.
* Algebraic laws – Use algebraic laws to simplify or reorder instructions.
* Special case instructions – Use instructions designed for special operand cases.
* Address mode operations – Use address modes to simplify code.
There can be other types of peephole optimizations.
Examples
Replacing slow instructions with faster ones
The following
Java bytecode
In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming languages, ...
...
aload 1
aload 1
mul
...
can be replaced by
...
aload 1
dup
mul
...
This kind of optimization, like most peephole optimizations, makes certain assumptions about the efficiency of instructions. For instance, in this case, it is assumed that the
dup
operation (which duplicates and pushes the top of the
stack
Stack may refer to:
Places
* Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group
* Blue Stack Mountains, in Co. Donegal, Ireland
People
* Stack (surname) (including a list of people ...
) is more efficient than the
aload X
operation (which loads a local
variable
Variable may refer to:
* Variable (computer science), a symbolic name associated with a value and whose associated value may be changed
* Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many ...
identified as
X
and pushes it on the stack).
Removing redundant code
Another example is to eliminate redundant load stores.
a = b + c;
d = a + e;
is straightforwardly implemented as
MOV b, R0 ; Copy b to the register
ADD c, R0 ; Add c to the register, the register is now b+c
MOV R0, a ; Copy the register to a
MOV a, R0 ; Copy a to the register
ADD e, R0 ; Add e to the register, the register is now a+e b+c)+eMOV R0, d ; Copy the register to d
but can be optimised to
MOV b, R0 ; Copy b to the register
ADD c, R0 ; Add c to the register, which is now b+c (a)
MOV R0, a ; Copy the register to a
ADD e, R0 ; Add e to the register, which is now b+c+e a)+eMOV R0, d ; Copy the register to d
Removing redundant stack instructions
If the compiler saves registers on the stack before calling a subroutine and restores them when returning, consecutive calls to subroutines may have redundant stack instructions.
Suppose the compiler generates the following
Z80
The Z80 is an 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first working samples were ...
instructions for each procedure call:
PUSH AF
PUSH BC
PUSH DE
PUSH HL
CALL _ADDR
POP HL
POP DE
POP BC
POP AF
If there were two consecutive subroutine calls, they would look like this:
PUSH AF
PUSH BC
PUSH DE
PUSH HL
CALL _ADDR1
POP HL
POP DE
POP BC
POP AF
PUSH AF
PUSH BC
PUSH DE
PUSH HL
CALL _ADDR2
POP HL
POP DE
POP BC
POP AF
The sequence POP regs followed by PUSH for the same registers is generally redundant. In cases where it is redundant, a peephole optimization would remove these instructions. In the example, this would cause another redundant POP/PUSH pair to appear in the peephole, and these would be removed in turn. Assuming that subroutine _ADDR2 does not depend on previous register values, removing all of the
redundant code In computer programming, redundant code is source code or compiled code in a computer program that is unnecessary, such as:
* recomputing a value that has previously been calculated and is still available,
* code that is never executed (known as unr ...
in the example above would eventually leave the following code:
PUSH AF
PUSH BC
PUSH DE
PUSH HL
CALL _ADDR1
CALL _ADDR2
POP HL
POP DE
POP BC
POP AF
Implementation
Modern compilers often implement peephole optimizations with a
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
.
See also
*
Object code optimizer
An object code optimizer, sometimes also known as a post pass optimizer or, for small sections of code, peephole optimizer, takes the output from a source language compile step - the object code or binary file - and tries to replace identifiable ...
s, discussion in relation to general
algorithmic efficiency
In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algor ...
*
Capex Corporation
Capex Corporation was an American computer software company in existence from 1969 through 1982 and based in Phoenix, Arizona. It made a variety of software products, mostly system utilities for the IBM mainframe platform, and was known for its Op ...
– produced the
COBOL
COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily us ...
optimizer, an early mainframe
object code optimizer
An object code optimizer, sometimes also known as a post pass optimizer or, for small sections of code, peephole optimizer, takes the output from a source language compile step - the object code or binary file - and tries to replace identifiable ...
for
IBM Cobol
*
Superoptimization
Superoptimization is the process where a compiler automatically finds the optimal sequence for a loop-free sequence of instructions. Real-world compilers generally cannot produce genuinely ''optimal'' code, and while most standard compiler optim ...
*
Digital Research XLT86
A source-to-source translator, source-to-source compiler (S2S compiler), transcompiler, or transpiler is a type of translator that takes the source code of a program written in a programming language as its input and produces an equivalent sou ...
, an optimizing assembly source-to-source compiler
References
External links
*
tp://ftp.cs.princeton.edu/pub/lcc/contrib/copt.shar The copt general-purpose peephole optimizer by Christopher W. FraserThe original paper
{{Compiler optimizations
Compiler optimizations