Interprocedural optimization (IPO) is a collection of
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
techniques used in
computer programming
Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
to improve performance in programs containing many frequently used
functions of small or medium length. IPO differs from other
compiler optimizations by analyzing the entire program as opposed to a single function or block of code.
IPO seeks to reduce or eliminate duplicate calculations and inefficient use of memory and to simplify iterative sequences such as loops. If a call to another routine occurs within a loop, IPO analysis may determine that it is best to
inline that routine. Additionally, IPO may re-order the routines for better memory layout and
locality.
IPO may also include typical compiler optimizations applied on a whole-program level, for example,
dead code elimination
In compiler theory, dead-code elimination (DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove dead code (code that does not affect the program results). Removing such code has several benefits: i ...
(DCE), which removes code that is never executed. IPO also tries to ensure better use of constants. Modern compilers offer IPO as an option at compile-time. The actual IPO process may occur at any step between the human-readable source code and producing a finished executable binary program.
For languages that compile on a file-by-file basis, effective IPO across
translation units (module files) requires knowledge of the "entry points" of the program so that a whole program optimization (WPO) can be run. In many cases, this is implemented as a link-time optimization (LTO) pass, because the whole program is visible to the linker.
Analysis
The objective of any optimization for speed is to have the program run as swiftly as possible; the problem is that it is not possible for a compiler to correctly analyze a program and determine what it ''will'' do, much less what the programmer ''intended'' for it to do. By contrast, human programmers start at the other end with a purpose and attempt to produce a program that will achieve it, preferably without expending a lot of thought in the process.
For various reasons, including readability, programs are frequently broken up into a number of procedures that handle a few general cases. However, the generality of each procedure may result in wasted effort in specific usages. Interprocedural optimization represents an attempt at reducing this waste.
Suppose there is a procedure that evaluates F(x), and that F is a
pure function
In computer programming, a pure function is a function that has the following properties:
# the function return values are identical for identical arguments (no variation with local static variables, non-local variables, mutable reference ...
, and the code requests the result of F(6) and then later, F(6) again. This second evaluation is almost certainly unnecessary: the result could have instead been saved and referred to later. This simple optimization is foiled the moment that the implementation of F(x) becomes impure; that is, its execution involves references to parameters other than the explicit argument 6 that has been changed between the invocations, or side effects such as printing some message to a log, counting the number of evaluations, accumulating the
CPU time consumed, preparing internal tables so that subsequent invocations for related parameters will be facilitated, and so forth. Losing these side effects via non-evaluation a second time may be acceptable, or they may not.
More generally, aside from optimization, the second reason to use procedures is to avoid duplication of code that would produce the same results, or almost the same results, each time the procedure is performed. A general approach to optimization would therefore be to reverse this: some or all invocations of a certain procedure are replaced by the respective code, with the parameters appropriately substituted. The compiler will then try to optimize the result.
WPO and LTO
Whole program optimization (WPO) is the compiler optimization of a program using information about all the
modules in the program. Normally, optimizations are performed on a
per module, "compiland", basis; but this approach, while easier to write and test and less demanding of resources during the compilation itself, does not allow certainty about the safety of a number of optimizations such as aggressive
inlining and thus cannot perform them even if they would actually turn out to be efficiency gains that do not change the
semantics
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
of the emitted object code.
Link-time optimization (LTO) is a type of program optimization performed by a compiler to a program at
link time
In computer science, link time refers to the period of time, during the creation of a computer program, in which a linker is being applied to that program. Link time occurs after compile time and before runtime (when a program is executed).
It ...
. Link time optimization is relevant in programming languages that compile programs on a file-by-file basis, and then link those files together (such as
C and
Fortran), rather than all at once (such as
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
's
just-in-time compilation
In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is compilation (of computer code) during execution of a program (at run time) rather than before execution. This may consist of source code transl ...
(JIT)).
Once all files have been compiled separately into
object file
An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is kno ...
s, traditionally, a compiler links (merges) the object files into a single file, the
executable
In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), in ...
. However, in LTO as implemented by the
GNU Compiler Collection
The GNU Compiler Collection (GCC) is a collection of compilers from the GNU Project that support various programming languages, Computer architecture, hardware architectures, and operating systems. The Free Software Foundation (FSF) distributes ...
(GCC) and
LLVM
LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
, the compiler is able to dump its
intermediate representation
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
(IR), i.e.
GIMPLE
The GNU Compiler Collection (GCC) is a collection of compilers from the GNU Project that support various programming languages, hardware architectures, and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
bytecode or LLVM bitcode, respectively, so that all the different compilation units that will go to make up a single executable can be optimized as a single module when the link finally happens. This expands the scope of interprocedural optimizations to encompass the whole program (or, rather, everything that is visible at link time). With link-time optimization, the compiler can apply various forms of interprocedural optimization to the whole program, allowing for deeper analysis, more optimization, and ultimately better program performance.
In practice, LTO does not always optimize the entire program—
library functions, especially
dynamically linked shared objects, are intentionally kept out to avoid excessive duplication and to allow for updating.
Static linking does naturally lend to the concept of LTO, but it only works with library archives that contain IR objects as opposed to machine-code only object files.
Due to performance concerns, not even the entire unit is always directly used—a program could be partitioned in a divide-and-conquer style LTO such as GCC's WHOPR.
And of course, when the program being built is itself a library, the optimization would keep every externally-available (exported) symbol, without trying too hard at removing them as a part of DCE.
A much more limited form of WPO is still possible without LTO, as exemplified by GCC's switch. This mode makes GCC assume that the module being compiled contains the
entry point
In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments.
To start a program's execution, the loader or operating system passes co ...
of the entire program, so that every other function in it is not externally used and can be safely optimized away. Since it only applies to a single module, it cannot truly encompass the whole program. It can be combined with LTO in the one-big-module sense, which is useful when the linker is not communicating back to GCC about what entry points or symbols are being used externally.
History
For
procedural language
Procedural programming is a programming paradigm, classified as imperative programming, that involves implementing the behavior of a computer program as procedures (a.k.a. functions, subroutines) that call each other. The resulting program is a ...
s like
ALGOL
ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...
, interprocedural analysis and optimization appear to have entered commercial practice in the early 1970s. IBM's
PL/I
PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has b ...
Optimizing Compiler performed interprocedural analysis to understand the side effects of both procedure calls and exceptions (cast, in PL/I terms as "on conditions") and in papers by
Fran Allen. Work on compilation of the
APL programming language was necessarily interprocedural.
The techniques of interprocedural analysis and optimization were the subject of academic research in the 1980s and 1990s. They re-emerged into the commercial compiler world in the early 1990s with compilers from both
Convex Computer Corporation (the "Application Compiler" for the Convex C4) and from Ardent (the compiler for the
Ardent Titan). These compilers demonstrated that the technologies could be made sufficiently fast to be acceptable in a commercial compiler; subsequently interprocedural techniques have appeared in a number of commercial and non-commercial systems.
Flags and implementation
Unix-like
The
GNU Compiler Collection
The GNU Compiler Collection (GCC) is a collection of compilers from the GNU Project that support various programming languages, Computer architecture, hardware architectures, and operating systems. The Free Software Foundation (FSF) distributes ...
has function inlining at all optimization levels. At this only applies to those only called once (), at this constraint is relaxed (). By default this is a single-file-only behavior, but with link-time optimization it becomes whole program.
Clang
Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...
's command-line interface is similar to that of GCC, with the exception that there is no option.
Object files produced by LTO contain a compiler-specific
intermediate representation
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
(IR) that is interpreted at link-time. To make sure this plays well with
static libraries, newer
GNU linkers have a "linker plugin" interface that allows the compiler to convert the object files into a machine code form when needed. This plugin also helps drive the LTO process in general. Alternatively, a "fat LTO" object can be produced to contain both machine code and the IR, but this takes more space.
Since both GCC and LLVM (clang) are able produce an IR from a variety of programming languages, link-time IPO can happen even across language boundaries. This is most commonly demonstrated with C and C++, but LLVM makes it possible for
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
and all other LLVM-based compilers too.
Non-LTO options
GCC and Clang perform IPO by default at optimization level 2. However, the degree of optimization is limited when LTO is disabled, as IPO can only happen within an object file and non-
static functions can never be eliminated. The latter problem has a non-LTO solution: the switch can be used to assume that only is non-static, i.e. visible from the outside.
Another non-LTO technique is "function sections" ( in GCC and Clang). By placing each function into its own section in the object file, the linker can perform dead code removal without an IR by removing unreferenced sections (using the linker option ). A similar option is available for variables, but it causes much worse code to be produced.
Other
The
Intel C/C++ compilers allow whole-program IPO. The flag to enable interprocedural optimizations for a single file is , the flag to enable interprocedural optimization across all files in the program is .
The
MSVC compiler, integrated into Visual Studio, also supports interprocedural optimization on the whole program.
A compiler-independent interface for enabling whole-program interprocedural optimizations is via the property in
CMake.
See also
*
Profile-guided optimization (PGO)
*
Single compilation unit
References
External links
How to trick C/C++ compilers into generating terrible code?
{{DEFAULTSORT:Interprocedural Optimization
Compiler optimizations