computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...

, inline expansion, or inlining, is a manual or

compiler optimization An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption. Optimization is generally implemented as a sequence of op ...

that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compiling, without changing the

source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...

(the text), while macro expansion occurs before compiling, and results in different text that is then processed by the

compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

. Inlining is an important optimization, but has complex effects on performance. As a

rule of thumb In English language, English, the phrase ''rule of thumb'' refers to an approximate method for doing something, based on practical experience rather than theory. This usage of the phrase can be traced back to the 17th century and has been associat ...

, some inlining will improve speed at very minor cost of space, but excess inlining will hurt speed, due to inlined code consuming too much of the

instruction cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...

, and also cost significant space. A survey of the modest academic literature on inlining from the 1980s and 1990s is given in Peyton Jones & Marlow 1999.

Overview

Inline expansion is similar to macro expansion as the compiler places a new copy of the function in each place it is called. Inlined functions run a little faster than the normal functions as function-calling-overheads are saved, however, there is a memory penalty. If a function is inlined 10 times, there will be 10 copies of the function inserted into the code. Hence inlining is best for small functions that are called often. In C++ the member functions of a class, if defined within the class definition, are inlined by default (no need to use the ''inline''

reserved word In a programming language, a reserved word (sometimes known as a reserved identifier) is a word that cannot be used by a programmer as an identifier, such as the name of a variable, function, or label – it is "reserved from use". In brief, an '' ...

(keyword)); otherwise, the keyword is needed. The compiler may ignore the programmer’s attempt to inline a function, mainly if it is particularly large. Inline expansion is used to eliminate the time overhead (excess time) when a function is called. It is typically used for functions that execute frequently. It also has a space benefit for very small functions, and is an enabling transformation for other optimizations. Without inline functions, the

decides which functions to inline. The programmer has little or no control over which functions are inlined and which are not. Giving this degree of control to the programmer allows for the use of application-specific knowledge in choosing which functions to inline. Ordinarily, when a function is invoked, control is transferred to its definition by a

branch A branch, also called a ramus in botany, is a stem that grows off from another stem, or when structures like veins in leaves are divided into smaller veins. History and etymology In Old English, there are numerous words for branch, includ ...

or call instruction. With inlining, control drops through directly to the code for the function, without a branch or call instruction.

Compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

s usually implement statements with inlining. Loop conditions and loop bodies need

lazy evaluation In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an Expression (computer science), expression until its value is needed (non-strict evaluation) and which avoids repeated eva ...

. This property is fulfilled when the code to compute loop conditions and loop bodies is inlined. Performance considerations are another reason to inline statements. In the context of

functional programming In computer science, functional programming is a programming paradigm where programs are constructed by Function application, applying and Function composition (computer science), composing Function (computer science), functions. It is a declarat ...

languages, inline expansion is usually followed by the beta-reduction transformation. A programmer might inline a function manually through

copy-and-paste programming Copy-and-paste programming, sometimes referred to as just pasting, is the production of highly repetitive computer programming code, as produced by copy and paste operations. It is primarily a pejorative term; those who use the term are often impl ...

, as a one-time operation on the

. However, other methods of controlling inlining (see below) are preferable, because they do not precipitate bugs arising when the programmer overlooks a (possibly modified) duplicated version of the original function body, while fixing a bug in the inlined function.

Effect on performance

The direct effect of this optimization is to improve time performance (by eliminating call overhead), at the cost of worsening space usage (due to duplicating the function body). The code expansion due to duplicating the function body dominates, except for simple cases, and thus the direct effect of inline expansion is to improve time at the cost of space. However, the main benefit of inline expansion is to allow further optimizations and improved scheduling, due to increasing the size of the function body, as better optimization is possible on larger functions. The ultimate impact of inline expansion on speed is complex, due to multiple effects on performance of the memory system (mainly

), which dominates performance on modern processors: depending on the specific program and cache, inlining particular functions can increase or decrease performance. The impact of inlining varies by

programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...

and program, due to different degrees of abstraction. In lower-level imperative languages such as C and Fortran it is typically a 10–20% speed boost, with minor impact on code size, while in more abstract languages it can be significantly more important, due to the number of layers inlining removes, with an extreme example being

Self In philosophy, the self is an individual's own being, knowledge, and values, and the relationship between these attributes. The first-person perspective distinguishes selfhood from personal identity. Whereas "identity" is (literally) same ...

, where one compiler saw improvement factors of 4 to 55 by inlining. The direct benefits of eliminating a function call are: * It eliminates instructions needed for a

function call In computer programming, a function (also procedure, method, subroutine, routine, or subprogram) is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times. Callable units provide a p ...

, both in the calling function and in the callee: placing arguments on a

stack Stack may refer to: Places * Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group * Blue Stack Mountains, in Co. Donegal, Ireland People * Stack (surname) (including a list of people ...

or in registers, the function call itself, the function prologue, then at return the function epilogue, the return statement, and then getting the return value back, and removing arguments from stacks and restoring registers (if needed). * Due to not needing registers to pass arguments, it reduces register spilling. * It eliminates having to pass references and then dereference them, when using

call by reference In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the ...

(or call by address, or call by sharing). The main benefit of inlining, however, is the further optimizations it allows. Optimizations that cross function boundaries can be done without requiring

interprocedural optimization Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used Function (computer science), functions of small or medium length. IPO differs ...

(IPO): once inlining has been performed, added ''intra''procedural optimizations ("global optimizations") become possible on the enlarged function body. For example: * A constant passed as an argument can often be propagated to all instances of the matching parameter, or part of the function may be "hoisted out" of a loop (via

loop-invariant code motion In computer programming, loop-invariant code consists of statements or expressions (in an imperative programming, imperative programming language) that can be moved outside the body of a loop without affecting the semantics of the program. Loop-i ...

). *

Register allocation In compiler optimization, register allocation is the process of assigning local automatic variables and Expression (computer science), expression results to a limited number of processor registers. Register allocation can happen over a basic bloc ...

can be done across the larger function body. * High-level optimizations, such as escape analysis and tail duplication, can be performed on a larger scope and be more effective, more so if the compiler implementing those optimizations relies on mainly intra-procedural analysis. These can be done without inlining, but require a significantly more complex compiler and linker (in case caller and callee are in separate compiling units). Conversely, in some cases a language specification may allow a program to make added assumptions about arguments to procedures that it can no longer make after the procedure is inlined, preventing some optimizations. Smarter compilers (such as

Glasgow Haskell Compiler The Glasgow Haskell Compiler (GHC) is a native or machine code compiler for the functional programming language Haskell. It provides a cross-platform software environment for writing and testing Haskell code and supports many extensions, libra ...

(GHC)) will track this, but naive inlining loses this information. A further benefit of inlining for the memory system is: * Eliminating branches and keeping code that is executed close together in memory improves instruction cache performance by improving

locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...

(spatial locality and sequentiality of instructions). This is smaller than optimizations that specifically target sequentiality, but is significant. The direct cost of inlining is increased code size, due to duplicating the function body at each call site. However, it does not always do so, namely in case of very short functions, where the function body is smaller than the size of a function call (at the caller, including argument and return value handling), such as trivial

accessor method In computer science, a mutator method is a method used to control changes to a variable. They are also widely known as setter methods. Often a setter is accompanied by a getter, which returns the value of the private member variable. They are also ...

s or

mutator method In computer science, a mutator method is a method used to control changes to a variable. They are also widely known as setter methods. Often a setter is accompanied by a getter, which returns the value of the private member variable. They are also ...

s (getters and setters); or for a function that is only used in one place, in which case it is not duplicated. Thus inlining may be minimized or eliminated if optimizing for code size, as is often the case in

embedded system An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is e ...

s. Inlining also imposes a cost on performance, due to the code expansion (due to duplication) hurting instruction cache performance. This is most significant if, before expansion, the

working set Working set is a concept in computer science which defines the amount of memory that a process (computing), process requires in a given time interval. Definition Peter_J._Denning, Peter Denning (1968) defines "the working set of information W(t ...

of the program (or a hot section of code) fit in one level of the memory hierarchy (e.g., L1 cache), but after expansion it no longer fits, resulting in frequent cache misses at that level. Due to the significant difference in performance at different levels of the hierarchy, this hurts performance considerably. At the highest level this can result in increased

page fault In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space ...

s, catastrophic performance degradation due to thrashing, or the program failing to run at all. This last is rare in common desktop and server applications, where code size is small relative to available memory, but can be an issue for resource-constrained environments such as embedded systems. One way to mitigate this problem is to split functions into a smaller hot inline path (

fast path Fast path is a term used in computer science to describe a path with shorter instruction path length through a program compared to the normal path. For a fast path to be effective it must handle the most commonly occurring tasks more efficiently tha ...

), and a larger cold non-inline path (slow path). Inlining hurting performance is a problem for mainly large functions that are used in many places, but the break-even point beyond which inlining reduces performance is difficult to determine and depends in general on precise load, so it can be subject to manual optimization or profile-guided optimization. This is a similar issue to other code expanding optimizations such as

loop unrolling Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation c ...

, which also reduces number of instructions processed, but can decrease performance due to poorer cache performance. The precise effect of inlining on cache performance is complex. For small cache sizes (much smaller than the working set before expansion), the increased sequentiality dominates, and inlining improves cache performance. For cache sizes close to the working set, where inlining expands the working set so it no longer fits in cache, this dominates and cache performance decreases. For cache sizes larger than the working set, inlining has negligible impact on cache performance. Further, changes in cache design, such as load forwarding, can offset the increase in cache misses.

Compiler support

Compilers use a variety of mechanisms to decide which function calls should be inlined; these can include manual hints from programmers for specific functions, together with overall control via command-line options. Inlining is done automatically by many compilers in many languages, based on judgment of whether inlining is beneficial, while in other cases it can be manually specified via compiler directives, typically using a keyword or

compiler directive In computer programming, a directive or pragma (from "pragmatic") is a language construct that specifies how a compiler (or other translator) should process its input. Depending on the programming language, directives may or may not be part of the ...

called inline. Typically this only hints that inlining is desired, rather than requiring inlining, with the force of the hint varying by language and compiler. Typically, compiler developers keep the above performance issues in mind, and incorporate

heuristics A heuristic or heuristic technique (''problem solving'', '' mental shortcut'', ''rule of thumb'') is any approach to problem solving that employs a pragmatic method that is not fully optimized, perfected, or rationalized, but is nevertheless ...

into their compilers that choose which functions to inline so as to improve performance, rather than worsening it, in most cases.

Implementation

Once the

has decided to inline a particular function, performing the inlining operation itself is usually simple. Depending on whether a compiler inlines functions across code in different languages, the compiler can inline on either a high-level

intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...

(like

abstract syntax tree An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...

s) or a low-level intermediate representation. In either case, the compiler simply computes the

arguments An argument is a series of sentences, statements, or propositions some of which are called premises and one is the conclusion. The purpose of an argument is to give reasons for one's conclusion via justification, explanation, and/or persua ...

, stores them in variables corresponding to the function's arguments, and then inserts the body of the function at the call site. Linkers can also do function inlining. When a linker inlines functions, it may inline functions whose source is not available, such as library functions (see link-time optimization). A

runtime system In computer programming, a runtime system or runtime environment is a sub-system that exists in the computer where a program is created, as well as in the computers where the program is intended to be run. The name comes from the compile time ...

can inline a function also. Runtime inlining can use dynamic profiling information to make better decisions about which functions to inline, as in the Java HotSpot compiler. Here is a simple example of inline expansion performed "by hand" at the source level in the

C language C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities o ...

: int pred(int x) ''Before inlining:'' int func(int y) ''After inlining:'' int func(int y) Note that this is only an example. In an actual C application, it would be preferable to use an inlining language feature such as parameterized macros or

inline function In the C (programming language), C and C++ programming languages, an inline function is one qualified with the Keyword (computer programming), keyword inline; this serves two purposes: # It serves as a compiler directive that suggests (but doe ...

s to tell the compiler to transform the code in this way. The next section lists ways to optimize this code.

Inlining by assembly macro expansion

Assembler macros provide an alternative approach to inlining whereby a sequence of instructions can normally be generated inline by macro expansion from a single macro source statement (with zero or more parameters). One of the parameters might be an option to alternatively generate a one-time separate

subroutine In computer programming, a function (also procedure, method, subroutine, routine, or subprogram) is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times. Callable units provide a ...

containing the sequence and processed instead by an inlined call to the function. Example: MOVE FROM=array1,TO=array2,INLINE=NO

Heuristics

A range of different heuristics have been explored for inlining. Usually, an inlining algorithm has a certain code budget (an allowed increase in program size) and aims to inline the most valuable callsites without exceeding that budget. In this sense, many inlining algorithms are usually modeled after the

Knapsack problem The knapsack problem is the following problem in combinatorial optimization: :''Given a set of items, each with a weight and a value, determine which items to include in the collection so that the total weight is less than or equal to a given lim ...

. To decide which callsites are more valuable, an inlining algorithm must estimate their benefit—i.e. the expected decrease in the execution time. Commonly, inliners use profiling information about the frequency of the execution of different code paths to estimate the benefits. In addition to profiling information, newer

just-in-time compiler In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is compiler, compilation (of Source code, computer code) during execution of a program (at run time (program lifecycle phase), run time) rather than b ...

s apply several more advanced heuristics, such as:
Prokopec et al., An Optimization Driven Incremental Inline Substitution Algorithm for Just-In-Time Compilers, CGO'19 publication about the inliner used in the Graal compiler for the JVM * Speculating which code paths will result in the best reduction in execution time (by enabling additional compiler optimizations as a result of inlining) and increasing the perceived benefit of such paths. * Adaptively adjusting the benefit-per-cost threshold for inlining based on the size of the compiling unit and the amount of code already inlined. * Grouping subroutines into clusters, and inlining entire clusters instead of singular subroutines. Here, the heuristic guesses the clusters by grouping those methods for which inlining just a proper subset of the cluster leads to a worse performance than inlining nothing at all.

Benefits

Inline expansion itself is an optimization, since it eliminates overhead from calls, but it is much more important as an enabling transformation. That is, once the compiler expands a function body in the context of its call site—often with arguments that may be fixed

constants Constant or The Constant may refer to: Mathematics * Constant (mathematics), a non-varying value * Mathematical constant, a special number that arises naturally in mathematics, such as or Other concepts * Control variable or scientific const ...

—it may be able to do a variety of transformations that were not possible before. For example, a

conditional branch A branch, jump or transfer is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. ''Branch'' (or ''br ...

may turn out to be always true or always false at this particular call site. This in turn may enable

dead code elimination In compiler theory, dead-code elimination (DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove dead code (code that does not affect the program results). Removing such code has several benefits: i ...

, or induction variable elimination. In the C example in the prior section, optimizing opportunities abound. The compiler may follow this sequence of steps: * The tmp += 0 statements in the lines marked (2) and (3) do nothing. The compiler can remove them. * The condition

0 
 0

is always true, so the compiler can replace the line marked (2) with the consequent, tmp += 0 (which does nothing). * The compiler can rewrite the condition

y+1 
 0

y 
 -1

. * The compiler can reduce the expression (y + 1) - 1 to y. * The expressions y and y+1 cannot both equal zero. This lets the compiler eliminate one test. * In statements such as

if (y 
 0) return y

the value of y is known in the body, and can be inlined. The new function looks like: int func(int y)

Limits

Complete inline expansion is not always possible, due to

recursion Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in m ...

: recursively inline expanding the calls will not terminate. There are various solutions, such as expanding a bounded amount, or analyzing the

call graph A call graph (also known as a call multigraph) is a control-flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge ''(f, g)'' indicates that procedure ''f'' c ...

and breaking loops at certain nodes (i.e., not expanding some edge in a recursive loop). An identical problem occurs in macro expansion, as recursive expansion does not terminate, and is typically resolved by forbidding recursive macros (as in C and C++).

Comparison with macros

Traditionally, in languages such as C, inline expansion was accomplished at the source level using parameterized macros. Use of true inline functions, as are available in C99, provides several benefits over this approach: * In C, macro invocations do not perform

type checking In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a ''type'' (for example, integer, floating point, string) to every '' term'' (a word, phrase, or other set of symbols). Usu ...

, or even check that arguments are well-formed, whereas function calls usually do. * In C, a macro cannot use the return keyword with the same meaning as a function would do (it would make the function that asked the expansion terminate, rather than the macro). In other words, a macro cannot return anything which is not the result of the last expression invoked inside it. * Since C macros use mere textual substitution, this may result in unintended side-effects and inefficiency due to re-evaluation of arguments and

order of operations In mathematics and computer programming, the order of operations is a collection of rules that reflect conventions about which operations to perform first in order to evaluate a given mathematical expression. These rules are formalized with a ...

. * Compiler errors within macros are often difficult to understand, because they refer to the expanded code, rather than the code the programmer typed. Thus, debugging information for inlined code is usually more helpful than that of macro-expanded code. * Many constructs are awkward or impossible to express using macros, or use a significantly different syntax. Inline functions use the same syntax as regular functions, and can be inlined and un-inlined at will with ease. Many compilers can also inline expand some recursive functions; recursive macros are typically illegal. Bjarne Stroustrup, the designer of C++, likes to emphasize that macros should be avoided wherever possible, and advocates extensive use of inline functions.

Selection methods

Many compilers aggressively inline functions wherever it is beneficial to do so. Although it can lead to larger

executable In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), in ...

s, aggressive inlining has nevertheless become more and more desirable as memory capacity has increased faster than CPU speed. Inlining is a critical optimization in languages for functional and

object-oriented programming Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impl ...

, which rely on it to provide enough context for their typically small functions to make classical optimizations effective.

Language support

Many languages, including

Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...

and functional languages, do not provide language constructs for inline functions, but their compilers or interpreters often perform aggressive inline expansion. Other languages provide constructs for explicit hints, generally as compiler directives (pragmas). The language Ada has a pragma for inline functions. Functions in

Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...

may be defined as inline by the inline declaration as such: (declaim (inline dispatch)) (defun dispatch (x) (funcall (get (car x) 'dispatch) x)) The

Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...

compiler GHC tries to inline functions or values that are small enough but inlining may be noted explicitly using a language pragma: key_function :: Int -> String -> (Bool, Double)

C and C++

C and C++ have an inline keyword which serves as a hint that inlining may be beneficial; however, in newer versions, its main purpose is instead to alter the visibility and linking behavior of the function. https://en.cppreference.com/w/cpp/language/inline

Rust

Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...

, inlining is automatically done by the compiler. Rust provides an

# nline/code> attribute that suggests to the compiler that a function should be inlined, but does not guarantee it; the compiler may ignore even # nline(always)/code>. In debug mode, the compiler will never inline. 

  See also 

* Macro (computer science) 


In computer programming, a macro (short for "macro instruction"; ) is a rule or pattern that specifies how a certain input should be  mapped to a replacement output. Applying a macro to an input is known as macro expansion. 

The input and output ...

*  Partial evaluation
*  Tail-call elimination
*  Code outlining

  Notes 



  References 


* 
* 

  External links 



Eliminating Virtual Function Calls in C++ Programs
; Gerald Aigner,  Urs Hölzle

Reducing Indirect Function Call Overhead In C++ Programs
; Brad Calder, Dirk Grumwald



;  John R. Levine

Whole Program Optimization with Visual C++ .NET
; Brandon Bray

{{Compiler optimizations

 Compiler optimizations
 Subroutines
 Programming language comparisons

 Articles with example C code
 Articles with example Haskell code
 Articles with example Lisp (programming language) code