Program Optimization
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.


General

Although the word "optimization" shares the same root as "optimal", it is rare for the process of optimization to produce a truly optimal system. A system can generally be made optimal not in absolute terms, but only with respect to a given quality metric, which may be in contrast with other possible metrics. As a result, the optimized system will typically only be optimal in one application or for one audience. One might reduce the amount of time that a program takes to perform some task at the price of making it consume more memory. In an application where memory space is at a premium, one might deliberately choose a slower
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
in order to use less memory. Often there is no "one size fits all" design which works well in all cases, so
engineer Engineers, as practitioners of engineering, are professionals who invent, design, analyze, build and test machines, complex systems, structures, gadgets and materials to fulfill functional objectives and requirements while considering the l ...
s make
trade-off A trade-off (or tradeoff) is a situational decision that involves diminishing or losing one quality, quantity, or property of a set or design in return for gains in other aspects. In simple terms, a tradeoff is where one thing increases, and anot ...
s to optimize the attributes of greatest interest. Additionally, the effort required to make a piece of software completely optimal incapable of any further improvement is almost always more than is reasonable for the benefits that would be accrued; so the process of optimization may be halted before a completely optimal solution has been reached. Fortunately, it is often the case that the greatest improvements come early in the process. Even for a given quality metric (such as execution speed), most methods of optimization only improve the result; they have no pretense of producing optimal output.
Superoptimization Superoptimization is the process where a compiler automatically finds the optimal sequence for a loop-free sequence of instructions. Real-world compilers generally cannot produce genuinely ''optimal'' code, and while most standard compiler optim ...
is the process of finding truly optimal output.


Levels of optimization

Optimization can occur at a number of levels. Typically the higher levels have greater impact, and are harder to change later on in a project, requiring significant changes or a complete rewrite if they need to be changed. Thus optimization can typically proceed via refinement from higher to lower, with initial gains being larger and achieved with less work, and later gains being smaller and requiring more work. However, in some cases overall performance depends on performance of very low-level portions of a program, and small changes at a late stage or early consideration of low-level details can have outsized impact. Typically some consideration is given to efficiency throughout a project though this varies significantly but major optimization is often considered a refinement to be done late, if ever. On longer-running projects there are typically cycles of optimization, where improving one area reveals limitations in another, and these are typically curtailed when performance is acceptable or gains become too small or costly. As performance is part of the specification of a program a program that is unusably slow is not fit for purpose: a video game with 60 Hz (frames-per-second) is acceptable, but 6 frames-per-second is unacceptably choppy performance is a consideration from the start, to ensure that the system is able to deliver sufficient performance, and early prototypes need to have roughly acceptable performance for there to be confidence that the final system will (with optimization) achieve acceptable performance. This is sometimes omitted in the belief that optimization can always be done later, resulting in prototype systems that are far too slow often by an
order of magnitude An order of magnitude is an approximation of the logarithm of a value relative to some contextually understood reference value, usually 10, interpreted as the base of the logarithm and the representative of values of magnitude one. Logarithmic di ...
or more and systems that ultimately are failures because they architecturally cannot achieve their performance goals, such as the
Intel 432 The iAPX 432 (''Intel Advanced Performance Architecture'') is a discontinued computer architecture introduced in 1981. It was Intel's first 32-bit Microprocessor, processor design. The main processor of the architecture, the ''general data proc ...
(1981); or ones that take years of work to achieve acceptable performance, such as Java (1995), which only achieved acceptable performance with HotSpot (1999). The degree to which performance changes between prototype and production system, and how amenable it is to optimization, can be a significant source of uncertainty and risk.


Design level

At the highest level, the design may be optimized to make best use of the available resources, given goals, constraints, and expected use/load. The architectural design of a system overwhelmingly affects its performance. For example, a system that is network latency-bound (where network latency is the main constraint on overall performance) would be optimized to minimize network trips, ideally making a single request (or no requests, as in a push protocol) rather than multiple roundtrips. Choice of design depends on the goals: when designing a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
, if fast compilation is the key priority, a
one-pass compiler In computer programming, a one-pass compiler is a compiler that passes through the parts of each compilation unit only once, immediately translating each part into its final machine code. This is in contrast to a multi-pass compiler which conver ...
is faster than a
multi-pass compiler A multi-pass compiler is a type of compiler that processes the source code or abstract syntax tree of a program several times. This is in contrast to a one-pass compiler, which traverses the program only once. Each pass takes the result of the previ ...
(assuming same work), but if speed of output code is the goal, a slower multi-pass compiler fulfills the goal better, even though it takes longer itself. Choice of platform and programming language occur at this level, and changing them frequently requires a complete rewrite, though a modular system may allow rewrite of only some component for example, a Python program may rewrite performance-critical sections in C. In a distributed system, choice of architecture ( client-server,
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer ...
, etc.) occurs at the design level, and may be difficult to change, particularly if all components cannot be replaced in sync (e.g., old clients).


Algorithms and data structures

Given an overall design, a good choice of efficient algorithms and
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
s, and efficient implementation of these algorithms and data structures comes next. After design, the choice of
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s and data structures affects efficiency more than any other aspect of the program. Generally data structures are more difficult to change than algorithms, as a data structure assumption and its performance assumptions are used throughout the program, though this can be minimized by the use of
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a ''user'', of the data, specifically in terms of possible values, pos ...
s in function definitions, and keeping the concrete data structure definitions restricted to a few places. For algorithms, this primarily consists of ensuring that algorithms are constant O(1), logarithmic O(log ''n''), linear O(''n''), or in some cases log-linear O(''n'' log ''n'') in the input (both in space and time). Algorithms with quadratic complexity O(''n''2) fail to scale, and even linear algorithms cause problems if repeatedly called, and are typically replaced with constant or logarithmic if possible. Beyond asymptotic order of growth, the constant factors matter: an asymptotically slower algorithm may be faster or smaller (because simpler) than an asymptotically faster algorithm when they are both faced with small input, which may be the case that occurs in reality. Often a
hybrid algorithm {{Unreferenced, date=May 2014 A hybrid algorithm is an algorithm that combines two or more other algorithms that solve the same problem, and is mostly used in programming languages like C++, either choosing one (depending on the data), or switching ...
will provide the best performance, due to this tradeoff changing with size. A general technique to improve performance is to avoid work. A good example is the use of a
fast path Fast path is a term used in computer science to describe a path with shorter instruction path length through a program compared to the normal path. For a fast path to be effective it must handle the most commonly occurring tasks more efficiently th ...
for common cases, improving performance by avoiding unnecessary work. For example, using a simple text layout algorithm for Latin text, only switching to a complex layout algorithm for complex scripts, such as
Devanagari Devanagari ( ; , , Sanskrit pronunciation: ), also called Nagari (),Kathleen Kuiper (2010), The Culture of India, New York: The Rosen Publishing Group, , page 83 is a left-to-right abugida (a type of segmental Writing systems#Segmental syste ...
. Another important technique is caching, particularly memoization, which avoids redundant computations. Because of the importance of caching, there are often many levels of caching in a system, which can cause problems from memory use, and correctness issues from stale caches.


Source code level

Beyond general algorithms and their implementation on an abstract machine, concrete source code level choices can make a significant difference. For example, on early C compilers, while(1) was slower than for(;;) for an unconditional loop, because while(1) evaluated 1 and then had a conditional jump which tested if it was true, while for (;;) had an unconditional jump . Some optimizations (such as this one) can nowadays be performed by
optimizing compiler In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power cons ...
s. This depends on the source language, the target machine language, and the compiler, and can be both difficult to understand or predict and changes over time; this is a key place where understanding of compilers and machine code can improve performance.
Loop-invariant code motion In computer programming, loop-invariant code consists of statements or expressions (in an imperative programming language) that can be moved outside the body of a loop without affecting the semantics of the program. Loop-invariant code motion ( ...
and
return value optimization In C++ computer programming, copy elision refers to a compiler optimization technique that eliminates unnecessary copying of objects. The C++ language standard generally allows implementations to perform any optimization, provided the resulting pr ...
are examples of optimizations that reduce the need for auxiliary variables and can even result in faster performance by avoiding round-about optimizations.


Build level

Between the source and compile level, directives and build flags can be used to tune performance options in the source code and compiler respectively, such as using
preprocessor In computer science, a preprocessor (or precompiler) is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by so ...
defines to disable unneeded software features, optimizing for specific processor models or hardware capabilities, or predicting branching, for instance. Source-based software distribution systems such as
BSD The Berkeley Software Distribution or Berkeley Standard Distribution (BSD) is a discontinued operating system based on Research Unix, developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berk ...
's
Ports A port is a maritime facility comprising one or more wharves or loading areas, where ships load and discharge cargo and passengers. Although usually situated on a sea coast or estuary, ports can also be found far inland, such as H ...
and Gentoo's Portage can take advantage of this form of optimization.


Compile level

Use of an
optimizing compiler In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power cons ...
tends to ensure that the
executable program In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data file ...
is optimized at least as much as the compiler can predict.


Assembly level

At the lowest level, writing code using an
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
, designed for a particular hardware platform can produce the most efficient and compact code if the programmer takes advantage of the full repertoire of
machine instruction In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very ...
s. Many
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s used on
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...
s have been traditionally written in assembler code for this reason. Programs (other than very small programs) are seldom written from start to finish in assembly due to the time and cost involved. Most are compiled down from a high level language to assembly and hand optimized from there. When efficiency and size are less important large parts may be written in a high-level language. With more modern
optimizing compiler In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power cons ...
s and the greater complexity of recent CPUs, it is harder to write more efficient code than what the compiler generates, and few projects need this "ultimate" optimization step. Much of the code written today is intended to run on as many machines as possible. As a consequence, programmers and compilers don't always take advantage of the more efficient instructions provided by newer CPUs or quirks of older models. Additionally, assembly code tuned for a particular processor without using such instructions might still be suboptimal on a different processor, expecting a different tuning of the code. Typically today rather than writing in assembly language, programmers will use a
disassembler A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly l ...
to analyze the output of a compiler and change the high-level source code so that it can be compiled more efficiently, or understand why it is inefficient.


Run time

Just-in-time compilers can produce customized machine code based on run-time data, at the cost of compilation overhead. This technique dates to the earliest
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
engines, and has become widespread with Java HotSpot and V8 for JavaScript. In some cases
adaptive optimization Adaptive optimization is a technique in computer science that performs dynamic recompilation of portions of a program based on the current execution profile. With a simple implementation, an adaptive optimizer may simply make a trade-off between ...
may be able to perform
run time Run(s) or RUN may refer to: Places * Run (island), one of the Banda Islands in Indonesia * Run (stream), a stream in the Dutch province of North Brabant People * Run (rapper), Joseph Simmons, now known as "Reverend Run", from the hip-hop group ...
optimization exceeding the capability of static compilers by dynamically adjusting parameters according to the actual input or other factors.
Profile-guided optimization Profile-guided optimization (PGO, sometimes pronounced as ''pogo''), also known as profile-directed feedback (PDF), and feedback-directed optimization (FDO) is a compiler optimization technique in computer programming that uses profiling to imp ...
is an ahead-of-time (AOT) compilation optimization technique based on run time profiles, and is similar to a static "average case" analog of the dynamic technique of adaptive optimization.
Self-modifying code In computer science, self-modifying code (SMC) is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, ...
can alter itself in response to run time conditions in order to optimize code; this was more common in assembly language programs. Some
CPU design Processor design is a subfield of computer engineering and electronics engineering (fabrication) that deals with creating a processor, a key component of computer hardware. The design process involves choosing an instruction set and a certain ...
s can perform some optimizations at run time. Some examples include out-of-order execution,
speculative execution Speculative execution is an optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that would have to be incurred by doing ...
,
instruction pipeline In computer engineering, instruction pipelining or ILP is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing inco ...
s, and
branch predictor In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
s. Compilers can help the program take advantage of these CPU features, for example through
instruction scheduling In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing ...
.


Platform dependent and independent optimizations

Code optimization can be also broadly categorized as
platform Platform may refer to: Technology * Computing platform, a framework on which applications may be run * Platform game, a genre of video games * Car platform, a set of components shared by several vehicle models * Weapons platform, a system or ...
-dependent and platform-independent techniques. While the latter ones are effective on most or all platforms, platform-dependent techniques use specific properties of one platform, or rely on parameters depending on the single platform or even on the single processor. Writing or producing different versions of the same code for different processors might therefore be needed. For instance, in the case of compile-level optimization, platform-independent techniques are generic techniques (such as
loop unrolling Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation ...
, reduction in function calls, memory efficient routines, reduction in conditions, etc.), that impact most CPU architectures in a similar way. A great example of platform-independent optimization has been shown with inner for loop, where it was observed that a loop with an inner for loop performs more computations per unit time than a loop without it or one with an inner while loop. Generally, these serve to reduce the total
instruction path length In computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's performa ...
required to complete the program and/or reduce total memory usage during the process. On the other hand, platform-dependent techniques involve instruction scheduling,
instruction-level parallelism Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution. Disc ...
, data-level parallelism, cache optimization techniques (i.e., parameters that differ among various platforms) and the optimal instruction scheduling might be different even on different processors of the same architecture.


Strength reduction

Computational tasks can be performed in several different ways with varying efficiency. A more efficient version with equivalent functionality is known as a
strength reduction In compiler construction, strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loo ...
. For example, consider the following C code snippet whose intention is to obtain the sum of all integers from 1 to : int i, sum = 0; for (i = 1; i <= N; ++i) printf("sum: %d\n", sum); This code can (assuming no
arithmetic overflow Arithmetic () is an elementary part of mathematics that consists of the study of the properties of the traditional operations on numbers—addition, subtraction, multiplication, division, exponentiation, and extraction of roots. In the 19th ce ...
) be rewritten using a mathematical formula like: int sum = N * (1 + N) / 2; printf("sum: %d\n", sum); The optimization, sometimes performed automatically by an optimizing compiler, is to select a method (
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
) that is more computationally efficient, while retaining the same functionality. See
algorithmic efficiency In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algo ...
for a discussion of some of these techniques. However, a significant improvement in performance can often be achieved by removing extraneous functionality. Optimization is not always an obvious or intuitive process. In the example above, the "optimized" version might actually be slower than the original version if were sufficiently small and the particular hardware happens to be much faster at performing addition and
loop Loop or LOOP may refer to: Brands and enterprises * Loop (mobile), a Bulgarian virtual network operator and co-founder of Loop Live * Loop, clothing, a company founded by Carlos Vasquez in the 1990s and worn by Digable Planets * Loop Mobile, an ...
ing operations than multiplication and division.


Trade-offs

In some cases, however, optimization relies on using more elaborate algorithms, making use of "special cases" and special "tricks" and performing complex trade-offs. A "fully optimized" program might be more difficult to comprehend and hence may contain more faults than unoptimized versions. Beyond eliminating obvious antipatterns, some code level optimizations decrease maintainability. Optimization will generally focus on improving just one or two aspects of performance: execution time, memory usage, disk space, bandwidth, power consumption or some other resource. This will usually require a trade-off where one factor is optimized at the expense of others. For example, increasing the size of
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
improves run time performance, but also increases the memory consumption. Other common trade-offs include code clarity and conciseness. There are instances where the programmer performing the optimization must decide to make the software better for some operations but at the cost of making other operations less efficient. These trade-offs may sometimes be of a non-technical nature such as when a competitor has published a
benchmark Benchmark may refer to: Business and economics * Benchmarking, evaluating performance within organizations * Benchmark price * Benchmark (crude oil), oil-specific practices Science and technology * Benchmark (surveying), a point of known elevati ...
result that must be beaten in order to improve commercial success but comes perhaps with the burden of making normal usage of the software less efficient. Such changes are sometimes jokingly referred to as ''pessimizations''.


Bottlenecks

Optimization may include finding a
bottleneck Bottleneck literally refers to the narrowed portion (neck) of a bottle near its opening, which limit the rate of outflow, and may describe any object of a similar shape. The literal neck of a bottle was originally used to play what is now known as ...
in a system a component that is the limiting factor on performance. In terms of code, this will often be a hot spot a critical part of the code that is the primary consumer of the needed resource though it can be another factor, such as I/O latency or network bandwidth. In computer science, resource consumption often follows a form of power law distribution, and the Pareto principle can be applied to resource optimization by observing that 80% of the resources are typically used by 20% of the operations. In software engineering, it is often a better approximation that 90% of the execution time of a computer program is spent executing 10% of the code (known as the 90/10 law in this context). More complex algorithms and data structures perform well with many items, while simple algorithms are more suitable for small amounts of data — the setup, initialization time, and constant factors of the more complex algorithm can outweigh the benefit, and thus a
hybrid algorithm {{Unreferenced, date=May 2014 A hybrid algorithm is an algorithm that combines two or more other algorithms that solve the same problem, and is mostly used in programming languages like C++, either choosing one (depending on the data), or switching ...
or
adaptive algorithm An adaptive algorithm is an algorithm that changes its behavior at the time it is run, based on information available and on ''a priori'' defined reward mechanism (or criterion). Such information could be the story of recently received data, inform ...
may be faster than any single algorithm. A performance profiler can be used to narrow down decisions about which functionality fits which conditions. In some cases, adding more
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
can help to make a program run faster. For example, a filtering program will commonly read each line and filter and output that line immediately. This only uses enough memory for one line, but performance is typically poor, due to the latency of each disk read. Caching the result is similarly effective, though also requiring larger memory use.


When to optimize

Optimization can reduce readability and add code that is used only to improve the performance. This may complicate programs or systems, making them harder to maintain and debug. As a result, optimization or performance tuning is often performed at the end of the
development stage A software release life cycle is the sum of the stages of development and maturity for a piece of computer software ranging from its initial development to its eventual release, and including updated versions of the released version to help impro ...
.
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
made the following two statements on optimization:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%"
:(He also attributed the quote to
Tony Hoare Sir Charles Antony Richard Hoare (Tony Hoare or C. A. R. Hoare) (born 11 January 1934) is a British computer scientist who has made foundational contributions to programming languages, algorithms, operating systems, formal verification, and ...
several years later, although this might have been an error as Hoare disclaims having coined the phrase.)
"In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering"
"Premature optimization" is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a piece of code. This can result in a design that is not as clean as it could have been or code that is incorrect, because the code is complicated by the optimization and the programmer is distracted by optimizing. When deciding whether to optimize a specific part of the program,
Amdahl's Law In computer architecture, Amdahl's law (or Amdahl's argument) is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It states tha ...
should always be considered: the impact on the overall program depends very much on how much time is actually spent in that specific part, which is not always clear from looking at the code without a performance analysis. A better approach is therefore to design first, code from the design and then
profile Profile or profiles may refer to: Art, entertainment and media Music * ''Profile'' (Jan Akkerman album), 1973 * ''Profile'' (Githead album), 2005 * ''Profile'' (Pat Donohue album), 2005 * ''Profile'' (Duke Pearson album), 1959 * '' ''Profi ...
/
benchmark Benchmark may refer to: Business and economics * Benchmarking, evaluating performance within organizations * Benchmark price * Benchmark (crude oil), oil-specific practices Science and technology * Benchmark (surveying), a point of known elevati ...
the resulting code to see which parts should be optimized. A simple and elegant design is often easier to optimize at this stage, and profiling may reveal unexpected performance problems that would not have been addressed by premature optimization. In practice, it is often necessary to keep performance goals in mind when first designing software, but the programmer balances the goals of design and optimization. Modern compilers and operating systems are so efficient that the intended performance increases often fail to materialize. As an example, caching data at the application level that is again cached at the operating system level does not yield improvements in execution. Even so, it is a rare case when the programmer will remove failed optimizations from production code. It is also true that advances in hardware will more often than not obviate any potential improvements, yet the obscuring code will persist into the future long after its purpose has been negated.


Macros

Optimization during code development using macros takes on different forms in different languages. In some procedural languages, such as C and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, macros are implemented using token substitution. Nowadays,
inline function In the C and C++ programming languages, an inline function is one qualified with the keyword inline; this serves two purposes: # It serves as a compiler directive that suggests (but does not require) that the compiler substitute the body of ...
s can be used as a
type safe In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is ...
alternative in many cases. In both cases, the inlined function body can then undergo further compile-time optimizations by the compiler, including constant folding, which may move some computations to compile time. In many
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
languages macros are implemented using parse-time substitution of parse trees/abstract syntax trees, which it is claimed makes them safer to use. Since in many cases interpretation is used, that is one way to ensure that such computations are only performed at parse-time, and sometimes the only way. Lisp originated this style of macro, and such macros are often called "Lisp-like macros." A similar effect can be achieved by using
template metaprogramming Template metaprogramming (TMP) is a metaprogramming technique in which templates are used by a compiler to generate temporary source code, which is merged by the compiler with the rest of the source code and then compiled. The output of these t ...
in
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
. In both cases, work is moved to compile-time. The difference between C macros on one side, and Lisp-like macros and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
template metaprogramming Template metaprogramming (TMP) is a metaprogramming technique in which templates are used by a compiler to generate temporary source code, which is merged by the compiler with the rest of the source code and then compiled. The output of these t ...
on the other side, is that the latter tools allow performing arbitrary computations at compile-time/parse-time, while expansion of C macros does not perform any computation, and relies on the optimizer ability to perform it. Additionally, C macros do not directly support
recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathemati ...
or
iteration Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
, so are not
Turing complete Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical co ...
. As with any optimization, however, it is often difficult to predict where such tools will have the most impact before a project is complete.


Automated and manual optimization

''See also :Compiler optimizations'' Optimization can be automated by compilers or performed by programmers. Gains are usually limited for local optimization, and larger for global optimizations. Usually, the most powerful optimization is to find a superior
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
. Optimizing a whole system is usually undertaken by programmers because it is too complex for automated optimizers. In this situation, programmers or system administrators explicitly change code so that the overall system performs better. Although it can produce better efficiency, it is far more expensive than automated optimizations. Since many parameters influence the program performance, the program optimization space is large. Meta-heuristics and machine learning are used to address the complexity of program optimization. Use a profiler (or
performance analyzer Performance Analyzer is a commercial utility software for software performance analysis for x86 or SPARC machines. It has both a graphical user interface and a command line interface. It is available for both Linux and Solaris operating systems. ...
) to find the sections of the program that are taking the most resources the ''bottleneck''. Programmers sometimes believe they have a clear idea of where the bottleneck is, but intuition is frequently wrong. Optimizing an unimportant piece of code will typically do little to help the overall performance. When the bottleneck is localized, optimization usually starts with a rethinking of the algorithm used in the program. More often than not, a particular algorithm can be specifically tailored to a particular problem, yielding better performance than a generic algorithm. For example, the task of sorting a huge list of items is usually done with a
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
routine, which is one of the most efficient generic algorithms. But if some characteristic of the items is exploitable (for example, they are already arranged in some particular order), a different method can be used, or even a custom-made sort routine. After the programmer is reasonably sure that the best algorithm is selected, code optimization can start. Loops can be unrolled (for lower loop overhead, although this can often lead to ''lower'' speed if it overloads the
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whic ...
), data types as small as possible can be used, integer arithmetic can be used instead of floating-point, and so on. (See
algorithmic efficiency In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algo ...
article for these and other techniques.) Performance bottlenecks can be due to language limitations rather than algorithms or data structures used in the program. Sometimes, a critical part of the program can be re-written in a different
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
that gives more direct access to the underlying machine. For example, it is common for very
high-level High-level and low-level, as technical terms, are used to classify, describe and point to specific goals of a systematic operation; and are applied in a wide range of contexts, such as, for instance, in domains as widely varied as computer scienc ...
languages like
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
to have modules written in C for greater speed. Programs already written in C can have modules written in
assembly Assembly may refer to: Organisations and meetings * Deliberative assembly, a gathering of members who use parliamentary procedure for making decisions * General assembly, an official meeting of the members of an organization or of their representa ...
. Programs written in D can use the
inline assembler In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C ...
. Rewriting sections "pays off" in these circumstances because of a general " rule of thumb" known as the 90/10 law, which states that 90% of the time is spent in 10% of the code, and only 10% of the time in the remaining 90% of the code. So, putting intellectual effort into optimizing just a small part of the program can have a huge effect on the overall speed if the correct part(s) can be located. Manual optimization sometimes has the side effect of undermining readability. Thus code optimizations should be carefully documented (preferably using in-line comments), and their effect on future development evaluated. The program that performs an automated optimization is called an optimizer. Most optimizers are embedded in compilers and operate during compilation. Optimizers can often tailor the generated code to specific processors. Today, automated optimizations are almost exclusively limited to compiler optimization. However, because compiler optimizations are usually limited to a fixed set of rather general optimizations, there is considerable demand for optimizers which can accept descriptions of problem and language-specific optimizations, allowing an engineer to specify custom optimizations. Tools that accept descriptions of optimizations are called
program transformation A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and ...
systems and are beginning to be applied to real software systems such as C++. Some high-level languages (
Eiffel Eiffel may refer to: Places * Eiffel Peak, a summit in Alberta, Canada * Champ de Mars – Tour Eiffel station, Paris, France; a transit station Structures * Eiffel Tower, in Paris, France, designed by Gustave Eiffel * Eiffel Bridge, Ungheni, M ...
,
Esterel Esterel is a synchronous programming language for the development of complex reactive systems. The imperative programming style of Esterel allows the simple expression of parallelism and preemption. As a consequence, it is well suited for cont ...
) optimize their programs by using an
intermediate language An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A " ...
.
Grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from co ...
or
distributed computing A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
aims to optimize the whole system, by moving tasks from computers with high usage to computers with idle time.


Time taken for optimization

Sometimes, the time taken to undertake optimization therein itself may be an issue. Optimizing existing code usually does not add new features, and worse, it might add new bugs in previously working code (as any change might). Because manually optimized code might sometimes have less "readability" than unoptimized code, optimization might impact maintainability of it as well. Optimization comes at a price and it is important to be sure that the investment is worthwhile. An automatic optimizer (or
optimizing compiler In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power cons ...
, a program that performs code optimization) may itself have to be optimized, either to further improve the efficiency of its target programs or else speed up its own operation. A compilation performed with optimization "turned on" usually takes longer, although this is usually only a problem when programs are quite large. In particular, for
just-in-time compiler In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may cons ...
s the performance of the
run time Run(s) or RUN may refer to: Places * Run (island), one of the Banda Islands in Indonesia * Run (stream), a stream in the Dutch province of North Brabant People * Run (rapper), Joseph Simmons, now known as "Reverend Run", from the hip-hop group ...
compile component, executing together with its target code, is the key to improving overall execution speed.


References


Further reading

* Jon Bentley: ''Writing Efficient Programs'', . *
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
: ''
The Art of Computer Programming ''The Art of Computer Programming'' (''TAOCP'') is a comprehensive monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. Volumes 1–5 are intended to represent the central core of com ...
''
How To Write Fast Numerical Code: A Small Introduction

"What Every Programmer Should Know About Memory"
by Ulrich Drepper explains the structure of modern memory subsystems and suggests how to utilize them efficiently
"Linux Multicore Performance Analysis and Optimization in a Nutshell"
presentation slides by Philip Mucci

by Paul Hsieh

by Jon Bentley
"Performance Anti-Patterns"
by Bart Smaalders {{DEFAULTSORT:Program Optimization * Programming language topics Articles with example C code *