Binary search vs Linear search example svg

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, the analysis of algorithms is the process of finding the

computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations ...

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

s—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that relates the size of an algorithm's input to the number of steps it takes (its

time complexity In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...

) or the number of storage locations it uses (its space complexity). An algorithm is said to be efficient when this function's values are small, or grow slowly compared to a growth in the size of the input. Different inputs of the same size may cause the algorithm to have different behavior, so best, worst and average case descriptions might all be of practical interest. When not otherwise specified, the function describing the performance of an algorithm is usually an

upper bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is every element of . Dually, a lower bound or minorant of is defined to be an element of that is less ...

, determined from the worst case inputs to the algorithm. The term "analysis of algorithms" was coined by

Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...

. Algorithm analysis is an important part of a broader

computational complexity theory In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and explores the relationships between these classifications. A computational problem ...

, which provides theoretical estimates for the resources needed by any algorithm which solves a given

computational problem In theoretical computer science, a computational problem is one that asks for a solution in terms of an algorithm. For example, the problem of factoring :"Given a positive integer ''n'', find a nontrivial prime factor of ''n''." is a computati ...

. These estimates provide an insight into reasonable directions of search for efficient algorithms. In theoretical analysis of algorithms it is common to estimate their complexity in the asymptotic sense, i.e., to estimate the complexity function for arbitrarily large input.

Big O notation Big ''O'' notation is a mathematical notation that describes the asymptotic analysis, limiting behavior of a function (mathematics), function when the Argument of a function, argument tends towards a particular value or infinity. Big O is a memb ...

, Big-omega notation and Big-theta notation are used to this end. For instance,

binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...

is said to run in a number of steps proportional to the logarithm of the size of the sorted list being searched, or in , colloquially "in logarithmic time". Usually

asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates Limit of a function#Limits at infinity, tends to infinity. In pro ...

estimates are used because different

implementation Implementation is the realization of an application, execution of a plan, idea, scientific modelling, model, design, specification, Standardization, standard, algorithm, policy, or the Management, administration or management of a process or Goal ...

s of the same algorithm may differ in efficiency. However the efficiencies of any two "reasonable" implementations of a given algorithm are related by a constant multiplicative factor called a ''hidden constant''. Exact (not asymptotic) measures of efficiency can sometimes be computed but they usually require certain assumptions concerning the particular implementation of the algorithm, called a

model of computation In computer science, and more specifically in computability theory and computational complexity theory, a model of computation is a model which describes how an output of a mathematical function is computed given an input. A model describes how ...

. A model of computation may be defined in terms of an abstract computer, e.g.

Turing machine A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...

, and/or by postulating that certain operations are executed in unit time. For example, if the sorted list to which we apply binary search has elements, and we can guarantee that each lookup of an element in the list can be done in unit time, then at most time units are needed to return an answer.

Cost models

Time efficiency estimates depend on what we define to be a step. For the analysis to correspond usefully to the actual run-time, the time required to perform a step must be guaranteed to be bounded above by a constant. One must be careful here; for instance, some analyses count an addition of two numbers as one step. This assumption may not be warranted in certain contexts. For example, if the numbers involved in a computation may be arbitrarily large, the time required by a single addition can no longer be assumed to be constant. Two cost models are generally used:, section 1.3 * the uniform cost model, also called unit-cost model (and similar variations), assigns a constant cost to every machine operation, regardless of the size of the numbers involved * the logarithmic cost model, also called logarithmic-cost measurement (and similar variations), assigns a cost to every machine operation proportional to the number of bits involved The latter is more cumbersome to use, so it is only employed when necessary, for example in the analysis of

arbitrary-precision arithmetic In computer science, arbitrary-precision arithmetic, also called bignum arithmetic, multiple-precision arithmetic, or sometimes infinite-precision arithmetic, indicates that calculations are performed on numbers whose digits of precision are po ...

algorithms, like those used in

cryptography Cryptography, or cryptology (from "hidden, secret"; and ''graphein'', "to write", or ''-logy, -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of Adversary (cryptography), ...

. A key point which is often overlooked is that published lower bounds for problems are often given for a model of computation that is more restricted than the set of operations that you could use in practice and therefore there are algorithms that are faster than what would naively be thought possible.

Run-time analysis

Run-time analysis is a theoretical classification that estimates and anticipates the increase in '' running time'' (or run-time or execution time) of an

as its '' input size'' (usually denoted as ) increases. Run-time efficiency is a topic of great interest in

: A program can take seconds, hours, or even years to finish executing, depending on which algorithm it implements. While software profiling techniques can be used to measure an algorithm's run-time in practice, they cannot provide timing data for all infinitely many possible inputs; the latter can only be achieved by the theoretical methods of run-time analysis.

Shortcomings of empirical metrics

Since algorithms are platform-independent (i.e. a given algorithm can be implemented in an arbitrary

programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...

on an arbitrary

computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...

running an arbitrary

operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...

), there are additional significant drawbacks to using an

empirical Empirical evidence is evidence obtained through sense experience or experimental procedure. It is of central importance to the sciences and plays a role in various other fields, like epistemology and law. There is no general agreement on how t ...

approach to gauge the comparative performance of a given set of algorithms. Take as an example a program that looks up a specific entry in a sorted

list A list is a Set (mathematics), set of discrete items of information collected and set forth in some format for utility, entertainment, or other purposes. A list may be memorialized in any number of ways, including existing only in the mind of t ...

of size ''n''. Suppose this program were implemented on Computer A, a state-of-the-art machine, using a linear search algorithm, and on Computer B, a much slower machine, using a

binary search algorithm In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the ...

. Benchmark testing on the two computers running their respective programs might look something like the following: Based on these metrics, it would be easy to jump to the conclusion that ''Computer A'' is running an algorithm that is far superior in efficiency to that of ''Computer B''. However, if the size of the input-list is increased to a sufficient number, that conclusion is dramatically demonstrated to be in error: Computer A, running the linear search program, exhibits a

linear In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a '' function'' (or '' mapping''); * linearity of a '' polynomial''. An example of a linear function is the function defined by f(x) ...

growth rate. The program's run-time is directly proportional to its input size. Doubling the input size doubles the run-time, quadrupling the input size quadruples the run-time, and so forth. On the other hand, Computer B, running the binary search program, exhibits a

logarithm In mathematics, the logarithm of a number is the exponent by which another fixed value, the base, must be raised to produce that number. For example, the logarithm of to base is , because is to the rd power: . More generally, if , the ...

ic growth rate. Quadrupling the input size only increases the run-time by a constant amount (in this example, 50,000 ns). Even though Computer A is ostensibly a faster machine, Computer B will inevitably surpass Computer A in run-time because it is running an algorithm with a much slower growth rate.

Orders of growth

Informally, an algorithm can be said to exhibit a growth rate on the order of a

mathematical function In mathematics, a function from a set (mathematics), set to a set assigns to each element of exactly one element of .; the words ''map'', ''mapping'', ''transformation'', ''correspondence'', and ''operator'' are sometimes used synonymously. ...

if beyond a certain input size , the function times a positive constant provides an upper bound or limit for the run-time of that algorithm. In other words, for a given input size greater than some ₀ and a constant , the run-time of that algorithm will never be larger than . This concept is frequently expressed using Big O notation. For example, since the run-time of insertion sort grows quadratically as its input size increases, insertion sort can be said to be of order . Big O notation is a convenient way to express the worst-case scenario for a given algorithm, although it can also be used to express the average-case — for example, the worst-case scenario for

quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...

is , but the average-case run-time is .

Empirical orders of growth

Assuming the run-time follows power rule, , the coefficient can be found by taking empirical measurements of run-time at some problem-size points , and calculating so that . In other words, this measures the slope of the empirical line on the

log–log plot In science and engineering, a log–log graph or log–log plot is a two-dimensional graph of numerical data that uses logarithmic scales on both the horizontal and vertical axes. Exponentiation#Power_functions, Power functions – relationshi ...

of run-time vs. input size, at some size point. If the order of growth indeed follows the power rule (and so the line on the log–log plot is indeed a straight line), the empirical value of will stay constant at different ranges, and if not, it will change (and the line is a curved line)—but still could serve for comparison of any two given algorithms as to their ''empirical local orders of growth'' behaviour. Applied to the above table: It is clearly seen that the first algorithm exhibits a linear order of growth indeed following the power rule. The empirical values for the second one are diminishing rapidly, suggesting it follows another rule of growth and in any case has much lower local orders of growth (and improving further still), empirically, than the first one.

Evaluating run-time complexity

The run-time complexity for the worst-case scenario of a given algorithm can sometimes be evaluated by examining the structure of the algorithm and making some simplifying assumptions. Consider the following pseudocode: 1 ''get a positive integer n from input'' 2 if n > 10 3 print "This might take a while..." 4 for i = 1 to n 5 for j = 1 to i 6 print i * j 7 print "Done!" A given computer will take a discrete amount of time to execute each of the instructions involved with carrying out this algorithm. Say that the actions carried out in step 1 are considered to consume time at most ''T''₁, step 2 uses time at most ''T''₂, and so forth. In the algorithm above, steps 1, 2 and 7 will only be run once. For a worst-case evaluation, it should be assumed that step 3 will be run as well. Thus the total amount of time to run steps 1–3 and step 7 is: :

T_1 + T_2 + T_3 + T_7. \,

The loops in steps 4, 5 and 6 are trickier to evaluate. The outer loop test in step 4 will execute ( ''n'' + 1 ) times, which will consume ''T''₄( ''n'' + 1 ) time. The inner loop, on the other hand, is governed by the value of j, which iterates from 1 to ''i''. On the first pass through the outer loop, j iterates from 1 to 1: The inner loop makes one pass, so running the inner loop body (step 6) consumes ''T''₆ time, and the inner loop test (step 5) consumes 2''T''₅ time. During the next pass through the outer loop, j iterates from 1 to 2: the inner loop makes two passes, so running the inner loop body (step 6) consumes 2''T''₆ time, and the inner loop test (step 5) consumes 3''T''₅ time. Altogether, the total time required to run the inner loop ''body'' can be expressed as an

arithmetic progression An arithmetic progression or arithmetic sequence is a sequence of numbers such that the difference from any succeeding term to its preceding term remains constant throughout the sequence. The constant difference is called common difference of that ...

: :

T_6 + 2T_6 + 3T_6 + \cdots + (n-1) T_6 + n T_6

which can be factored as :

\left 1 + 2 + 3 + \cdots + (n-1) + n \right T_6 = \left \frac (n^2 + n) \right T_6

The total time required to run the inner loop ''test'' can be evaluated similarly: :

\begin
& 2T_5 + 3T_5 + 4T_5 + \cdots + (n-1) T_5 + n T_5 + (n + 1) T_5\\
=  &T_5 + 2T_5 + 3T_5 + 4T_5 + \cdots + (n-1)T_5 + nT_5 + (n+1)T_5 - T_5
\end

which can be factored as :

T_5 \end

Therefore, the total run-time for this algorithm is: :

f(n) = T_1 + T_2 + T_3 + T_7 + (n + 1)T_4 + \left \frac (n^2 + n) \right T_6 + \left \frac (n^2+3n) \right T_5

which reduces to :

f(n) = \left \frac (n^2 + n) \right T_6 + \left \frac (n^2 + 3n) \right T_5 + (n + 1)T_4 + T_1 + T_2 + T_3 + T_7

As a rule-of-thumb, one can assume that the highest-order term in any given function dominates its rate of growth and thus defines its run-time order. In this example, n² is the highest-order term, so one can conclude that . Formally this can be proven as follows: A more elegant approach to analyzing this algorithm would be to declare that 'T''₁..''T''₇are all equal to one unit of time, in a system of units chosen so that one unit is greater than or equal to the actual times for these steps. This would mean that the algorithm's run-time breaks down as follows:This approach, unlike the above approach, neglects the constant time consumed by the loop tests which terminate their respective loops, but it is trivial to prove that such omission does not affect the final result

Growth rate analysis of other resources

The methodology of run-time analysis can also be utilized for predicting other growth rates, such as consumption of memory space. As an example, consider the following pseudocode which manages and reallocates memory usage by a program based on the size of a file which that program manages: while ''file is still open:'' let n = ''size of file'' for ''every 100,000

kilobyte The kilobyte is a multiple of the unit byte for Computer data storage, digital information. The International System of Units (SI) defines the prefix ''kilo-, kilo'' as a multiplication factor of 1000 (103); therefore, one kilobyte is 1000&nbs ...

s of increase in file size'' ''double the amount of memory reserved'' In this instance, as the file size n increases, memory will be consumed at an

exponential growth Exponential growth occurs when a quantity grows as an exponential function of time. The quantity grows at a rate directly proportional to its present size. For example, when it is 3 times as big as it is now, it will be growing 3 times as fast ...

rate, which is order . This is an extremely rapid and most likely unmanageable growth rate for consumption of memory

resources ''Resource'' refers to all the materials available in our environment which are Technology, technologically accessible, Economics, economically feasible and Culture, culturally Sustainability, sustainable and help us to satisfy our needs and want ...

Relevance

Algorithm analysis is important in practice because the accidental or unintentional use of an inefficient algorithm can significantly impact system performance. In time-sensitive applications, an algorithm taking too long to run can render its results outdated or useless. An inefficient algorithm can also end up requiring an uneconomical amount of computing power or storage in order to run, again rendering it practically useless.

Constant factors

Analysis of algorithms typically focuses on the asymptotic performance, particularly at the elementary level, but in practical applications constant factors are important, and real-world data is in practice always limited in size. The limit is typically the size of addressable memory, so on 32-bit machines 2³² = 4 GiB (greater if segmented memory is used) and on 64-bit machines 2⁶⁴ = 16 EiB. Thus given a limited size, an order of growth (time or space) can be replaced by a constant factor, and in this sense all practical algorithms are for a large enough constant, or for small enough data. This interpretation is primarily useful for functions that grow extremely slowly: (binary) iterated logarithm (log^*) is less than 5 for all practical data (2⁶⁵⁵³⁶ bits); (binary) log-log (log log ''n'') is less than 6 for virtually all practical data (2⁶⁴ bits); and binary log (log ''n'') is less than 64 for virtually all practical data (2⁶⁴ bits). An algorithm with non-constant complexity may nonetheless be more efficient than an algorithm with constant complexity on practical data if the overhead of the constant time algorithm results in a larger constant factor, e.g., one may have

K > k \log \log n

so long as

K/k > 6

and

n < 2^ = 2^

. For large data linear or quadratic factors cannot be ignored, but for small data an asymptotically inefficient algorithm may be more efficient. This is particularly used in hybrid algorithms, like Timsort, which use an asymptotically efficient algorithm (here

merge sort In computer science, merge sort (also commonly spelled as mergesort and as ) is an efficient, general-purpose, and comparison sort, comparison-based sorting algorithm. Most implementations of merge sort are Sorting algorithm#Stability, stable, wh ...

, with time complexity

n \log n

), but switch to an asymptotically inefficient algorithm (here insertion sort, with time complexity

n^2

) for small data, as the simpler algorithm is faster on small data.

Notes

References

* * * * * *

External links

* {{DEFAULTSORT:Analysis Of Algorithms Computational complexity theory