Interpolation search is an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
for
searching
Searching or search may refer to:
Computing technology
* Search algorithm, including keyword search
** :Search algorithms
* Search and optimization for problem solving in artificial intelligence
* Search engine technology, software for findi ...
for a key in an array that has been
ordered by numerical values assigned to the keys (''key values''). It was first described by W. W. Peterson in 1957. Interpolation search resembles the method by which people search a
telephone directory
A telephone directory, commonly called a telephone book, telephone address book, phonebook, or the white and yellow pages, is a listing of telephone subscribers in a geographical area or subscribers to services provided by the organization tha ...
for a name (the key value by which the book's entries are ordered): in each step the algorithm calculates where in the remaining
search space the sought item might be, based on the key values at the bounds of the search space and the value of the sought key, usually via a
linear interpolation
In mathematics, linear interpolation is a method of curve fitting using linear polynomials to construct new data points within the range of a discrete set of known data points.
Linear interpolation between two known points
If the two known poin ...
. The key value actually found at this estimated position is then compared to the key value being sought. If it is not equal, then depending on the comparison, the remaining search space is reduced to the part before or after the estimated position. This method will only work if calculations on the size of differences between key values are sensible.
By comparison,
binary search
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
always chooses the middle of the remaining search space, discarding one half or the other, depending on the comparison between the key found at the estimated position and the key sought — it does not require numerical values for the keys, just a total order on them. The remaining search space is reduced to the part before or after the estimated position. The
linear search
In computer science, a linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched.
A linear search runs in at ...
uses equality only as it compares elements one-by-one from the start, ignoring any sorting.
On average the interpolation search makes about log(log(''n'')) comparisons (if the elements are uniformly distributed), where ''n'' is the number of elements to be searched. In the worst case (for instance where the numerical values of the keys increase exponentially) it can make up to
O(''n'') comparisons.
In interpolation-sequential search, interpolation is used to find an item near the one being searched for, then
linear search
In computer science, a linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched.
A linear search runs in at ...
is used to find the exact item.
Performance
Using
big-O notation
Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Land ...
, the performance of the interpolation algorithm on a data set of size ''n'' is ''O''(''n''); however under the assumption of a uniform distribution of the data on the linear scale used for interpolation, the performance can be shown to be ''O''(log log ''n''). However, Dynamic Interpolation Search is possible in ''o''(log log ''n'') time using a novel data structure.
[Andersson, Arne, and Christer Mattsson. ‘Dynamic Interpolation Search in ''o''(log log ''n'') Time’. In Automata, Languages and Programming, edited by Andrzej Lingas, Rolf Karlsson, and Svante Carlsson, 700:15–27. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 1993. ]
Practical performance of interpolation search depends on whether the reduced number of probes is outweighed by the more complicated calculations needed for each probe. It can be useful for locating a record in a large sorted file on disk, where each probe involves a disk seek and is much slower than the interpolation arithmetic.
Index structures like
B-tree
In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for n ...
s also reduce the number of disk accesses, and are more often used to index on-disk data in part because they can index many types of data and can be updated
online
In computer technology and telecommunications, online indicates a state of connectivity and offline indicates a disconnected state. In modern terminology, this usually refers to an Internet connection, but (especially when expressed "on line" or ...
. Still, interpolation search may be useful when one is forced to search certain sorted but unindexed on-disk datasets.
Adaptation to different datasets
When sort keys for a dataset are uniformly distributed numbers, linear interpolation is straightforward to implement and will find an index very near the sought value.
On the other hand, for a phone book sorted by name, the straightforward approach to interpolation search does not apply. The same high-level principles can still apply, though: one can estimate a name's position in the phone book using the relative frequencies of letters in names and use that as a probe location.
Some interpolation search implementations may not work as expected when a run of equal key values exists. The simplest implementation of interpolation search won't necessarily select the first (or last) element of such a run.
Book-based searching
The conversion of names in a telephone book to some sort of number clearly will not provide numbers having a uniform distribution (except via immense effort such as sorting the names and calling them name #1, name #2, etc.) and further, it is well known that some names are much more common than others (Smith, Jones,) Similarly with dictionaries, where there are many more words starting with some letters than others. Some publishers go to the effort of preparing marginal annotations or even cutting into the side of the pages to show markers for each letter so that at a glance a segmented interpolation can be performed.
Sample implementation
The following
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
code example is a simple implementation. At each stage it computes a probe position then as with the binary search, moves either the upper or lower bound in to define a smaller interval containing the sought value. Unlike the binary search which guarantees a halving of the interval's size with each stage, a misled interpolation may reduce/i-case efficiency of O(''n'').
/*
T must implement the operators -, !=, , >=, <= and <
such that >=, <=, !=, and < define a total order on T and
such that
(tm - tl) * k / (th - tl)
is an int between 0 and k (inclusive) for any tl, tm, th in T with tl <= tm <= th, tl != th.
arr must be sorted according to this ordering.
\returns An index i such that arr key or -1 if there is no i that satisfies this.
*/
template
int interpolation_search(T arr[], int size, T key)
Notice that having probed the list at index ''mid'', for reasons of loop control administration, this code sets either ''high'' or ''low'' to be not ''mid'' but an adjacent index, which location is then probed during the next iteration. Since an adjacent entry's value will not be much different, the interpolation calculation is not much improved by this one step adjustment, at the cost of an additional reference to distant memory such as disk.
Each iteration of the above code requires between five and six comparisons (the extra is due to the repetitions needed to distinguish the three states of < > and = via binary comparisons in the absence of a
three-way comparison
In computer science, a three-way comparison takes two values A and B belonging to a type with a total order and determines whether A < B, A = B, or A > B in a single operation, in accordance with the mathematical law of trichotomy.
Machine- ...
) plus some messy arithmetic, while the
binary search algorithm
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
can be written with one comparison per iteration and uses only trivial integer arithmetic. It would thereby search an array of a million elements with no more than twenty comparisons (involving accesses to slow memory where the array elements are stored); to beat that, the interpolation search, as written above, would be allowed no more than three iterations.
See also
*
Linear search
In computer science, a linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched.
A linear search runs in at ...
*
Binary search
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
*
Exponential search
In computer science, an exponential search (also called doubling search or galloping search or Struzik search) is an algorithm, created by Jon Bentley and Andrew Chi-Chih Yao in 1976, for searching sorted, unbounded/infinite lists. There are num ...
*
Ternary search A ternary search algorithm is a technique in computer science for finding the minimum or maximum of a unimodal function. A ternary search determines either that the minimum or maximum cannot be in the first third of the domain or that it cannot be ...
*
Hash table
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
*
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valu ...
*
Flashsort
Flashsort is a distribution sorting algorithm showing linear computational complexity for uniformly distributed data sets and relatively little additional memory requirement. The original work was published in 1998 by Karl-Dietrich Neubert.
Co ...
, using the distribution of values for
sorting
Sorting refers to ordering data in an increasing or decreasing manner according to some linear relationship among the data items.
# ordering: arranging items in a sequence ordered by some criterion;
# categorizing: grouping items with similar pro ...
rather than searching
References
External links
Interpolation searchInterpolation Search - A Log LogN Search
{{DEFAULTSORT:Interpolation Search
Search algorithms