Quick Select
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, quickselect is a
selection algorithm In computer science, a selection algorithm is an algorithm for finding the ''k''th smallest number in a list or array; such a number is called the ''k''th ''order statistic''. This includes the cases of finding the minimum, maximum, and median el ...
to find the ''k''th smallest element in an unordered list. It is also known as the kth order statistics . It is related to the
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
sorting algorithm. Like quicksort, it was developed by
Tony Hoare Sir Charles Antony Richard Hoare (Tony Hoare or C. A. R. Hoare) (born 11 January 1934) is a British computer scientist who has made foundational contributions to programming languages, algorithms, operating systems, formal verification, and c ...
, and thus is also known as Hoare's selection algorithm. Like quicksort, it is efficient in practice and has good average-case performance, but has poor worst-case performance. Quickselect and its variants are the selection algorithms most often used in efficient real-world implementations. Quickselect uses the same overall approach as quicksort, choosing one element as a pivot and partitioning the data in two based on the pivot, accordingly as less than or greater than the pivot. However, instead of recursing into both sides, as in quicksort, quickselect only recurses into one side – the side with the element it is searching for. This reduces the average complexity from O(n\log n) to O(n), with a worst case of O(n^2). As with quicksort, quickselect is generally implemented as an
in-place algorithm In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the output ...
, and beyond selecting the th element, it also partially sorts the data. See
selection algorithm In computer science, a selection algorithm is an algorithm for finding the ''k''th smallest number in a list or array; such a number is called the ''k''th ''order statistic''. This includes the cases of finding the minimum, maximum, and median el ...
for further discussion of the connection with sorting.


Algorithm

In quicksort, there is a subprocedure called partition that can, in linear time, group a list (ranging from indices left to right) into two parts: those less than a certain element, and those greater than or equal to the element. Here is pseudocode that performs a partition about the element list ivotIndex/code>: function partition(list, left, right, pivotIndex) is pivotValue := list ivotIndex swap list ivotIndexand list ight ''// Move pivot to end'' storeIndex := left for i from left to right − 1 do if list < pivotValue then swap list toreIndexand list increment storeIndex swap list ightand list toreIndex ''// Move pivot to its final place'' return storeIndex This is known as the Lomuto partition scheme, which is simpler but less efficient than Hoare's original partition scheme. In quicksort, we recursively sort both branches, leading to best-case O(n\log n) time. However, when doing selection, we already know which partition our desired element lies in, since the pivot is in its final sorted position, with all those preceding it in an unsorted order and all those following it in an unsorted order. Therefore, a single recursive call locates the desired element in the correct partition, and we build upon this for quickselect: ''// Returns the k-th smallest element of list within left..right inclusive'' ''// (i.e. left <= k <= right).'' function select(list, left, right, k) is if left = right then ''// If the list contains only one element,'' return list
eft A newt is a salamander in the subfamily Pleurodelinae. The terrestrial juvenile phase is called an eft. Unlike other members of the family Salamandridae, newts are semiaquatic, alternating between aquatic and terrestrial habitats. Not all aqu ...
''// return that element'' pivotIndex := ... ''// select a pivotIndex between left and right,'' ''// e.g.,'' left + floor(rand() % (right − left + 1)) pivotIndex := partition(list, left, right, pivotIndex) ''// The pivot is in its final sorted position'' if k = pivotIndex then return list else if k < pivotIndex then return select(list, left, pivotIndex − 1, k) else return select(list, pivotIndex + 1, right, k) ----Note the resemblance to quicksort: just as the minimum-based selection algorithm is a partial selection sort, this is a partial quicksort, generating and partitioning only O(\log n) of its O(n) partitions. This simple procedure has expected linear performance, and, like quicksort, has quite good performance in practice. It is also an
in-place algorithm In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the output ...
, requiring only constant memory overhead if
tail call In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recur ...
optimization is available, or if eliminating the
tail recursion In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recur ...
with a loop: function select(list, left, right, k) is loop if left = right then return list
eft A newt is a salamander in the subfamily Pleurodelinae. The terrestrial juvenile phase is called an eft. Unlike other members of the family Salamandridae, newts are semiaquatic, alternating between aquatic and terrestrial habitats. Not all aqu ...
pivotIndex := ... ''// select pivotIndex between left and right'' pivotIndex := partition(list, left, right, pivotIndex) if k = pivotIndex then return list else if k < pivotIndex then right := pivotIndex − 1 else left := pivotIndex + 1


Time complexity

Like quicksort, quickselect has good average performance, but is sensitive to the pivot that is chosen. If good pivots are chosen, meaning ones that consistently decrease the search set by a given fraction, then the search set decreases in size exponentially and by induction (or summing the
geometric series In mathematics, a geometric series is the sum of an infinite number of terms that have a constant ratio between successive terms. For example, the series :\frac \,+\, \frac \,+\, \frac \,+\, \frac \,+\, \cdots is geometric, because each succ ...
) one sees that performance is linear, as each step is linear and the overall time is a constant times this (depending on how quickly the search set reduces). However, if bad pivots are consistently chosen, such as decreasing by only a single element each time, then worst-case performance is quadratic: O(n^2). This occurs for example in searching for the maximum element of a set, using the first element as the pivot, and having sorted data. The probability of the worst-case occurring decreases exponentially with n, so quickselect has
almost certain In probability theory, an event (probability theory), event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty ...
O(n) time complexity.


Variants

The easiest solution is to choose a random pivot, which yields
almost certain In probability theory, an event (probability theory), event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty ...
linear time. Deterministically, one can use median-of-3 pivot strategy (as in the quicksort), which yields linear performance on partially sorted data, as is common in the real world. However, contrived sequences can still cause worst-case complexity;
David Musser David "Dave" Musser is a professor emeritus of computer science at the Rensselaer Polytechnic Institute in Troy, New York, United States. He is known for his work in generic programming, particularly as applied to C++, and his collaboration with Al ...
describes a "median-of-3 killer" sequence that allows an attack against that strategy, which was one motivation for his
introselect In computer science, introselect (short for "introspective selection") is a selection algorithm that is a hybrid of quickselect and median of medians which has fast average performance and optimal worst-case performance. Introselect is related t ...
algorithm. One can assure linear performance even in the worst case by using a more sophisticated pivot strategy; this is done in the
median of medians In computer science, the median of medians is an approximate (median) selection algorithm, frequently used to supply a good pivot for an exact selection algorithm, mainly the quickselect, that selects the ''k''th smallest element of an initially u ...
algorithm. However, the overhead of computing the pivot is high, and thus this is generally not used in practice. One can combine basic quickselect with median of medians as fallback to get both fast average case performance and linear worst-case performance; this is done in
introselect In computer science, introselect (short for "introspective selection") is a selection algorithm that is a hybrid of quickselect and median of medians which has fast average performance and optimal worst-case performance. Introselect is related t ...
. Finer computations of the average time complexity yield a worst case of n(2+2\log 2+o(1)) \leq 3.4n + o(n) for random pivots (in the case of the median; other ''k'' are faster).Blum-style analysis of Quickselect
David Eppstein David Arthur Eppstein (born 1963) is an American computer scientist and mathematician. He is a Distinguished Professor of computer science at the University of California, Irvine. He is known for his work in computational geometry, graph algorit ...
, October 9, 2007. The constant can be improved to 3/2 by a more complicated pivot strategy, yielding the
Floyd–Rivest algorithm In computer science, the Floyd-Rivest algorithm is a selection algorithm developed by Robert W. Floyd and Ronald L. Rivest that has an optimal expected number of comparisons within lower-order terms. It is functionally equivalent to quickselect, b ...
, which has average complexity of 1.5 n + O(n^) for median, with other ''k'' being faster.


See also

*
Floyd–Rivest algorithm In computer science, the Floyd-Rivest algorithm is a selection algorithm developed by Robert W. Floyd and Ronald L. Rivest that has an optimal expected number of comparisons within lower-order terms. It is functionally equivalent to quickselect, b ...
*
Introselect In computer science, introselect (short for "introspective selection") is a selection algorithm that is a hybrid of quickselect and median of medians which has fast average performance and optimal worst-case performance. Introselect is related t ...
*
Median of medians In computer science, the median of medians is an approximate (median) selection algorithm, frequently used to supply a good pivot for an exact selection algorithm, mainly the quickselect, that selects the ''k''th smallest element of an initially u ...


References

{{reflist


External links

*
qselect
, ''Quickselect algorithm in Matlab,'' Manolis Lourakis Selection algorithms