HOME

TheInfoList



OR:

Introsort or introspective sort is a
hybrid Hybrid may refer to: Science * Hybrid (biology), an offspring resulting from cross-breeding ** Hybrid grape, grape varieties produced by cross-breeding two ''Vitis'' species ** Hybridity, the property of a hybrid plant which is a union of two dif ...
sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a List (computing), list into an Total order, order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. ...
that provides both fast average performance and (asymptotically) optimal worst-case performance. It begins with
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
, it switches to
heapsort In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the ...
when the recursion depth exceeds a level based on (the
logarithm In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number  to the base  is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 o ...
of) the number of elements being sorted and it switches to
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Howev ...
when the number of elements is below some threshold. This combines the good parts of the three algorithms, with practical performance comparable to quicksort on typical data sets and worst-case O(''n'' log ''n'') runtime due to the heap sort. Since the three algorithms it uses are
comparison sort A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should occ ...
s, it is also a comparison sort. Introsort was invented by
David Musser David "Dave" Musser is a professor emeritus of computer science at the Rensselaer Polytechnic Institute in Troy, New York, United States. He is known for his work in generic programming, particularly as applied to C++, and his collaboration with Al ...
in , in which he also introduced
introselect In computer science, introselect (short for "introspective selection") is a selection algorithm that is a hybrid of quickselect and median of medians which has fast average performance and optimal worst-case performance. Introselect is related ...
, a hybrid
selection algorithm In computer science, a selection algorithm is an algorithm for finding the ''k''th smallest number in a list or array; such a number is called the ''k''th ''order statistic''. This includes the cases of finding the minimum, maximum, and median e ...
based on
quickselect In computer science, quickselect is a selection algorithm to find the ''k''th smallest element in an unordered list. It is also known as the kth order statistics . It is related to the quicksort sorting algorithm. Like quicksort, it was devel ...
(a variant of quicksort), which falls back to
median of medians In computer science, the median of medians is an approximate (median) selection algorithm, frequently used to supply a good pivot for an exact selection algorithm, mainly the quickselect, that selects the ''k''th smallest element of an initially u ...
and thus provides worst-case linear complexity, which is optimal. Both algorithms were introduced with the purpose of providing generic algorithms for the
C++ Standard Library The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard.ISO/IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the original ANSI C standard, it wa ...
which had both fast average performance and optimal worst-case performance, thus allowing the performance requirements to be tightened. Introsort is in place and not
stable A stable is a building in which livestock, especially horses, are kept. It most commonly means a building that is divided into separate stalls for individual animals and livestock. There are many different types of stables in use today; the ...
.


Pseudocode

If a heapsort implementation and partitioning functions of the type discussed in the
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
article are available, the introsort can be described succinctly as procedure sort(A : array): maxdepth ← ⌊log2(length(A))⌋ × 2 introsort(A, maxdepth) procedure introsort(A, maxdepth): n ← length(A) if n < 16: insertionsort(A) else if maxdepth = 0: heapsort(A) else: p ← partition(A) ''// assume this function does pivot selection, p is the final position of the pivot'' introsort(A :p-1 maxdepth - 1) introsort(A +1:n maxdepth - 1) The factor 2 in the maximum depth is arbitrary; it can be tuned for practical performance. denotes the array slice of items to including both and . The indices are assumed to start with 1 (the first element of the array is ).


Analysis

In quicksort, one of the critical operations is choosing the pivot: the element around which the list is partitioned. The simplest pivot selection algorithm is to take the first or the last element of the list as the pivot, causing poor behavior for the case of sorted or nearly sorted input.
Niklaus Wirth Niklaus Emil Wirth (born 15 February 1934) is a Swiss computer scientist. He has designed several programming languages, including Pascal (programming language), Pascal, and pioneered several classic topics in software engineering. In 1984, he w ...
's variant uses the middle element to prevent these occurrences, degenerating to O(''n''2) for contrived sequences. The median-of-3 pivot selection algorithm takes the median of the first, middle, and last elements of the list; however, even though this performs well on many real-world inputs, it is still possible to contrive a ''median-of-3 killer'' list that will cause dramatic slowdown of a quicksort based on this pivot selection technique. Musser reported that on a median-of-3 killer sequence of 100,000 elements, introsort's running time was 1/200 that of median-of-3 quicksort. Musser also considered the effect on caches of Sedgewick's delayed small sorting, where small ranges are sorted at the end in a single pass of
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Howev ...
. He reported that it could double the number of cache misses, but that its performance with
double-ended queue In computer science, a double-ended queue (abbreviated to deque, pronounced ''deck'', like "cheque") is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail). I ...
s was significantly better and should be retained for template libraries, in part because the gain in other cases from doing the sorts immediately was not great.


Implementations

Introsort or some variant is used in a number of
standard library In computer programming, a standard library is the library made available across implementations of a programming language. These libraries are conventionally described in programming language specifications; however, contents of a language's as ...
sort functions, including some C++ sort implementations. The June 2000
SGI SGI may refer to: Companies *Saskatchewan Government Insurance *Scientific Games International, a gambling company *Silicon Graphics, Inc., a former manufacturer of high-performance computing products *Silicon Graphics International, formerly Rac ...
C++
Standard Template Library The Standard Template Library (STL) is a Library (computer science), software library originally designed by Alexander Stepanov for the C++ programming language that influenced many parts of the C++ Standard Library. It provides four components ...
br>stl_algo.h
implementation of
unstable sort In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important ...
uses the Musser introsort approach with the recursion depth to switch to heapsort passed as a parameter, median-of-3 pivot selection and the Knuth final insertion sort pass for partitions smaller than 16. The
GNU Standard C++ library The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
is similar: uses introsort with a maximum depth of 2×log2 ''n'', followed by an
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Howev ...
on partitions smaller than 16. LLVM libc++ also uses introsort with a maximum depth of 2×log2 ''n'', however the size limit for
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Howev ...
is different for different data types (30 if swaps are trivial, 6 otherwise). Also, arrays with sizes up to 5 are handled separately. Kutenin (2022) provides an overview for some changes made by LLVM, with a focus on the 2022 fix for quadraticness. The Microsoft .NET Framework
Class Library In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subro ...
, starting from version 4.5 (2012), uses introsort instead of simple quicksort. Go uses introsort with small modification: for slices of 12 or less elements it uses
Shellsort Shellsort, also known as Shell sort or Shell's method, is an in-place comparison sort. It can be seen as either a generalization of sorting by exchange ( bubble sort) or sorting by insertion (insertion sort). The method starts by sorting pairs o ...
instead of
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Howev ...
, and more advanced median of three medians of three pivot selection for quicksort.
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
, starting from version 14 (2020), uses a hybrid sorting algorithm that uses merge sort for highly structured arrays (arrays that are composed of a small number of sorted subarrays) and introsort otherwise to sort arrays of ints, longs, floats and doubles.


Variants


pdqsort

Pattern-defeating quicksort (pdqsort) is a variant of introsort incorporating the following improvements: * Median-of-three pivoting, * "BlockQuicksort" partitioning technique to mitigates branch misprediction penalities, * Linear time performance for certain input patterns (
adaptive sort A sorting algorithm falls into the adaptive sort family if it takes advantage of existing order in its input. It benefits from the presortedness in the input sequence – or a limited amount of disorder for various definitions of measures of disord ...
), * Use element shuffling on bad cases before trying the slower heapsort. pdqsort is used by
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH ...
, GAP, and the C++ library Boost.


References


General

* * Niklaus Wirth. ''Algorithms and Data Structures''. Prentice-Hall, Inc., 1985. . {{sorting Comparison sorts Articles with example pseudocode