HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, integer sorting is the
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
ic problem of
sorting Sorting refers to ordering data in an increasing or decreasing manner according to some linear relationship among the data items. # ordering: arranging items in a sequence ordered by some criterion; # categorizing: grouping items with similar pro ...
a collection of data values by
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
numbers,
rational number In mathematics, a rational number is a number that can be expressed as the quotient or fraction of two integers, a numerator and a non-zero denominator . For example, is a rational number, as is every integer (e.g. ). The set of all ration ...
s, or text strings.. The ability to perform integer arithmetic on the keys allows integer sorting algorithms to be faster than
comparison sort A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should occ ...
ing algorithms in many cases, depending on the details of which operations are allowed in the model of computing and how large the integers to be sorted are. Integer sorting algorithms including
pigeonhole sort __NOTOC__ Pigeonhole sorting is a sorting algorithm that is suitable for sorting lists of elements where the number ''n'' of elements and the length ''N'' of the range of possible key values are approximately the same. It requires O(''n'' + ''N'' ...
,
counting sort In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess di ...
, and
radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
are widely used and practical. Other integer sorting algorithms with smaller worst-case time bounds are not believed to be practical for computer architectures with 64 or fewer bits per word. Many such algorithms are known, with performance depending on a combination of the number of items to be sorted, number of bits per key, and number of bits per word of the computer performing the sorting algorithm.


General considerations


Models of computation

Time bounds for integer sorting algorithms typically depend on three parameters: the number of data values to be sorted, the magnitude of the largest possible key to be sorted, and the number of bits that can be represented in a single machine word of the computer on which the algorithm is to be performed. Typically, it is assumed that ; that is, that machine words are large enough to represent an index into the sequence of input data, and also large enough to represent a single key. Integer sorting algorithms are usually designed to work in either the
pointer machine In theoretical computer science a pointer machine is an "atomistic" ''abstract computational machine'' model akin to the random-access machine. A pointer algorithm is an algorithm restricted to the pointer machine model. Depending on the type, a ...
or
random access machine In computer science, random-access machine (RAM) is an abstract machine in the general class of register machines. The RAM is very similar to the counter machine but with the added capability of 'indirect addressing' of its registers. Like the cou ...
models of computing. The main difference between these two models is in how memory may be addressed. The random access machine allows any value that is stored in a register to be used as the address of memory read and write operations, with unit cost per operation. This ability allows certain complex operations on data to be implemented quickly using table lookups. In contrast, in the pointer machine model, read and write operations use addresses stored in pointers, and it is not allowed to perform arithmetic operations on these pointers. In both models, data values may be added, and bitwise Boolean operations and binary shift operations may typically also be performed on them, in unit time per operation. Different integer sorting algorithms make different assumptions, however, about whether integer multiplication is also allowed as a unit-time operation. Other more specialized models of computation such as the
parallel random access machine In computer science, a parallel random-access machine (parallel RAM or PRAM) is a shared memory architecture, shared-memory abstract machine. As its name indicates, the PRAM is intended as the parallel-computing analogy to the random-access machin ...
have also been considered. showed that in some cases the multiplications or table lookups required by some integer sorting algorithms could be replaced by customized operations that would be more easily implemented in hardware but that are not typically available on general-purpose computers. improved on this by showing how to replace these special operations by the
bit field A bit field is a data structure that consists of one or more adjacent bits which have been allocated for specific purposes, so that any single bit or group of bits within the structure can be set or inspected. A bit field is most commonly used to r ...
manipulation instructions already available on
Pentium Pentium is a brand used for a series of x86 architecture-compatible microprocessors produced by Intel. The original Pentium processor from which the brand took its name was first released on March 22, 1993. After that, the Pentium II and Pe ...
processors. In external memory models of computing, no known integer sorting algorithm is faster than comparison sorting. Researchers have shown that, in these models, restricted classes of algorithms that are limited in how they manipulate their keys cannot be faster than comparison sorting, and that an integer sorting algorithm that is faster than comparison sorting would imply the falsity of a standard conjecture in
network coding In computer networking, linear network coding is a program in which intermediate nodes transmit data from source nodes to sink nodes by means of linear combinations. Linear network coding may be used to improve a network's throughput, efficiency, ...
.


Sorting versus integer priority queues

A
priority queue In computer science, a priority queue is an abstract data-type similar to a regular queue or stack data structure in which each element additionally has a ''priority'' associated with it. In a priority queue, an element with high priority is se ...
is a data structure for maintaining a collection of items with numerical priorities, having operations for finding and removing the item with the minimum priority value. Comparison-based priority queues such as the
binary heap A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort. A bin ...
take logarithmic time per update, but other structures such as the
van Emde Boas tree A van Emde Boas tree (), also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with -bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boa ...
or
bucket queue A bucket queue is a data structure that implements the priority queue abstract data type: it maintains a dynamic collection of elements with numerical priorities and allows quick access to the element with minimum (or maximum) priority. In the bu ...
may be faster for inputs whose priorities are small integers. These data structures can be used in the
selection sort In computer science, selection sort is an in-place comparison sorting algorithm. It has an O(''n''2) time complexity, which makes it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is not ...
algorithm, which sorts a collection of elements by repeatedly finding and removing the smallest element from the collection, and returning the elements in the order they were found. A priority queue can be used to maintain the collection of elements in this algorithm, and the time for this algorithm on a collection of elements can be bounded by the time to initialize the priority queue and then to perform find and remove operations. For instance, using a
binary heap A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort. A bin ...
as a priority queue in selection sort leads to the
heap sort In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the ...
algorithm, a comparison sorting algorithm that takes time. Instead, using selection sort with a bucket queue gives a form of
pigeonhole sort __NOTOC__ Pigeonhole sorting is a sorting algorithm that is suitable for sorting lists of elements where the number ''n'' of elements and the length ''N'' of the range of possible key values are approximately the same. It requires O(''n'' + ''N'' ...
, and using van Emde Boas trees or other integer priority queues leads to other fast integer sorting algorithms. Instead of using an integer priority queue in a sorting algorithm, it is possible to go the other direction, and use integer sorting algorithms as subroutines within an integer priority queue data structure. used this idea to show that, if it is possible to perform integer sorting in time per key, then the same time bound applies to the time per insertion or deletion operation in a priority queue data structure. Thorup's reduction is complicated and assumes the availability of either fast multiplication operations or table lookups, but he also provides an alternative priority queue using only addition and Boolean operations with time per operation, at most multiplying the time by an
iterated logarithm In computer science, the iterated logarithm of n, written  n (usually read "log star"), is the number of times the logarithm function must be iteratively applied before the result is less than or equal to 1. The simplest formal definition i ...
.


Usability

The classical integer sorting algorithms of
pigeonhole sort __NOTOC__ Pigeonhole sorting is a sorting algorithm that is suitable for sorting lists of elements where the number ''n'' of elements and the length ''N'' of the range of possible key values are approximately the same. It requires O(''n'' + ''N'' ...
,
counting sort In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess di ...
, and
radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
are widely used and practical. Much of the subsequent research on integer sorting algorithms has focused less on practicality and more on theoretical improvements in their
worst case analysis Worst case analysis was, from 1978 until 1986, a doctrine under which mandated that an environmental impact statement include such an analysis: It led to a 1989 SCOTUS decision, written by John Paul Stevens and reported in ''Robertson v. Methow ...
, and the algorithms that come from this line of research are not believed to be practical for current
64-bit In computer architecture, 64-bit Integer (computer science), integers, memory addresses, or other Data (computing), data units are those that are 64 bits wide. Also, 64-bit central processing unit, CPUs and arithmetic logic unit, ALUs are those ...
computer architectures, although experiments have shown that some of these methods may be an improvement on radix sorting for data with 128 or more bits per key.. Additionally, for large data sets, the near-random
memory access pattern In computing, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes memory on secondary storage. These patterns differ in the level of locality of reference and drastically affect cache performa ...
s of many integer sorting algorithms can handicap them compared to comparison sorting algorithms that have been designed with the
memory hierarchy In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlli ...
in mind. Integer sorting provides one of the six
benchmark Benchmark may refer to: Business and economics * Benchmarking, evaluating performance within organizations * Benchmark price * Benchmark (crude oil), oil-specific practices Science and technology * Benchmark (surveying), a point of known elevati ...
s in the
DARPA The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adv ...
High Productivity Computing Systems High Productivity Computing Systems (HPCS) is a DARPA project for developing a new generation of economically viable high productivity computing systems for national security and industry in the 2002–10 timeframe. The HPC Challenge (High-perfo ...
Discrete Mathematics benchmark suite, and one of eleven benchmarks in the NAS Parallel Benchmarks suite.


Practical algorithms

Pigeonhole sort __NOTOC__ Pigeonhole sorting is a sorting algorithm that is suitable for sorting lists of elements where the number ''n'' of elements and the length ''N'' of the range of possible key values are approximately the same. It requires O(''n'' + ''N'' ...
or
counting sort In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess di ...
can both sort data items having keys in the range from to in time . In
pigeonhole sort __NOTOC__ Pigeonhole sorting is a sorting algorithm that is suitable for sorting lists of elements where the number ''n'' of elements and the length ''N'' of the range of possible key values are approximately the same. It requires O(''n'' + ''N'' ...
(often called bucket sort), pointers to the data items are distributed to a table of buckets, represented as
collection Collection or Collections may refer to: * Cash collection, the function of an accounts receivable department * Collection (church), money donated by the congregation during a church service * Collection agency, agency to collect cash * Collectio ...
data types such as
linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...
s, using the keys as indices into the table. Then, all of the buckets are concatenated together to form the output list. Counting sort uses a table of counters in place of a table of buckets, to determine the number of items with each key. Then, a
prefix sum In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers is a second sequence of numbers , the sums of prefixes ( running totals) of the input sequence: : : : :... For instance, the prefix sums ...
computation is used to determine the range of positions in the sorted output at which the values with each key should be placed. Finally, in a second pass over the input, each item is moved to its key's position in the output array. Both algorithms involve only simple loops over the input data (taking time ) and over the set of possible keys (taking time ), giving their overall time bound.
Radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
is a sorting algorithm that works for larger keys than pigeonhole sort or counting sort by performing multiple passes over the data. Each pass sorts the input using only part of the keys, by using a different sorting algorithm (such as pigeonhole sort or counting sort) that is suited only for small keys. To break the keys into parts, the radix sort algorithm computes the
positional notation Positional notation (or place-value notation, or positional numeral system) usually denotes the extension to any base of the Hindu–Arabic numeral system (or decimal system). More generally, a positional system is a numeral system in which the ...
for each key, according to some chosen
radix In a positional numeral system, the radix or base is the number of unique digits, including the digit zero, used to represent numbers. For example, for the decimal/denary system (the most common system in use today) the radix (base number) is t ...
; then, the part of the key used for the th pass of the algorithm is the th digit in the positional notation for the full key, starting from the least significant digit and progressing to the most significant. For this algorithm to work correctly, the sorting algorithm used in each pass over the data must be
stable A stable is a building in which livestock, especially horses, are kept. It most commonly means a building that is divided into separate stalls for individual animals and livestock. There are many different types of stables in use today; the ...
: items with equal digits should not change positions with each other. For greatest efficiency, the radix should be chosen to be near the number of data items, . Additionally, using a
power of two A power of two is a number of the form where is an integer, that is, the result of exponentiation with number two as the base and integer  as the exponent. In a context where only integers are considered, is restricted to non-negative ...
near as the radix allows the keys for each pass to be computed quickly using only fast binary shift and mask operations. With these choices, and with pigeonhole sort or counting sort as the base algorithm, the radix sorting algorithm can sort data items having keys in the range from to in time .


Theoretical algorithms

Many integer sorting algorithms have been developed whose theoretical analysis shows them to behave better than comparison sorting, pigeonhole sorting, or radix sorting for large enough combinations of the parameters defining the number of items to be sorted, range of keys, and machine word size. Which algorithm has the best performance depends on the values of these parameters. However, despite their theoretical advantages, these algorithms are not an improvement for the typical ranges of these parameters that arise in practical sorting problems.


Algorithms for small keys

A
Van Emde Boas tree A van Emde Boas tree (), also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with -bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boa ...
may be used as a priority queue to sort a set of keys, each in the range from to , in time . This is a theoretical improvement over radix sorting when is sufficiently large. However, in order to use a Van Emde Boas tree, one either needs a directly addressable memory of words, or one needs to simulate it using a
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
, reducing the space to linear but making the algorithm be randomized. Another priority queue with similar performance (including the need for randomization in the form of hash tables) is the
Y-fast trie In computer science, a y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time ''O''(log log ''M''), using ''O''(''n'') space, where ''n'' is the number ...
of . A more sophisticated technique with a similar flavor and with better theoretical performance was developed by . They observed that each pass of radix sort can be interpreted as a range reduction technique that, in linear time, reduces the maximum key size by a factor of ; instead, their technique reduces the key size to the square root of its previous value (halving the number of bits needed to represent a key), again in linear time. As in radix sort, they interpret the keys as two-digit base- numbers for a base that is approximately . They then group the items to be sorted into buckets according to their high digits, in linear time, using either a large but uninitialized direct addressed memory or a hash table. Each bucket has a representative, the item in the bucket with the largest key; they then sort the list of items using as keys the high digits for the representatives and the low digits for the non-representatives. By grouping the items from this list into buckets again, each bucket may be placed into sorted order, and by extracting the representatives from the sorted list the buckets may be concatenated together into sorted order. Thus, in linear time, the sorting problem is reduced to another recursive sorting problem in which the keys are much smaller, the square root of their previous magnitude. Repeating this range reduction until the keys are small enough to bucket sort leads to an algorithm with running time . A complicated randomized algorithm of in the
word RAM In theoretical computer science, the word RAM (word random-access machine) model is a model of computation in which a random-access machine does bitwise operations on a word of bits. Michael Fredman and Dan Willard created it in 1990 to simulate p ...
model of computation In computer science, and more specifically in computability theory and computational complexity theory, a model of computation is a model which describes how an output of a mathematical function is computed given an input. A model describes how ...
allows these time bounds to be reduced even farther, to .


Algorithms for large words

An integer sorting algorithm is said to be ''non-conservative'' if it requires a word size that is significantly larger than . As an extreme instance, if , and all keys are distinct, then the set of keys may be sorted in linear time by representing it as a bitvector, with a 1 bit in position when is one of the input keys, and then repeatedly removing the least significant bit. The non-conservative packed sorting algorithm of uses a subroutine, based on
Ken Batcher Ken Batcher, full name Kenneth Edward Batcher (December 1935 – August 2019) was an emeritus professor of Computer Science at Kent State University. He also worked as a computer architect at Goodyear Aerospace in Akron, Ohio for 28 years. Earl ...
's bitonic sorting network, for
merging Merge, merging, or merger may refer to: Concepts * Merge (traffic), the reduction of the number of lanes on a road * Merge (linguistics), a basic syntactic operation in generative syntax in the Minimalist Program * Merger (politics), the comb ...
two sorted sequences of keys that are each short enough to be packed into a single machine word. The input to the packed sorting algorithm, a sequence of items stored one per word, is transformed into a packed form, a sequence of words each holding multiple items in sorted order, by using this subroutine repeatedly to double the number of items packed into each word. Once the sequence is in packed form, Albers and Hagerup use a form of
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same i ...
to sort it; when two sequences are being merged to form a single longer sequence, the same bitonic sorting subroutine can be used to repeatedly extract packed words consisting of the smallest remaining elements of the two sequences. This algorithm gains enough of a speedup from its packed representation to sort its input in linear time whenever it is possible for a single word to contain keys; that is, when for some constant .


Algorithms for few items

Pigeonhole sort, counting sort, radix sort, and Van Emde Boas tree sorting all work best when the key size is small; for large enough keys, they become slower than comparison sorting algorithms. However, when the key size or the word size is very large relative to the number of items (or equivalently when the number of items is small), it may again become possible to sort quickly, using different algorithms that take advantage of the parallelism inherent in the ability to perform arithmetic operations on large words. An early result in this direction was provided by using the cell-probe model of computation (an artificial model in which the complexity of an algorithm is measured only by the number of memory accesses it performs). Building on their work, described two data structures, the Q-heap and the atomic heap, that are implementable on a random access machine. The Q-heap is a bit-parallel version of a binary
trie In computer science, a trie, also called digital tree or prefix tree, is a type of ''k''-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes def ...
, and allows both priority queue operations and successor and predecessor queries to be performed in constant time for sets of items, where is the size of the precomputed tables needed to implement the data structure. The atomic heap is a
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for n ...
in which each tree node is represented as a Q-heap; it allows constant time priority queue operations (and therefore sorting) for sets of items. provide a randomized algorithm called signature sort that allows for linear time sorting of sets of up to items at a time, for any constant . As in the algorithm of Kirkpatrick and Reisch, they perform range reduction using a representation of the keys as numbers in base  for a careful choice of . Their range reduction algorithm replaces each digit by a signature, which is a hashed value with bits such that different digit values have different signatures. If is sufficiently small, the numbers formed by this replacement process will be significantly smaller than the original keys, allowing the non-conservative packed sorting algorithm of to sort the replaced numbers in linear time. From the sorted list of replaced numbers, it is possible to form a compressed
trie In computer science, a trie, also called digital tree or prefix tree, is a type of ''k''-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes def ...
of the keys in linear time, and the children of each node in the trie may be sorted recursively using only keys of size , after which a tree traversal produces the sorted order of the items.


Trans-dichotomous algorithms

introduced the
transdichotomous model In computational complexity theory, and more specifically in the analysis of algorithms with integer data, the transdichotomous model is a variation of the random access machine in which the machine word size is assumed to match the problem size. ...
of analysis for integer sorting algorithms, in which nothing is assumed about the range of the integer keys and one must bound the algorithm's performance by a function of the number of data values alone. Alternatively, in this model, the running time for an algorithm on a set of items is assumed to be the
worst case In computer science, best, worst, and average cases of a given algorithm express what the resource usage is ''at least'', ''at most'' and ''on average'', respectively. Usually the resource being considered is running time, i.e. time complexity, b ...
running time for any possible combination of values of and . The first algorithm of this type was Fredman and Willard's fusion tree sorting algorithm, which runs in time ; this is an improvement over comparison sorting for any choice of and . An alternative version of their algorithm that includes the use of random numbers and integer division operations improves this to . Since their work, even better algorithms have been developed. For instance, by repeatedly applying the Kirkpatrick–Reisch range reduction technique until the keys are small enough to apply the Albers–Hagerup packed sorting algorithm, it is possible to sort in time ; however, the range reduction part of this algorithm requires either a large memory (proportional to ) or randomization in the form of hash tables. showed how to sort in randomized time . Their technique involves using ideas related to signature sorting to partition the data into many small sublists, of a size small enough that signature sorting can sort each of them efficiently. It is also possible to use similar ideas to sort integers deterministically in time and linear space. Using only simple arithmetic operations (no multiplications or table lookups) it is possible to sort in randomized expected time or deterministically in time for any constant .


References

;Footnotes ;Secondary sources *. *. *. ;Primary sources * *. *. *. *. *. *. *. *. Cited by as an early source for
radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. {{refend Sorting algorithms