In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, counting sort is an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for
sorting a collection of objects according to keys that are small positive
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the languag ...
s; that is, it is an
integer sorting
In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by integer keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numb ...
algorithm. It operates by counting the number of objects that possess distinct key values, and applying prefix sum on those counts to determine the positions of each key value in the output sequence. Its running time is linear in the number of items and the difference between the maximum key value and the minimum key value, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items. It is often used as a subroutine in
radix sort
In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
, another sorting algorithm, which can handle larger keys more efficiently.
[. See also the historical notes on page 181.][.][.]
Counting sort is not a
comparison sort
A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should occ ...
; it uses key values as indexes into an array and the
lower bound
In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is greater than or equal to every element of .
Dually, a lower bound or minorant of is defined to be an element ...
for comparison sorting will not apply.
Bucket sort
Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the b ...
may be used in lieu of counting sort, and entails a similar time analysis. However, compared to counting sort, bucket sort requires
linked list
In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...
s,
dynamic array
In computer science, a dynamic array, growable array, resizable array, dynamic table, mutable array, or array list is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard lib ...
s, or a large amount of pre-allocated memory to hold the sets of items within each bucket, whereas counting sort stores a single number (the count of items) per bucket.
Input and output assumptions
In the most general case, the input to counting sort consists of a
collection
Collection or Collections may refer to:
* Cash collection, the function of an accounts receivable department
* Collection (church), money donated by the congregation during a church service
* Collection agency, agency to collect cash
* Collectio ...
of items, each of which has a non-negative integer key whose maximum value is at most .
In some descriptions of counting sort, the input to be sorted is assumed to be more simply a sequence of integers itself,
but this simplification does not accommodate many applications of counting sort. For instance, when used as a subroutine in
radix sort
In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
, the keys for each call to counting sort are individual digits of larger item keys; it would not suffice to return only a sorted list of the key digits, separated from the items.
In applications such as in radix sort, a bound on the maximum key value will be known in advance, and can be assumed to be part of the input to the algorithm. However, if the value of is not already known then it may be computed, as a first step, by an additional loop over the data to determine the maximum key value.
The output is an
array
An array is a systematic arrangement of similar objects, usually in rows and columns.
Things called an array include:
{{TOC right
Music
* In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
of the elements ordered by their keys. Because of its application to radix sorting, counting sort must be a
stable sort
In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important ...
; that is, if two elements share the same key, their relative order in the output array and their relative order in the input array should match.
Pseudocode
In pseudocode, the algorithm may be expressed as:
function CountingSort(input, ''k'')
count ← array of ''k'' + 1 zeros
output ← array of same length as input
for ''i'' = 0 to length(input) - 1 do
''j'' = key(input
'i''
count
'j''= count
'j''+ 1
for ''i'' = 1 to ''k'' do
count
'i''= count
'i''+ count
'i'' - 1
for ''i'' = length(input) - 1 down to 0 do
''j'' = key(input
'i''
count
'j''= count
'j''- 1
output
ount[''j''_=_input[''i''.html" ;"title="'j''.html" ;"title="ount[''j''">ount[''j'' = input[''i''">'j''.html" ;"title="ount[''j''">ount[''j'' = input[''i''
return output
Here
input
is the input array to be sorted,
key
returns the numeric key of each item in the input array,
count
is an auxiliary array used first to store the numbers of items with each key, and then (after the second loop) to store the positions where items with each key should be placed,
k
is the maximum value of the non-negative key values and
output
is the sorted output array.
In summary, the algorithm loops over the items in the first loop, computing a histogram of the number of times each key occurs within the
input
collection. After that in the second loop, it performs a prefix sum computation on
count
in order to determine, for each key, the position range where the items having that key should be placed; i.e. items of key
should be placed starting in position
count i">math>i/code>. Finally, in the third loop, it loops over the items of input
again, but in reverse order, moving each item into its sorted position in the output
array.
The relative order of items with equal keys is preserved here; i.e., this is a stable sort
In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important ...
.
Complexity analysis
Because the algorithm uses only simple for loops, without recursion or subroutine calls, it is straightforward to analyze. The initialization of the count array, and the second for loop which performs a prefix sum on the count array, each iterate at most times and therefore take time. The other two for loops, and the initialization of the output array, each take time. Therefore, the time for the whole algorithm is the sum of the times for these steps, .
Because it uses arrays of length and , the total space usage of the algorithm is also . For problem instances in which the maximum key value is significantly smaller than the number of items, counting sort can be highly space-efficient, as the only storage it uses other than its input and output arrays is the Count array which uses space .
Variant algorithms
If each item to be sorted is itself an integer, and used as key as well, then the second and third loops of counting sort can be combined; in the second loop, instead of computing the position where items with key i
should be placed in the output, simply append Count /code> copies of the number i
to the output.
This algorithm may also be used to eliminate duplicate keys, by replacing the Count
array with a bit vector
A bit array (also known as bitmask, bit map, bit set, bit string, or bit vector) is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level p ...
that stores a one
for a key that is present in the input and a zero
for a key that is not present. If additionally the items are the integer keys themselves, both second and third loops can be omitted entirely and the bit vector will itself serve as output, representing the values as offsets of the non-zero
entries, added to the range's lowest value. Thus the keys are sorted and the duplicates are eliminated in this variant just by being placed into the bit array.
For data in which the maximum key size is significantly smaller than the number of data items, counting sort may be parallelized by splitting the input into subarrays of approximately equal size, processing each subarray in parallel to generate a separate count array for each subarray, and then merging the count arrays. When used as part of a parallel radix sort algorithm, the key size (base of the radix representation) should be chosen to match the size of the split subarrays. The simplicity of the counting sort algorithm and its use of the easily parallelizable prefix sum primitive also make it usable in more fine-grained parallel algorithms.
As described, counting sort is not an in-place algorithm
In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the output ...
; even disregarding the count array, it needs separate input and output arrays. It is possible to modify the algorithm so that it places the items into sorted order within the same array that was given to it as the input, using only the count array as auxiliary storage; however, the modified in-place version of counting sort is not stable.
History
Although radix sorting itself dates back far longer,
counting sort, and its application to radix sorting, were both invented by Harold H. Seward in 1954.[. Section 5.2, Sorting by counting, pp. 75–80, and historical notes, p. 170.][.]
References
External links
Counting Sort html5 visualization
*.
{{DEFAULTSORT:Counting Sort
Sorting algorithms
Stable sorts