HOME

TheInfoList



OR:

Bucket sort, or bin sort, is a
sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a List (computing), list into an Total order, order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. ...
that works by distributing the elements of an
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
into a number of
bucket A bucket is typically a watertight, vertical Cylinder (geometry), cylinder or Truncation (geometry), truncated Cone (geometry), cone or square, with an open top and a flat bottom, attached to a semicircular carrying handle (grip), handle called ...
s. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a
distribution sort In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is import ...
, a generalization of
pigeonhole sort __NOTOC__ Pigeonhole sorting is a sorting algorithm that is suitable for sorting lists of elements where the number ''n'' of elements and the length ''N'' of the range of possible key values are approximately the same. It requires O(''n'' + ''N'' ...
that allows multiple keys per bucket, and is a cousin of
radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a
comparison sort A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should occ ...
algorithm. The
computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) ...
depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed. Bucket sort works as follows: # Set up an array of initially empty "buckets". # Scatter: Go over the original array, putting each object in its bucket. # Sort each non-empty bucket. # Gather: Visit the buckets in order and put all elements back into the original array.


Pseudocode

function bucketSort(array, k) is buckets ← new array of k empty lists M ← 1 + the maximum key value in the array for i = 0 to length(array) do insert ''array ' into ''buckets loor(k_×_array[i/_M).html" ;"title=".html" ;"title="loor(k × array[i">loor(k × array[i/ M)">.html" ;"title="loor(k × array[i">loor(k × array[i/ M)' for i = 0 to k do nextSort(buckets[i]) return the concatenation of buckets[0], ...., buckets[k] ''Let array'' denote the array to be sorted and ''k'' denote the number of buckets to use. One can compute the maximum key value in
linear time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
by iterating over all the keys once. The
floor function In mathematics and computer science, the floor function is the function that takes as input a real number , and gives as output the greatest integer less than or equal to , denoted or . Similarly, the ceiling function maps to the least int ...
must be used to convert a floating number to an integer ( and possibly casting of datatypes too ). The function ''nextSort'' is a sorting function used to sort each bucket. Conventionally,
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
is used, but other algorithms could be used as well, such as ''
selection sort In computer science, selection sort is an in-place comparison sorting algorithm. It has an O(''n''2) time complexity, which makes it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is no ...
'' or ''
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
''. Using ''bucketSort'' itself as ''nextSort'' produces a relative of
radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
; in particular, the case ''n = 2'' corresponds to
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
(although potentially with poor pivot choices).


Analysis


Worst-case analysis

When the input contains several keys that are close to each other (clustering), those elements are likely to be placed in the same bucket, which results in some buckets containing more elements than average. The worst-case scenario occurs when all the elements are placed in a single bucket. The overall performance would then be dominated by the algorithm used to sort each bucket, for example O(n^2)
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
or O(n \log(n))
comparison sort A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should occ ...
algorithms, such as
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
.


Average-case analysis

Consider the case that the input is uniformly distributed. The first step, which is initialize the buckets and find the maximum key value in the array, can be done in O(n) time. If division and multiplication can be done in constant time, then scattering each element to its bucket also costs O(n). Assume insertion sort is used to sort each bucket, then the third step costs O(\textstyle \sum_^k \displaystyle n_i^2), where n_i is the length of the bucket indexed i. Since we are concerning the average time, the expectation E(n_i^2) has to be evaluated instead. Let X_ be the random variable that is 1 if element j is placed in bucket i, and 0 otherwise. We have n_i = \sum_^n X_. Therefore, :\begin E(n_i^2) & = E\left(\sum_^n X_ \sum_^n X_\right) \\ & = E\left(\sum_^n \sum_^n X_X_\right) \\ & = E\left(\sum_^n X_^2\right) + E\left(\sum_\sum_X_X_\right) \end The last line separates the summation into the case j=k and the case j\neq k. Since the chance of an object distributed to bucket i is 1/k, X_ is 1 with probability 1/k and 0 otherwise. :E(X_^2) = 1^2\cdot \left(\frac\right) + 0^2\cdot \left(1-\frac\right) = \frac :E(X_X_) = 1\cdot \left(\frac\right)\left(\frac\right) = \frac With the summation, it would be :E\left(\sum_^n X_^2\right) + E\left(\sum_\sum_X_X_\right) = n\cdot\frac + n(n-1)\cdot\frac = \frac Finally, the complexity would be O\left(\sum_^kE(n_i^2)\right) = O\left(\sum_^k \frac\right) = O\left(\frac+n\right) . The last step of bucket sort, which is concatenating all the sorted objects in each buckets, requires O(k) time. Therefore, the total complexity is O\left(n+\frac+k\right). Note that if k is chosen to be k = \Theta(n), then bucket sort runs in O(n) average time, given a uniformly distributed input.


Optimizations

A common optimization is to put the unsorted elements of the buckets back in the original array ''first'', then run
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
over the complete array; because insertion sort's runtime is based on how far each element is from its final position, the number of comparisons remains relatively small, and the memory hierarchy is better exploited by storing the list contiguously in memory. If the input distribution is known or can be estimated, buckets can often be chosen which contain constant density (rather than merely having constant size). This allows O(n) average time complexity even without uniformly distributed input.


Variants


Generic bucket sort

The most common variant of bucket sort operates on a list of ''n'' numeric inputs between zero and some maximum value ''M'' and divides the value range into ''n'' buckets each of size ''M''/''n''. If each bucket is sorted using
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
, the sort can be shown to run in expected linear time (where the average is taken over all possible inputs). However, the performance of this sort degrades with clustering; if many values occur close together, they will all fall into a single bucket and be sorted slowly. This performance degradation is avoided in the original bucket sort algorithm by assuming that the input is generated by a random process that distributes elements uniformly over the interval ''[0,1)''.


ProxmapSort

Similar to generic bucket sort as described above, ProxmapSort works by dividing an array of keys into subarrays via the use of a "map key" function that preserves a partial ordering on the keys; as each key is added to its subarray, insertion sort is used to keep that subarray sorted, resulting in the entire array being in sorted order when ProxmapSort completes. ProxmapSort differs from bucket sorts in its use of the map key to place the data approximately where it belongs in sorted order, producing a "proxmap" — a proximity mapping — of the keys.


Histogram sort

Another variant of bucket sort known as histogram sort or
counting sort In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess dis ...
adds an initial pass that counts the number of elements that will fall into each bucket using a count array. Using this information, the array values can be arranged into a sequence of buckets in-place by a sequence of exchanges, leaving no space overhead for bucket storage.


Postman's sort

The Postman's sort is a variant of bucket sort that takes advantage of a hierarchical structure of elements, typically described by a set of attributes. This is the algorithm used by letter-sorting machines in
post office A post office is a public facility and a retailer that provides mail services, such as accepting letters and parcels, providing post office boxes, and selling postage stamps, packaging, and stationery. Post offices may offer additional serv ...
s: mail is sorted first between domestic and international; then by state, province or territory; then by destination post office; then by routes, etc. Since keys are not compared against each other, sorting time is O(''cn''), where ''c'' depends on the size of the key and number of buckets. This is similar to a
radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
that works "top down," or "most significant digit first."


Shuffle sort

The shuffle sortA revolutionary new sort from John Cohen Nov 26, 1997
/ref> is a variant of bucket sort that begins by removing the first 1/8 of the ''n'' items to be sorted, sorts them recursively, and puts them in an array. This creates ''n''/8 "buckets" to which the remaining 7/8 of the items are distributed. Each "bucket" is then sorted, and the "buckets" are concatenated into a sorted array.


Comparison with other sorting algorithms

Bucket sort can be seen as a generalization of
counting sort In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess dis ...
; in fact, if each bucket has size 1 then bucket sort degenerates to counting sort. The variable bucket size of bucket sort allows it to use O(''n'') memory instead of O(''M'') memory, where ''M'' is the number of distinct values; in exchange, it gives up counting sort's O(''n'' + ''M'') worst-case behavior. Bucket sort with two buckets is effectively a version of
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
where the pivot value is always selected to be the middle value of the value range. While this choice is effective for uniformly distributed inputs, other means of choosing the pivot in quicksort such as randomly selected pivots make it more resistant to clustering in the input distribution. The ''n''-way
mergesort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
algorithm also begins by distributing the list into ''n'' sublists and sorting each one; however, the sublists created by mergesort have overlapping value ranges and so cannot be recombined by simple concatenation as in bucket sort. Instead, they must be interleaved by a merge algorithm. However, this added expense is counterbalanced by the simpler scatter phase and the ability to ensure that each sublist is the same size, providing a good worst-case time bound. Top-down
radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
can be seen as a special case of bucket sort where both the range of values and the number of buckets is constrained to be a power of two. Consequently, each bucket's size is also a power of two, and the procedure can be applied recursively. This approach can accelerate the scatter phase, since we only need to examine a prefix of the bit representation of each element to determine its bucket.


References

* Paul E. Blac
"Postman's Sort"
from
Dictionary of Algorithms and Data Structures The NIST ''Dictionary of Algorithms and Data Structures'' is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data structures. For algorithms and ...
at
NIST The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into physical sci ...
. * Robert Rame
'"The Postman's Sort"
''C Users Journal'' Aug. 1992


External links





{{sorting Sorting algorithms Stable sorts Articles with example pseudocode