Flashsort
   HOME

TheInfoList



OR:

Flashsort is a distribution sorting algorithm showing linear computational complexity for uniformly distributed data sets and relatively little additional memory requirement. The original work was published in 1998 by Karl-Dietrich Neubert.


Concept

Flashsort is an efficient
in-place In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the output ...
implementation of histogram sort, itself a type of
bucket sort Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the b ...
. It assigns each of the input elements to one of ''buckets'', efficiently rearranges the input to place the buckets in the correct order, then sorts each bucket. The original algorithm sorts an input array as follows: # Using a first pass over the input or ''
a priori ("from the earlier") and ("from the later") are Latin phrases used in philosophy to distinguish types of knowledge, justification, or argument by their reliance on empirical evidence or experience. knowledge is independent from current ...
'' knowledge, find the minimum and maximum sort keys. # Linearly divide the range into buckets. # Make one pass over the input, counting the number of elements which fall into each bucket. (Neubert calls the buckets "classes" and the assignment of elements to their buckets "classification".) # Convert the counts of elements in each bucket to a
prefix sum In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers is a second sequence of numbers , the sums of prefixes ( running totals) of the input sequence: : : : :... For instance, the prefix sums ...
, where is the number of elements in bucket or less. ( and .) # Rearrange the input to all elements of each bucket are stored in positions where . # Sort each bucket using
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
. Steps 1–3 and 6 are common to any bucket sort, and can be improved using techniques generic to bucket sorts. In particular, the goal is for the buckets to be of approximately equal size ( elements each), with the ideal being division into
quantile In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile th ...
s. While the basic algorithm is a linear
interpolation sort Interpolation sort is a kind of bucket sort. It uses an interpolation formula to assign data to the bucket. A general interpolation formula is: Interpolation = INT(((Array - min) / (max - min)) * (ArraySize - 1)) Algorithm Interpolation sort ...
, if the input
distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations * Probability distribution, the probability of a particular value or value range of a vari ...
is known to be non-uniform, a non-linear division will more closely approximate this ideal. Likewise, the final sort can use any of a number of techniques, including a recursive flash sort. What distinguishes flash sort is step 5: an efficient in-place algorithm for collecting the elements of each bucket together in the correct relative order using only words of additional memory.


Memory efficient implementation

The Flashsort rearrangement phase operates in cycles. Elements start out "unclassified", then are moved to the correct bucket and considered "classified". The basic procedure is to choose an unclassified element, find its correct bucket, exchange it with an unclassified element there (which must exist, because we counted the size of each bucket ahead of time), mark it as classified, and then repeat with the just-exchanged unclassified element. Eventually, the element is exchanged with itself and the cycle ends. The details are easy to understand using two (word-sized) variables per bucket. The clever part is the elimination of one of those variables, allowing twice as many buckets to be used and therefore half as much time spent on the final sorting. To understand it with two variables per bucket, assume there are two arrays of additional words: is the (fixed) upper limit of bucket (and ), while is a (movable) index into bucket , so . We maintain the
loop invariant In computer science, a loop invariant is a property of a program loop that is true before (and after) each iteration. It is a logical assertion, sometimes checked within the code by an assertion call. Knowing its invariant(s) is essential in und ...
that each bucket is divided by into an unclassified prefix ( for have yet to be moved to their target buckets) and a classified suffix ( for are all in the correct bucket and will not be moved again). Initially and all elements are unclassified. As sorting proceeds, the are decremented until for all and all elements are classified into the correct bucket. Each round begins by finding the first incompletely classified bucket (which has ) and taking the first unclassified element in that bucket where . (Neubert calls this the "cycle leader".) Copy to a temporary variable and repeat: * Compute the bucket to which belongs. * Let be the location where will be stored. * Exchange with , i.e. store in while fetching the previous value thereby displaced. * Decrement to reflect the fact that is now correctly classified. * If , restart this loop with the new . * If , this round is over and find a new first unclassified element . * When there are no more unclassified elements, the distribution into buckets is complete. When implemented with two variables per bucket in this way, the choice of each round's starting point is in fact arbitrary; ''any'' unclassified element may be used as a cycle leader. The only requirement is that the cycle leaders can be found efficiently. Although the preceding description uses to find the cycle leaders, it is in fact possible to do without it, allowing the entire -word array to be eliminated. (After the distribution is complete, the bucket boundaries can be found in .) Suppose that we have classified all elements up to , and are considering as a potential new cycle leader. It is easy to compute its target bucket . By the loop invariant, it is classified if , and unclassified if is outside that range. The first
inequality Inequality may refer to: Economics * Attention inequality, unequal distribution of attention across users, groups of people, issues in etc. in attention economy * Economic inequality, difference in economic well-being between population groups * ...
is easy to test, but the second appears to require the value . It turns out that the
induction hypothesis Mathematical induction is a method for proving that a statement ''P''(''n'') is true for every natural number ''n'', that is, that the infinitely many cases ''P''(0), ''P''(1), ''P''(2), ''P''(3), ...  all hold. Informal metaphors help ...
that all elements up to are classified implies that , so it is not necessary to test the second inequality. Consider the bucket which position falls into. That is, . By the induction hypothesis, all elements below , which includes all buckets up to , are completely classified. I.e. no elements which belong in those buckets remain in the rest of the array. Therefore, it is not possible that . The only remaining case is , which implies , Q.E.D. Incorporating this, the flashsort distribution algorithm begins with as described above and . Then proceed: * If , the distribution is complete. * Given , compute the bucket to which it belongs. * If , then is unclassified. Copy it a temporary variable and: ** Let be the location where will be stored. ** Exchange with , i.e. store in while fetching the previous value thereby displaced. ** Decrement to reflect the fact that is now correctly classified. ** If , compute the bucket to which belongs and restart this (inner) loop with the new . * is now correctly classified. Increment and restart the (outer) loop. While saving memory, Flashsort has the disadvantage that it recomputes the bucket for many already-classified elements. This is already done twice per element (once during the bucket-counting phase and a second time when moving each element), but searching for the first unclassified element requires a third computation for most elements. This could be expensive if buckets are assigned using a more complex formula than simple linear interpolation. A variant reduces the number of computations from almost to at most by taking the ''last'' unclassified element in an unfinished bucket as cycle leader: * Maintain a variable identifying the first incompletely-classified bucket. Let to begin with, and when , the distribution is complete. * Let . If , increment and restart this loop. (.) * Compute the bucket to which belongs. * If , then and we are done with bucket . Increment and restart this loop. * If , the classification is trivial. Decrement and restart this loop. * If , then is unclassified. Perform the same classification loop as the previous case, then restart this loop. Most elements have their buckets computed only twice, except for the final element in each bucket, which is used to detect the completion of the following bucket. A small further reduction can be achieved by maintaining a count of unclassified elements and stopping when it reaches zero.


Performance

The only extra memory requirements are the auxiliary vector for storing bucket bounds and the constant number of other variables used. Further, each element is moved (via a temporary buffer, so two move operations) only once. However, this memory efficiency comes with the disadvantage that the array is accessed randomly, so cannot take advantage of a
data cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...
smaller than the whole array. As with all bucket sorts, performance depends critically on the balance of the buckets. In the ideal case of a balanced data set, each bucket will be approximately the same size. If the number of buckets is linear in the input size , each bucket has a constant size, so sorting a single bucket with an algorithm like insertion sort has complexity . The running time of the final insertion sorts is therefore . Choosing a value for , the number of buckets, trades off time spent classifying elements (high ) and time spent in the final insertion sort step (low ). For example, if is chosen proportional to , then the running time of the final insertion sorts is therefore . In the worst-case scenarios where almost all the elements are in a few buckets, the complexity of the algorithm is limited by the performance of the final bucket-sorting method, so degrades to . Variations of the algorithm improve worst-case performance by using better-performing sorts such as
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
or recursive flashsort on buckets which exceed a certain size limit. For with uniformly distributed random data, flashsort is faster than
heapsort In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the ...
for all and faster than quicksort for . It becomes about twice as fast as quicksort at . Note that these measurements were taken in the late 1990s, when memory hierarchies were much less dependent on cacheing. Due to the ''in situ'' permutation that flashsort performs in its classification process, flashsort is not
stable A stable is a building in which livestock, especially horses, are kept. It most commonly means a building that is divided into separate stalls for individual animals and livestock. There are many different types of stables in use today; the ...
. If stability is required, it is possible to use a second array so elements can be classified sequentially. However, in this case, the algorithm will require additional memory.


See also

*
Interpolation search Interpolation search is an algorithm for Search algorithm, searching for a key in an array that has been Collation, ordered by numerical values assigned to the keys (''key values''). It was first described by W. W. Peterson in 1957. Interpolation ...
, using the distribution of items for
searching Searching or search may refer to: Computing technology * Search algorithm, including keyword search ** :Search algorithms * Search and optimization for problem solving in artificial intelligence * Search engine technology, software for findi ...
rather than sorting


References


External links


Implementations of Randomized Sorting on Large Parallel Machines (1992)

Implementation of Parallel Algorithms (1992)

Visualization of Flashsort
{{sorting Sorting algorithms