HOME

TheInfoList



OR:

Burstsort and its variants are cache-efficient algorithms for sorting
strings String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
. They are variants of the traditional
radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
but faster for large
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
s of common strings, first published in 2003, with some optimizing versions published in later years. Burstsort algorithms use a
trie In computer science, a trie, also called digital tree or prefix tree, is a type of ''k''-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes def ...
to store prefixes of strings, with growable arrays of pointers as end nodes containing sorted, unique, suffixes (referred to as ''buckets''). Some variants copy the string tails into the buckets. As the buckets grow beyond a predetermined threshold, the buckets are "burst" into tries, giving the sort its name. A more recent variant uses a bucket index with smaller sub-buckets to reduce memory usage. Most implementations delegate to multikey quicksort, an extension of three-way radix quicksort, to sort the contents of the buckets. By dividing the input into buckets with common prefixes, the sorting can be done in a cache-efficient manner. Burstsort was introduced as a sort that is similar to MSD radix sort, but is faster due to being aware of caching and related radixes being stored closer to each other due to specifics of trie structure. It exploits specifics of strings that are usually encountered in real world. And although asymptotically it is the same as radix sort, with time complexity of (''w'' – word length and ''n'' – number of strings to be sorted), but due to better memory distribution it tends to be twice as fast on big data sets of strings. It has been billed as the "fastest known algorithm to sort large sets of strings".


References

* A burstsort derivative (C-burstsort), faster than burstsort: * The data type used in burstsort: * *


External links

* A burstsort implementation in Java
burstsort4j
*
Judy array Judy is a short form of the name Judith. Judy may refer to: Places * Judy, Kentucky, village in Montgomery County, United States * Judy Woods, woodlands in Bradford, West Yorkshire, England, United Kingdom Animals * Judy (dog) (1936–1950) ...
s are a type of copy burstsort
C implementation
{{sorting String sorting algorithms