HOME

TheInfoList



OR:

Shellsort, also known as Shell sort or Shell's method, is an
in-place In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the output ...
comparison sort A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should occ ...
. It can be seen as either a generalization of sorting by exchange ( bubble sort) or sorting by insertion (
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
). The method starts by sorting pairs of elements far apart from each other, then progressively reducing the gap between elements to be compared. By starting with far apart elements, it can move some out-of-place elements into position faster than a simple nearest neighbor exchange.
Donald Shell Donald L. Shell (March 1, 1924 – November 2, 2015) was an American computer scientist who designed the Shellsort sorting algorithm. He acquired his Ph.D. in mathematics from the University of Cincinnati in 1959, and published the Shellsort algo ...
published the first version of this sort in 1959. The running time of Shellsort is heavily dependent on the gap sequence it uses. For many practical variants, determining their
time complexity In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
remains an open problem.


Description

Shellsort is an optimization of
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
that allows the exchange of items that are far apart. The idea is to arrange the list of elements so that, starting anywhere, taking every ''h''th element produces a sorted list. Such a list is said to be ''h''-sorted. It can also be thought of as ''h'' interleaved lists, each individually sorted. Beginning with large values of ''h'' allows elements to move long distances in the original list, reducing large amounts of disorder quickly, and leaving less work for smaller ''h''-sort steps to do. If the list is then ''k-sorted'' for some smaller integer ''k'', then the list remains ''h''-sorted. Following this idea for a decreasing sequence of ''h'' values ending in 1 is guaranteed to leave a sorted list in the end. In simplistic terms, this means if we have an array of 1024 numbers, our first gap (''h'') could be 512. We then run through the list comparing each element in the first half to the element in the second half. Our second gap (''k'') is 256, which breaks the array into four sections (starting at 0,256,512,768), and we make sure the first items in each section are sorted relative to each other, then the second item in each section, and so on. In practice the gap sequence could be anything, but the last gap is always 1 to finish the sort (effectively finishing with an ordinary insertion sort). An example run of Shellsort with gaps 5, 3 and 1 is shown below. The first pass, 5-sorting, performs insertion sort on five separate subarrays (''a''1, ''a''6, ''a''11), (''a''2, ''a''7, ''a''12), (''a''3, ''a''8), (''a''4, ''a''9), (''a''5, ''a''10). For instance, it changes the subarray (''a''1, ''a''6, ''a''11) from (62, 17, 25) to (17, 25, 62). The next pass, 3-sorting, performs insertion sort on the three subarrays (''a''1, ''a''4, ''a''7, ''a''10), (''a''2, ''a''5, ''a''8, ''a''11), (''a''3, ''a''6, ''a''9, ''a''12). The last pass, 1-sorting, is an ordinary insertion sort of the entire array (''a''1,..., ''a''12). As the example illustrates, the subarrays that Shellsort operates on are initially short; later they are longer but almost ordered. In both cases insertion sort works efficiently. Shellsort is not
stable A stable is a building in which livestock, especially horses, are kept. It most commonly means a building that is divided into separate stalls for individual animals and livestock. There are many different types of stables in use today; the ...
: it may change the relative order of elements with equal values. It is an adaptive sorting algorithm in that it executes faster when the input is partially sorted.


Pseudocode

Using Marcin Ciura's gap sequence, with an inner insertion sort. # Sort an array a ...n-1 gaps = 01, 301, 132, 57, 23, 10, 4, 1 // Ciura gap sequence # Start with the largest gap and work down to a gap of 1 # similar to insertion sort but instead of 1, gap is being used in each step foreach (gap in gaps)


Gap sequences

The question of deciding which gap sequence to use is difficult. Every gap sequence that contains 1 yields a correct sort (as this makes the final pass an ordinary insertion sort); however, the properties of thus obtained versions of Shellsort may be very different. Too few gaps slows down the passes, and too many gaps produces an overhead. The table below compares most proposed gap sequences published so far. Some of them have decreasing elements that depend on the size of the sorted array (''N''). Others are increasing infinite sequences, whose elements less than ''N'' should be used in reverse order. When the binary representation of ''N'' contains many consecutive zeroes, Shellsort using Shell's original gap sequence makes Θ(''N''2) comparisons in the worst case. For instance, this case occurs for ''N'' equal to a power of two when elements greater and smaller than the median occupy odd and even positions respectively, since they are compared only in the last pass. Although it has higher complexity than the ''O''(''N'' log ''N'') that is optimal for comparison sorts, Pratt's version lends itself to
sorting network In computer science, comparator networks are abstract devices built up of a fixed number of "wires", carrying values, and comparator modules that connect pairs of wires, swapping the values on the wires if they are not in a desired order. Such net ...
s and has the same asymptotic gate complexity as Batcher's
bitonic sorter Bitonic mergesort is a parallel algorithm for sorting. It is also used as a construction method for building a sorting network. The algorithm was devised by Ken Batcher. The resulting sorting networks consist of O(n\log^2(n)) comparators and ha ...
. Gonnet and Baeza-Yates observed that Shellsort makes the fewest comparisons on average when the ratios of successive gaps are roughly equal to 2.2. This is why their sequence with ratio 2.2 and Tokuda's sequence with ratio 2.25 prove efficient. However, it is not known why this is so. Sedgewick recommends using gaps which have low
greatest common divisor In mathematics, the greatest common divisor (GCD) of two or more integers, which are not all zero, is the largest positive integer that divides each of the integers. For two integers ''x'', ''y'', the greatest common divisor of ''x'' and ''y'' is ...
s or are pairwise
coprime In mathematics, two integers and are coprime, relatively prime or mutually prime if the only positive integer that is a divisor of both of them is 1. Consequently, any prime number that divides does not divide , and vice versa. This is equivale ...
. With respect to the average number of comparisons, Ciura's sequence has the best known performance; gaps from 701 were not determined but the sequence can be further extended according to the recursive formula h_k = \lfloor 2.25 h_ \rfloor. Tokuda's sequence, defined by the simple formula h_k = \lceil h'_k \rceil, where h'_k = 2.25 h'_ + 1, h'_1 = 1, can be recommended for practical applications. If the maximum input size is small, as may occur if Shellsort is used on small subarrays by another recursive sorting algorithm such as
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
or
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
, then it is possible to tabulate an optimal sequence for each input size.


Computational complexity

The following property holds: after ''h''2-sorting of any ''h''1-sorted array, the array remains ''h''1-sorted. Every ''h''1-sorted and ''h''2-sorted array is also (''a''1''h''1+''a''2''h''2)-sorted, for any nonnegative integers ''a''1 and ''a''2. The worst-case complexity of Shellsort is therefore connected with the Frobenius problem: for given integers ''h''1,..., ''hn'' with gcd = 1, the Frobenius number ''g''(''h''1,..., ''hn'') is the greatest integer that cannot be represented as ''a''1''h''1+ ... +''anhn'' with nonnegative integer ''a''1,..., ''an''. Using known formulae for Frobenius numbers, we can determine the worst-case complexity of Shellsort for several classes of gap sequences. Proven results are shown in the above table. Mark Allen Weiss proved that Shellsort runs in ''O''(''N'' log ''N'') time when the input array is in reverse order. With respect to the average number of operations, none of the proven results concerns a practical gap sequence. For gaps that are powers of two, Espelid computed this average as 0.5349N\sqrt-0.4387N-0.097\sqrt+O(1). Knuth determined the average complexity of sorting an ''N''-element array with two gaps (''h'', 1) to be \frac + \sqrt. It follows that a two-pass Shellsort with ''h'' = Θ(''N''1/3) makes on average ''O''(''N''5/3) comparisons/inversions/running time. Yao found the average complexity of a three-pass Shellsort. His result was refined by
Janson Janson is the name given to a set of old-style serif typefaces from the Dutch Baroque period, and modern revivals from the twentieth century. Janson is a crisp, relatively high-contrast serif design, most popular for body text. Janson is based ...
and Knuth: the average number of comparisons/inversions/running time made during a Shellsort with three gaps (''ch'', ''cg'', 1), where ''h'' and ''g'' are coprime, is \frac + O(N) in the first pass, \frac\sqrt(h - 1)N^ + O(hN) in the second pass and \psi(h, g)N + \frac\sqrt(c - 1)N^ + O\left((c - 1)gh^N\right) + O\left(c^2g^3h^2\right) in the third pass. ''ψ''(''h'', ''g'') in the last formula is a complicated function asymptotically equal to \sqrtg + O\left(g^h^\right) + O\left(gh^\right). In particular, when ''h'' = Θ(''N''7/15) and ''g'' = Θ(''N''1/5), the average time of sorting is ''O''(''N''23/15). Based on experiments, it is conjectured that Shellsort with Hibbard's gap sequence runs in ''O''(''N''5/4) average time, and that Gonnet and Baeza-Yates's sequence requires on average 0.41''N'' ln ''N'' (ln ln ''N'' + 1/6) element moves. Approximations of the average number of operations formerly put forward for other sequences fail when sorted arrays contain millions of elements. The graph below shows the average number of element comparisons in various variants of Shellsort, divided by the theoretical lower bound, i.e. log2''N''!, where the sequence 1, 4, 10, 23, 57, 132, 301, 701 has been extended according to the formula h_k = \lfloor2.25 h_\rfloor. Applying the theory of
Kolmogorov complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produ ...
, Jiang, Li, and Vitányi proved the following lower bound for the order of the average number of operations/running time in a ''p''-pass Shellsort: Ω(''pN''1+1/''p'') when ''p'' ≤ log2''N'' and Ω(''pN'') when ''p'' > log2''N''. Therefore, Shellsort has prospects of running in an average time that asymptotically grows like ''N'' log''N'' only when using gap sequences whose number of gaps grows in proportion to the logarithm of the array size. It is, however, unknown whether Shellsort can reach this asymptotic order of average-case complexity, which is optimal for comparison sorts. The lower bound was improved by Vitányi for every number of passes p to \Omega ( N\sum_^p h_/h_k ) where h_0=N. This result implies for example the Jiang-Li-Vitányi lower bound for all p-pass increment sequences and improves that lower bound for particular increment sequences. In fact all bounds (lower and upper) currently known for the average case are precisely matched by this lower bound. For example, this gives the new result that the Janson-Knuth upper bound is matched by the resulting lower bound for the used increment sequence, showing that three pass Shellsort for this increment sequence uses \Theta(N^) comparisons/inversions/running time. The formula allows us to search for increment sequences that yield lower bounds which are unknown; for example an increment sequence for four passes which has a lower bound greater than \Omega(pn^) = \Omega(n^) for the increment sequence h_1 = n^, h_2 = n^, h_3 = n^, h_4 = 1. The lower bound becomes T = \Omega(n\cdot (n^+n^+n^+n^) = \Omega(n^) = \Omega(n^). The worst-case complexity of any version of Shellsort is of higher order: Plaxton, Poonen, and Suel showed that it grows at least as rapidly as \Omega\left(N \left( \right)^2\right). Robert Cypher proved a stronger lower bound: \Omega\left(N \right) when h_ > h_s for all s.


Applications

Shellsort performs more operations and has higher cache miss ratio than
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
. However, since it can be implemented using little code and does not use the
call stack In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or ma ...
, some implementations of the
qsort qsort is a C standard library function that implements a polymorphic sorting algorithm for arrays of arbitrary objects according to a user-provided comparison function. It is named after the "quicker sort" algorithm (a quicksort variant due to R ...
function in the
C standard library The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard.ISO/IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the original ANSI C standard, it wa ...
targeted at
embedded systems An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...
use it instead of quicksort. Shellsort is, for example, used in the
uClibc __NOTOC__ In computing, uClibc (sometimes written µClibc) is a small C standard library intended for Linux kernel-based operating systems for embedded systems and mobile devices. uClibc was written to support μClinux, a version of Linux no ...
library. For similar reasons, in the past, Shellsort was used in the
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ope ...
. Shellsort can also serve as a sub-algorithm of introspective sort, to sort short subarrays and to prevent a slowdown when the recursion depth exceeds a given limit. This principle is employed, for instance, in the bzip2 compressor.


See also

*
Comb sort Comb sort is a relatively simple sorting algorithm originally designed by Włodzimierz Dobosiewicz and Artur Borowy in 1980, later rediscovered (and given the name "Combsort") by Stephen Lacey and Richard Box in 1991. Comb sort improves on bubble ...


References


Bibliography

*
Analysis of Shellsort and Related Algorithms
Robert Sedgewick, Fourth European Symposium on Algorithms, Barcelona, September 1996.


External links

* – graphical demonstration
Shellsort with gaps 5, 3, 1 as a Hungarian folk dance
{{DEFAULTSORT:Shellsort Sorting algorithms Comparison sorts