HOME

TheInfoList



OR:

Library sort, or gapped insertion sort is a
sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a List (computing), list into an Total order, order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. ...
that uses an
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
, but with gaps in the array to accelerate subsequent insertions. The name comes from an analogy:
Suppose a librarian were to store their books alphabetically on a long shelf, starting with the As at the left end, and continuing to the right along the shelf with no spaces between the books until the end of the Zs. If the librarian acquired a new book that belongs to the B section, once they find the correct space in the B section, they will have to move every book over, from the middle of the Bs all the way down to the Zs in order to make room for the new book. This is an insertion sort. However, if they were to leave a space after every letter, as long as there was still space after B, they would only have to move a few books to make room for the new one. This is the basic principle of the Library Sort.
The algorithm was proposed by Michael A. Bender,
Martín Farach-Colton Martin Farach-Colton is an American computer scientist, known for his work in streaming algorithms, suffix tree construction, pattern matching in compressed data, cache-oblivious algorithms, and lowest common ancestor data structures. He is ...
, and
Miguel Mosteiro --> Miguel is a given name and surname, the Portuguese and Spanish form of the Hebrew name Michael. It may refer to: Places *Pedro Miguel, a parish in the municipality of Horta and the island of Faial in the Azores Islands * São Miguel (disamb ...
in 2004 and was published in 2006. Like the insertion sort it is based on, library sort is a
comparison sort A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should occ ...
; however, it was shown to have a high probability of running in O(n log n) time (comparable to
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
), rather than an insertion sort's O(n2). There is no full implementation given in the paper, nor the exact algorithms of important parts, such as insertion and rebalancing. Further information would be needed to discuss how the efficiency of library sort compares to that of other sorting methods in reality. Compared to basic insertion sort, the drawback of library sort is that it requires extra space for the gaps. The amount and distribution of that space would be implementation dependent. In the paper the size of the needed array is ''(1 + ε)n'', but with no further recommendations on how to choose ε. Moreover, it is neither adaptive nor stable. In order to warrant the with-high-probability time bounds, it requires to randomly permute the input, what changes the relative order of equal elements and shuffles any presorted input. Also, the algorithm uses binary search to find the insertion point for each element, which does not take profit of presorted input. Another drawback is that it cannot be run as an
online algorithm In computer science, an online algorithm is one that can process its input piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start. In contrast, an o ...
, because it is not possible to randomly shuffle the input. If used without this shuffling, it could easily degenerate into quadratic behaviour. One weakness of
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
is that it may require a high number of swap operations and be costly if memory write is expensive. Library sort may improve that somewhat in the insertion step, as fewer elements need to move to make room, but is also adding an extra cost in the rebalancing step. In addition, locality of reference will be poor compared to
mergesort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
as each insertion from a random data set may access memory that is no longer in cache, especially with large data sets.


Implementation


Algorithm

Let us say we have an array of n elements. We choose the gap we intend to give. Then we would have a final array of size (1 + ε)n. The algorithm works in log n rounds. In each round we insert as many elements as there are in the final array already, before re-balancing the array. For finding the position of inserting, we apply Binary Search in the final array and then swap the following elements till we hit an empty space. Once the round is over, we re-balance the final array by inserting spaces between each element. Following are three important steps of the algorithm: # Binary Search: Finding the position of insertion by applying binary search within the already inserted elements. This can be done by linearly moving towards left or right side of the array if you hit an empty space in the middle element. # Insertion: Inserting the element in the position found and swapping the following elements by 1 position till an empty space is hit. This is done in logarithmic time, with high probability. # Re-Balancing: Inserting spaces between each pair of elements in the array. The cost of rebalancing is linear in the number of elements already inserted. As these lengths increase with the powers of 2 for each round, the total cost of rebalancing is also linear.


Pseudocode

procedure rebalance(A, begin, end) is r ← end w ← end ÷ 2 while r ≥ begin do A +1← gap A ← A r ← r − 1 w ← w − 2 procedure sort(A) is n ← length(A) S ← new array of n gaps for i ← 1 to floor(log2(n) + 1) do for j ← 2^i to 2^(i + 1) do ins ← binarysearch(A S, 2^(i − 1)) insert A at S ns Here, binarysearch(el, A, k) performs
binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
in the first elements of , skipping over gaps, to find a place where to locate element . Insertion should favor gaps over filled-in elements.


References


External links


Gapped Insertion Sort
{{sorting Sorting algorithms Comparison sorts Stable sorts Online sorts