Binary Insertion Sort
   HOME

TheInfoList



OR:

Insertion sort is a simple
sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a List (computing), list into an Total order, order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. ...
that builds the final
sorted array A sorted array is an array data structure in which each element is sorted in numerical, alphabetical, or some other order, and placed at equally spaced addresses in computer memory. It is typically used in computer science to implement static l ...
(or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
,
heapsort In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the ...
, or
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same i ...
. However, insertion sort provides several advantages: * Simple implementation: Jon Bentley shows a three-line C/
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
version that is five lines when optimized. * Efficient for (quite) small data sets, much like other quadratic (i.e., O(''n''2)) sorting algorithms * More efficient in practice than most other simple quadratic algorithms such as
selection sort In computer science, selection sort is an in-place comparison sorting algorithm. It has an O(''n''2) time complexity, which makes it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is not ...
or
bubble sort Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the input list element by element, comparing the current element with the one after it, swapping their values if needed. These passes ...
*
Adaptive Adaptation, in biology, is the process or trait by which organisms or population better match their environment Adaptation may also refer to: Arts * Adaptation (arts), a transfer of a work of art from one medium to another ** Film adaptation, a ...
, i.e., efficient for data sets that are already substantially sorted: the
time complexity In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
is O(''kn'') when each element in the input is no more than places away from its sorted position *
Stable A stable is a building in which livestock, especially horses, are kept. It most commonly means a building that is divided into separate stalls for individual animals and livestock. There are many different types of stables in use today; the ...
; i.e., does not change the relative order of elements with equal keys *
In-place In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the output ...
; i.e., only requires a constant amount O(1) of additional memory space *
Online In computer technology and telecommunications, online indicates a state of connectivity and offline indicates a disconnected state. In modern terminology, this usually refers to an Internet connection, but (especially when expressed "on line" or ...
; i.e., can sort a list as it receives it When people manually sort cards in a
bridge A bridge is a structure built to span a physical obstacle (such as a body of water, valley, road, or rail) without blocking the way underneath. It is constructed for the purpose of providing passage over the obstacle, which is usually somethi ...
hand, most use a method that is similar to insertion sort.


Algorithm

Insertion sort iterates, consuming one input element each repetition, and grows a sorted output list. At each iteration, insertion sort removes one element from the input data, finds the location it belongs within the sorted list, and inserts it there. It repeats until no input elements remain. Sorting is typically done in-place, by iterating up the array, growing the sorted list behind it. At each array-position, it checks the value there against the largest value in the sorted list (which happens to be next to it, in the previous array-position checked). If larger, it leaves the element in place and moves to the next. If smaller, it finds the correct position within the sorted list, shifts all the larger values up to make a space, and inserts into that correct position. The resulting array after ''k'' iterations has the property where the first ''k'' + 1 entries are sorted ("+1" because the first entry is skipped). In each iteration the first remaining entry of the input is removed, and inserted into the result at the correct position, thus extending the result: becomes with each element greater than ''x'' copied to the right as it is compared against ''x''. The most common variant of insertion sort, which operates on arrays, can be described as follows: # Suppose there exists a function called ''Insert'' designed to insert a value into a sorted sequence at the beginning of an array. It operates by beginning at the end of the sequence and shifting each element one place to the right until a suitable position is found for the new element. The function has the side effect of overwriting the value stored immediately after the sorted sequence in the array. # To perform an insertion sort, begin at the left-most element of the array and invoke ''Insert'' to insert each element encountered into its correct position. The ordered sequence into which the element is inserted is stored at the beginning of the array in the set of indices already examined. Each insertion overwrites a single value: the value being inserted.
Pseudocode In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...
of the complete algorithm follows, where the arrays are zero-based: i ← 1 while i < length(A) j ← i while j > 0 and A -1> A swap A and A -1 j ← j - 1 end while i ← i + 1 end while The outer loop runs over all the elements except the first one, because the single-element prefix A :1/code> is trivially sorted, so the
invariant Invariant and invariance may refer to: Computer science * Invariant (computer science), an expression whose value doesn't change during program execution ** Loop invariant, a property of a program loop that is true before (and after) each iteratio ...
that the first i entries are sorted is true from the start. The inner loop moves element A /code> to its correct place so that after the loop, the first i+1 elements are sorted. Note that the and-operator in the test must use
short-circuit evaluation A short circuit (sometimes abbreviated to short or s/c) is an electrical circuit that allows a current to travel along an unintended path with no or very low electrical impedance. This results in an excessive current flowing through the circuit. ...
, otherwise the test might result in an array bounds error, when j=0 and it tries to evaluate A -1> A /code> (i.e. accessing A 1/code> fails). After expanding the swap operation in-place as x ← A A ← A -1 A -1← x (where x is a temporary variable), a slightly faster version can be produced that moves A /code> to its position in one go and only performs one assignment in the inner loop body: i ← 1 while i < length(A) x ← A j ← i - 1 while j >= 0 and A > x A +1← A j ← j - 1 end while A +1← x i ← i + 1 end while The new inner loop shifts elements to the right to clear a spot for x = A /code>. The algorithm can also be implemented in a recursive way. The recursion just replaces the outer loop, calling itself and storing successively smaller values of ''n'' on the stack until ''n'' equals 0, where the function then returns up the call chain to execute the code after each recursive call starting with ''n'' equal to 1, with ''n'' increasing by 1 as each instance of the function returns to the prior instance. The initial call would be ''insertionSortR(A, length(A)-1)''. function insertionSortR(array A, int n) if n > 0 insertionSortR(A, n-1) x ← A j ← n-1 while j >= 0 and A > x A +1← A j ← j-1 end while A +1← x end if end function It does not make the code any shorter, it also doesn't reduce the execution time, but it increases the additional memory consumption from to (at the deepest level of recursion the stack contains references to the array, each with accompanying value of variable from down to 1).


Best, worst, and average cases

The best case input is an array that is already sorted. In this case insertion sort has a linear running time (i.e., O(''n'')). During each iteration, the first remaining element of the input is only compared with the right-most element of the sorted subsection of the array. The simplest worst case input is an array sorted in reverse order. The set of all worst case inputs consists of all arrays where each element is the smallest or second-smallest of the elements before it. In these cases every iteration of the inner loop will scan and shift the entire sorted subsection of the array before inserting the next element. This gives insertion sort a quadratic running time (i.e., O(''n''2)). The average case is also quadratic, which makes insertion sort impractical for sorting large arrays. However, insertion sort is one of the fastest algorithms for sorting very small arrays, even faster than
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
; indeed, good
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
implementations use insertion sort for arrays smaller than a certain threshold, also when arising as subproblems; the exact threshold must be determined experimentally and depends on the machine, but is commonly around ten. Example: The following table shows the steps for sorting the sequence . In each step, the key under consideration is underlined. The key that was moved (or left in place because it was the biggest yet considered) in the previous step is marked with an asterisk. 3 7 4 9 5 2 6 1 3* 7 4 9 5 2 6 1 3 7* 4 9 5 2 6 1 3 4* 7 9 5 2 6 1 3 4 7 9* 5 2 6 1 3 4 5* 7 9 2 6 1 2* 3 4 5 7 9 6 1 2 3 4 5 6* 7 9 1 1* 2 3 4 5 6 7 9


Relation to other sorting algorithms

Insertion sort is very similar to
selection sort In computer science, selection sort is an in-place comparison sorting algorithm. It has an O(''n''2) time complexity, which makes it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is not ...
. As in selection sort, after ''k'' passes through the array, the first ''k'' elements are in sorted order. However, the fundamental difference between the two algorithms is that insertion sort scans backwards from the current key, while selection sort scans forwards. This results in selection sort making the first k elements the ''k'' smallest elements of the unsorted input, while in insertion sort they are simply the first ''k'' elements of the input. The primary advantage of insertion sort over selection sort is that selection sort must always scan all remaining elements to find the absolute smallest element in the unsorted portion of the list, while insertion sort requires only a single comparison when the (''k'' + 1)-st element is greater than the ''k''-th element; when this is frequently true (such as if the input array is already sorted or partially sorted), insertion sort is distinctly more efficient compared to selection sort. On average (assuming the rank of the (''k'' + 1)-st element rank is random), insertion sort will require comparing and shifting half of the previous ''k'' elements, meaning that insertion sort will perform about half as many comparisons as selection sort on average. In the worst case for insertion sort (when the input array is reverse-sorted), insertion sort performs just as many comparisons as selection sort. However, a disadvantage of insertion sort over selection sort is that it requires more writes due to the fact that, on each iteration, inserting the (''k'' + 1)-st element into the sorted portion of the array requires many element swaps to shift all of the following elements, while only a single swap is required for each iteration of selection sort. In general, insertion sort will write to the array O(''n''2) times, whereas selection sort will write only O() times. For this reason selection sort may be preferable in cases where writing to memory is significantly more expensive than reading, such as with
EEPROM EEPROM (also called E2PROM) stands for electrically erasable programmable read-only memory and is a type of non-volatile memory used in computers, usually integrated in microcontrollers such as smart cards and remote keyless systems, or as a ...
or
flash memory Flash memory is an electronic non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for the NOR and NAND logic gates. Both us ...
. While some
divide-and-conquer algorithm In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved dire ...
s such as
quicksort Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
and
mergesort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same i ...
outperform insertion sort for larger arrays, non-recursive sorting algorithms such as insertion sort or selection sort are generally faster for very small arrays (the exact size varies by environment and implementation, but is typically between 7 and 50 elements). Therefore, a useful optimization in the implementation of those algorithms is a hybrid approach, using the simpler algorithm when the array has been divided to a small size.


Variants

D.L. Shell made substantial improvements to the algorithm; the modified version is called
Shell sort Shellsort, also known as Shell sort or Shell's method, is an in-place comparison sort. It can be seen as either a generalization of sorting by exchange ( bubble sort) or sorting by insertion (insertion sort). The method starts by sorting pairs of ...
. The sorting algorithm compares elements separated by a distance that decreases on each pass. Shell sort has distinctly improved running times in practical work, with two simple variants requiring O(''n''3/2) and O(''n''4/3) running time. If the cost of comparisons exceeds the cost of swaps, as is the case for example with string keys stored by reference or with human interaction (such as choosing one of a pair displayed side-by-side), then using ''binary insertion sort'' may yield better performance. Binary insertion sort employs a
binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
to determine the correct location to insert new elements, and therefore performs ⌈log2 ''n''⌉ comparisons in the worst case. When each element in the array is searched for and inserted this is O(''n'' log ''n''). The algorithm as a whole still has a running time of O(''n''2) on average because of the series of swaps required for each insertion. The number of swaps can be reduced by calculating the position of multiple elements before moving them. For example, if the target position of two elements is calculated before they are moved into the proper position, the number of swaps can be reduced by about 25% for random data. In the extreme case, this variant works similar to
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same i ...
. A variant named ''binary merge sort'' uses a ''binary insertion sort'' to sort groups of 32 elements, followed by a final sort using
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same i ...
. It combines the speed of insertion sort on small data sets with the speed of merge sort on large data sets. To avoid having to make a series of swaps for each insertion, the input could be stored in a
linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...
, which allows elements to be spliced into or out of the list in constant time when the position in the list is known. However, searching a linked list requires sequentially following the links to the desired position: a linked list does not have random access, so it cannot use a faster method such as binary search. Therefore, the running time required for searching is O(''n''), and the time for sorting is O(''n''2). If a more sophisticated
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
(e.g.,
heap Heap or HEAP may refer to: Computing and mathematics * Heap (data structure), a data structure commonly used to implement a priority queue * Heap (mathematics), a generalization of a group * Heap (programming) (or free store), an area of memory f ...
or
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
) is used, the time required for searching and insertion can be reduced significantly; this is the essence of
heap sort In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the ...
and
binary tree sort A tree sort is a sort algorithm that builds a binary search tree from the elements to be sorted, and then traverses the tree ( in-order) so that the elements come out in sorted order. Its typical use is sorting elements online: after each insert ...
. In 2006 Bender,
Martin Farach-Colton Martin Farach-Colton is an American computer scientist, known for his work in streaming algorithms, suffix tree construction, pattern matching in compressed data, cache-oblivious algorithms, and lowest common ancestor data structures. He is a D ...
, and Mosteiro published a new variant of insertion sort called ''
library sort Library sort, or gapped insertion sort is a sorting algorithm that uses an insertion sort, but with gaps in the array to accelerate subsequent insertions. The name comes from an analogy: Suppose a librarian were to store their books alphabetically ...
'' or ''gapped insertion sort'' that leaves a small number of unused spaces (i.e., "gaps") spread throughout the array. The benefit is that insertions need only shift elements over until a gap is reached. The authors show that this sorting algorithm runs with high probability in O(''n'' log ''n'') time. If a
skip list In computer science, a skip list (or skiplist) is a probabilistic data structure that allows \mathcal(\log n) average complexity for search as well as \mathcal(\log n) average complexity for insertion within an ordered sequence of n elements. ...
is used, the insertion time is brought down to O(log ''n''), and swaps are not needed because the skip list is implemented on a linked list structure. The final running time for insertion would be O(''n'' log ''n'').


List insertion sort code in C

If the items are stored in a linked list, then the list can be sorted with O(1) additional space. The algorithm starts with an initially empty (and therefore trivially sorted) list. The input items are taken off the list one at a time, and then inserted in the proper place in the sorted list. When the input list is empty, the sorted list has the desired result. struct LIST * SortList1(struct LIST * pList) The algorithm below uses a trailing pointer. for the insertion into the sorted list. A simpler recursive method rebuilds the list each time (rather than splicing) and can use O(''n'') stack space. struct LIST ; struct LIST * SortList(struct LIST * pList)


References


Further reading

* .


External links

* – graphical demonstration * . * . * – implementations of insertion sort in C and several other programming languages {{DEFAULTSORT:Insertion Sort Sorting algorithms Comparison sorts Stable sorts Articles with example pseudocode Online sorts no:Sorteringsalgoritme#Innstikksortering