Merge-insert Sort
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, merge-insertion sort or the Ford–Johnson algorithm is a
comparison sort A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should occ ...
ing algorithm published in 1959 by
L. R. Ford Jr. Lester Randolph Ford Jr. (September 23, 1927 – February 26, 2017) was an American mathematician specializing in network flow problems. He was the son of mathematician Lester R. Ford Sr. Ford's paper with D. R. Fulkerson on the maximum flow p ...
and
Selmer M. Johnson Selmer Martin Johnson (21 May 1916 – 26 June 1996) was an American mathematician, a researcher at the RAND Corporation. Biography Johnson was born on May 21, 1916, in Buhl, Minnesota. He earned a B.A. and then an M.A. in mathematics from the U ...
. It uses fewer comparisons in the
worst case In computer science, best, worst, and average cases of a given algorithm express what the resource usage is ''at least'', ''at most'' and ''on average'', respectively. Usually the resource being considered is running time, i.e. time complexity, b ...
than the best previously known algorithms, binary insertion sort and
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
, and for 20 years it was the sorting algorithm with the fewest known comparisons. Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons. The same algorithm may have also been independently discovered by Stanisław Trybuła and Czen Ping.


Algorithm

Merge-insertion sort performs the following steps, on an input X of n elements: #Group the elements of X into \lfloor n/2\rfloor pairs of elements, arbitrarily, leaving one element unpaired if there is an odd number of elements. #Perform \lfloor n/2\rfloor comparisons, one per pair, to determine the larger of the two elements in each pair. #Recursively sort the \lfloor n/2\rfloor larger elements from each pair, creating a sorted sequence S of \lfloor n/2\rfloor of the input elements, in ascending order. #Insert at the start of S the element that was paired with the first and smallest element of S. #Insert the remaining \lceil n/2\rceil-1 elements of X\setminus S into S, one at a time, with a specially chosen insertion ordering described below. Use
binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
in subsequences of S (as described below) to determine the position at which each element should be inserted. The algorithm is designed to take advantage of the fact that the binary searches used to insert elements into S are most efficient (from the point of view of worst case analysis) when the length of the subsequence that is searched is one less than a
power of two A power of two is a number of the form where is an integer, that is, the result of exponentiation with number two as the base and integer  as the exponent. In a context where only integers are considered, is restricted to non-negative ...
. This is because, for those lengths, all outcomes of the search use the same number of comparisons as each other. To choose an insertion ordering that produces these lengths, consider the sorted sequence S after step 4 of the outline above (before inserting the remaining elements), and let x_i denote the ith element of this sorted sequence. Thus, :S=(x_1,x_2,x_3,\dots), where each element x_i with i\ge 3 is paired with an element y_i < x_i that has not yet been inserted. (There are no elements y_1 or y_2 because x_1 and x_2 were paired with each other.) If n is odd, the remaining unpaired element should also be numbered as y_i with i larger than the indexes of the paired elements. Then, the final step of the outline above can be expanded into the following steps: *Partition the uninserted elements y_i into groups with contiguous indexes. There are two elements y_3 and y_4 in the first group, and the sums of sizes of every two adjacent groups form a sequence of powers of two. Thus, the sizes of groups are: 2, 2, 6, 10, 22, 42, ... *Order the uninserted elements by their groups (smaller indexes to larger indexes), but within each group order them from larger indexes to smaller indexes. Thus, the ordering becomes ::y_4,y_3,y_6,y_5,y_,y_,y_,y_9,y_8,y_7,y_,y_\dots *Use this ordering to insert the elements y_i into S. For each element y_i, use a binary search from the start of S up to but not including x_i to determine where to insert y_i.


Analysis

Let C(n) denote the number of comparisons that merge-insertion sort makes, in the worst case, when sorting n elements. This number of comparisons can be broken down as the sum of three terms: *\lfloor n/2\rfloor comparisons among the pairs of items, *C(\lfloor n/2\rfloor) comparisons for the recursive call, and *some number of comparisons for the binary insertions used to insert the remaining elements. In the third term, the worst-case number of comparisons for the elements in the first group is two, because each is inserted into a subsequence of S of length at most three. First, y_4 is inserted into the three-element subsequence (x_1,x_2,x_3). Then, y_3 is inserted into some permutation of the three-element subsequence (x_1,x_2,y_4), or in some cases into the two-element subsequence (x_1,x_2). Similarly, the elements y_6 and y_5 of the second group are each inserted into a subsequence of length at most seven, using three comparisons. More generally, the worst-case number of comparisons for the elements in the ith group is i+1, because each is inserted into a subsequence of length at most 2^-1. By summing the number of comparisons used for all the elements and solving the resulting
recurrence relation In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a parameter ...
, this analysis can be used to compute the values of C(n), giving the formula :C(n)=\sum_^n \left\lceil \log_2 \frac \right\rceil \approx n\log_2 n - 1.415n or, in closed form, :C(n)=n\biggl\lceil\log_2\frac\biggr\rceil-\biggl\lfloor\frac\biggr\rfloor+\biggl\lfloor\frac\biggr\rfloor. For n=1,2,\dots the numbers of comparisons are :0, 1, 3, 5, 7, 10, 13, 16, 19, 22, 26, 30, 34, ...


Relation to other comparison sorts

The algorithm is called merge-insertion sort because the initial comparisons that it performs before its recursive call (pairing up arbitrary items and comparing each pair) are the same as the initial comparisons of
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
, while the comparisons that it performs after the recursive call (using binary search to insert elements one by one into a sorted list) follow the same principle as
insertion sort Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Ho ...
. In this sense, it is a
hybrid algorithm {{Unreferenced, date=May 2014 A hybrid algorithm is an algorithm that combines two or more other algorithms that solve the same problem, and is mostly used in programming languages like C++, either choosing one (depending on the data), or switching ...
that combines both merge sort and insertion sort., p. 184: "Since it involves some aspects of merging and some aspects of insertion, we call it ''merge insertion''." For small inputs (up to n=11) its numbers of comparisons equal the
lower bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is greater than or equal to every element of . Dually, a lower bound or minorant of is defined to be an element ...
on comparison sorting of \lceil\log_2 n!\rceil\approx n\log_2 n - 1.443n. However, for larger inputs the number of comparisons made by the merge-insertion algorithm is bigger than this lower bound. Merge-insertion sort also performs fewer comparisons than the
sorting number In mathematics and computer science, the sorting numbers are a sequence of numbers introduced in 1950 by Hugo Steinhaus for the analysis of comparison sort algorithms. These numbers give the worst-case number of comparisons used by both binary ins ...
s, which count the comparisons made by binary insertion sort or merge sort in the worst case. The sorting numbers fluctuate between n\log_2 n - 0.915n and n\log_2 n - n, with the same leading term but a worse constant factor in the lower-order linear term. Merge-insertion sort is the sorting algorithm with the minimum possible comparisons for n items whenever n\le 15 or 20\le n\le 22, and it has the fewest comparisons known for n\le 46. For 20 years, merge-insertion sort was the sorting algorithm with the fewest comparisons known for all input lengths. However, in 1979 Glenn Manacher published another sorting algorithm that used even fewer comparisons, for large enough inputs. It remains unknown exactly how many comparisons are needed for sorting, for all n, but Manacher's algorithm and later record-breaking sorting algorithms have all used modifications of the merge-insertion sort ideas.


References

{{Sorting Comparison sorts 1959 in computing Sorting algorithms