Merge algorithms are a family of
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s that take multiple
sorted lists as input and produce a single list as output, containing all the elements of the inputs lists in sorted order. These algorithms are used as
subroutine
In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.
Functions may ...
s in various
sorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a List (computing), list into an Total order, order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. ...
s, most famously
merge sort
In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
.
Application
The merge algorithm plays a critical role in the
merge sort
In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
algorithm, a
comparison-based sorting algorithm. Conceptually, the merge sort algorithm consists of two steps:
#
Recursively
Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
divide the list into sublists of (roughly) equal length, until each sublist contains only one element, or in the case of iterative (bottom up) merge sort, consider a list of ''n'' elements as ''n'' sub-lists of size 1. A list containing a single element is, by definition, sorted.
# Repeatedly merge sublists to create a new sorted sublist until the single list contains all elements. The single list is the sorted list.
The merge algorithm is used repeatedly in the merge sort algorithm.
An example merge sort is given in the illustration. It starts with an unsorted array of 7 integers. The array is divided into 7 partitions; each partition contains 1 element and is sorted. The sorted partitions are then merged to produce larger, sorted, partitions, until 1 partition, the sorted array, is left.
Merging two lists
Merging two sorted lists into one can be done in
linear time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
and linear or constant space (depending on the data access model). The following
pseudocode
In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...
demonstrates an algorithm that merges input lists (either
linked list
In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...
s or
arrays
An array is a systematic arrangement of similar objects, usually in rows and columns.
Things called an array include:
{{TOC right
Music
* In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
) and into a new list .
The function yields the first element of a list; "dropping" an element means removing it from its list, typically by incrementing a pointer or index.
algorithm merge(A, B) is
inputs A, B : list
returns list
C := new empty list
while A is not empty and B is not empty do
if head(A) ≤ head(B) then
append head(A) to C
drop the head of A
else
append head(B) to C
drop the head of B
''// By now, either A or B is empty. It remains to empty the other input list.''
while A is not empty do
append head(A) to C
drop the head of A
while B is not empty do
append head(B) to C
drop the head of B
return C
When the inputs are linked lists, this algorithm can be implemented to use only a constant amount of working space; the pointers in the lists' nodes can be reused for bookkeeping and for constructing the final merged list.
In the merge sort algorithm, this
subroutine
In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.
Functions may ...
is typically used to merge two sub-arrays , of a single array . This can be done by copying the sub-arrays into a temporary array, then applying the merge algorithm above. The allocation of a temporary array can be avoided, but at the expense of speed and programming ease. Various in-place merge algorithms have been devised, sometimes sacrificing the linear-time bound to produce an algorithm; see for discussion.
K-way merging
-way merging generalizes binary merging to an arbitrary number of sorted input lists. Applications of -way merging arise in various sorting algorithms, including
patience sorting
In computer science, patience sorting is a sorting algorithm inspired by, and named after, the card game patience. A variant of the algorithm efficiently computes the length of a longest increasing subsequence in a given array.
Overview
The algor ...
and an
external sorting
External sorting is a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in t ...
algorithm that divides its input into blocks that fit in memory, sorts these one by one, then merges these blocks.
Several solutions to this problem exist. A naive solution is to do a loop over the lists to pick off the minimum element each time, and repeat this loop until all lists are empty:
* Input: a list of lists.
* While any of the lists is non-empty:
** Loop over the lists to find the one with the minimum first element.
** Output the minimum element and remove it from its list.
In the worst case, this algorithm performs element comparisons to perform its work if there are a total of elements in the lists.
It can be improved by storing the lists in a
priority queue (
min-heap) keyed by their first element:
* Build a min-heap of the lists, using the first element as the key.
* While any of the lists is non-empty:
** Let .
** Output the first element of list and remove it from its list.
** Re-heapify .
Searching for the next smallest element to be output (find-min) and restoring heap order can now be done in time (more specifically, comparisons), and the full problem can be solved in time (approximately comparisons).
A third algorithm for the problem is a
divide and conquer solution that builds on the binary merge algorithm:
* If , output the single input list.
* If , perform a binary merge.
* Else, recursively merge the first lists and the final lists, then binary merge these.
When the input lists to this algorithm are ordered by length, shortest first, it requires fewer than comparisons, i.e., less than half the number used by the heap-based algorithm; in practice, it may be about as fast or slow as the heap-based algorithm.
Parallel merge
A
parallel
Parallel is a geometric term of location which may refer to:
Computing
* Parallel algorithm
* Parallel computing
* Parallel metaheuristic
* Parallel (software), a UNIX utility for running programs in parallel
* Parallel Sysplex, a cluster of ...
version of the binary merge algorithm can serve as a building block of a
parallel merge sort. The following pseudocode demonstrates this algorithm in a
parallel divide-and-conquer style (adapted from Cormen ''et al.''
). It operates on two sorted arrays and and writes the sorted output to array . The notation denotes the part of from index through , exclusive.
algorithm merge(A
...j B
...ℓ C
...q is
inputs A, B, C : array
i, j, k, ℓ, p, q : indices
let m = j - i,
n = ℓ - k
if m < n then
swap A and B ''// ensure that A is the larger array: i, j still belong to A; k, ℓ to B''
swap m and n
if m ≤ 0 then
return ''// base case, nothing to merge''
let r = ⌊(i + j)/2⌋
let s = binary-search(A
B
...ℓ
let t = p + (r - i) + (s - k)
C
= A
in parallel do
merge(A
...r B
...s C
...t
merge(A
+1...j B
...ℓ C
+1...q
The algorithm operates by splitting either or , whichever is larger, into (nearly) equal halves. It then splits the other array into a part with values smaller than the midpoint of the first, and a part with larger or equal values. (The
binary search
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
subroutine returns the index in where would be, if it were in ; that this always a number between and .) Finally, each pair of halves is merged
recursively
Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
, and since the recursive calls are independent of each other, they can be done in parallel. Hybrid approach, where serial algorithm is used for recursion base case has been shown to perform well in practice
The
work
Work may refer to:
* Work (human activity), intentional activity people perform to support themselves, others, or the community
** Manual labour, physical work done by humans
** House work, housework, or homemaking
** Working animal, an animal t ...
performed by the algorithm for two arrays holding a total of elements, i.e., the running time of a serial version of it, is . This is optimal since elements need to be copied into . To calculate the
span
Span may refer to:
Science, technology and engineering
* Span (unit), the width of a human hand
* Span (engineering), a section between two intermediate supports
* Wingspan, the distance between the wingtips of a bird or aircraft
* Sorbitan ester ...
of the algorithm, it is necessary to derive a
Recurrence relation
In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a parameter ...
. Since the two recursive calls of ''merge'' are in parallel, only the costlier of the two calls needs to be considered. In the worst case, the maximum number of elements in one of the recursive calls is at most
since the array with more elements is perfectly split in half. Adding the
cost of the Binary Search, we obtain this recurrence as an upper bound:
The solution is
, meaning that it takes that much time on an ideal machine with an unbounded number of processors.
Note: The routine is not
stable
A stable is a building in which livestock, especially horses, are kept. It most commonly means a building that is divided into separate stalls for individual animals and livestock. There are many different types of stables in use today; the ...
: if equal items are separated by splitting and , they will become interleaved in ; also swapping and will destroy the order, if equal items are spread among both input arrays. As a result, when used for sorting, this algorithm produces a sort that is not stable.
Parallel merge of two lists
There are also algorithms that introduce parallelism within a single instance of merging of two sorted lists. These can be used in field-programmable gate arrays (
FPGA
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware de ...
s), specialized sorting circuits, as well as in modern processors with single-instruction multiple-data (
SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
) instructions.
Existing parallel algorithms are based on modifications of the merge part of either the
bitonic sorter
Bitonic mergesort is a parallel algorithm for sorting. It is also used as a construction method for building a sorting network. The algorithm was devised by Ken Batcher. The resulting sorting networks consist of O(n\log^2(n)) comparators and ha ...
or
odd-even mergesort.
In 2018, Saitoh M. et al. introduced MMS for FPGAs, which focused on removing a multi-cycle feedback datapath that prevented efficient pipelining in hardware. Also in 2018, Papaphilippou P. et al. introduced FLiMS
that improved the hardware utilization and performance by only requiring
pipeline stages of compare-and-swap units to merge with a parallelism of elements per FPGA cycle.
Language support
Some
computer language
A computer language is a formal language used to communicate with a computer. Types of computer languages include:
* Construction language – all forms of communication by which a human can specify an executable problem solution to a compu ...
s provide built-in or library support for merging sorted
collections
Collection or Collections may refer to:
* Cash collection, the function of an accounts receivable department
* Collection (church), money donated by the congregation during a church service
* Collection agency, agency to collect cash
* Collection ...
.
C++
The
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
's
Standard Template Library
The Standard Template Library (STL) is a Library (computer science), software library originally designed by Alexander Stepanov for the C++ programming language that influenced many parts of the C++ Standard Library. It provides four components ...
has the function , which merges two sorted ranges of
iterator
In computer programming, an iterator is an object that enables a programmer to traverse a container, particularly lists. Various types of iterators are often provided via a container's interface. Though the interface and semantics of a given iterat ...
s, and , which merges two consecutive sorted ranges ''in-place''. In addition, the (linked list) class has its own method which merges another list into itself. The type of the elements merged must support the less-than () operator, or it must be provided with a custom comparator.
C++17 allows for differing execution policies, namely sequential, parallel, and parallel-unsequenced.
Python
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
's standard library (since 2.6) also has a function in the module, that takes multiple sorted iterables, and merges them into a single iterator.
See also
*
Merge (revision control)
In version control, merging (also called integration) is a fundamental operation that reconciles multiple changes made to a version-controlled collection of files. Most often, it is necessary when a file is modified on two independent branches an ...
*
Join (relational algebra)
In database theory, relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it. The theory was introduced by Edgar F. Codd.
The main application of relational algebr ...
*
Join (SQL)
A join clause in SQL – corresponding to a join operation in relational algebra – combines columns from one or more tables into a new table. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, ...
*
Join (Unix)
join is a command in Unix and Unix-like operating systems that merges the lines of two sorted text files based on the presence of a common field. It is similar to the join operator used in relational databases but operating on text files.
Overvi ...
References
Further reading
*
Donald Knuth
Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
. ''
The Art of Computer Programming
''The Art of Computer Programming'' (''TAOCP'') is a comprehensive monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. Volumes 1–5 are intended to represent the central core of compu ...
'', Volume 3: ''Sorting and Searching'', Third Edition. Addison-Wesley, 1997. . Pages 158–160 of section 5.2.4: Sorting by Merging. Section 5.3.2: Minimum-Comparison Merging, pp. 197–207.
External links
High Performance Implementationof Parallel and Serial Merge in
C# with source i
GitHuband in
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
br>
GitHub
{{DEFAULTSORT:Merge Algorithm
Articles with example pseudocode
Sorting algorithms