Overview
The heapsort algorithm can be divided into two parts. In the first step, ai
is the index of the current node, then
iParent(i) = floor((i-1) / 2) where floor functions map a real number to the largest leading integer. iLeftChild(i) = 2*i + 1 iRightChild(i) = 2*i + 2In the second step, a sorted array is created by repeatedly removing the largest element from the heap (the root of the heap), and inserting it into the array. The heap is updated after each removal to maintain the heap property. Once all objects have been removed from the heap, the result is a sorted array. Heapsort can be performed in place. The array can be split into two parts, the sorted array and the heap. The storage of heaps as arrays is diagrammed here. The heap's invariant is preserved after each extraction, so the only cost is that of extraction.
Algorithm
The heapsort algorithm involves preparing the list by first turning it into aPseudocode
The following is a simple way to implement the algorithm inswap
is used to exchange two elements of the array. Movement 'down' means from the root towards the leaves, or from lower indices to higher. Note that during the sort, the largest element is at the root of the heap at a /code>, while at the end of the sort, the largest element is in a nd/code>.
procedure heapsort(a, count) is
input: an unordered array ''a'' of length ''count''
''(Build the heap in array a so that largest value is at the root)''
heapify(a, count)
''(The following loop maintains the invariants that a :endis a heap and every element''
''beyond end is greater than everything before it (so a nd:countis in sorted order))''
end ← count - 1
while end > 0 do
''(a is the root and largest value. The swap moves it in front of the sorted elements.)''
swap(a nd a
''(the heap size is reduced by one)''
end ← end - 1
''(the swap ruined the heap property, so restore it)''
siftDown(a, 0, end)
The sorting routine uses two subroutines, heapify
and siftDown
. The former is the common in-place heap construction routine, while the latter is a common subroutine for implementing heapify
.
''(Put elements of 'a' in heap order, in-place)''
procedure heapify(a, count) is
''(start is assigned the index in 'a' of the last parent node)''
''(the last element in a 0-based array is at index count-1; find the parent of that element)''
start ← iParent(count-1)
while start ≥ 0 do
''(sift down the node at index 'start' to the proper place such that all nodes below''
'' the start index are in heap order)''
siftDown(a, start, count - 1)
''(go to the next parent node)''
start ← start - 1
''(after sifting down the root all nodes/elements are in heap order)''
''(Repair the heap whose root element is at index 'start', assuming the heaps rooted at its children are valid)''
procedure siftDown(a, start, end) is
root ← start
while iLeftChild(root) ≤ end do ''(While the root has at least one child)''
child ← iLeftChild(root) ''(Left child of root)''
swap ← root ''(Keeps track of child to swap with)''
if a wap< a hildthen
swap ← child
''(If there is a right child and that child is greater)''
if child+1 ≤ end and a wap< a hild+1then
swap ← child + 1
if swap = root then
''(The root holds the largest element. Since we assume the heaps rooted at the''
'' children are valid, this means that we are done.)''
return
else
Swap(a oot a wap
root ← swap ''(repeat to continue sifting down the child now)''
The heapify
procedure can be thought of as building a heap from the bottom up by successively sifting downward to establish the heap property. An alternative version (shown below) that builds the heap top-down and sifts upward may be simpler to understand. This siftUp
version can be visualized as starting with an empty heap and successively inserting elements, whereas the siftDown
version given above treats the entire input array as a full but "broken" heap and "repairs" it starting from the last non-trivial sub-heap (that is, the last parent node).
Also, the siftDown
version of heapify has time complexity, while the siftUp
version given below has time complexity due to its equivalence with inserting each element, one at a time, into an empty heap.
This may seem counter-intuitive since, at a glance, it is apparent that the former only makes half as many calls to its logarithmic-time sifting function as the latter; i.e., they seem to differ only by a constant factor, which never affects asymptotic analysis.
To grasp the intuition behind this difference in complexity, note that the number of swaps that may occur during any one call ''increases'' with the depth of the node on which the call is made. The crux is that there are many (exponentially many) more "deep" nodes than there are "shallow" nodes in a heap, so that siftUp may have its full logarithmic running-time on the approximately linear number of calls made on the nodes at or near the "bottom" of the heap. On the other hand, the number of swaps that may occur during any one siftDown call ''decreases'' as the depth of the node on which the call is made increases. Thus, when the siftDown
heapify
begins and is calling siftDown
on the bottom and most numerous node-layers, each sifting call will incur, at most, a number of swaps equal to the "height" (from the bottom of the heap) of the node on which the sifting call is made. In other words, about half the calls to will have at most only one swap, then about a quarter of the calls will have at most two swaps, etc.
The heapsort algorithm itself has time complexity using either version of heapify.
procedure heapify(a,count) is
''(end is assigned the index of the first (left) child of the root)''
end := 1
while end < count
''(sift up the node at index end to the proper place such that all nodes above''
'' the end index are in heap order)''
siftUp(a, 0, end)
end := end + 1
''(after sifting up the last node all nodes are in heap order)''
procedure siftUp(a, start, end) is
input: ''start represents the limit of how far up the heap to sift.''
''end is the node to sift up.''
child := end
while child > start
parent := iParent(child)
if a arent< a hildthen ''(out of max-heap order)''
swap(a arent a hild
child := parent ''(repeat to continue sifting up the parent now)''
else
return
Note that unlike siftDown
approach where, after each swap, you need to call only the siftDown
subroutine to fix the broken heap; the siftUp
subroutine alone cannot fix the broken heap. The heap needs to be built every time after a swap by calling the heapify
procedure since "siftUp" assumes that the element getting swapped ends up in its final place, as opposed to "siftDown" allows for continuous adjustments of items lower in the heap until the invariant is satisfied. The adjusted pseudocode for using siftUp
approach is given below.
procedure heapsort(a, count) is
input: an unordered array ''a'' of length ''count''
''(Build the heap in array a so that largest value is at the root)''
heapify(a, count)
''(The following loop maintains the invariants that a :endis a heap and every element''
''beyond end is greater than everything before it (so a nd:countis in sorted order))''
end ← count - 1
while end > 0 do
''(a is the root and largest value. The swap moves it in front of the sorted elements.)''
swap(a nd a
''(rebuild the heap using siftUp after the swap ruins the heap property)''
heapify(a, end)
''(reduce the heap size by one)''
end ← end - 1
Variations
Floyd's heap construction
The most important variation to the basic algorithm, which is included in all practical implementations, is a heap-construction algorithm by Floyd which runs in time and uses siftdown rather than siftup, avoiding the need to implement siftup at all.
Rather than starting with a trivial heap and repeatedly adding leaves, Floyd's algorithm starts with the leaves, observing that they are trivial but valid heaps by themselves, and then adds parents. Starting with element and working backwards, each internal node is made the root of a valid heap by sifting down. The last step is sifting down the first element, after which the entire array obeys the heap property.
The worst-case number of comparisons during the Floyd's heap-construction phase of heapsort is known to be equal to , where is the number of 1 bits in the binary representation of and is number of trailing 0 bits.
The standard implementation of Floyd's heap-construction algorithm causes a large number of cache misses once the size of the data exceeds that of the CPU cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, wh ...
. Much better performance on large data sets can be obtained by merging in depth-first order, combining subheaps as soon as possible, rather than combining all subheaps on one level before proceeding to the one above.Alternate PDF source
Bottom-up heapsort
Bottom-up heapsort is a variant which reduces the number of comparisons required by a significant factor. While ordinary heapsort requires comparisons worst-case and on average, the bottom-up variant requires comparisons on average, and in the worst case.
If comparisons are cheap (e.g. integer keys) then the difference is unimportant, as top-down heapsort compares values that have already been loaded from memory. If, however, comparisons require a function call or other complex logic, then bottom-up heapsort is advantageous.
This is accomplished by improving the siftDown
procedure. The change improves the linear-time heap-building phase somewhat, but is more significant in the second phase. Like ordinary heapsort, each iteration of the second phase extracts the top of the heap, , and fills the gap it leaves with , then sifts this latter element down the heap. But this element comes from the lowest level of the heap, meaning it is one of the smallest elements in the heap, so the sift-down will likely take many steps to move it back down. In ordinary heapsort, each step of the sift-down requires two comparisons, to find the minimum of three elements: the new node and its two children.
Bottom-up heapsort instead finds the path of largest children to the leaf level of the tree (as if it were inserting −∞) using only one comparison per level. Put another way, it finds a leaf which has the property that it and all of its ancestors are greater than or equal to their siblings. (In the absence of equal keys, this leaf is unique.) Then, from this leaf, it searches ''upward'' (using one comparison per level) for the correct position in that path to insert . This is the same location as ordinary heapsort finds, and requires the same number of exchanges to perform the insert, but fewer comparisons are required to find that location. Also available as
Because it goes all the way to the bottom and then comes back up, it is called heapsort with bounce by some authors.
function leafSearch(a, i, end) is
j ← i
while iRightChild(j) ≤ end do
''(Determine which of j's two children is the greater)''
if a RightChild(j)> a LeftChild(j)then
j ← iRightChild(j)
else
j ← iLeftChild(j)
''(At the last level, there might be only one child)''
if iLeftChild(j) ≤ end then
j ← iLeftChild(j)
return j
The return value of the leafSearch
is used in the modified siftDown
routine:
procedure siftDown(a, i, end) is
j ← leafSearch(a, i, end)
while a > a do
j ← iParent(j)
x ← a a ← a while j > i do
swap x, a Parent(j) j ← iParent(j)
Bottom-up heapsort was announced as beating quicksort (with median-of-three pivot selection) on arrays of size ≥16000.
A 2008 re-evaluation of this algorithm showed it to be no faster than ordinary heapsort for integer keys, presumably because modern branch prediction nullifies the cost of the predictable comparisons which bottom-up heapsort manages to avoid.
A further refinement does a binary search in the path to the selected leaf, and sorts in a worst case of comparisons, approaching the information-theoretic lower bound of comparisons.
A variant which uses two extra bits per internal node (''n''−1 bits total for an ''n''-element heap) to cache information about which child is greater (two bits are required to store three cases: left, right, and unknown) uses less than compares.
Other variations
*Ternary heapsort uses a ternary heap instead of a binary heap; that is, each element in the heap has three children. It is more complicated to program, but does a constant number of times fewer swap and comparison operations. This is because each sift-down step in a ternary heap requires three comparisons and one swap, whereas in a binary heap two comparisons and one swap are required. Two levels in a ternary heap cover 32 = 9 elements, doing more work with the same number of comparisons as three levels in the binary heap, which only cover 23 = 8. This is primarily of academic interest, or as a student exercise, as the additional complexity is not worth the minor savings, and bottom-up heapsort beats both.
*Memory-optimized heapsort improves heapsort's locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
by increasing the number of children even more. This increases the number of comparisons, but because all children are stored consecutively in memory, reduces the number of cache line
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...
s accessed during heap traversal, a net performance improvement.
*Out-of-place heapsort improves on bottom-up heapsort by eliminating the worst case, guaranteeing comparisons. When the maximum is taken, rather than fill the vacated space with an unsorted data value, fill it with a sentinel value, which never "bounces" back up. It turns out that this can be used as a primitive in an in-place (and non-recursive) "QuickHeapsort" algorithm. First, you perform a quicksort-like partitioning pass, but reversing the order of the partitioned data in the array. Suppose ( without loss of generality) that the smaller partition is the one greater than the pivot, which should go at the end of the array, but our reversed partitioning step places it at the beginning. Form a heap out of the smaller partition and do out-of-place heapsort on it, exchanging the extracted maxima with values from the end of the array. These are less than the pivot, meaning less than any value in the heap, so serve as sentinel values. Once the heapsort is complete (and the pivot moved to just before the now-sorted end of the array), the order of the partitions has been reversed, and the larger partition at the beginning of the array may be sorted in the same way. (Because there is no non-tail recursion
In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recur ...
, this also eliminates quicksort's stack usage.)
*The smoothsort algorithm is a variation of heapsort developed by Edsger W. Dijkstra
Edsger Wybe Dijkstra ( ; ; 11 May 1930 – 6 August 2002) was a Dutch computer scientist, programmer, software engineer, systems scientist, and science essayist. He received the 1972 Turing Award for fundamental contributions to developing progra ...
in 1981. Like heapsort, smoothsort's upper bound is . The advantage of smoothsort is that it comes closer to time if the input is already sorted to some degree, whereas heapsort averages regardless of the initial sorted state. Due to its complexity, smoothsort is rarely used.
*Levcopoulos and Petersson describe a variation of heapsort based on a heap of Cartesian trees. First, a Cartesian tree is built from the input in time, and its root is placed in a 1-element binary heap. Then we repeatedly extract the minimum from the binary heap, output the tree's root element, and add its left and right children (if any) which are themselves Cartesian trees, to the binary heap. As they show, if the input is already nearly sorted, the Cartesian trees will be very unbalanced, with few nodes having left and right children, resulting in the binary heap remaining small, and allowing the algorithm to sort more quickly than for inputs that are already nearly sorted.
* Several variants such as weak heapsort require comparisons in the worst case, close to the theoretical minimum, using one extra bit of state per node. While this extra bit makes the algorithms not truly in-place, if space for it can be found inside the element, these algorithms are simple and efficient, but still slower than binary heaps if key comparisons are cheap enough (e.g. integer keys) that a constant factor does not matter.
* Katajainen's "ultimate heapsort" requires no extra storage, performs comparisons, and a similar number of element moves. It is, however, even more complex and not justified unless comparisons are very expensive.
Comparison with other sorts
Heapsort primarily competes with quicksort
Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. Overall, it is slightly faster than ...
, another very efficient general purpose in-place comparison-based sort algorithm.
Heapsort's primary advantages are its simple, non- recursive code, minimal auxiliary storage requirement, and reliably good performance: its best and worst cases are within a small constant factor of each other, and of the theoretical lower bound on comparison sorts. While it cannot do better than for pre-sorted inputs, it does not suffer from quicksort's worst case, either. (The latter can be avoided with careful implementation, but that makes quicksort far more complex, and one of the most popular solutions, introsort, uses heapsort for the purpose.)
Its primary disadvantages are its poor locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
and its inherently serial nature; the accesses to the implicit tree are widely scattered and mostly random, and there is no straightforward way to convert it to a parallel algorithm.
This makes it popular in embedded system
An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
s, real-time computing
Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constra ...
, and systems concerned with maliciously chosen inputs, such as the Linux kernel. It is also a good choice for any application which does not expect to be bottlenecked on sorting.
A well-implemented quicksort is usually 2–3 times faster than heapsort. See particularly figure 9c on p. 98. See Fig. 1 on p. 6. Although quicksort requires fewer comparisons, this is a minor factor. (Results claiming twice as many comparisons are measuring the top-down version; see .) The main advantage of quicksort is its much better locality of reference: partitioning is a linear scan with good spatial locality, and the recursive subdivision has good temporal locality. With additional effort, quicksort can also be implemented in mostly branch-free code, and multiple CPUs can be used to sort subpartitions in parallel. Thus, quicksort is preferred when the additional performance justifies the implementation effort.
The other major sorting algorithm is merge sort
In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
, but that rarely competes directly with heapsort because it is not in-place. Merge sort's requirement for extra space (roughly half the size of the input) is usually prohibitive except in the situations where merge sort has a clear advantage:
* When a stable sort is required
* When taking advantage of (partially) pre-sorted input
* Sorting linked list
In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whi ...
s (in which case merge sort requires minimal extra space)
* Parallel sorting; merge sort parallelizes even better than quicksort and can easily achieve close to linear speedup
* External sorting
External sorting is a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in ...
; merge sort has excellent locality of reference
Example
Let be the list that we want to sort from the smallest to the largest. (NOTE, for 'Building the Heap' step: Larger nodes don't stay below smaller node parents. They are swapped with parents, and then recursively checked if another swap is needed, to keep larger numbers above smaller numbers on the heap binary tree.)
Build the heap
Sorting
Notes
References
*
*
*
*
* Chapters 6 and 7 Respectively: Heapsort and Priority Queues
A PDF of Dijkstra's original paper on Smoothsort
by David Carlson, St. Vincent College
External links
* – graphical demonstration
– With text, animations and interactive exercises
* ttp://www.codecodex.com/wiki/Heapsort Heapsort implemented in 12 languages
Sorting revisited
by Paul Hsieh
A PowerPoint presentation demonstrating how Heap sort works
that is for educators.
Pat Morin
{{sorting
Articles with example pseudocode
Comparison sorts
Heaps (data structures)
Sorting algorithms