Overview
The outer loop of block sort is identical to a bottom-up merge sort, where each ''level'' of the sort merges pairs of subarrays, and , in sizes of 1, then 2, then 4, 8, 16, and so on, until both subarrays combined are the array itself. Rather than merging and directly as with traditional methods, a block-based merge algorithm divides into discrete blocks of size (resulting in ''number'' of blocks as well), inserts each block into such that the first value of each block is less than or equal (≤) to the value immediately after it, then ''locally merges'' each block with any values between it and the next block. As merges still require a separate buffer large enough to hold the block to be merged, two areas within the array are reserved for this purpose (known as ''internal buffers''). The first two blocks are thus modified to contain the first instance of each value within , with the original contents of those blocks shifted over if necessary. The remaining blocks are then inserted into and merged using one of the two buffers as swap space. This process causes the values in that buffer to be rearranged. Once every and block of every and subarray have been merged for that level of the merge sort, the values in that buffer must be sorted to restore their original order, so an insertion sort must be applied. The values in the buffers are then redistributed to their first sorted position within the array. This process repeats for each level of the outer bottom-up merge sort, at which point the array will have been stably sorted.Algorithm
The following operators are used in the code examples: Additionally, block sort relies on the following operations as part of its overall algorithm: *Outer loop
As previously stated, the outer loop of a block sort is identical to a bottom-up merge sort. However, it benefits from the variant that ensures each and subarray are the same size to within one item: BlockSort(array) power_of_two = FloorPowerOfTwo(array.size) scale = array.size/power_of_two // 1.0 ≤ scale < 2.0 // insertion sort 16–31 items at a time for (merge = 0; merge < power_of_two; merge += 16) start = merge * scale end = start + 16 * scale InsertionSort(array, [start, end)) for (length = 16; length < power_of_two; length += length) for (merge = 0; merge < power_of_two; merge += length * 2) start = merge * scale mid = (merge + length) * scale end = (merge + length * 2) * scale if (array[end − 1] < array[start]) // the two ranges are in reverse order, so a rotation is enough to merge them Rotate(array, mid − start, tart, end)) else if (array[mid − 1> array[mid">id_−_1.html" ;"title="tart, end)) else if (array[mid − 1">tart, end)) else if (array[mid − 1> array[mid Merge(array, A = [start, mid), B = [mid, end)) // else the ranges are already correctly ordered Fixed-point arithmetic">Fixed-point math may also be used, by representing the scale factor as a fractioninteger_part + numerator/denominator
:
power_of_two = FloorPowerOfTwo(array.size)
denominator = power_of_two/16
numerator_step = array.size % denominator
integer_step = floor(array.size/denominator)
// insertion sort 16–31 items at a time
while (integer_step < array.size)
integer_part = numerator = 0
while (integer_part < array.size)
// get the ranges for A and B
start = integer_part
integer_part += integer_step
numerator += numerator_step
if (numerator ≥ denominator)
numerator −= denominator
integer_part++
mid = integer_part
integer_part += integer_step
numerator += numerator_step
if (numerator ≥ denominator)
numerator −= denominator
integer_part++
end = integer_part
if (array[end − 1] < array[start])
Rotate(array, mid − start, [start, end))
else if (array[mid − 1] > array[mid])
Merge(array, A = [start, mid), B = [mid, end))
integer_step += integer_step
numerator_step += numerator_step
if (numerator_step ≥ denominator)
numerator_step −= denominator
integer_step++
Extract buffers
Tag A blocks
Roll and drop
Local merges
Once the A block has been rotated into the B block, the previous A block is then merged with the B values that follow it, using the second buffer as swap space. When the first A block is dropped behind this refers to the unevenly sized A block at the start, when the second A block is dropped behind it means the first A block, and so forth. MergeInternal(array, A, B, buffer) // block swap the values in A with those in 'buffer' BlockSwap(array, A.start, buffer.start, ">A, ) A_count = 0, B_count = 0, insert = 0 while (A_count < , A, and B_count < , B, ) if (array ≤ array[B.start + B_count">uffer.start + A_count≤ array[B.start + B_count Swap(array[A.start + insert">.start_+_B_count.html" ;"title="uffer.start + A_count≤ array[B.start + B_count">uffer.start + A_count≤ array[B.start + B_count Swap(array[A.start + insert array[buffer.start + A_count]) A_count++ else Swap(array[A.start + insert], array[B.start + B_count]) B_count++ insert++ // block swap the remaining part of the buffer with the remaining part of the array BlockSwap(array, buffer.start + A_count, A.start + insert, , A, - A_count) If the second buffer does not exist, a strictly in-place merge operation must be performed, such as a rotation-based version of the Hwang and Lin algorithm, the Dudzinski and Dydek algorithm, or a repeated binary search and rotate. MergeInPlace(array, A, B) while (, A, > 0 and , B, > 0) // find the first place in B where the first item in A needs to be inserted mid = BinaryFirst(array, array .start B) // rotate A into place amount = mid - A.end Rotate(array, amount, [A.start, mid)) // calculate the new A and B ranges B = [mid, B.end) A = [A.start + amount, mid) A.start = BinaryLast(array, array .start A) After dropping the minimum A block and merging the previous A block with the B values that follow it, the new minimum A block must be found within the blocks that are still being rolled through the array. This is handled by running a linear search through those A blocks and comparing the tag values to find the smallest one. minA = blockA.start for (findA = minA + block_size; findA < blockA.end - 1; findA += block_size) if (array[findA + 1] < array[minA + 1]) minA = findA These remaining A blocks then continue rolling through the array and being dropped and inserted where they belong. This process repeats until all of the A blocks have been dropped and rotated into the previous B block. Once the last remaining A block has been dropped behind and inserted into B where it belongs, it should be merged with the remaining B values that follow it. This completes the merge process for that particular pair of A and B subarrays. However, it must then repeat the process for the remaining A and B subarrays for the current level of the merge sort. Note that the internal buffers can be reused for every set of A and B subarrays for this level of the merge sort, and do not need to be re-extracted or modified in any way.Redistribute
After all of the A and B subarrays have been merged, the one or two internal buffers are still left over. The first internal buffer was used for tagging the A blocks, and its contents are still in the same order as before, but the second internal buffer may have had its contents rearranged when it was used as swap space for the merges. This means the contents of the second buffer will need to be sorted using a different algorithm, such as insertion sort. The two buffers must then be redistributed back into the array using the opposite process that was used to create them. After repeating these steps for every level of the bottom-up merge sort, the block sort is completed.Variants
Block sort works by extracting two internal buffers, breaking A and B subarrays into evenly sized blocks, rolling and dropping the A blocks into B (using the first buffer to track the order of the A blocks), locally merging using the second buffer as swap space, sorting the second buffer, and redistributing both buffers. While the steps do not change, these subsystems can vary in their actual implementation. One variant of block sort allows it to use any amount of additional memory provided to it, by using this ''external buffer'' for merging an A subarray or A block with B whenever A fits into it. In this situation it would be identical to a merge sort. Good choices for the buffer size include: Rather than tagging the A blocks using the contents of one of the internal buffers, an indirect ''movement-imitation buffer'' can be used instead. This is an internal buffer defined as ''s1 t s2'', where ''s1'' and ''s2'' are each as large as the number of A and B blocks, and ''t'' contains any values immediately following ''s1'' that are equal to the last value of ''s1'' (thus ensuring that no value in ''s2'' appears in ''s1''). A second internal buffer containing unique values is still used. The first values of ''s1'' and ''s2'' are then swapped with each other to encode information into the buffer about which blocks are A blocks and which are B blocks. When an A block at index ''i'' is swapped with a B block at index ''j'' (where the first evenly sized A block is initially at index 0), s1 and s1 are swapped with s2 and s2 respectively. This ''imitates the movements'' of the A blocks through B. The unique values in the second buffer are used to determine the original order of the A blocks as they are rolled through the B blocks. Once all of the A blocks have been dropped, the movement-imitation buffer is used to decode whether a given block in the array is an A block or a B block, each A block is rotated into B, and the second internal buffer is used as swap space for the local merges. The ''second'' value of each A block doesn't necessarily need to be tagged – the first, last, or any other element could be used instead. However, if the first value is tagged, the values will need to be read from the first internal buffer (where they were swapped) when deciding where to drop the minimum A block. Many sorting algorithms can be used to sort the contents of the second internal buffer, including unstable sorts likeAnalysis
Block sort is a well-defined and testable class of algorithms, with working implementations available as a merge and as a sort. This allows its characteristics to be measured and considered.Complexity
Block sort begins by performing insertion sort on groups of 16–31 items in the array. Insertion sort is an operation, so this leads to anywhere from to , which is once the constant factors are omitted. It must also apply an insertion sort on the second internal buffer after each level of merging is completed. However, as this buffer was limited to in size, the operation also ends up being . Next it must extract two internal buffers for each level of the merge sort. It does so by iterating over the items in the A and B subarrays and incrementing a counter whenever the value changes, and upon finding enough values it rotates them to the start of A or the end of B. In the worst case this will end up searching the entire array before finding non-contiguous unique values, which requires comparisons and rotations for values. This resolves to , or . When none of the A or B subarrays contained unique values to create the internal buffers, a normally suboptimal in-place merge operation is performed where it repeatedly binary searches and rotates A into B. However, the known lack of unique values within any of the subarrays places a hard limit on the number of binary searches and rotations that will be performed during this step, which is again items rotated up to times, or . The size of each block is also adjusted to be smaller in the case where it found unique values but not 2, which further limits the number of unique values contained within any A or B block. Tagging the A blocks is performed times for each A subarray, then the A blocks are rolled through and inserted into the B blocks up to times. The local merges retain the same complexity of a standard merge, albeit with more assignments since the values must be swapped rather than copied. The linear search for finding the new minimum A block iterates over blocks times. And the buffer redistribution process is identical to the buffer extraction but in reverse, and therefore has the same complexity. After omitting all but the highest complexity and considering that there are levels in the outer merge loop, this leads to a final asymptotic complexity of for the worst and average cases. For the best case, where the data is already in order, the merge step performs comparisons for the first level, then , , , etc. This is a well-known mathematical series which resolves to .Memory
As block sort is non-recursive and does not require the use of dynamic allocations, this leads to constantStability
Although items in the array are moved out of order during a block sort, each operation is fully reversible and will have restored the original order of equivalent items by its completion. Stability requires the first instance of each value in an array before sorting to still be the first instance of that value after sorting. Block sort moves these first instances to the start of the array to create the two internal buffers, but when all of the merges are completed for the current level of the block sort, those values are distributed back to the first sorted position within the array. This maintains stability. Before rolling the A blocks through the B blocks, each A block has its second value swapped with a value from the first buffer. At that point the A blocks are moved out of order to roll through the B blocks. However, once it finds where it should insert the smallest A block into the previous B block, that smallest A block is moved back to the start of the A blocks and its second value is restored. By the time all of the A blocks have been inserted, the A blocks will be in order again and the first buffer will contain its original values in the original order. Using the second buffer as swap space when merging an A block with some B values causes the contents of that buffer to be rearranged. However, as the algorithm already ensured the buffer only contains unique values, sorting the contents of the buffer is sufficient to restore their original stable order.Adaptivity
Block sort is anAdvantages
Block sort is a stable sort that does not require additional memory, which is useful in cases where there is not enough free memory to allocate the O(n) buffer. When using the ''external buffer'' variant of block sort, it can scale from using O(n) memory to progressively smaller buffers as needed, and will still work efficiently within those constraints.Disadvantages
Block sort does not exploit sorted ranges of data on as fine a level as some other algorithms, such asReferences
{{sorting Sorting algorithms Comparison sorts Stable sorts Articles with example pseudocode