HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, \boldsymbol+\boldsymbol sorting is the problem of sorting pairs of numbers by their sums. Applications of the problem include transit fare minimisation,
VLSI Very large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining millions or billions of MOS transistors onto a single chip. VLSI began in the 1970s when MOS integrated circuit (Metal Oxide Semiconductor) ...
design, and
sparse polynomial In mathematics, a sparse polynomial (also lacunary polynomial or fewnomial) is a polynomial that has far fewer terms than its degree and number of variables would suggest. Examples include *monomials, polynomials with only one term, * binomials, po ...
multiplication. As with comparison sorting and
integer sorting In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by integer keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numb ...
more generally, algorithms for this problem can be based only on comparisons of these sums, or on other operations that work only when the inputs are small integers. It is unknown whether this problem has a comparison-based solution whose running time is asymptotically faster than sorting an unstructured list of equally many items. Therefore, research on the problem has focused on two approaches to settle the question of whether such an improvement is possible: the development of algorithms that improve on unstructured sorting in their number of comparisons rather than in their total running time, and
lower bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is greater than or equal to every element of . Dually, a lower bound or minorant of is defined to be an eleme ...
s for the number of comparisons based on counting cells in subdivisions of high-dimensional spaces. Both approaches are historically tied together, in that the first algorithms that used few comparisons were based on the weakness of the cell-counting lower bounds.


Problem statement and history

The input to the X+Y sorting problem consists of two finite
collections Collection or Collections may refer to: * Cash collection, the function of an accounts receivable department * Collection (church), money donated by the congregation during a church service * Collection agency, agency to collect cash * Collection ...
of numbers X and Y, of the same length. The problem's output is the collection of all
pairs Concentration, also known as Memory, Shinkei-suijaku (Japanese meaning "nervous breakdown"), Matching Pairs, Match Match, Match Up, Pelmanism, Pexeso or simply Pairs, is a card game in which all of the cards are laid face down on a surface and tw ...
of a number from X and a number from Y, arranged into sorted order by the sum of each pair. As a small example, for the inputs X=\ and Y=\, the output should be the list of pairs(1,0),\, (2,0),\, (1,4),\, (2,4),\, (9,0),\, (1,9),\, (2,9),\, (9,4),\, (9,9)of one element from X and one element from Y, listed in sorted order by their sums of pairs1, 2, 5, 6, 9, 10, 11, 13, 18.One way to solve the problem would be to construct the pairs to be sorted (the Cartesian product of the two collections) and use these pairs as input to a standard comparison sorting algorithm such as
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
or
heapsort In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the ...
. When the inputs have length n, they form n^2 pairs, and the time to sort the pairs in this way is O(n^2\log n). In terms of its big O notation, this method is the fastest known algorithm for X+Y sorting. Whether a faster algorithm exists is an open problem, posed by
Elwyn Berlekamp Elwyn Ralph Berlekamp (September 6, 1940 – April 9, 2019) was a professor of mathematics and computer science at the University of California, Berkeley.Contributors, ''IEEE Transactions on Information Theory'' 42, #3 (May 1996), p. 1048. DO10. ...
prior to 1975. A variant of the problem sorts the
sumset In additive combinatorics, the sumset (also called the Minkowski sum) of two subsets A and B of an abelian group G (written additively) is defined to be the set of all sums of an element from A with an element from B. That is, :A + B = \. The n-f ...
, the set of sums of pairs, with duplicate sums condensed to a single value. For this variant, the size of the sumset may be significantly smaller than n^2, and output-sensitive algorithms for constructing it have been investigated.


Applications

Steven Skiena recounts a practical application in transit
fare A fare is the fee paid by a passenger for use of a public transport system: rail, bus, taxi, etc. In the case of air transport, the term airfare is often used. Fare structure is the system set up to determine how much is to be paid by various pa ...
minimisation, an instance of the shortest path problem: find the cheapest two-hop airplane ticket between two given cities, from an input that describes both the cost of each hop and which pairs of hops may be combined into a single ticket. Skiena's solution consists of sorting pairs of hops by their total cost as an instance of the X+Y sorting problem, and then testing the resulting pairs in this sorted order until finding one that is allowed. To generate the sorted pairs in this order, Skiena uses a priority queue of pairs, initially containing only a single pair, the one consisting of the two cheapest hops. Then, when a pair (x,y) is removed from the queue and found to be disallowed, two more pairs are added, with one of these two pairs combining x with the next hop after y in a sorted list of the hops to the destination, and the other pair combining y with the next hop after x in a sorted list of hops from the start. In this way, each successive pair can be found in logarithmic time, and only the pairs up to the first allowable one need to be sorted. X+Y sorting is the most expensive subroutine in an algorithm for a problem in
VLSI Very large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining millions or billions of MOS transistors onto a single chip. VLSI began in the 1970s when MOS integrated circuit (Metal Oxide Semiconductor) ...
design, in which one must place two subunits of a VLSI circuit side-by-side along a communications channel in order to minimize the width of the channel needed to route pairs of wires from one subunit to the other. As one subunit is continuously shifted relative to the other, the channel width only changes at discrete positions where the ends of two wires line up with each other, and finding the sorted ordering of these positions in order to compute the sequence of changes to the width can be performed by X+Y sorting. If this sorting problem could be sped up, it would also speed up this VLSI design task. Another application involves polynomial multiplication for polynomials of a single variable that may have many fewer terms than their degrees. The product of two polynomials can be expressed as a sum of products of pairs of terms, one from each polynomial, and placing these term-by-term products into degree order amounts to sorting them by the sum of degrees. For example, the instance of X+Y sorting with n=3 given as an example above corresponds to the multiplication of two three-term polynomials to produce a nine-term polynomial: \begin (&x+x^2+x^9)(1+x^4+x^9)\\ &=x+x^2+x^5+x^6+x^9+x^+x^+x^+x^.\\ \end The degrees are always integers, so integer-based algorithms for X+Y sorting may be applied. However, for polynomials whose number of terms is comparable to their degree, FFT-based polynomial multiplication algorithms may be significantly more efficient than term-by-term multiplication.


Number of orderings

A well-known lower bound for unstructured sorting, in the
decision tree model In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of ''queries'' or ''tests'' that are done adaptively, so the outcome of the pre ...
, is based on the factorial number of sorted orders that an unstructured list may have. Because each comparison can at best reduce the number of possible orderings by a factor of two, sorting requires a number of comparisons at least equal to the binary logarithm of the factorial, which is n\log_2 n-O(n). Early work on X+Y sorting followed a similar approach by asking how many different sorted orderings are possible for this problem, and proving that this number is at most O(n^). However, because its binary logarithm is at most 8n\log_2 n+O(1), much smaller than the known time bounds for X+Y sorting, this method can only lead to weak lower bounds on the number of comparisons. The proof of this bound relates X+Y sorting to the complexity of an
arrangement of hyperplanes In geometry and combinatorics, an arrangement of hyperplanes is an arrangement of a finite set ''A'' of hyperplanes in a linear, affine, or projective space ''S''. Questions about a hyperplane arrangement ''A'' generally concern geometrical, ...
in high-dimensional geometry. The two input collections for the X+Y sorting problem comprise 2n numbers, which can alternatively be interpreted as the Cartesian coordinates of a point in the 2n-dimensional space \mathbb^. This space can be subdivided into cells, so that within a single cell all points correspond to inputs that produce the same sorted order. For this subdivision, each boundary between two cells lies within a hyperplane defined by an equality of pairs x_i+y_j=x_k+y_\ell, where (x_i,y_j) and (x_k,y_\ell) are two pairs whose ordering changes from one adjacent cell to the other. These hyperplanes are either generated by two disjoint pairs, or they have the simplified forms x_i=x_k or y_j=y_\ell, so the number of distinct hyperplanes that can be determined in this way is k=2\binom^2+2\binom. The number of cells that this number of hyperplanes can divide a space of dimension 2n into is \binom+\binom+\cdots+\binom=O(n^). Therefore, the set X+Y has O(n^) different possible sorted orderings. A similar style of analysis has been more successful in ruling out fast solutions to certain generalizations of X+Y sorting, by showing that they have too many orderings to sort quickly. In particular, suggest separately sorting X and Y, and then constructing a two-dimensional matrix of the values of X+Y that is sorted both by rows and by columns before using this partially-sorted data to complete the sort of X+Y. This idea of using a row-sorted and column-sorted matrix forms the basis for the method used by Skiena in the transportation application, and it can reduce the number of comparisons by a constant factor relative to naive comparison sorting. However, for matrices whose rows and columns are sorted in this way, the number of possible sorted orderings of the whole matrix is much larger than O(n^), so large that any comparison sorting algorithm that can work for arbitrary n\times n matrices that are sorted by rows and columns still requires \Omega(n^2\log n) comparisons. Therefore, if the X+Y sorting problem is to be solved quickly, the solution must use additional information about the set X+Y beyond this matrix ordering.


Number of comparisons

For the classical comparison sorting problem, the time to sort and the number of comparisons needed to sort are within constant factors of each other. But for X + Y sorting, the number of comparisons is smaller than the best time bound known:
Michael Fredman Michael Lawrence Fredman is an emeritus professor at the Computer Science Department at Rutgers University, United States. He earned his Ph.D. degree from Stanford University in 1972 under the supervision of Donald Knuth. He was a member of the ...
showed in 1976 that X + Y sorting can be done using only O(n^2) comparisons. More generally, he showed that any set of N elements, whose sorted ordering has already been restricted to a family \Gamma of orderings, can be sorted using \log_2, \Gamma, +O(N) comparisons, by a form of binary insertion sort. For the X + Y sorting problem, N=n^2, and , \Gamma, =O(n^), so \log_2, \Gamma, =O(n\log n) and Fredman's bound implies that only O(n^2) comparisons are needed. However, in Fredman's method, the time needed to decide which comparisons to perform may be significantly higher than the bound on the number of comparisons. The first explicit algorithm that achieves both O(n^2) comparisons and O(n^2\log n) total complexity was published sixteen years after Fredman by . The algorithm performs the following steps: #Recursively sort the two sets X+X and Y+Y. #Use the equivalence x_i-x_j\le x_k-x_\ell \Leftrightarrow x_i+x_\ell\le x_j+x_k to infer the sorted orderings of X-X and Y-Y without additional comparisons. #
Merge Merge, merging, or merger may refer to: Concepts * Merge (traffic), the reduction of the number of lanes on a road * Merge (linguistics), a basic syntactic operation in generative syntax in the Minimalist Program * Merger (politics), the comb ...
the two sets X-X and Y-Y into a single sorted order, using a number of comparisons linear in their total size. #Use the merged order and the equivalence x_i+y_j\le x_k+y_\ell\Leftrightarrow x_i-x_k\le y_\ell-y_j to infer the sorted order of X+Y without additional comparisons. The part of the algorithm that recursively sorts X+X (or equivalently Y+Y) does so by the following steps: #Split X into two equal sublists A and B. #Recursively sort A+A and B+B #Infer the ordering on A+B using only the comparisons from a single merge step as above. #Merge the sorted results A+A, B+B, and A+B together. The number of comparisons C(n) needed to perform this recursive algorithm on an input of n items can be analyzed using the
recurrence relation In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a parameter ...
C(n)\le 2C(n/2)+O(n^2), where the 2C(n/2) term of the recurrence counts the number of comparisons in the recursive calls to the algorithm to sort A+A and B+B, and the O(n^2) term counts the number of comparisons used to merge the results. The master theorem for recurrence relations of this form shows that C(n)=O(n^2). The total time complexity is slower, O(n^2\log n), because of the steps of the algorithm that use already-made comparisons to infer orderings of other sets. These steps can be performed in time O(n^2\log n) by using a standard comparison-sorting algorithm with its comparison steps replaced by the stated inferences. If only comparisons between elements of X+Y are allowed, then there is also a matching lower bound of \Omega(n^2) on the number of comparisons, but with more general comparisons involving linear combinations of constant numbers of elements, only O(n\log^2 n) comparisons are needed.


Non-comparison-based algorithms

Just as
integer sorting In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by integer keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numb ...
can be faster than comparison sorting for small-enough integers, the same is true for X+Y sorting. In particular, with integer inputs in the range from 0 to some upper limit M, the problem can be solved in O(n+M\log M) operations by means of the fast Fourier transform.


Related problems

Several other problems in computational geometry have equivalent or harder complexity to X+Y sorting, including constructing
Minkowski sum In geometry, the Minkowski sum (also known as dilation) of two sets of position vectors ''A'' and ''B'' in Euclidean space is formed by adding each vector in ''A'' to each vector in ''B'', i.e., the set : A + B = \. Analogously, the Minkowski ...
s of staircase polygons, finding the crossing points of an
arrangement of lines In music, an arrangement is a musical adaptation of an existing composition. Differences from the original composition may include reharmonization, melodic paraphrasing, orchestration, or formal development. Arranging differs from orchestr ...
in sorted order by their x-coordinates, listing pairs of points in sorted order by their distances, and testing whether one
rectilinear polygon A rectilinear polygon is a polygon all of whose sides meet at right angles. Thus the interior angle at each vertex is either 90° or 270°. Rectilinear polygons are a special case of isothetic polygons. In many cases another definition is ...
can be translated to fit within another. The problem of testing whether two of the pairs in the X+Y sorting problem have equal sums can be solved by sorting the pairs and then testing consecutive pairs for equality. In turn, it could be used to solve the
3SUM In computational complexity theory, the 3SUM problem asks if a given set of n real numbers contains three elements that sum to zero. A generalized version, ''k''-SUM, asks the same question on ''k'' numbers. 3SUM can be easily solved in O(n^2) t ...
problem, implying that it is unlikely to have a strongly subquadratic algorithm.


References

{{reflist, refs= {{cite conference , last1 = Arnold , first1 = Andrew , last2 = Roche , first2 = Daniel S. , contribution = Output-sensitive algorithms for sumset and sparse polynomial multiplication , doi = 10.1145/2755996.2756653 , location = New York , mr = 3388279 , pages = 29–36 , publisher = ACM , title = Proceedings of the 2015 ACM International Symposium on Symbolic and Algebraic Computation (ISSAC'15) , year = 2015, arxiv = 1501.05296 {{Introduction to Algorithms, chapter=8.1 Lower bounds for sorting, pages=191–193, edition=3 {{cite journal, last=Dietzfelbinger, first=Martin, doi=10.1016/0304-3975(89)90132-1, issue=2, journal=
Theoretical Computer Science computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumscribe the ...
, mr=1019082, pages=137–155, title=Lower bounds for sorting of sums, volume=66, year=1989, doi-access=free
{{cite journal , first=Michael L. , last=Fredman , author-link=Michael Fredman , title=How good is the information theory bound in sorting? , journal=
Theoretical Computer Science computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumscribe the ...
, volume=1 , issue=4 , pages=355–361 , doi=10.1016/0304-3975(76)90078-5 , year=1976 , mr=0416100, doi-access=free
{{cite journal , last1=Harper , first1=L. H. , last2=Payne , first2=T. H. , last3=Savage , first3=J. E. , author3-link= John E. Savage , last4=Straus , first4=E., author4-link= Ernst G. Straus , title=Sorting {{math, ''X'' + ''Y'' , journal=
Communications of the ACM ''Communications of the ACM'' is the monthly journal of the Association for Computing Machinery (ACM). It was established in 1958, with Saul Rosen as its first managing editor. It is sent to all ACM members. Articles are intended for readers with ...
, volume=18 , issue=6 , year=1975 , pages=347–349 , doi=10.1145/360825.360869, mr=0378473, s2cid=26360885
{{cite conference, contribution-url=http://www.cccg.ca/proceedings/1996/cccg1996_0048.pdf, contribution=Finding an {{math, ''o''(''n''2 log ''n'') algorithm is sometimes hard, first=Antonio, last=Hernández Barrera, pages=289–294, title=Proceedings of the 8th Canadian Conference on Computational Geometry (CCCG'96), year=1996 {{cite journal , last = Klarreich , first = Erica , author-link = Erica Klarreich , date = December 2019 , doi = 10.1145/3371387 , issue = 1 , journal =
Communications of the ACM ''Communications of the ACM'' is the monthly journal of the Association for Computing Machinery (ACM). It was established in 1958, with Saul Rosen as its first managing editor. It is sent to all ACM members. Articles are intended for readers with ...
, pages = 11–13 , title = Multiplication hits the speed limit , volume = 63, s2cid = 209450552
{{cite journal , last = Klip , first = Dorothea A. , doi = 10.1137/0208025 , issue = 3 , journal =
SIAM Journal on Computing The ''SIAM Journal on Computing'' is a scientific journal focusing on the mathematical and formal aspects of computer science. It is published by the Society for Industrial and Applied Mathematics (SIAM). Although its official ISO abbreviation is ...
, mr = 539251 , pages = 326–343 , title = New algorithms for polynomial multiplication , volume = 8 , year = 1979
{{cite journal , last1 = Kane , first1 = Daniel M. , author1-link = Daniel Kane (mathematician) , last2 = Lovett , first2 = Shachar , last3 = Moran , first3 = Shay , arxiv = 1705.01720 , doi = 10.1145/3285953 , issue = 3 , journal =
Journal of the ACM The ''Journal of the ACM'' is a peer-reviewed scientific journal covering computer science in general, especially theoretical aspects. It is an official journal of the Association for Computing Machinery. Its current editor-in-chief is Venkatesan ...
, mr = 3941341 , page = A16:1–A16:18 , title = Near-optimal linear decision trees for {{mvar, k-sum and related problems , volume = 66 , year = 2019, s2cid = 145914158
{{cite journal , first=Jean-Luc , last=Lambert , title=Sorting the sums ({{mvar, xi + {{mvar, yj) in {{math, ''O''(''n''2) comparisons , journal=
Theoretical Computer Science computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumscribe the ...
, volume=103 , issue=1 , pages=137–141 , year=1992 , doi=10.1016/0304-3975(92)90089-X, mr=1181041, doi-access=free
{{cite conference , last1 = LaPaugh , first1 = Andrea S. , author1-link = Andrea LaPaugh , last2 = Pinter , first2 = Ron Y. , author2-link = Ron Pinter , contribution = On minimizing channel density by lateral shifting , pages = 123–124 , title = IEEE International Conference on Computer-Aided Design , year = 1983 As cited by {{cite conference , last1 = Johnson , first1 = David S. , author1-link = David S. Johnson , last2 = LaPaugh , first2 = Andrea S. , last3 = Pinter , first3 = Ron Y. , author3-link = Ron Pinter , editor-last = Sleator , editor-first = Daniel Dominic , contribution = Minimizing channel density by lateral shifting of components , contribution-url = https://dl.acm.org/citation.cfm?id=314464.314486 , pages = 122–131 , title = Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms. 23-25 January 1994, Arlington, Virginia, USA , year = 1994 {{cite OEIS, A343245 {{cite book , first=Steven , last=Skiena , author-link=Steven Skiena , title=The Algorithm Design Manual , publisher=Springer , year=2008 , contribution=4.4 War Story: Give me a Ticket on an Airplane, pages=118–120, edition=2nd, doi=10.1007/978-1-84800-070-4_4 {{cite web , first1=Erik , last1=Demaine , author-link1=Erik Demaine , first2=Jeff , last2=Erickson , first3=Joseph , last3=O'Rourke, author3-link=Joseph O'Rourke (professor) , title=Problem 41: Sorting {{math, ''X'' + ''Y'' (Pairwise Sums) , date=20 August 2006 , access-date=23 September 2014 , website=The Open Problems Project , url=http://cs.smith.edu/~orourke/TOPP/P41.html Sorting algorithms Unsolved problems in computer science