HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, the largest differencing method is an algorithm for solving the
partition problem In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset ''S'' of positive integers can be partitioned into two subsets ''S''1 and ''S''2 such that the sum of the numbers ...
and the
multiway number partitioning In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in t ...
. It is also called the Karmarkar–Karp algorithm after its inventors,
Narendra Karmarkar Narendra Krishna Karmarkar (born Circa 1956) is an Indian Mathematician. Karmarkar developed Karmarkar's algorithm. He is listed as an ISI highly cited researcher. He invented one of the first provably polynomial time algorithms for linear p ...
and Richard M. Karp. It is often abbreviated as LDM.


The algorithm

The input to the algorithm is a set ''S'' of numbers, and a parameter ''k''. The required output is a partition of ''S'' into ''k'' subsets, such that the sums in the subsets are as nearly equal as possible. The main steps of the algorithm are: # Order the numbers from large to small. #Replace the largest and second-largest numbers by their difference. #If two or more numbers remain, return to step 1. # Using
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it d ...
, compute the partition.


Two-way partitioning

For ''k''=2, the main step (2) works as follows. * Take the two largest numbers in ''S'', remove them from ''S'', and insert their difference (this represents a decision to put each of these numbers in a different subset). * Proceed in this way until a single number remains. This single number is the difference in sums between the two subsets. For example, if S = , then the resulting difference-sets are after taking out the largest two numbers and inserting the difference 8-7=1 back; Repeat the steps and then we have , then then . Step 3 constructs the subsets in the partition by backtracking. The last step corresponds to ,. Then 2 is replaced by 3 in one set and 1 in the other set: ,, then ,, then , , then , , where the sum-difference is indeed 2. The runtime complexity of this algorithm is dominated by the step 1 (sorting), which takes O(''n'' log ''n''). Note that this partition is not optimal: in the partition , the sum-difference is 0. However, there is evidence that it provides a "good" partition: * If the numbers are uniformly distributed in ,1 then the expected difference between the two sums is n^. This also implies that the expected ratio between the maximum sum and the optimal maximum sum is 1+n^ . * When there are at most 4 items, LDM returns the optimal partition. *LDM always returns a partition in which the largest sum is at most 7/6 times the optimum. This is tight when there are 5 or more items. *On random instances, this approximate algorithm performs much better than
greedy number partitioning In computer science, greedy number partitioning is a class of greedy algorithms for multiway number partitioning. The input to the algorithm is a set ''S'' of numbers, and a parameter ''k''. The required output is a partition of ''S'' into ''k'' su ...
. However, it is still bad for instances where the numbers are exponential in the size of the set.


Multi-way partitioning

For any ''k'' ≥ 2, the algorithm can be generalized in the following way. * Initially, for each number ''i'' in ''S'', construct a ''k''-tuple of subsets, in which one subset is and the other ''k''-1 subsets are empty. * In each iteration, select two ''k''-tuples ''A'' and ''B'' in which the difference between the maximum and minimum sum is largest, and combine them in reverse order of sizes, i.e.: smallest subset in ''A'' with largest subset in ''B'', second-smallest in ''A'' with second-largest in ''B'', etc. * Proceed in this way until a single partition remains. Examples: * If S = and ''k''=2, then the initial partitions are (,), (,), (,), (,), (,). After the first step we get (,), (,), (,), (,). Then (,), (,), (,). Then (,), (,), and finally (,), where the sum-difference is 2; this is the same partition as described above. * If S = and ''k''=3, then the initial partitions are (,,), (,,), (,,), (,,), (,,). After the first step we get (,,), (,,), (,,), (,,). Then (,,), (,,), (,,). Then (,,), (,,), and finally (,,), where the sum-difference is 3. * If S = and ''k''=3, then after 7 iterations we get the partition (,,). This solution is not optimal; a better partitioning is provided by the grouping (,,). There is evidence for the good performance of LDM: * Simulation experiments show that, when the numbers are uniformly random in ,1 LDM always performs better (i.e., produces a partition with a smaller largest sum) than
greedy number partitioning In computer science, greedy number partitioning is a class of greedy algorithms for multiway number partitioning. The input to the algorithm is a set ''S'' of numbers, and a parameter ''k''. The required output is a partition of ''S'' into ''k'' su ...
. It performs better than the
multifit algorithm The multifit algorithm is an algorithm for multiway number partitioning, originally developed for the problem of identical-machines scheduling. It was developed by Coffman, Garey and Johnson. Its novelty comes from the fact that it uses an algorithm ...
when the number of items ''n'' is sufficiently large. When the numbers are uniformly random in 'o'', ''o''+1 from some ''o''>0, the performance of LDM remains stable, while the performance of multifit becomes worse as ''o'' increases. For ''o''>0.2, LDM performs better. *Let ''f*'' be the optimal largest sum. If all numbers are larger than ''f''*/3, then LDM returns the optimal solution. Otherwise, LDM returns a solution in which the difference between largest and smallest sum is at most the largest number which is at most ''f''*/3. *When there are at most ''k''+2 items, LDM is optimal. *When the number of items ''n'' is between ''k''+2 and 2''k'', the largest sum in the LDM partition is at most \frac-\frac times the optimum, *In all cases, the largest sum in the LDM partition is at most \frac-\frac times the optimum, and there are instances in which it is at least \frac-\frac times the optimum. *For two-way partitioning, when inputs are uniformly-distributed random variables, the expected difference between largest and smallest sum is n^.


Balanced two-way partitioning

Several variants of LDM were developed for the balanced number partitioning problem, in which all subsets must have the same cardinality (up to 1). PDM (Paired Differencing Method) works as follows. # Order the numbers from large to small. #Replace numbers #1 and #2 by their difference; #3 and #4 by their difference; etc. #If two or more numbers remain, return to step 1. # Using
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it d ...
, compute the partition. PDM has average properties worse than LDM. For two-way partitioning, when inputs are uniformly-distributed random variables, the expected difference between largest and smallest sum is \Theta(1/n). RLDM (Restricted Largest Differencing Method) works as follows. #Order the numbers from large to small. #Replace numbers #1 and #2 by their difference; #3 and #4 by their difference; etc. #Sort the list of ''n''/2 differences from large to small. #Assign each pair in turn to different sets: the largest in the pair to the set with the smallest sum, and the smallest in the pair to the set with the largest sum. For two-way partitioning, when inputs are uniformly-distributed random variables, the expected difference between largest and smallest sum is O(\log/n^2). BLDM (Balanced Largest Differencing Method) works as follows. #Order the numbers from large to small. #Replace numbers #1 and #2 by their difference; #3 and #4 by their difference; etc. #Run LDM on the set of differences. BLDM has average properties similar to LDM. For two-way partitioning, when inputs are uniformly-distributed random variables, the expected difference between largest and smallest sum is n^. For multi-way partitioning, when ''c''=ceiling(''n''/''k'') and each of the ''k'' subsets must contain either ceiling(''n''/''k'') or floor(''n''/''k'') items, the approximation ratio of BLDM for the minimum largest sum is exactly 4/3 for ''c''=3, 19/12 for ''c''=4, 103/60 for ''c''=5, 643/360 for ''c''=6, and 4603/2520 for ''c''=7. The ratios were found by solving a mixed integer linear program. In general (for any ''c''), the approximation ratio is at least 2-\sum_^\frac and at most 2-\frac. The MILP results for 3,4,5,6,7 correspond to the lower bound. When the parameter is the number of subsets (''k''), the approximation ratio is exactly 2-\frac.


Min-max subsequence problem

In the ''min-max subsequence problem'', the input is a multiset of ''n'' numbers and an integer parameter ''k'', and the goal is to order the numbers such that the largest sum of each block of adjacent ''k'' numbers is as small as possible. The problem occurs in the design of video servers. This problem can be solved in polytime for ''k''=2, but it is strongly NP-hard for ''k≥''3. A variance of the differencing method can applied to this problem.


An exact algorithm

The complete Karmarkar–Karp algorithm (CKK) finds an optimal solution by constructing a tree of degree k!. * In the case ''k''=2, each level corresponds to a pair of numbers, and the two branches correspond to taking their difference (i.e. putting them in different sets), or taking their sum (i.e. putting them in the same set). * For general ''k'', each level corresponds to a pair of ''k''-tuples, and each of the k! branches corresponds to a different way of combining the subsets in these tuples. For ''k''=2, CKK runs substantially faster than the Complete Greedy Algorithm (CGA) on random instances. This is due to two reasons: when an equal partition does not exist, CKK often allows more trimming than CGA; and when an equal partition does exist, CKK often finds it much faster and thus allows earlier termination. Korf reports that CKK can optimally partition 40 15-digit double-precision numbers in about 3 hours, while CGA requires about 9 hours. In practice, with ''k''=2, problems of arbitrary size can be solved by CKK if the numbers have at most 12
significant digits Significant figures (also known as the significant digits, ''precision'' or ''resolution'') of a number in positional notation are digits in the number that are reliable and necessary to indicate the quantity of something. If a number expre ...
; with ''k''=3, at most 6 significant digits. CKK can also run as an
anytime algorithm In computer science, an anytime algorithm is an algorithm that can return a valid solution to a problem even if it is interrupted before it ends. The algorithm is expected to find better and better solutions the longer it keeps running. Most algo ...
: it finds the KK solution first, and then finds progressively better solutions as time allows (possibly requiring exponential time to reach optimality, for the worst instances). Combining CKK with the balanced-LDM algorithm (BLDM) yields a complete
anytime algorithm In computer science, an anytime algorithm is an algorithm that can return a valid solution to a problem even if it is interrupted before it ends. The algorithm is expected to find better and better solutions the longer it keeps running. Most algo ...
for solving the
balanced partition problem Balanced number partitioning is a variant of multiway number partitioning in which there are constraints on the number of items allocated to each set. The input to the problem is a set of ''n'' items of different sizes, and two integers ''m'', ...
.


Previous mentions

An algorithm equivalent to the Karmarkar-Karp differencing heuristic is mentioned in ancient Jewish legal texts by Nachmanides and
Joseph ibn Habib Joseph ibn Habiba ( he, יוסף חביבא), also known as Joseph Havivah and Nimmukei Yosef, after the title of his book, was a Spanish Talmudist who flourished in the 14th and 15th centuries. He lived in Barcelona. Nimmukei Yosef Like his pred ...
. The algorithm is used to combine different testimonies about the same loan.


Implementations

* Python: Th
prtpy library
contains implementations of th
Karmarkar-Karp algorithm
an
complete Karmarkar-Karp algorithm


References

{{reflist Number partitioning