In
computer science, multiway number partitioning is the problem of partitioning a
multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by
Ronald Graham in 1969 in the context of the
Identical-machines scheduling Identical-machines scheduling is an optimization problem in computer science and operations research. We are given ''n'' jobs ''J''1, ''J''2, ..., ''Jn'' of varying processing times, which need to be scheduled on ''m'' identical machines, such th ...
problem.
The problem is parametrized by a positive integer ''k'', and called ''k''-way number partitioning.
The input to the problem is a multiset ''S'' of numbers (usually integers), whose sum is ''k*T''.
The associated
decision problem is to decide whether ''S'' can be partitioned into ''k'' subsets such that the sum of each subset is exactly ''T''. There is also an
optimization problem: find a partition of ''S'' into ''k'' subsets, such that the ''k'' sums are "as near as possible". The exact optimization objective can be defined in several ways:
* Minimize the difference between the largest sum and the smallest sum. This objective is common in papers about multiway number partitioning, as well as papers originating from physics applications.
* Minimize the largest sum. This objective is equivalent to one objective for
Identical-machines scheduling Identical-machines scheduling is an optimization problem in computer science and operations research. We are given ''n'' jobs ''J''1, ''J''2, ..., ''Jn'' of varying processing times, which need to be scheduled on ''m'' identical machines, such th ...
. There are ''k'' identical processors, and each number in ''S'' represents the time required to complete a single-processor job. The goal is to partition the jobs among the processors such that the
makespan (the finish time of the last job) is minimized.
* Maximize the smallest sum. This objective corresponds to the application of
fair item allocation, particularly the
maximin share. It also appears in voting manipulation problems,
and in sequencing of maintenance actions for modular gas turbine aircraft engines. Suppose there are some ''k'' engines, which must be kept working for as long as possible. An engine needs a certain critical part in order to operate. There is a set ''S'' of parts, each of which has a different lifetime. The goal is to assign the parts to the engines, such that the shortest engine lifetime is as large as possible.
These three objective functions are equivalent when ''k''=2, but they are all different when ''k''≥3.
All these problems are
NP-hard
In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
, but there are various algorithms that solve it efficiently in many cases.
Some closely-related problems are:
* The
partition problem - a special case of multiway number partitioning in which the number of subsets is 2.
* The
3-partition problem The 3-partition problem is a strongly NP-complete problem in computer science. The problem is to decide whether a given multiset of integers can be partitioned into triplets that all have the same sum. More precisely:
* The input to the problem ...
- a different and harder problem, in which the number of subsets is not considered a fixed parameter, but is determined by the input (the number of sets is the number of integers divided by 3).
*The
bin packing problem - a dual problem in which the total sum in each subset is bounded, but ''k'' is flexible; the goal is to find a partition with the smallest possible ''k''. The optimization objectives are closely related: the optimal number of ''d''-sized bins is at most ''k'', iff the optimal size of a largest subset in a ''k''-partition is at most ''d''.
*The
uniform-machines scheduling problem - a more general problem in which different processors may have different speeds.
Approximation algorithms
There are various algorithms that obtain a guaranteed approximation of the optimal solution in polynomial time. There are different approximation algorithms for different objectives.
Minimizing the largest sum
The ''approximation ratio'' in this context is the largest sum in the solution returned by the algorithm, divided by the largest sum in the optimal solution (the ratio is larger than 1). Most algorithms below were developed for
identical-machines scheduling Identical-machines scheduling is an optimization problem in computer science and operations research. We are given ''n'' jobs ''J''1, ''J''2, ..., ''Jn'' of varying processing times, which need to be scheduled on ''m'' identical machines, such th ...
.
*
Greedy number partitioning (also called the Largest Processing Time in the scheduling literature) loops over the numbers, and puts each number in the set whose current sum is smallest. If the numbers are not sorted, then the runtime is
and the approximation ratio is at most
. Sorting the numbers increases the runtime to
and improves the approximation ratio to 7/6 when ''k''=2, and
in general. If the numbers are distributed uniformly in
,1 then the approximation ratio is at most
almost surely , and
in expectation.
*
Largest Differencing Method (also called the Karmarkar-Karp algorithm ) sorts the numbers in descending order and repeatedly replaces numbers by their differences. The runtime complexity is
. In the worst case, its approximation ratio is similar - at most 7/6 for ''k'' =2, and at most
in general. However, in the average case it performs much better than the greedy algorithm: for ''k'' =2, when numbers are distributed uniformly in
,1 its approximation ratio is at most
in expectation. It also performs better in simulation experiments.
* The
Multifit algorithm uses binary search combined with an algorithm for
bin packing . In the worst case, its makespan is at most 8/7 for ''k'' =2, and at most 13/11 in general.
Several
polynomial-time approximation schemes (PTAS) have been developed:
*Graham
presented the following algorithm. For any integer ''r>0'', choose the ''r'' largest numbers in ''S'' and partition them optimally. Then allocate the remaining numbers arbitrarily. This algorithm has approximation ratio
and it runs in time
.
*Sahni
presented a PTAS that attains (1+ε)OPT in time
. It is an FPTAS if ''k'' is fixed. For ''k''=2, the run-time improves to
. The algorithm uses a technique called ''interval partitioning''.
*Hochbaum and Shmoys
presented the following algorithms, which work even when ''k'' is part of the input.
**For any ''r'' >0, an algorithm with approximation ratio at most (6/5+2
−''r'' ) in time
.
**For any ''r'' >0, an algorithm with approximation ratio at most (7/6+2
−''r'' ) in time
.
**For any ''ε''>0, an algorithm with approximation ratio at most (1+ε) in time
. This is a
PTAS.
Maximizing the smallest sum
The approximation ratio in this context is the smallest sum in the solution returned by the algorithm, divided by the smallest sum in the optimal solution (the ratio is less than 1).
* For
greedy number partitioning , if the numbers are not sorted then the worst-case approximation ratio is 1/''k''.
Sorting the numbers increases the approximation ratio to 5/6 when ''k''=2, and
in general, and it is tight.
*Woeginger
presented a PTAS that attains an approximation factor of
in time
, where
a huge constant that is exponential in the required approximation factor ε. The algorithm uses
Lenstra's algorithm for
integer linear programming.
*The FPTAS of Sahni
works for this objective too.
Maximizing the sum of products
Jin studies a problem in which the goal is to maximize the sum, over every set ''i'' in 1,...,''k'', of the product of numbers in set ''i''. In a more general variant, each set ''i'' may have a weight ''w
i'', and the goal is to maximize the ''weighted'' sum of products. This problem has an exact solution that runs in time O(''n''
2).
A PTAS for general objective functions
Let ''C
i'' (for ''i'' between 1 and ''k'') be the sum of subset ''i'' in a given partition. Instead of minimizing the objective function max(''C
i''), one can minimize the objective function max(''f''(''C
i'')), where ''f'' is any fixed function. Similarly, one can minimize the objective function sum(''f''(''C
i'')), or maximize min(f(''C
i'')), or maximize sum(''f''(''C
i'')). Alon, Azar, Woeginger and Yadid
presented general PTAS-s (generalizing the PTAS-s of Sanhi, Hochbaum and Shmoys, and Woeginger) for these four problems. Their algorithm works for any ''f'' which satisfies the following two conditions:
# A strong continuity condition called ''Condition F*'': for every ε>0 there exists δ>0 such that, if , ''y''-''x'', <δ''x'', then , f(''y'')-f(''x''), <ε''f''(''x'').
#
Convexity (for the minimization problems) or
concavity (for the maximization problems).
The runtime of their PTAS-s is linear in ''n'' (the number of inputs), but exponential in the approximation precision. The PTAS for minimizing sum(''f''(''C
i'')) is based on some combinatorial observations:
# Let ''L'' := the average sum in a single subset (1/''k'' the sum of all inputs). If some input ''x'' is at least ''L'', then there is an optimal partition in which one part contains only ''x''. This follows from the convexity of ''f''. Therefore, the input can be pre-processes by assigning each such input to a unique subset. After this preprocessing, one can assume that all inputs are smaller than ''L''.
#There is an optimal partition in which all subsets sums are strictly betweel ''L''/2 and 2''L'' (''L''/2 < ''C
i'' < 2''L'' for all ''i'' in 1,...,''k''). Particularly, the partition minimizing the sum of squares ''C
i''
2, among all optimal partitions, satisfies these inequalities.
The PTAS uses an ''input rounding'' technique. Given the input sequence ''S ='' (''v''
1,...,''v
n'') and a positive integer ''d'', the rounded sequence ''S
#''(''d'') is defined as follows:
* For any ''v
j'' > ''L/d'', the sequence ''S
#''(''d'') contains an input ''v
j#'' which is ''v
j'' rounded up to the next integer multiple of ''L''/''d''
2. Note that ''v
j'' ≤ ''v
j''
# < ''v
j'' +''L''/''d''
2, and ''L/d''
2 < ''v
j'' /''d'', so ''v
j''
# < (''d''+1)''v
j'' /''d.''
*In addition, the sequence ''S
#''(''d'') contains some inputs equal to ''L/d''. The number of these inputs is determined such that the sum of all these new inputs equals the sum of all inputs in ''S
#''(''d'') that are at most ''L/d'', rounded up to the next integer multiple of ''L/d'' (for example, if the sum of all "short" inputs in ''S'' is 51.3''L/d'', then 52 new ''L/d'' inputs are added to ''S
#''(''d'')).
In ''S
#''(''d''), all inputs are integer multiples of ''L''/''d''
2. Moreover, the above two observations hold for ''S
#''(''d'') too:
# Let ''L''
# be the average sum in a single subset (1/''k'' the sum of all inputs in ''S
#''(''d'')). By construction, ''L''
# is at least ''L''. Since ''L'' itself is an integer multiple of ''L''/''d''
2, the rounding-up of inputs smaller than ''L'' cannot make them larger than ''L''. Therefore, all inputs in ''S
#''(''d'') are smaller than ''L'', and hence smaller than L
#.
#There is an optimal partition of ''S
#''(''d'') in which all subset sums are strictly between ''L''
#/2 and 2''L
#.'' Therefore, all subsets contain at most 2''d'' elements (since all inputs in ''S
#''(''d'') are at least ''L''/''d'').
Based on these observations, all inputs in ''S
#''(''d'') are of the form ''hL''/''d''
2, where ''h'' is an integer in the range
. Therefore, the input can be represented as an integer vector
, where
is the number of ''hL''/''d''
2 inputs in ''S
#''(''d''). Moreover, each subset can be represented as an integer vector
, where
is the number of ''hL''/''d''
2 inputs in the subset. The subset sum is then
. Denote by
, the set of vectors
with
. Since the sum of elements in such a vector is at most 2''d'', the total number of these elements is smaller than
, so
.
There are two different ways to find an optimal solution to ''S
#''(''d''). One way uses dynamic programming: its run-time is a polynomial whose exponent depends on ''d''. The other way uses
Lenstra's algorithm for integer linear programming.
Dynamic programming solution
Define
as the optimal (minimum) value of the objective function sum(''f''(''C
i'')), when the input vector is
and it has to be partitioned into ''k'' subsets, among all partitions in which all subset sums are strictly between ''L''
#/2 and 2''L
#.''
It can be solved by the following
recurrence relation:
*
- since their objective sum is empty.
*
if
- since all inputs must be assigned to a single subset, so its sum is
.
*
otherwise - since we do not consider optimal solutions outside this range.
*