computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...

, multiway number partitioning is the problem of partitioning a

multiset In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that ...

of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by

Ronald Graham Ronald Lewis Graham (October 31, 1935July 6, 2020) was an American mathematician credited by the American Mathematical Society as "one of the principal architects of the rapid development worldwide of discrete mathematics in recent years". He ...

in 1969 in the context of the

Identical-machines scheduling Identical-machines scheduling is an optimization problem in computer science and operations research. We are given ''n'' jobs ''J''1, ''J''2, ..., ''Jn'' of varying processing times, which need to be scheduled on ''m'' identical machines, such th ...

problem. The problem is parametrized by a positive integer ''k'', and called ''k''-way number partitioning. The input to the problem is a multiset ''S'' of numbers (usually integers), whose sum is ''k*T''. The associated

decision problem In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm whethe ...

is to decide whether ''S'' can be partitioned into ''k'' subsets such that the sum of each subset is exactly ''T''. There is also an

optimization problem In mathematics, computer science and economics, an optimization problem is the problem of finding the ''best'' solution from all feasible solutions. Optimization problems can be divided into two categories, depending on whether the variables ...

: find a partition of ''S'' into ''k'' subsets, such that the ''k'' sums are "as near as possible". The exact optimization objective can be defined in several ways: * Minimize the difference between the largest sum and the smallest sum. This objective is common in papers about multiway number partitioning, as well as papers originating from physics applications. * Minimize the largest sum. This objective is equivalent to one objective for

. There are ''k'' identical processors, and each number in ''S'' represents the time required to complete a single-processor job. The goal is to partition the jobs among the processors such that the

makespan In operations research, the makespan of a project is the length of time that elapses from the start of work to the end. This type of multi-mode resource constrained project scheduling problem (MRCPSP) seeks to create the shortest logical project sc ...

(the finish time of the last job) is minimized. * Maximize the smallest sum. This objective corresponds to the application of

fair item allocation Fair item allocation is a kind of a fair division problem in which the items to divide are ''discrete'' rather than continuous. The items have to be divided among several partners who value them differently, and each item has to be given as a whole ...

, particularly the

maximin share Maximin share (MMS) is a criterion of fair item allocation. Given a set of items with different values, the ''1-out-of-n maximin-share'' is the maximum value that can be gained by partitioning the items into ''n'' parts and taking the part with th ...

. It also appears in voting manipulation problems, and in sequencing of maintenance actions for modular gas turbine aircraft engines. Suppose there are some ''k'' engines, which must be kept working for as long as possible. An engine needs a certain critical part in order to operate. There is a set ''S'' of parts, each of which has a different lifetime. The goal is to assign the parts to the engines, such that the shortest engine lifetime is as large as possible. These three objective functions are equivalent when ''k''=2, but they are all different when ''k''≥3. All these problems are

NP-hard In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...

, but there are various algorithms that solve it efficiently in many cases. Some closely-related problems are: * The

partition problem In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset ''S'' of positive integers can be partitioned into two subsets ''S''1 and ''S''2 such that the sum of the numbers ...

- a special case of multiway number partitioning in which the number of subsets is 2. * The

3-partition problem The 3-partition problem is a strongly NP-complete problem in computer science. The problem is to decide whether a given multiset of integers can be partitioned into triplets that all have the same sum. More precisely: * The input to the proble ...

- a different and harder problem, in which the number of subsets is not considered a fixed parameter, but is determined by the input (the number of sets is the number of integers divided by 3). *The

bin packing problem The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has ma ...

- a dual problem in which the total sum in each subset is bounded, but ''k'' is flexible; the goal is to find a partition with the smallest possible ''k''. The optimization objectives are closely related: the optimal number of ''d''-sized bins is at most ''k'', iff the optimal size of a largest subset in a ''k''-partition is at most ''d''. *The

uniform-machines scheduling Uniform machine scheduling (also called uniformly-related machine scheduling or related machine scheduling) is an optimization problem in computer science and operations research. It is a variant of optimal job scheduling. We are given ''n'' jobs '' ...

problem - a more general problem in which different processors may have different speeds.

Approximation algorithms

There are various algorithms that obtain a guaranteed approximation of the optimal solution in polynomial time. There are different approximation algorithms for different objectives.

Minimizing the largest sum

The ''approximation ratio'' in this context is the largest sum in the solution returned by the algorithm, divided by the largest sum in the optimal solution (the ratio is larger than 1). Most algorithms below were developed for

identical-machines scheduling Identical-machines scheduling is an optimization problem in computer science and operations research. We are given ''n'' jobs ''J''1, ''J''2, ..., ''Jn'' of varying processing times, which need to be scheduled on ''m'' identical machines, such th ...

. *

Greedy number partitioning In computer science, greedy number partitioning is a class of greedy algorithms for multiway number partitioning. The input to the algorithm is a set ''S'' of numbers, and a parameter ''k''. The required output is a partition of ''S'' into ''k'' ...

(also called the Largest Processing Time in the scheduling literature) loops over the numbers, and puts each number in the set whose current sum is smallest. If the numbers are not sorted, then the runtime is

O(n)

and the approximation ratio is at most

2 - 1/k

. Sorting the numbers increases the runtime to

O(n\log)

and improves the approximation ratio to 7/6 when ''k''=2, and

\frac - \frac = \frac

in general. If the numbers are distributed uniformly in ,1 then the approximation ratio is at most

1 + O(\log/n)

almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0 ...

, and

1 + O(1/n)

in expectation. * Largest Differencing Method (also called the Karmarkar-Karp algorithm ) sorts the numbers in descending order and repeatedly replaces numbers by their differences. The runtime complexity is

O(n\log)

. In the worst case, its approximation ratio is similar - at most 7/6 for ''k'' =2, and at most

4/3 - 1/3k

in general. However, in the average case it performs much better than the greedy algorithm: for ''k'' =2, when numbers are distributed uniformly in ,1 its approximation ratio is at most

1 + 1/n^

in expectation. It also performs better in simulation experiments. * The

Multifit algorithm The multifit algorithm is an algorithm for multiway number partitioning, originally developed for the problem of identical-machines scheduling. It was developed by Coffman, Garey and Johnson. Its novelty comes from the fact that it uses an algorithm ...

uses binary search combined with an algorithm for

bin packing The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has m ...

. In the worst case, its makespan is at most 8/7 for ''k'' =2, and at most 13/11 in general. Several

polynomial-time approximation scheme In computer science (particularly algorithmics), a polynomial-time approximation scheme (PTAS) is a type of approximation algorithm for optimization problems (most often, NP-hard optimization problems). A PTAS is an algorithm which takes an in ...

s (PTAS) have been developed: *Graham presented the following algorithm. For any integer ''r>0'', choose the ''r'' largest numbers in ''S'' and partition them optimally. Then allocate the remaining numbers arbitrarily. This algorithm has approximation ratio

1 + \frac

and it runs in time

O(2^r n\log)

. *Sahni presented a PTAS that attains (1+ε)OPT in time

O(n\cdot (n^2 / \epsilon)^)

. It is an FPTAS if ''k'' is fixed. For ''k''=2, the run-time improves to

O(n^2 / \epsilon)

. The algorithm uses a technique called ''interval partitioning''. *Hochbaum and Shmoys presented the following algorithms, which work even when ''k'' is part of the input. **For any ''r'' >0, an algorithm with approximation ratio at most (6/5+2^−''r'' ) in time

O(n(r+\log))

. **For any ''r'' >0, an algorithm with approximation ratio at most (7/6+2^−''r'' ) in time

O(n(r k^4+\log))

. **For any ''ε''>0, an algorithm with approximation ratio at most (1+ε) in time

O((n/\varepsilon)^)

. This is a PTAS.

Maximizing the smallest sum

The approximation ratio in this context is the smallest sum in the solution returned by the algorithm, divided by the smallest sum in the optimal solution (the ratio is less than 1). * For

greedy number partitioning In computer science, greedy number partitioning is a class of greedy algorithms for multiway number partitioning. The input to the algorithm is a set ''S'' of numbers, and a parameter ''k''. The required output is a partition of ''S'' into ''k'' ...

, if the numbers are not sorted then the worst-case approximation ratio is 1/''k''. Sorting the numbers increases the approximation ratio to 5/6 when ''k''=2, and

\frac

in general, and it is tight. *Woeginger presented a PTAS that attains an approximation factor of

1-

in time

O(c_n\log)

, where

c_

a huge constant that is exponential in the required approximation factor ε. The algorithm uses

Lenstra's algorithm An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective ...

for

integer linear programming An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective ...

. *The FPTAS of Sahni works for this objective too.

Maximizing the sum of products

Jin studies a problem in which the goal is to maximize the sum, over every set ''i'' in 1,...,''k'', of the product of numbers in set ''i''. In a more general variant, each set ''i'' may have a weight ''w_i'', and the goal is to maximize the ''weighted'' sum of products. This problem has an exact solution that runs in time O(''n''²).

A PTAS for general objective functions

Let ''C_i'' (for ''i'' between 1 and ''k'') be the sum of subset ''i'' in a given partition. Instead of minimizing the objective function max(''C_i''), one can minimize the objective function max(''f''(''C_i'')), where ''f'' is any fixed function. Similarly, one can minimize the objective function sum(''f''(''C_i'')), or maximize min(f(''C_i'')), or maximize sum(''f''(''C_i'')). Alon, Azar, Woeginger and Yadid presented general PTAS-s (generalizing the PTAS-s of Sanhi, Hochbaum and Shmoys, and Woeginger) for these four problems. Their algorithm works for any ''f'' which satisfies the following two conditions: # A strong continuity condition called ''Condition F*'': for every ε>0 there exists δ>0 such that, if , ''y''-''x'', <δ''x'', then , f(''y'')-f(''x''), <ε''f''(''x''). #

Convexity Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytope ...

(for the minimization problems) or concavity (for the maximization problems). The runtime of their PTAS-s is linear in ''n'' (the number of inputs), but exponential in the approximation precision. The PTAS for minimizing sum(''f''(''C_i'')) is based on some combinatorial observations: # Let ''L'' := the average sum in a single subset (1/''k'' the sum of all inputs). If some input ''x'' is at least ''L'', then there is an optimal partition in which one part contains only ''x''. This follows from the convexity of ''f''. Therefore, the input can be pre-processes by assigning each such input to a unique subset. After this preprocessing, one can assume that all inputs are smaller than ''L''. #There is an optimal partition in which all subsets sums are strictly betweel ''L''/2 and 2''L'' (''L''/2 < ''C_i'' < 2''L'' for all ''i'' in 1,...,''k''). Particularly, the partition minimizing the sum of squares ''C_i''², among all optimal partitions, satisfies these inequalities. The PTAS uses an ''input rounding'' technique. Given the input sequence ''S ='' (''v''₁,...,''v_n'') and a positive integer ''d'', the rounded sequence ''S^#''(''d'') is defined as follows: * For any ''v_j'' > ''L/d'', the sequence ''S^#''(''d'') contains an input ''v_j^#'' which is ''v_j'' rounded up to the next integer multiple of ''L''/''d''². Note that ''v_j'' ≤ ''v_j''^# < ''v_j'' +''L''/''d''², and ''L/d''² < ''v_j'' /''d'', so ''v_j'' ^# < (''d''+1)''v_j'' /''d.'' *In addition, the sequence ''S^#''(''d'') contains some inputs equal to ''L/d''. The number of these inputs is determined such that the sum of all these new inputs equals the sum of all inputs in ''S^#''(''d'') that are at most ''L/d'', rounded up to the next integer multiple of ''L/d'' (for example, if the sum of all "short" inputs in ''S'' is 51.3''L/d'', then 52 new ''L/d'' inputs are added to ''S^#''(''d'')). In ''S^#''(''d''), all inputs are integer multiples of ''L''/''d''². Moreover, the above two observations hold for ''S^#''(''d'') too: # Let ''L''^# be the average sum in a single subset (1/''k'' the sum of all inputs in ''S^#''(''d'')). By construction, ''L''^# is at least ''L''. Since ''L'' itself is an integer multiple of ''L''/''d''², the rounding-up of inputs smaller than ''L'' cannot make them larger than ''L''. Therefore, all inputs in ''S^#''(''d'') are smaller than ''L'', and hence smaller than L^#. #There is an optimal partition of ''S^#''(''d'') in which all subset sums are strictly between ''L''^#/2 and 2''L^#.'' Therefore, all subsets contain at most 2''d'' elements (since all inputs in ''S^#''(''d'') are at least ''L''/''d''). Based on these observations, all inputs in ''S^#''(''d'') are of the form ''hL''/''d''², where ''h'' is an integer in the range

(d,d+1,\ldots,d^2)

. Therefore, the input can be represented as an integer vector

\mathbf = (n_d, n_, \ldots, n_)

, where

n_h

is the number of ''hL''/''d''² inputs in ''S^#''(''d''). Moreover, each subset can be represented as an integer vector

\mathbf = (t_d, t_, \ldots, t_)

, where

x_h

is the number of ''hL''/''d''² inputs in the subset. The subset sum is then

C(\mathbf) = \sum_^ t_h\cdot (h L/d^2)

. Denote by

T

, the set of vectors

\mathbf

with

L^/2 < C(\mathbf) < 2 L^

. Since the sum of elements in such a vector is at most 2''d'', the total number of these elements is smaller than

^ = d^

, so

, T, \leq d^

. There are two different ways to find an optimal solution to ''S^#''(''d''). One way uses dynamic programming: its run-time is a polynomial whose exponent depends on ''d''. The other way uses

for integer linear programming.

Dynamic programming solution

Define

VAL(k, \mathbf)

as the optimal (minimum) value of the objective function sum(''f''(''C_i'')), when the input vector is

\mathbf = (n_d, n_, \ldots, n_)

and it has to be partitioned into ''k'' subsets, among all partitions in which all subset sums are strictly between ''L''^#/2 and 2''L^#.'' It can be solved by the following

recurrence relation In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a paramete ...

: *

VAL(0, \mathbf) = 0

- since their objective sum is empty. *

VAL(1, \mathbf) = f(C(\mathbf))

L^/2 < C(\mathbf)) < 2 L^

- since all inputs must be assigned to a single subset, so its sum is

C(\mathbf)

. *

VAL(1, \mathbf) = \infty

otherwise - since we do not consider optimal solutions outside this range. *

VAL(k, \mathbf) = \min_(C(\mathbf)) + VAL(k-1, \mathbf-\mathbf) /math> for all k\geq 2 : we check all options for the ''k''-th subset, and combine it with an optimal partition of the remainder into ''k''-1 subsets.
For each ''k'' and n, the recurrence relation requires to check at most, T, vectors. The total number of vectors n to check is at most n^, where ''n'' is the original number of inputs. Therefore, the run-time of the dynamic programming algorithm is O(k\cdot n^\cdot d^) . It is linear in ''n'' for any fixed ''d''.

Integer linear programming solution

For each vector t in ''T'', introduce a variable ''x''_t denoting the number of subsets with this configuration. Minimizing sum(''f''(''C_i'')) can be attained by the solving the following ILP: * Minimize

\sum_ x_\cdot f(C(\mathbf))

* subject to

\sum_ x_ = k

(the total number of subsets) * and

\sum_ x_ \cdot \mathbf= \mathbf

(the total vector of inputs) * and

x_ \geq 0

. The number of variables is at most

d^

, and the number of equations is

d^+d^2-d+2

- both are constants independent of ''n'', ''k''. Therefore,

can be used. Its run-time is exponential in the dimension (

d^

), but polynomial in the binary representation of the coefficients, which are in O(log(''n'')). Constructing the ILP itself takes time O(''n'').

Converting the solution from the rounded to the original instance

The following lemmas relate the partitions of the rounded instance ''S^#''(''d'') and the original instance ''S''. * For every partition of ''S'' with sums ''C_i'', there is a partition of ''S^#''(''d'') with sums ''C_i^#'', where

C_i - \frac \leq C_i^ \leq  \fracC_i  +  \frac

. * For every partition of ''S^#''(''d'') with sums ''C_i^#'', there is a partition of ''S'' with sums ''C_i'', where

\fracC^_i - 2 \frac \leq C_i \leq  C^_i  +  \frac

, and it can be found in time O(''n''). Given a desired approximation precision ε>0, let δ>0 be the constant corresponding to ε/3, whose existence is guaranteed by Condition F*. Let

d := \lceil 5/\delta \rceil

. It is possible to show that converted partition of ''S'' has a total cost of at most

(1+\frac)\cdot OPT \leq(1+ \epsilon)\cdot OPT

, so the approximation ratio is 1+ε.

Non-existence of PTAS for some objective functions

# In contrast to the above result, if we take f(''x'') = 2^''x'', or f(''x'')=(''x''-1)², then no PTAS for minimizing sum(''f''(''C_i'')) exists unless

P=NP The P versus NP problem is a major unsolved problem in theoretical computer science. In informal terms, it asks whether every problem whose solution can be quickly verified can also be quickly solved. The informal term ''quickly'', used abov ...

. Note that these f(''x'') are convex, but they do not satisfy Condition F* above. The proof is by reduction from

Exact algorithms

There are

exact algorithm In computer science and operations research, exact algorithms are algorithms that always solve an optimization problem to optimality. Unless P = NP, an exact algorithm for an NP-hard optimization problem cannot run in worst-case polynomial time ...

s, that always find the optimal partition. Since the problem is NP-hard, such algorithms might take exponential time in general, but may be practically usable in certain cases. * The

pseudopolynomial time number partitioning In computer science, pseudopolynomial time number partitioning is a pseudopolynomial time algorithm for solving the partition problem. The problem can be solved using dynamic programming when the size of the set and the size of the sum of the int ...

takes memory, where is the largest number in the input. It is practical only when ''k''=2, or when ''k''=3 and the inputs are small integers. * The Complete Greedy Algorithm (CGA) considers all partitions by constructing a ''k''- ary tree. Each level in the tree corresponds to an input number, where the root corresponds to the largest number, the level below to the next-largest number, etc. Each of the ''k'' branches corresponds to a different set in which the current number can be put. Traversing the tree in

depth-first Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible alo ...

order requires only O(''n'') space, but might take O(''kⁿ'') time. The runtime can be improved by using a greedy heuristic: in each level, develop first the branch in which the current number is put in the set with the smallest sum. This algorithm finds first the solution found by

, but then proceeds to look for better solutions. * The Complete Karmarkar-Karp algorithm (CKK) considers all partitions by constructing a tree of degree

k!

. Each level corresponds to a pair of ''k''-tuples, and each of the

k!

branches corresponds to a different way of combining these ''k''-tuples. This algorithm finds first the solution found by the

largest differencing method In computer science, the largest differencing method is an algorithm for solving the partition problem and the multiway number partitioning. It is also called the Karmarkar–Karp algorithm after its inventors, Narendra Karmarkar and Richard M. Kar ...

, but then proceeds to find better solutions. For ''k'' =2 and ''k'' =3, CKK runs substantially faster than CGA on random instances. The advantage of CKK over CGA is much larger in the latter case (when an equal partition exists), and can be of several orders of magnitude. In practice, with ''k''=2, problems of arbitrary size can be solved by CKK if the numbers have at most 12

significant digit Significant figures (also known as the significant digits, ''precision'' or ''resolution'') of a number in positional notation are digits in the number that are reliable and necessary to indicate the quantity of something. If a number expres ...

s; with ''k''=3, at most 6 significant digits. CKK can also run as an

anytime algorithm In computer science, an anytime algorithm is an algorithm that can return a valid solution to a problem even if it is interrupted before it ends. The algorithm is expected to find better and better solutions the longer it keeps running. Most algor ...

: it finds the KK solution first, and then finds progressively better solutions as time allows (possibly requiring exponential time to reach optimality, for the worst instances). For ''k'' ≥ 4, CKK becomes much slower, and CGA performs better. *Korf, Schreiber and Moffitt presented hybrid algorithms, combining CKK, CGA and other methods from the

subset sum problem The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset S of integers and a target-sum T, and the question is to decide whether any subset of the integers sum to precisely T''.'' ...

and the

to achieve an even better performance. Their 2018 journal paper summarizes works from several previous conference papers: **Recursive Number Partitioning (RNP) uses CKK for ''k''=2, but for ''k''>2 it recursively splits ''S'' into subsets and splits ''k'' into halves. **Hybrid recursive number partitioning (HRNP). **Improved bin completion. **Improved search strategies. **Few machines algorithm. **Cached iterative weakening (CIW). **Sequential partitioning.

Reduction to bin packing

The

has many fast solvers. A BP solver can be used to find an optimal number partitioning. The idea is to use

binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...

to find the optimal makespan. To initialize the binary search, we need a lower bound and an upper bound: * Some lower bounds on the makespan are: (sum ''S'')/''k'' - the average value per subset, ''s''₁ - the largest number in ''S'', and ''s''_k + ''s''_k+1 - the size of a bin in the optimal partition of only the largest ''k''+1 numbers. * Some upper bounds can be attained by running heuristic algorithms, such as the greedy algorithm or KK. Given a lower and an upper bound, run the BP solver with bin size middle := (lower+upper)/2. * If the result contains more than ''k'' bins, then the optimal makespan must be larger: set lower to middle and repeat. * If the result contains at most ''k'' bins, then the optimal makespan may be smaller set higher to middle and repeat.

Variants

In the balanced number partitioning problem, there are constraints on the number of items that can be allocated to each subset (these are called ''cardinality constraints''). Another variant is the multidimensional number partitioning.

Applications

One application of the partition problem is for manipulation of

elections An election is a formal group decision-making process by which a population chooses an individual or multiple individuals to hold public office. Elections have been the usual mechanism by which modern representative democracy has opera ...

. Suppose there are three candidates (A, B and C). A single candidate should be elected using the veto voting rule, i.e., each voter vetoes a single candidate and the candidate with the fewest vetoes wins. If a coalition wants to ensure that C is elected, they should partition their vetoes among A and B so as to maximize the smallest number of vetoes each of them gets. If the votes are weighted, then the problem can be reduced to the partition problem, and thus it can be solved efficiently using CKK. For ''k''=2, the same is true for any other voting rule that is based on scoring. However, for ''k''>2 and other voting rules, some other techniques are required.

Implementations

* Python: Th
prtpy package
contains code for various number-partitioning and bin-packing algorithms.

References

{{reflist Number partitioning