Multifit Algorithm
   HOME

TheInfoList



OR:

The multifit algorithm is an algorithm for
multiway number partitioning In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in t ...
, originally developed for the problem of
identical-machines scheduling Identical-machines scheduling is an optimization problem in computer science and operations research. We are given ''n'' jobs ''J''1, ''J''2, ..., ''Jn'' of varying processing times, which need to be scheduled on ''m'' identical machines, such that ...
. It was developed by Coffman, Garey and Johnson. Its novelty comes from the fact that it uses an algorithm for another famous problem - the
bin packing problem The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has ma ...
- as a subroutine.


The algorithm

The input to the algorithm is a set ''S'' of numbers, and a parameter ''n''. The required output is a partition of ''S'' into ''n'' subsets, such that the largest subset sum (also called the makespan) is as small as possible. The algorithm uses as a subroutine, an algorithm called '' first-fit-decreasing bin packing'' (FFD). The FFD algorithm takes as input the same set ''S'' of numbers, and a bin-capacity ''c''. It heuristically packs numbers into bins such that the sum of numbers in each bin is at most ''C'', aiming to use as few bins as possible. Multifit runs FFD multiple times, each time with a different capacity ''C'', until it finds some ''C'' such that FFD with capacity ''C'' packs ''S'' into at most ''n'' bins. To find it, it uses
binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the ...
as follows. # Let ''L'' := max ( sum(''S'') / ''n'', max(''S'') ). Note, with bin-capacity smaller than ''L'', every packing must use more than ''n'' bins. # Let ''U'' := max ( 2 sum(''S'') / ''n'', max(''S'') ). Note, with bin-capacity at least ''U'', FFD uses at most ''n'' bins. ''Proof'': suppose by contradiction that some input ''si'' did not fit into any of the first ''n'' bins. Clearly this is possible only if ''i'' ≥ ''n''+1. If ''si'' ''>'' C/2, then, since the inputs are ordered in descending order, the same inequality holds for all the first ''n''+1 inputs in ''S''. This means that sum(S) > (n+1)''C''/2 > ''n'' ''U''/2, a contradiction to the definition of ''U''. Otherwise, ''si'' ≤ C/2. So the sum of each of the first ''n'' bins is more than C/2. This again implies sum(S) > n ''C''/2 > ''n'' ''U''/2, contradiction. # Iterate ''k'' times (where ''k'' is a precision parameter): #* Let ''C'' := (''L''+''U'')/2. Run FFD on ''S'' with capacity ''C''. #** If FFD needs at most ''n'' bins, then decrease ''U'' by letting ''U'' := ''C''. #** If FFD needs more than ''n'' bins, then increase L by letting ''L'' := ''C.'' # Finally, run FFD with capacity ''U''. It is guaranteed to use at most ''n'' bins. Return the resulting scheduling.


Performance

Multifit is a
constant-factor approximation algorithm In computational complexity theory, the class APX (an abbreviation of "approximable") is the set of NP optimization problems that allow polynomial-time approximation algorithms with approximation ratio bounded by a constant (or constant-factor ap ...
. It always finds a partition in which the makespan is at most a constant factor larger than the optimal makespan. To find this constant, we must first analyze FFD. While the standard analysis of FFD considers approximation w.r.t. ''number of bins'' when the capacity is constant, here we need to analyze approximation w.r.t. ''capacity'' when the number of bins is constant. Formally, for every input size S and integer n, let OPT(S,n) be the smallest capacity such that ''S'' can be packed into ''n'' bins of this capacity. Note that OPT(S,n) is the value of the optimal solution to the original scheduling instance. Let r_n be the smallest real number such that, for every input ''S'', FFD with capacity r_n\cdot OPT(S,n) uses at most ''n'' bins.


Upper bounds

Coffman, Garey and Johnson prove the following upper bounds on r_n: * r_n\leq 8/7 \approx 1.14 for ''n'' = 2; * r_n\leq 15/13 \approx 1.15 for ''n'' = 3; *r_n\leq 20/17 \approx 1.176 for ''n'' = 4,5,6,7; *r_n\leq 122/100 = 1.22 for all ''n'' ≥ 8. During the MultiFit algorithm, the lower bound ''L'' is always a capacity for which it is impossible to pack ''S'' into ''n'' bins. Therefore, L < r_n\cdot OPT(S,n). Initially, the difference U-L is at most sum(''S'') / ''n'', which is at most OPT(S,n). After the MultiFit algorithm runs for ''k'' iterations, the difference shrinks ''k'' times by half, so U-L\leq (1/2)^k\cdot OPT(S,n). Therefore, U\leq (r_n + (1/2)^k)\cdot OPT(S,n). Therefore, the scheduling returned by MultiFit has makespan at most r_n + (1/2)^k times the optimal makespan. When k is sufficiently large, the approximation factor of MultiFit can be made arbitrarily close to r_n , which is at most 1.22. Later papers performed a more detailed analysis of MultiFit, and proved that its approximation ratio is at most 6/5=1.2, and later, at most 13/11≈1.182. The original proof of this missed some cases; presented a complete and simpler proof. The 13/11 cannot be improved: there is an instance with ''n''=13 in which the approximation ratio is exactly 13/11.


Lower bounds

For ''n''=4: the following shows that r_n\geq 20/17, which is tight. The inputs are 9,7,6,5,5, 4,4,4,4,4,4,4,4,4. They can be packed into 4 bins of capacity 17 as follows: * 9, 4, 4 * 7, 6, 4 * 5, 4, 4, 4 * 5, 4, 4, 4 But if we run FFD with bin capacity smaller than 20, then the filled bins are: * 9,7 does not fit* 6,5,5 does not fit* 4,4,4,4 does not fit* 4,4,4,4 * 4 Note that the sum in each of the first 4 bins is 16, so we cannot put another 4 inside it. Therefore, 4 bins are not sufficient. For ''n''=13: the following shows that r_n\geq 13/11, which is tight. The inputs can be packed into 13 bins of capacity 66 as follows: * 40,13,13 * 25,25,16 *25,24,17 But if we run FFD with bin capacity smaller than 66*13/11 = 78, then the filled bins are: * 40,25 *24, 24, 17 *17, 16, 16, 16 *13, 13, 13, 13, 13 *13 Note that the sum in each of the first 13 bins is 65, so we cannot put another 13 inside it. Therefore, 13 bins are not sufficient.


Performance with uniform machines

MultiFit can also be used in the more general setting called
uniform-machines scheduling Uniform machine scheduling (also called uniformly-related machine scheduling or related machine scheduling) is an optimization problem in computer science and operations research. It is a variant of optimal job scheduling. We are given ''n'' jobs '' ...
, where machines may have different processing speeds. When there are two uniform machines, the approximation factor is \sqrt/2. When MultiFit is combined with the
LPT algorithm Longest-processing-time-first (LPT) is a greedy algorithm for job scheduling. The input to the algorithm is a set of ''jobs'', each of which has a specific processing-time. There is also a number ''m'' specifying the number of ''machines'' that can ...
, the ratio improves to \sqrt+1/2.


Performance for maximizing the smallest sum

A dual goal to minimizing the largest sum (makespan) is maximizing the smallest sum. Deuermeyer, Friesen and Langston claim that MultiFit does not have a good approximation factor for this problem:
"In the solution of the makespan problem using MULTIFIT, it is easy to construct examples where one processor is never used. Such a solution is tolerable for the makespan problem, but is totally unacceptable for our problem ince the smallest sum is 0/sub>. Modifications of MULTIFIT can be devised which would be more suitable for our problem, but we could find none which produces a better worst-case bound than that of LPT."


Proof idea


Minimal counterexamples

The upper bounds on r_n are proved by contradiction. For any integers ''p'' ≥ ''q,'' if r_n > p/q, then there exists a (''p''/''q'')-counterexample, defined as an instance ''S'' and a number ''n'' of bins such that * ''S'' can be packed into ''n'' bins with capacity ''q''; * FFD does ''not'' manage to pack ''S'' into ''n'' bins with capacity ''p''. If there exists such a counterexample, then there also exists a ''minimal (p/q)-counterexample'', which is a (''p''/''q'')-counterexample with a smallest number of items in ''S'' and a smallest number of bins ''n''. In a minimal ''(p/q)''-counterexample, FFD packs all items in ''S'' except the last (smallest) one into ''n'' bins with capacity ''p''. Given a minimal ''(p/q)''-counterexample, denote by P1,...,P''n'' the (incomplete) FFD packing into these ''n'' bins with capacity ''p'', by P''n+1'' the bin containing the single smallest item, and by Q1,...,Q''n'' the (complete) optimal packing into ''n'' bins with capacity ''q''. The following lemmas can be proved: * No union of ''k'' subsets from is dominated by a union of ''k'' subsets from ("dominated" means that each item in the dominated subset is mapped to a weakly-larger item in the dominating subset). Otherwise we could get a smaller counterexample as follows. Delete all items in the P''i''. Clearly, the incomplete FFD packing now needs ''n''-''k'' bins, and still the smallest item (or an entire bin) remains unpacked. In the optimal packing Q''i'', exchange each item with its dominating item. Now, the ''k'' subsets Q''i'' are larger (probably larger than ''q''), but all other ''n''-''k'' subsets are smaller (in particular, at most ''q''). Therefore, after deleting all items in the ''Pi'', the remaining items can be packed into at most ''n''-''k'' bins of size ''q''. * Each of Q1,...,Qn contains at least 3 items. This is because each Q''i'' with a single item is dominated by the P''j'' that contains that item; for each Q''i'' with two items ''x'' and ''y'', if both ''x'' and ''y'' are in the same P''j'''','' then Q''i'' is dominated by this P''j''; Suppose x≥y, x is in some P''j'''','' and y is in some P''k'' to its right. This means that y did not fit into P''j''. But x+y ≤ q. This means that P''j'' must contain some item z ≥ y. So Q''i'' is dominated by P''j''''.'' Suppose x≥y, x is in some P''j'''','' and y is in some P''k'' to its left. This means that there must be a previous item z ≥ x. So Q''i'' is dominated by P''k''''.'' *Otherwise we had domination and, by the previous lemma, could get a smaller counterexample. * Each of P1,...,Pn contains at least 2 items. This is because, if some P''i'' contains only a single item, this implies that the last (smallest) item does not fit into it. This means that this single item must be alone in an optimal bundle, contradicting the previous lemma. *Let ''s'' be the size of the smallest item. Then s > \frac(p-q). ''Proof'': Since ''s'' does not fit into the first ''n'' bundles, we have sum(P_i) + s > p, so \sum_^n sum(P_i) + n\cdot s > n\cdot p. On the other hand, since all items fit into ''n'' bins of capacity ''q'', we have \sum_^n sum(P_i) + s \leq n\cdot q. Subtracting the inequalities gives s > \frac(p-q). *The size of every item is at most q - 2s. This is because there are at least 3 items in each optimal bin (with capacity ''q''). *The sum of items in every bin P1,...,Pn is larger than p - s; otherwise we could add the smallest item.


5/4 Upper bound

From the above lemmas, it is already possible to prove a loose upper bound r_n\leq 5/4 = 1.25. ''Proof''. Let ''S'', ''n'' be a minimal (5/4)-counterexample. The above lemmas imply that - * s > \frac(5-4) > 1. Since the optimal capacity is 4, no optimal bin can contain 4 or more items. Therefore, each optimal bin must contain at most 3 items, and the number of items is at most 3''n''. * The size of each item is at most 4 - 2s, and the size of each FFD bin is more than 5-s. If some FFD bin contained only two items, its sum would be at most 8 - 4s = 5 + (3-3s) - s < 5 - s; so each FFD bin must contain at least 3 items. But this means that FFD yields exactly ''n'' bins - a contradiction.


Structure of FFD packing

To prove tighter bounds, one needs to take a closer look at the FFD packing of the minimal (''p''/''q'')-counterexample. The items and FFD bins P1,...,P''n'' are termed as follows: * A ''regular item'' is an item added to some bin Pi, before the next bin Pi+1 was opened. Equivalently, a regular item is an item in Pi which is at least as large as every item in every bin Pj for ''j''>''i''. * A ''fallback item'' is an item added to some bin Pi, after the next bin Pi+1 was opened. Equivalently, a fallback item is an item in Pi which is smaller than the largest item in Pi+1. * A ''regular k-bin'' is a bin that contains ''k'' regular items and no fallback items. * A ''fallback k-bin'' is a bin that contains ''k'' regular items and some fallback items. The following lemmas follow immediately from these definitions and the operation of FFD. * If ''k''1<''k''2, then all ''k''1-bins are to the left of all ''k''2-bins. This is because all bins have the same capacity, so if more regular items fit into a bin, these items must be smaller, so they must be allocated later. * If Pi is a ''k''-bin, then the sum of the ''k'' regular items in Pi is larger than \frac\cdot p, since otherwise we could add another item before opening a new bin. * If Pi and Pi+1 are both ''k''-bins, and then the sum of the ''k'' regular items in Pi is at least as large as in Pi+1 (this is because the items are ordered by decreasing size). * All regular ''k''-bins are to the left of all fallback ''k''-bins. This is because all bins have the same capacity, so if more fallback items fit into a bin, these items must be smaller, so they must be allocated later. In a minimal counterexample, there are no regular 1-bins (since each bin contains at least 2 items), so by the above lemmas, the FFD bins P1,...,P''n'' are ordered by type: * Zero or more fallback 1-bins; * Then, zero or more regular 2-bins; * Then, zero or more fallback 2-bins; * Then, zero or more regular 3-bins; * Then, zero or more fallback 3-bins; * and so on.


1.22 upper bound

The upper bound r_n\leq 1.22 is proved by assuming a minimal (122/100)-counterexample. Each item is given a ''weight'' based on its size and its bin in the FFD packing. The weights are determined such that the total weight in each FFD bin is at least ''x'', and the total weight in almost each optimal bin is at most ''x'' (for some predetermined ''x''). This implies that the number of FFD bins is at most the number of optimal bins, which contradicts the assumption that it is a counterexample. By the lemmas above, we know that: * The size of the smallest item satisfies ''s'' > ''p''-''q'' = 22, so s = 22+''D'' for some ''D''>0. *Each optimal bin contains at most 4 items (floor(100/22)), and each FFD bin contains at most 5 items (floor(122/22)). * The size of every item is at most ''q''-2''s'' = 56-2''D''. *The sum in each FFD bin is larger than ''p''-''s'' = 100-''D''. *There are no 1-bins, since in a 1-bin, the size of the regular item must be at least ''p''/2=61, while here the size of every item is less than 56. If ''D''>4, the size of each item is larger than 26, so each optimal bin (with capacity 100) must contain at most 3 items. Each item is smaller than 56-2''D'' and each FFD bin has a sum larger than 100-''D'', so each FFD bin must contain at least 3 items. Therefore, there are at most ''n'' FFD bins - contradiction. So from now on, we assume ''D≤''4. The items are assigned types and weights as follows. * The two items in each regular 2-bin except maybe the last one have a size larger than (100-''D'')/2 each. All such items are called ''type-X2'', and assigned a weight of (100-''D'')/2. The last 2-regular bin is a special case: if both its items have a size larger than (100-''D'')/2, then they are ''type-X2'' too; otherwise, they are called ''type-Z'', and their weight equals their size. * The two regular items in each fallback 2-bin have a total size larger than 2*122/3; they are called ''type-Y2'', and their weight equals their size minus ''D''. * The three items in each regular 3-bin except maybe the last one have a size larger than (100-''D'')/3 each. All such items are called ''type-X3'', and assigned a weight of (100-''D'')/3. The last 3-regular bin is a special case: if all items in it have a size larger than (100-''D'')/3, then they are ''type-X3'' too; otherwise, they are called ''type-Z'' and their weight equals their size. * The three regular items in each fallback 3-bin have a total size larger than 3*122/4; they are called ''type-Y3'', and their weight equals their size minus ''D''. * The four items in each regular 4-bin except maybe the last one have a size larger than (100-''D'')/4 each. All such items are called ''type-X4'', and assigned a weight of (100-''D'')/4. The last 4-regular bin is a special case: if all items in it have a size larger than (100-''D'')/4, then they are ''type-X4'' too; otherwise, they are called ''type-Z'' and their weight equals their size. * The remaining items (including all fallback items in fallback 2-bins and 3-bins, all fallback 4-bins, and all other 5-item bins) are all called ''type-X5'', and their weight equals 22 (if ''D'' ≤ 12/5) or (100-''D'')/4 (otherwise). The threshold 12/5 was computed such that the weight is always at most 22+''D'', so that the weight is always smaller than the size. Note that the weight of each item is at most its size (the weight can be seen as the size "rounded down"). Still, the total weight of items in every FFD bin is at least 100-''D'': * For regular 2-bins, regular 3-bins and regular 4-bins: ** For the non-last ones, this is immediate. ** The last such bins contain only Z-type items, whose weight equals their size, so the total weight of these bins equals their total size, which is more than 100-''D''. * Fallback 2-bins contain two type-Y2 items with total weight larger than 2*122/3-2''D'', plus at least one type-X5 item with weight at least 22 (if ''D'' ≤ 12/5) or (100-''D'')/4 (otherwise). In both cases the total weight is more than 100-''D''. * Fallback 3-bins contain three type-Y3 items with total weight larger than 3*122/4-3''D'', plus at least one type-X5 item with weight at least 22. So the total weight is more than 3*122/4+22-3''D'' = 113.5-3''D'' ≥ 105.5-''D'' > 100-''D'', since ''D≤''4. * 5-item bins contain 5 items with size at least 22+''D'' and weight at least 22, so their total weight is obviously more than 100-''D''. The total weight of items in most optimal bins is at most 100-''D'': * This is clear for any optimal bin containing a type''-Y2'' item or a type''-Y3'' item, since their weight is their size minus ''D'', the weights of other items is at most their size, and the total size of an optimal bin is at most 100. *For optimal bins containing only type''-X2'', type''-X3'', type''-X4'' and type''-X5'' items, it is possible to check all possible configurations (all combinations that fit into an optimal bin of size 100), and verify that the total weight in each configuration is at most 100-''D''. *Optimal bins containing type-Z items might have a total weight larger than 100-''D''. Since the total weight is at most 100, there is an "excess weight" of at most ''D'' for each such bin. However, the number of type-Z items is limited: **If D > 12/5, then there are at most 5 type-Z items (2 in the last regular 2-bin and 3 in the last regular 3-bin; the items in the last regular 4-bin are all type-X4). Therefore, the excess weight is at most 5''D''. Comparing the total weight of FFD vs. optimal bins yields ''s'' < 5''D ≤'' 20 < 22, a contradiction. **Otherwise, there are at most 9 type-Z items (2+3+4). Therefore, the excess weight is at most 9''D''. Comparing the total weight of FFD vs. optimal bins yields ''s'' < 9''D ≤'' 108/5 < 22, a contradiction.


13/11 upper bound

The upper bound r_n\leq 13/11\approx 1.182 is proved by assuming a minimal ((120-3''d'')/100)-counterexample, with some ''d<''20/33, and deriving a contradiction.


Non-monotonicity

MultiFit is not ''monotone'' in the following sense: it is possible that an input ''decreases'' while the max-sum in the partition returned by MultiFit ''increases''. As an example, suppose ''n''=3 and the input numbers are:
44, 24, 24, 22, 21, 17, 8, 8, 6, 6.
FFD packs these inputs into 3 bins of capacity 60 (which is optimal): * 44, 8, 8; * 24, 24, 6, 6; * 22, 21, 17. But if the "17" becomes "16", then FFD with capacity 60 needs 4 bins: * 44, 16; * 24, 24, 8; * 22, 21, 8, 6; * 6. so MultiFit must increase the capacity, for example, to 62: * 44, 16; * 24, 24, 8, 6; * 22, 21, 8, 6.


Generalization: fair allocation of chores

Multifit has been extended to the more general problem of
maximin-share Maximin share (MMS) is a criterion of fair item allocation. Given a set of items with different values, the ''1-out-of-n maximin-share'' is the maximum value that can be gained by partitioning the items into ''n'' parts and taking the part with th ...
allocation of chores. In this problem, S is a set of chores, and there are ''n'' agents who assign potentially ''different'' valuations to the chores. The goal is to give to each agent, a set of chores worth at most ''r'' times the maximum value in an optimal scheduling based on ''i'''s valuations. A naive approach is to let each agent in turn use the MultiFit algorithm to calculate the threshold, and then use the algorithm where each agent uses his own threshold. If this approach worked, we would get an approximation of 13/11. However, this approach fails due to the non-monotonicity of FFD. Here is an example. Suppose there are four agents, and they have valuations of two types: Both types can partition the chores into 4 parts of total value 75. Type A: * 51, 12, 12 * 27.5, 27.5, 10, 10 * 27.5, 27.5, 10, 10 * 25, 10, 10, 10, 10, 10 Type B: * 51, 24 * 27.5, 27.5, 20 * 27.5, 27.5, 20 * 8.33 If all four agents are of the same, then FFD with threshold 75 fills the 4 optimal bins. But suppose there is one agent of type B, and the others are of type A. Then, in the first round, the agent of type B takes the bundle 51, 24 (the other agents cannot take it since for them the values are 51,25 whose sum is more than 75).In the following rounds, the following bundles are filled for the type A agents: * 27.5, 27.5, 12 he sum is 67 - there is no room for another 10* 27.5, 27.5, 12 he sum is 67 - there is no room for another 10* 10, 10, 10, 10, 10, 10, 10 he sum is 70 - there is no room for another 10 so the last two chores remain unallocated. Using a more sophisticated threshold calculation, it is possible to guarantee to each agent at most 11/9≈1.22 of his optimal value if the optimal value is known, and at most 5/4≈1.25 of his optimal value (using a polynomial time algorithm) if the optimal value is not known.


Implementations

* Python: Th
prtpy package
contain
an implementation of multifit


References

{{reflist Number partitioning Optimal scheduling Bin packing