
The knapsack problem is the following problem in
combinatorial optimization
Combinatorial optimization is a subfield of mathematical optimization that consists of finding an optimal object from a finite set of objects, where the set of feasible solutions is discrete or can be reduced to a discrete set. Typical combina ...
:
:''Given a set of items, each with a weight and a value, determine which items to include in the collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.''
It derives its name from the problem faced by someone who is constrained by a fixed-size
knapsack and must fill it with the most valuable items. The problem often arises in
resource allocation
In economics, resource allocation is the assignment of available resources to various uses. In the context of an entire economy, resources can be allocated by various means, such as markets, or planning.
In project management, resource allocatio ...
where the decision-makers have to choose from a set of non-divisible projects or tasks under a fixed budget or time constraint, respectively.
The knapsack problem has been studied for more than a century, with early works dating as far back as 1897.
The
subset sum problem
The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset S of integers and a target-sum T, and the question is to decide whether any subset of the integers sum to precisely T''.'' ...
is a special case of the decision and 0-1 problems where each kind of item, the weight equals the value:
. In the field of
cryptography
Cryptography, or cryptology (from "hidden, secret"; and ''graphein'', "to write", or ''-logy, -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of Adversary (cryptography), ...
, the term ''knapsack problem'' is often used to refer specifically to the subset sum problem. The subset sum problem is one of
Karp's 21 NP-complete problems.
Applications
Knapsack problems appear in real-world decision-making processes in a wide variety of fields, such as finding the least wasteful way to cut raw materials, selection of
investment
Investment is traditionally defined as the "commitment of resources into something expected to gain value over time". If an investment involves money, then it can be defined as a "commitment of money to receive more money later". From a broade ...
s and
portfolios, selection of assets for
asset-backed securitization, and generating keys for the
Merkle–Hellman and other
knapsack cryptosystems.
One early application of knapsack algorithms was in the construction and scoring of tests in which the test-takers have a choice as to which questions they answer. For small examples, it is a fairly simple process to provide the test-takers with such a choice. For example, if an exam contains 12 questions each worth 10 points, the test-taker need only answer 10 questions to achieve a maximum possible score of 100 points. However, on tests with a heterogeneous distribution of point values, it is more difficult to provide choices. Feuerman and Weiss proposed a system in which students are given a heterogeneous test with a total of 125 possible points. The students are asked to answer all of the questions to the best of their abilities. Of the possible subsets of problems whose total point values add up to 100, a knapsack algorithm would determine which subset gives each student the highest possible score.
A 1999 study of the Stony Brook University Algorithm Repository showed that, out of 75 algorithmic problems related to the field of combinatorial algorithms and algorithm engineering, the knapsack problem was the 19th most popular and the third most needed after
suffix tree
In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particu ...
s and the
bin packing problem
The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has m ...
.
Definition
The most common problem being solved is the 0-1 knapsack problem, which restricts the number ''
'' of copies of each kind of item to zero or one. Given a set of ''
'' items numbered from 1 up to ''
'', each with a weight ''
'' and a value ''
'', along with a maximum weight capacity ''
'',
: maximize
: subject to
and
.
Here ''
'' represents the number of instances of item ''
'' to include in the knapsack. Informally, the problem is to maximize the sum of the values of the items in the knapsack so that the sum of the weights is less than or equal to the knapsack's capacity.
The bounded knapsack problem (BKP) removes the restriction that there is only one of each item, but restricts the number
of copies of each kind of item to a maximum non-negative integer value
:
: maximize
: subject to
and
The unbounded knapsack problem (UKP) places no upper bound on the number of copies of each kind of item and can be formulated as above except that the only restriction on
is that it is a non-negative integer.
: maximize
: subject to
and
One example of the unbounded knapsack problem is given using the figure shown at the beginning of this article and the text "if any number of each book is available" in the caption of that figure.
Computational complexity
The knapsack problem is interesting from the perspective of computer science for many reasons:
* The
decision problem
In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question on a set of input values. An example of a decision problem is deciding whether a given natura ...
form of the knapsack problem (''Can a value of at least ''V'' be achieved without exceeding the weight ''W''?'') is
NP-complete
In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''.
Somewhat more precisely, a problem is NP-complete when:
# It is a decision problem, meaning that for any ...
, thus there is no known algorithm that is both correct and fast (polynomial-time) in all cases.
* There is no known polynomial algorithm which can tell, given a solution, whether it is optimal (which would mean that there is no solution with a larger ''V''). This problem is
co-NP-complete
In complexity theory, computational problems that are co-NP-complete are those that are the hardest problems in co-NP, in the sense that any problem in co-NP can be reformulated as a special case of any co-NP-complete problem with only polynomial ...
.
* There is a
pseudo-polynomial time
In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is a polynomial in the ''numeric value'' of the input (the largest integer present in the input)—but not necessarily in the ''length'' of ...
algorithm using
dynamic programming.
* There is a
fully polynomial-time approximation scheme, which uses the pseudo-polynomial time algorithm as a subroutine, described below.
* Many cases that arise in practice, and "random instances" from some distributions, can nonetheless be solved exactly.
There is a link between the "decision" and "optimization" problems in that if there exists a polynomial algorithm that solves the "decision" problem, then one can find the maximum value for the optimization problem in polynomial time by applying this algorithm iteratively while increasing the value of k. On the other hand, if an algorithm finds the optimal value of the optimization problem in polynomial time, then the decision problem can be solved in polynomial time by comparing the value of the solution output by this algorithm with the value of k. Thus, both versions of the problem are of similar difficulty.
One theme in research literature is to identify what the "hard" instances of the knapsack problem look like,
[Pisinger, D. 2003]
Where are the hard knapsack problems?
Technical Report 2003/08, Department of Computer Science, University of Copenhagen, Copenhagen, Denmark. or viewed another way, to identify what properties of instances in practice might make them more amenable than their worst-case NP-complete behaviour suggests.
The goal in finding these "hard" instances is for their use in
public-key cryptography
Public-key cryptography, or asymmetric cryptography, is the field of cryptographic systems that use pairs of related keys. Each key pair consists of a public key and a corresponding private key. Key pairs are generated with cryptographic alg ...
systems, such as the
Merkle–Hellman knapsack cryptosystem The Merkle–Hellman knapsack cryptosystem was one of the earliest public key cryptosystems. It was published by Ralph Merkle and Martin Hellman in 1978. A polynomial time attack was published by Adi Shamir in 1984. As a result, the cryptosystem is ...
. More generally, better understanding of the structure of the space of instances of an optimization problem helps to advance the study of the particular problem and can improve algorithm selection.
Furthermore, notable is the fact that the hardness of the knapsack problem depends on the form of the input. If the weights and profits are given as integers, it is
weakly NP-complete, while it is
strongly NP-complete In computational complexity, strong NP-completeness is a property of computational problems that is a special case of NP-completeness. A general computational problem may have numerical parameters. For example, the input to the bin packing proble ...
if the weights and profits are given as rational numbers.
However, in the case of rational weights and profits it still admits a
fully polynomial-time approximation scheme.
Unit-cost models
The NP-hardness of the Knapsack problem relates to computational models in which the size of integers matters (such as the
Turing machine
A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...
). In contrast,
decision trees
A decision tree is a decision support system, decision support recursive partitioning structure that uses a Tree (graph theory), tree-like Causal model, model of decisions and their possible consequences, including probability, chance event ou ...
count each decision as a single step. Dobkin and Lipton show an
lower bound on linear decision trees for the knapsack problem, that is, trees where decision nodes test the sign of
affine function
In Euclidean geometry, an affine transformation or affinity (from the Latin, ''wikt:affine, affinis'', "connected with") is a geometric transformation that preserves line (geometry), lines and parallel (geometry), parallelism, but not necessarily ...
s. This was generalized to algebraic decision trees by Steele and Yao.
If the elements in the problem are
real number
In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
s or
rationals, the decision-tree lower bound extends to the
real random-access machine model with an instruction set that includes addition, subtraction and multiplication of real numbers, as well as comparison and either division or remaindering ("floor"). This model covers more algorithms than the algebraic decision-tree model, as it encompasses algorithms that use indexing into tables. However, in this model all program steps are counted, not just decisions.
An ''upper bound'' for a decision-tree model was given by Meyer auf der Heide who showed that for every ''n'' there exists an -deep linear decision tree that solves the
subset-sum problem with ''n'' items. Note that this does not imply any upper bound for an algorithm that should solve the problem for ''any given n''.
Solving
Several algorithms are available to solve knapsack problems, based on the dynamic programming approach,
the
branch and bound
Branch and bound (BB, B&B, or BnB) is a method for solving optimization problems by breaking them down into smaller sub-problems and using a bounding function to eliminate sub-problems that cannot contain the optimal solution.
It is an algorithm ...
approach
[S. Martello, P. Toth, Knapsack Problems: Algorithms and Computer Implementations,
John Wiley and Sons, 1990] or
hybridizations of both approaches.
[S. Martello, D. Pisinger, P. Toth]
Dynamic programming and strong bounds for the 0-1 knapsack problem
''Manag. Sci.'', 45:414–424, 1999.
Dynamic programming in-advance algorithm
The unbounded knapsack problem (UKP) places no restriction on the number of copies of each kind of item. Besides, here we assume that
:
: subject to
and
Observe that