The set cover problem is a classical question in
combinatorics
Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many a ...
,
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
,
operations research
Operations research ( en-GB, operational research) (U.S. Air Force Specialty Code: Operations Analysis), often shortened to the initialism OR, is a discipline that deals with the development and application of analytical methods to improve dec ...
, and
complexity theory. It is one of
Karp's 21 NP-complete problems shown to be
NP-complete
In computational complexity theory, a problem is NP-complete when:
# it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryin ...
in 1972.
Given a
set of elements (called the
universe
The universe is all of space and time and their contents, including planets, stars, galaxies, and all other forms of matter and energy. The Big Bang theory is the prevailing cosmological description of the development of the universe. A ...
) and a collection of sets whose
union equals the universe, the set cover problem is to identify the smallest sub-collection of whose union equals the universe. For example, consider the universe and the collection of sets Clearly the union of is . However, we can cover all of the elements with the following, smaller number of sets:
More formally, given a universe
and a family
of subsets of
, a ''cover'' is a subfamily
of sets whose union is
. In the set covering
decision problem
In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm whethe ...
, the input is a pair
and an integer
; the question is whether there is a set covering of size
or less. In the set covering
optimization problem
In mathematics, computer science and economics, an optimization problem is the problem of finding the ''best'' solution from all feasible solutions.
Optimization problems can be divided into two categories, depending on whether the variables ...
, the input is a pair
, and the task is to find a set covering that uses the fewest sets.
The decision version of set covering is
NP-complete
In computational complexity theory, a problem is NP-complete when:
# it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryin ...
, and the optimization/search version of set cover is
NP-hard
In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
. It is a problem "whose study has led to the development of fundamental techniques for the entire field" of
approximation algorithms
In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solut ...
. If each set is assigned a
cost
In Production (economics), production, research, retail, and accounting, a cost is the value of money that has been used up to produce something or deliver a service, and hence is not available for use anymore. In business, the cost may be one o ...
, it becomes a ''weighted'' set cover problem.
Integer linear program formulation
The minimum set cover problem can be formulated as the following
integer linear program (ILP).
This ILP belongs to the more general class of ILPs for
covering problems.
The
integrality gap
In mathematics, the relaxation of a (mixed) integer linear program is the problem that arises by removing the integrality constraint of each variable.
For example, in a 0–1 integer program, all constraints are of the form
:x_i\in\.
The relax ...
of this ILP is at most
, so its
relaxation gives a factor-
approximation algorithm
In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solu ...
for the minimum set cover problem (where
is the size of the universe).
In weighted set cover, the sets are assigned weights. Denote the weight of set
by
. Then the integer linear program describing weighted set cover is identical to the one given above, except that the objective function to minimize is
.
Hitting set formulation
Set covering is equivalent to the hitting set problem. That is seen by observing that an instance of set covering can
be viewed as an arbitrary
bipartite graph
In the mathematical field of graph theory, a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets U and V, that is every edge connects a vertex in U to one in V. Vertex sets U and V ar ...
, with the universe represented by vertices on the left, the sets represented by vertices on the
right, and edges representing the inclusion of elements in sets. The task is then to find a minimum cardinality subset of right-vertices which covers all of the left-vertices, which is precisely the Hitting set problem.
Greedy algorithm
There is a
greedy algorithm
A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locall ...
for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. This method can be implemented in time linear in the sum of sizes of the input sets, using a
bucket queue to prioritize the sets. It achieves an approximation ratio of
, where
is the size of the set to be covered. In other words, it finds a covering that may be
times as large as the minimum one, where
is the
-th
harmonic number:
This greedy algorithm actually achieves an approximation ratio of
where
is the maximum cardinality set of
. For
dense instances, however, there exists a
-approximation algorithm for every
.
There is a standard example on which the greedy algorithm achieves an approximation ratio of
.
The universe consists of
elements. The set system consists of
pairwise disjoint sets
with sizes
respectively, as well as two additional disjoint sets
,
each of which contains half of the elements from each
. On this input, the greedy algorithm takes the sets
, in that order, while the optimal solution consists only of
and
.
An example of such an input for
is pictured on the right.
Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms
(see
Inapproximability results below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly
.
Low-frequency systems
If each element occurs in at most sets, then a solution can be found in polynomial time that approximates the optimum to within a factor of using
LP relaxation.
If the constraint
is replaced by
for all in
in the integer linear program shown
above, then it becomes a (non-integer) linear program . The algorithm can be described as follows:
# Find an optimal solution for the program using some polynomial-time method of solving linear programs.
# Pick all sets for which the corresponding variable
has value at least 1/ in the solution .
Inapproximability results
When
refers to the size of the universe, showed that set covering cannot be approximated in polynomial time to within a factor of
, unless NP has
quasi-polynomial time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
algorithms.
Feige (1998) improved this lower bound to
under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm. established a lower bound
of
, where
is a certain constant, under the weaker assumption that P
NP.
A similar result with a higher value of
was recently proved by . showed optimal inapproximability by proving that it cannot be approximated to
unless P
NP.
Weighted set cover
Relaxing the integer linear program for weighted set cover stated
above, one may use
randomized rounding
Within computer science and operations research,
many combinatorial optimization problems are computationally intractable to solve exactly (to optimality).
Many such problems do admit fast (polynomial time) approximation algorithms—that is, algor ...
to get an
-factor approximation. Non weighted set cover can be adapted to the weighted case.
Related problems
* Hitting set is an equivalent reformulation of Set Cover.
*
Vertex cover
In graph theory, a vertex cover (sometimes node cover) of a graph is a set of vertices that includes at least one endpoint of every edge of the graph.
In computer science, the problem of finding a minimum vertex cover is a classical optimiza ...
is a special case of Hitting Set.
*
Edge cover
In graph theory, an edge cover of a graph is a set of edges such that every vertex of the graph is incident to at least one edge of the set.
In computer science, the minimum edge cover problem is the problem of finding an edge cover of minimum ...
is a special case of Set Cover.
*
Geometric set cover is a special case of Set Cover when the universe is a set of points in
and the sets are induced by the intersection of the universe and geometric shapes (e.g., disks, rectangles).
*
Set packing Set packing is a classical NP-complete problem in computational complexity theory and combinatorics, and was one of Karp's 21 NP-complete problems. Suppose one has a finite set ''S'' and a list of subsets of ''S''. Then, the set packing problem ask ...
*
Maximum coverage problem is to choose at most k sets to cover as many elements as possible.
*
Dominating set
In graph theory, a dominating set for a graph is a subset of its vertices, such that any vertex of is either in , or has a neighbor in . The domination number is the number of vertices in a smallest dominating set for .
The dominating set ...
is the problem of selecting a set of vertices (the dominating set) in a graph such that all other vertices are adjacent to at least one vertex in the dominating set. The Dominating set problem was shown to be NP complete through a reduction from Set cover.
*
Exact cover problem
In the mathematical field of combinatorics, given a collection of subsets of a set , an exact cover is a subcollection of such that each element in is contained in ''exactly one'' subset in . In other words, is a partition of consisting of ...
is to choose a set cover with no element included in more than one covering set.
* Red Blue Set Cover.
*
Set-cover abduction
Abductive reasoning (also called abduction,For example: abductive inference, or retroduction) is a form of logical inference formulated and advanced by American philosopher Charles Sanders Peirce beginning in the last third of the 19th centu ...
.
Notes
References
* .
*
* .
*
* .
* .
* .
*
*
*
External links
Benchmarks with Hidden Optimum Solutions for Set Covering, Set Packing and Winner Determination
{{DEFAULTSORT:Set Cover Problem
Families of sets
NP-complete problems
Linear programming
Approximation algorithms
Covering problems