
The set cover problem is a classical question in
combinatorics
Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ...
,
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
,
operations research
Operations research () (U.S. Air Force Specialty Code: Operations Analysis), often shortened to the initialism OR, is a branch of applied mathematics that deals with the development and application of analytical methods to improve management and ...
, and
complexity theory.
Given a
set of elements (henceforth referred to as the
universe
The universe is all of space and time and their contents. It comprises all of existence, any fundamental interaction, physical process and physical constant, and therefore all forms of matter and energy, and the structures they form, from s ...
, specifying all possible elements under consideration) and a collection, referred to as , of a given subsets whose
union equals the universe, the set cover problem is to identify a smallest sub-collection of whose union equals the universe.
For example, consider the universe, and the collection of sets In this example, is equal to 4, as there are four subsets that comprise this collection. The union of is equal to . However, we can cover all elements with only two sets: , see picture, but not with only one set. Therefore, the solution to the set cover problem for this and has size 2.
More formally, given a universe
and a family
of subsets of
, a set cover is a subfamily
of sets whose union is
.
* In the set cover
decision problem
In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question on a set of input values. An example of a decision problem is deciding whether a given natura ...
, the input is a pair
and an integer
; the question is whether there is a set cover of size
or less.
* In the set cover
optimization problem
In mathematics, engineering, computer science and economics
Economics () is a behavioral science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goo ...
, the input is a pair
, and the task is to find a set cover that uses the fewest sets.
The decision version of set covering is
NP-complete
In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''.
Somewhat more precisely, a problem is NP-complete when:
# It is a decision problem, meaning that for any ...
. It is one of
Karp's 21 NP-complete problems shown to be
NP-complete
In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''.
Somewhat more precisely, a problem is NP-complete when:
# It is a decision problem, meaning that for any ...
in 1972. The optimization/search version of set cover is
NP-hard. It is a problem "whose study has led to the development of fundamental techniques for the entire field" of
approximation algorithms
In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned sol ...
.
Variants
In the weighted set cover problem, each set is assigned a positive weight (representing its cost), and the goal is to find a set cover with a smallest weight. The usual (unweighted) set cover corresponds to all sets having a weight of 1.
In the fractional set cover problem, it is allowed to select fractions of sets, rather than entire sets. A fractional set cover is an assignment of a fraction (a number in
,1 to each set in
, such that for each element ''x'' in the universe, the sum of fractions of sets that contain ''x'' is at least 1. The goal is to find a fractional set cover in which the sum of fractions is as small as possible. Note that a (usual) set cover is equivalent to a fractional set cover in which all fractions are either 0 or 1; therefore, the size of the smallest fractional cover is at most the size of the smallest cover, but may be smaller. For example, consider the universe and the collection of sets The smallest set cover has a size of 2, e.g. But there is a fractional set cover of size 1.5, in which a 0.5 fraction of each set is taken.
Linear program formulation
The set cover problem can be formulated as the following
integer linear program (ILP).
For a more compact representation of the covering constraint, one can define an
incidence matrix ''
'', where each row corresponds to an element and each column corresponds to a set, and ''
'' if element e is in set s, and ''
'' otherwise. Then, the covering constraint can be written as
.
Weighted set cover is described by a program identical to the one given above, except that the objective function to minimize is
, where
is the weight of set
.
Fractional set cover is described by a program identical to the one given above, except that
can be non-integer, so the last constraint is replaced by
.
This linear program belongs to the more general class of LPs for
covering problems, as all the coefficients in the objective function and both sides of the constraints are non-negative. The
integrality gap of the ILP is at most
(where
is the size of the universe). It has been shown that its
relaxation indeed gives a factor-
approximation algorithm
In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned sol ...
for the minimum set cover problem. See
randomized rounding#setcover for a detailed explanation.
Hitting set formulation
The set cover problem is equivalent to the hitting set problem. A subset
of
is called a hitting set when
for all
(i.e.,
intersects or “hits” all subsets in
). The hitting set problem is to find a minimum hitting set
for a given
and
.
To show that the problems are equivalent, for a universe
of size
and collection of sets
of size
, construct
and
. Then a set cover
of
is equivalent to a hitting set
of
where
, and vice versa.
This equivalence can also be visualized by representing the problem as a
bipartite graph of
vertices, with
vertices on the left representing elements of
, and
vertices on the right representing elements of
, and edges representing set membership (i.e., there is an edge between the
-th vertex on the left and the
-th vertex of the right
iff. ). Then a set cover is a subset
of right vertices such that each left vertex is adjacent to at least one member of
, while a hitting set is a subset
of left vertices such that each right vertex is adjacent to at least one member of
. These definitions are exactly the same except that ''left'' and ''right'' are swapped. But there is nothing special about the sides in the bipartite graph; we could have put the elements of
on the right side, and the elements of
on the left side, creating a graph that is a mirror image of the one described above. This shows that set covers in the original graph are equivalent to hitting sets in the mirrored graph, and vice versa.
In the field of
computational geometry, a hitting set for a collection of geometrical objects is also called a stabbing set or piercing set.
Greedy algorithm
There is a
greedy algorithm for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. This method can be implemented in time linear in the sum of sizes of the input sets, using a
bucket queue to prioritize the sets. It achieves an approximation ratio of
, where
is the size of the set to be covered. In other words, it finds a covering that may be
times as large as the minimum one, where
is the
-th
harmonic number
In mathematics, the -th harmonic number is the sum of the reciprocals of the first natural numbers:
H_n= 1+\frac+\frac+\cdots+\frac =\sum_^n \frac.
Starting from , the sequence of harmonic numbers begins:
1, \frac, \frac, \frac, \frac, \dot ...
:
This greedy algorithm actually achieves an approximation ratio of
where
is the maximum cardinality set of
. For
dense instances, however, there exists a
-approximation algorithm for every
.
There is a standard example on which the greedy algorithm achieves an approximation ratio of
.
The universe consists of
elements. The set system consists of
pairwise disjoint sets
with sizes
respectively, as well as two additional disjoint sets
,
each of which contains half of the elements from each
. On this input, the greedy algorithm takes the sets
, in that order, while the optimal solution consists only of
and
.
An example of such an input for
is pictured on the right.
Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms
(see
Inapproximability results below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly
.
Low-frequency systems
If each element occurs in at most sets, then a solution can be found in polynomial time that approximates the optimum to within a factor of using
LP relaxation.
If the constraint
is replaced by
for all in
in the integer linear program shown
above, then it becomes a (non-integer) linear program . The algorithm can be described as follows:
# Find an optimal solution for the program using some polynomial-time method of solving linear programs.
# Pick all sets for which the corresponding variable
has value at least 1/ in the solution .
Inapproximability results
When
refers to the size of the universe, showed that set covering cannot be approximated in polynomial time to within a factor of
, unless NP has
quasi-polynomial time algorithms.
Feige (1998) improved this lower bound to
under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm. established a lower bound
of
, where
is a certain constant, under the weaker assumption that P
NP.
A similar result with a higher value of
was recently proved by . showed optimal inapproximability by proving that it cannot be approximated to
unless P
NP.
In low-frequency systems, proved it is NP-hard to approximate set cover to better than
.
If the
Unique games conjecture
In computational complexity theory, the unique games conjecture (often referred to as UGC) is a conjecture made by Subhash Khot in 2002. The conjecture postulates that the problem of determining the approximate ''value'' of a certain type of g ...
is true, this can be improved to
as proven by .
proves that set cover instances with sets of size at most
cannot be approximated to a factor better than
unless P
NP, thus making the approximation of
of the greedy algorithm essentially tight in this case.
Weighted set cover
Relaxing the integer linear program for weighted set cover stated
above, one may use
randomized rounding
In computer science and operations research, randomized rounding
is a widely used approach for designing and analyzing approximation algorithms.
Many combinatorial optimization problems are computationally intractability (complexity), intrac ...
to get an
-factor approximation. Non weighted set cover can be adapted to the weighted case.
Fractional set cover
Related problems
* Hitting set is an equivalent reformulation of Set Cover.
*
Vertex cover is a special case of Hitting Set.
*
Edge cover is a special case of Set Cover.
*
Geometric set cover is a special case of Set Cover when the universe is a set of points in
and the sets are induced by the intersection of the universe and geometric shapes (e.g., disks, rectangles).
*
Set packing
*
Maximum coverage problem is to choose at most k sets to cover as many elements as possible.
*
Dominating set
In graph theory, a dominating set for a Graph (discrete mathematics), graph is a subset of its vertices, such that any vertex of is in , or has a neighbor in . The domination number is the number of vertices in a smallest dominating set for ...
is the problem of selecting a set of vertices (the dominating set) in a graph such that all other vertices are adjacent to at least one vertex in the dominating set. The Dominating set problem was shown to be NP complete through a reduction from Set cover.
*
Exact cover problem is to choose a set cover with no element included in more than one covering set.
* Red-blue set cover.
*
Set-cover abduction.
*
Monotone dualization is a computational problem equivalent to either listing all minimal hitting sets or listing all minimal set covers of a given set family.
Notes
References
* .
*
* .
*
* .
* .
* .
*
*
*
*
*
*
External links
Benchmarks with Hidden Optimum Solutions for Set Covering, Set Packing and Winner Determination
{{DEFAULTSORT:Set Cover Problem
Families of sets
NP-complete problems
Linear programming
Approximation algorithms
Covering problems