Hitting Set Problem
   HOME

TheInfoList



OR:

The set cover problem is a classical question in combinatorics,
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
,
operations research Operations research ( en-GB, operational research) (U.S. Air Force Specialty Code: Operations Analysis), often shortened to the initialism OR, is a discipline that deals with the development and application of analytical methods to improve decis ...
, and complexity theory. It is one of
Karp's 21 NP-complete problems In computational complexity theory, Karp's 21 NP-complete problems are a set of computational problems which are NP-complete. In his 1972 paper, "Reducibility Among Combinatorial Problems", Richard Karp used Stephen Cook's 1971 theorem that the b ...
shown to be
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...
in 1972. Given a
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
of elements (called the
universe The universe is all of space and time and their contents, including planets, stars, galaxies, and all other forms of matter and energy. The Big Bang theory is the prevailing cosmological description of the development of the universe. ...
) and a collection of sets whose
union Union commonly refers to: * Trade union, an organization of workers * Union (set theory), in mathematics, a fundamental operation on sets Union may also refer to: Arts and entertainment Music * Union (band), an American rock group ** ''Un ...
equals the universe, the set cover problem is to identify the smallest sub-collection of whose union equals the universe. For example, consider the universe and the collection of sets Clearly the union of is . However, we can cover all of the elements with the following, smaller number of sets: More formally, given a universe \mathcal and a family \mathcal of subsets of \mathcal, a ''cover'' is a subfamily \mathcal\subseteq\mathcal of sets whose union is \mathcal. In the set covering
decision problem In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm wheth ...
, the input is a pair (\mathcal,\mathcal) and an integer k; the question is whether there is a set covering of size k or less. In the set covering
optimization problem In mathematics, computer science and economics, an optimization problem is the problem of finding the ''best'' solution from all feasible solutions. Optimization problems can be divided into two categories, depending on whether the variables ...
, the input is a pair (\mathcal,\mathcal), and the task is to find a set covering that uses the fewest sets. The decision version of set covering is
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...
, and the optimization/search version of set cover is NP-hard. It is a problem "whose study has led to the development of fundamental techniques for the entire field" of
approximation algorithms In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned sol ...
. If each set is assigned a
cost In production, research, retail, and accounting, a cost is the value of money that has been used up to produce something or deliver a service, and hence is not available for use anymore. In business, the cost may be one of acquisition, in whic ...
, it becomes a ''weighted'' set cover problem.


Integer linear program formulation

The minimum set cover problem can be formulated as the following
integer linear program An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objectiv ...
(ILP). This ILP belongs to the more general class of ILPs for covering problems. The
integrality gap In mathematics, the relaxation of a (mixed) integer linear program is the problem that arises by removing the integrality constraint of each variable. For example, in a 0–1 integer program, all constraints are of the form :x_i\in\. The relax ...
of this ILP is at most \scriptstyle \log n, so its relaxation gives a factor-\scriptstyle \log n approximation algorithm for the minimum set cover problem (where \scriptstyle n is the size of the universe). In weighted set cover, the sets are assigned weights. Denote the weight of set s\in \mathcal by w_. Then the integer linear program describing weighted set cover is identical to the one given above, except that the objective function to minimize is \sum_ w_s x_s.


Hitting set formulation

Set covering is equivalent to the hitting set problem. That is seen by observing that an instance of set covering can be viewed as an arbitrary bipartite graph, with the universe represented by vertices on the left, the sets represented by vertices on the right, and edges representing the inclusion of elements in sets. The task is then to find a minimum cardinality subset of right-vertices which covers all of the left-vertices, which is precisely the Hitting set problem.


Greedy algorithm

There is a
greedy algorithm A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. This method can be implemented in time linear in the sum of sizes of the input sets, using a
bucket queue A bucket queue is a data structure that implements the priority queue abstract data type: it maintains a dynamic collection of elements with numerical priorities and allows quick access to the element with minimum (or maximum) priority. In the bu ...
to prioritize the sets. It achieves an approximation ratio of H(s), where s is the size of the set to be covered. In other words, it finds a covering that may be H(n) times as large as the minimum one, where H(n) is the n-th
harmonic number In mathematics, the -th harmonic number is the sum of the reciprocals of the first natural numbers: H_n= 1+\frac+\frac+\cdots+\frac =\sum_^n \frac. Starting from , the sequence of harmonic numbers begins: 1, \frac, \frac, \frac, \frac, \dot ...
: H(n) = \sum_^ \frac \le \ln +1 This greedy algorithm actually achieves an approximation ratio of H(s^\prime) where s^\prime is the maximum cardinality set of S. For \delta-dense instances, however, there exists a c \ln-approximation algorithm for every c > 0. There is a standard example on which the greedy algorithm achieves an approximation ratio of \log_2(n)/2. The universe consists of n=2^-2 elements. The set system consists of k pairwise disjoint sets S_1,\ldots,S_k with sizes 2,4,8,\ldots,2^k respectively, as well as two additional disjoint sets T_0,T_1, each of which contains half of the elements from each S_i. On this input, the greedy algorithm takes the sets S_k,\ldots,S_1, in that order, while the optimal solution consists only of T_0 and T_1. An example of such an input for k=3 is pictured on the right. Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms (see Inapproximability results below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly \ln - \ln + \Theta(1).


Low-frequency systems

If each element occurs in at most sets, then a solution can be found in polynomial time that approximates the optimum to within a factor of using LP relaxation. If the constraint x_S\in\ is replaced by x_S \geq 0 for all in \mathcal in the integer linear program shown above, then it becomes a (non-integer) linear program . The algorithm can be described as follows: # Find an optimal solution for the program using some polynomial-time method of solving linear programs. # Pick all sets for which the corresponding variable has value at least 1/ in the solution .


Inapproximability results

When n refers to the size of the universe, showed that set covering cannot be approximated in polynomial time to within a factor of \tfrac\log_2 \approx 0.72\ln, unless NP has
quasi-polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
algorithms. Feige (1998) improved this lower bound to \bigl(1-o(1)\bigr)\cdot\ln under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm. established a lower bound of c\cdot\ln, where c is a certain constant, under the weaker assumption that P\not=NP. A similar result with a higher value of c was recently proved by . showed optimal inapproximability by proving that it cannot be approximated to \bigl(1 - o(1)\bigr) \cdot \ln unless P=NP.


Weighted set cover

Relaxing the integer linear program for weighted set cover stated above, one may use
randomized rounding Within computer science and operations research, many combinatorial optimization problems are computationally intractable to solve exactly (to optimality). Many such problems do admit fast ( polynomial time) approximation algorithms—that is, alg ...
to get an O(\log n)-factor approximation. Non weighted set cover can be adapted to the weighted case.


Related problems

* Hitting set is an equivalent reformulation of Set Cover. *
Vertex cover In graph theory, a vertex cover (sometimes node cover) of a graph is a set of vertices that includes at least one endpoint of every edge of the graph. In computer science, the problem of finding a minimum vertex cover is a classical optimiza ...
is a special case of Hitting Set. *
Edge cover In graph theory, an edge cover of a graph is a set of edges such that every vertex of the graph is incident to at least one edge of the set. In computer science, the minimum edge cover problem is the problem of finding an edge cover of minimum size ...
is a special case of Set Cover. * Geometric set cover is a special case of Set Cover when the universe is a set of points in \mathbb^d and the sets are induced by the intersection of the universe and geometric shapes (e.g., disks, rectangles). *
Set packing Set packing is a classical NP-complete problem in computational complexity theory and combinatorics, and was one of Karp's 21 NP-complete problems. Suppose one has a finite set ''S'' and a list of subsets of ''S''. Then, the set packing problem asks ...
* Maximum coverage problem is to choose at most k sets to cover as many elements as possible. *
Dominating set In graph theory, a dominating set for a graph is a subset of its vertices, such that any vertex of is either in , or has a neighbor in . The domination number is the number of vertices in a smallest dominating set for . The dominating set ...
is the problem of selecting a set of vertices (the dominating set) in a graph such that all other vertices are adjacent to at least one vertex in the dominating set. The Dominating set problem was shown to be NP complete through a reduction from Set cover. * Exact cover problem is to choose a set cover with no element included in more than one covering set. * Red Blue Set Cover. * Set-cover abduction.


Notes


References

* . * * . * * . * . * . * * *


External links


Benchmarks with Hidden Optimum Solutions for Set Covering, Set Packing and Winner Determination


{{DEFAULTSORT:Set Cover Problem Families of sets NP-complete problems Linear programming Approximation algorithms Covering problems