computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...

, k-approximation of k-hitting set is an

approximation algorithm In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solu ...

for weighted

hitting set The set cover problem is a classical question in combinatorics, computer science, operations research, and Computational complexity theory, complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972. Given a S ...

. The input is a collection ''S'' of

subset In mathematics, set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset o ...

s of some universe ''T'' and a

mapping Mapping may refer to: * Mapping (cartography), the process of making a map * Mapping (mathematics), a synonym for a mathematical function and its generalizations ** Mapping (logic), a synonym for functional predicate Types of mapping * Animated m ...

''W'' from ''T'' to non-negative numbers called the weights of the elements of ''T''. In k-hitting set the size of the sets in ''S'' cannot be larger than ''k''. That is,

\forall i \in S: , i,  \leq k

. The problem is now to pick some subset ''T''' of ''T'' such that every set in ''S'' contains some element of ''T''', and such that the total weight of all elements in ''T''' is as small as possible.

The algorithm

For each set

j

in ''S'' is maintained a ''price'',

p_j

, which is initially 0. For an element ''a'' in ''T'', let ''S''(''a'') be the collection of sets from ''S'' containing ''a''. During the algorithm the following invariant is kept :

\forall a \in T: \sum_ p_j \leq W(a).\,

We say that an element, ''a'', from ''T'' is ''tight'' if

\Sigma_ p_j = W(a)

. The main part of the algorithm consists of a loop: As long as there is a set in ''S'' that contains no element from ''T'' which is tight, the price of this set is increased as much as possible without violating the invariant above. When this loop exits, all sets contain some tight element. Pick all the tight elements to be the hitting set.

Correctness

The algorithm always terminates because in each iteration of the loop the price of some set in ''S'' is increased enough to make one more element from ''T'' tight. If it cannot increase any price, it exits. It runs in polynomial time because the loop will not make more iterations than the number of elements in the union of all the sets of ''S''. It returns a hitting set, because when the loop exits, all sets in ''S'' contain a tight element from ''T'', and the set of these tight elements are returned. Note that for any hitting set ''T*'' and any prices

p_1, \ldots, p_

where the invariant from the algorithm is true, the total weight of the hitting set is an upper limit to the sum over all members of ''T*'' of the sum of the prices of sets containing this element, that is:

\Sigma_ \Sigma_ p_j \leq \Sigma_ W(a)

. This follows from the invariant property. Further, since the price of every set must occur at least once on the left hand side, we get

\Sigma_ p_j \leq \Sigma_ W(a)

. Especially, this property is true for the optimal hitting set. Further, for the hitting set ''H'' returned from the algorithm above, we have

\Sigma_ \Sigma_ p_j = \Sigma_ W(a)

. Since any price

p_j

can appear at most ''k'' times in the left-hand side (since subsets of ''S'' can contain no more than ''k'' element from ''T'') we get

\Sigma_ W(a) \leq k \cdot \Sigma_ p_j

Combined with the previous paragraph we get

\Sigma_ W(a) \leq k \cdot \Sigma_ W(a)

, where ''T*'' is the optimal hitting set. This is exactly the guarantee that it is a k-approximation algorithm.

Relation to linear programming

This algorithm is an instance of the primal-dual method, also called the pricing method. The intuition is that it is dual to a

linear programming Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...

algorithm. For an explanation see http://algo.inria.fr/seminars/sem00-01/vazirani.html.

References

* . * *{{cite book, first1=M. X., last1=Goemans, author1-link=Michel Goemans, first2=D. P., last2=Williamson, author2-link=David P. Williamson, contribution=The primal-dual method for approximation algorithms and its application to network design problems, title=Approximation Algorithms for NP-Hard Problems, editor-first=Dorit S., editor-last=Hochbaum, editor-link=Dorit S. Hochbaum, publisher=PWS Publishing Company, year=1997, isbn=0-534-94968-1. Approximation algorithms