Metric K-center
   HOME

TheInfoList



OR:

In
graph theory In mathematics, graph theory is the study of ''graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conne ...
, the metric -center problem is a
combinatorial optimization Combinatorial optimization is a subfield of mathematical optimization that consists of finding an optimal object from a finite set of objects, where the set of feasible solutions is discrete or can be reduced to a discrete set. Typical combi ...
problem studied in
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
. Given cities with specified distances, one wants to build warehouses in different cities and minimize the maximum distance of a city to a warehouse. In
graph theory In mathematics, graph theory is the study of ''graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conne ...
, this means finding a set of vertices for which the largest distance of any point to its closest vertex in the -set is minimum. The vertices must be in a
metric space In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general settin ...
, providing a
complete graph In the mathematical field of graph theory, a complete graph is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge. A complete digraph is a directed graph in which every pair of distinct vertices is c ...
that satisfies the
triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of degenerate triangles, but ...
.


Formal definition

Let (X,d) be a
metric space In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general settin ...
where X is a set and d is a
metric Metric or metrical may refer to: * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics In mathem ...

A set \mathbf\subseteq\mathcal, is provided together with a parameter k. The goal is to find a subset \mathcal\subseteq \mathbf with , \mathcal, =k such that the maximum distance of a point in \mathbf to the closest point in \mathcal is minimized. The problem can be formally defined as follows:
For a metric space (\mathcal,d), * Input: a set \mathbf\subseteq\mathcal, and a parameter k. * Output: a set \mathcal\subseteq\mathbf of k points. * Goal: Minimize the cost r^\mathcal(\mathbf) = \underset d(v,\mathcal) That is, Every point in a cluster is in distance at most r^\mathcal(V) from its respective center. The k-Center Clustering problem can also be defined on a complete undirected graph ''G'' = (''V'', ''E'') as follows:
Given a complete undirected graph ''G'' = (''V'', ''E'') with distances ''d''(''v''''i'', ''v''''j'') ∈ ''N'' satisfying the triangle inequality, find a subset ''C'' ⊆ ''V'' with , ''C'',  = ''k'' while minimizing: : \max_ \min_ d(v,c)


Computational complexity

In a complete undirected graph ''G'' = (''V'', ''E''), if we sort the edges in non-decreasing order of the distances: ''d''(''e''1) ≤ ''d''(''e''2) ≤ … ≤ ''d''(''e''m) and let ''G''i = (V, ''E''i), where ''E''i = . The ''k''-center problem is equivalent to finding the smallest index ''i'' such that ''G''i has a
dominating set In graph theory, a dominating set for a graph is a subset of its vertices, such that any vertex of is either in , or has a neighbor in . The domination number is the number of vertices in a smallest dominating set for . The dominating set ...
of size at most ''k''. Although Dominating Set is
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
, the ''k''-center problem remains
NP-hard In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
. This is clear, since the optimality of a given feasible solution for the ''k''-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index ''i'' such that ''G''i has a
dominating set In graph theory, a dominating set for a graph is a subset of its vertices, such that any vertex of is either in , or has a neighbor in . The domination number is the number of vertices in a smallest dominating set for . The dominating set ...
of size at most ''k''), which is precisely the difficult core of the
NP-Hard In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
problems. Although a
Turing reduction In computability theory, a Turing reduction from a decision problem A to a decision problem B is an oracle machine which decides problem A given an oracle for B (Rogers 1967, Soare 1987). It can be understood as an algorithm that could be used to s ...
can get around this issue by trying all values of ''k''.


Approximations


A simple greedy algorithm

A simple greedy
approximation algorithm In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solu ...
that achieves an approximation factor of 2 builds \mathcal using a
farthest-first traversal In computational geometry, the farthest-first traversal of a compact metric space is a sequence of points in the space, where the first point is selected arbitrarily and each successive point is as far as possible from the set of previously-selec ...
in ''k'' iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows: * Pick an arbitrary point \bar_1 into C_1 * For every point v\in \mathbf compute d_1 /math> from \bar_1 * Pick the point \bar_2 with highest distance from \bar_1. * Add it to the set of centers and denote this expanded set of centers as C_2. Continue this till ''k'' centers are found


Running time

* The ith iteration of choosing the ith center takes \mathcal(n) time. * There are ''k'' such iterations. * Thus, overall the algorithm takes \mathcal(nk) time.


Proving the approximation factor

The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor. Given a set of ''n'' points \mathbf\subseteq\mathcal, belonging to a metric space (\mathcal,d), the greedy ''K''-center algorithm computes a set K of ''k'' centers, such that K is a 2-approximation to the optimal ''k''-center clustering of V. ''i.e.'' r^(\mathbf)\leq 2r^(\mathbf,\textit) This theorem can be proven using two cases as follows, Case 1: Every cluster of \mathcal_ contains exactly one point of \mathbf * Consider a point v\in \mathbf * Let \bar be the center it belongs to in \mathcal_ * Let \bar be the center of \mathbf that is in \Pi(\mathcal_,\bar) * d(v,\bar)=d(v,\mathcal_)\leq r^(\mathbf,k) * Similarly, d(\bar,\bar)=d(\bar,\mathcal_)\leq r^ * By the triangle inequality: d(v,\bar)\leq d(v,\bar)+d(\bar,\bar)\leq 2r^
Case 2: There are two centers \bar and \bar of \mathbf that are both in \Pi(\mathcal_,\bar), for some \bar\in \mathcal_ (By pigeon hole principle, this is the only other possibility) * Assume, without loss of generality, that \bar was added later to the center set \mathbf by the greedy algorithm, say in ith iteration. * But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that \bar\in\mathcal_and, \begin r^\mathbf(\mathbf)\leq r^(\mathbf)&=d(\bar,\mathcal_)\\ &\leq d(\bar,\bar)\\ &\leq d(\bar,\bar)+d(\bar,\bar)\\ &\leq 2r^ \end


Another 2-factor approximation algorithm

Another algorithm with the same approximation factor takes advantage of the fact that the ''k''-Center problem is equivalent to finding the smallest index ''i'' such that ''G''i has a dominating set of size at most ''k'' and computes a maximal independent set of ''G''i, looking for the smallest index ''i'' that has a maximal independent set with a size of at least ''k''. It is not possible to find an approximation algorithm with an approximation factor of 2 − ε for any ε > 0, unless P = NP. Furthermore, the distances of all edges in G must satisfy the triangle inequality if the ''k''-center problem is to be approximated within any constant factor, unless P = NP.


Parameterized approximations

It can be shown that the ''k''-Center problem is W hard to approximate within a factor of 2 − ε for any ε > 0, when using ''k'' as the parameter. This is also true when parameterizing by the
doubling dimension In mathematics, a metric space with metric is said to be doubling if there is some doubling constant such that for any and , it is possible to cover the ball with the union of at most balls of radius . The base-2 logarithm of is called the d ...
(in fact the dimension of a
Manhattan metric A taxicab geometry or a Manhattan geometry is a geometry whose usual distance function or metric of Euclidean geometry is replaced by a new metric in which the distance between two points is the sum of the absolute differences of their Cartesian co ...
), unless
P=NP The P versus NP problem is a major unsolved problem in theoretical computer science. In informal terms, it asks whether every problem whose solution can be quickly verified can also be quickly solved. The informal term ''quickly'', used abov ...
. When considering the combined parameter given by ''k'' and the
doubling dimension In mathematics, a metric space with metric is said to be doubling if there is some doubling constant such that for any and , it is possible to cover the ball with the union of at most balls of radius . The base-2 logarithm of is called the d ...
, ''k''-Center is still W hard but it is possible to obtain a parameterized approximation scheme. This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution.


See also

*
Traveling salesman problem The travelling salesman problem (also called the travelling salesperson problem or TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each cit ...
*
Minimum k-cut In mathematics, the minimum ''k''-cut, is a combinatorial optimization problem that requires finding a set of edges whose removal would partition the graph to at least ''k'' connected components. These edges are referred to as ''k''-cut. The goal ...
*
Dominating set In graph theory, a dominating set for a graph is a subset of its vertices, such that any vertex of is either in , or has a neighbor in . The domination number is the number of vertices in a smallest dominating set for . The dominating set ...
*
Independent set (graph theory) In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set S of vertices such that for every two vertices in S, there is no edge connecting the two ...
*
Facility location problem The study of facility location problems (FLP), also known as location analysis, is a branch of operations research and computational geometry concerned with the optimal placement of facilities to minimize transportation costs while considering fact ...


References


Further reading

* {{citation , last1 = Hochbaum , first1 = Dorit S. , author1-link = Dorit S. Hochbaum , last2 = Shmoys , first2 = David B. , author2-link = David Shmoys , contribution = A Best Possible Heuristic for the k-Center Problem , title = Mathematics of Operations Research , year = 1985 , volume = 10 , issue = 2 , pages = 180–184 Combinatorial optimization Computational problems in graph theory Approximation algorithms NP-hard problems