Color-coding
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
and
graph theory In mathematics, graph theory is the study of ''graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conn ...
, the term color-coding refers to an algorithmic technique which is useful in the discovery of network motifs. For example, it can be used to detect a simple path of length in a given
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
. The traditional color-coding algorithm is probabilistic, but it can be derandomized without much overhead in the running time. Color-coding also applies to the detection of cycles of a given length, and more generally it applies to the
subgraph isomorphism problem In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs ''G'' and ''H'' are given as input, and one must determine whether ''G'' contains a subgraph that is isomorphic to ''H''. Subgraph isomor ...
(an
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...
problem), where it yields polynomial time algorithms when the subgraph pattern that it is trying to detect has bounded
treewidth In graph theory, the treewidth of an undirected graph is an integer number which specifies, informally, how far the graph is from being a tree. The smallest treewidth is 1; the graphs with treewidth 1 are exactly the trees and the forests. The gra ...
. The color-coding method was proposed and analyzed in 1994 by
Noga Alon Noga Alon ( he, נוגה אלון; born 17 February 1956) is an Israeli mathematician and a professor of mathematics at Princeton University noted for his contributions to combinatorics and theoretical computer science, having authored hundreds of ...
, Raphael Yuster, and
Uri Zwick Uri Zwick is an Israeli computer scientist and mathematician known for his work on graph algorithms, in particular on distances in graphs and on the color-coding technique for subgraph isomorphism. With Howard Karloff, he is the namesake of the ...
.Alon, N., Yuster, R., and Zwick, U. 1995. Color-coding. J. ACM 42, 4 (Jul. 1995), 844–856. DOI= http://doi.acm.org/10.1145/210332.210337


Results

The following results can be obtained through the method of color-coding: * For every fixed constant , if a graph contains a simple cycle of size , then such a cycle can be found in: ** O(, V, ^\omega) expected time, or ** O(, V, ^\omega \log , V, ) worst-case time, where is the exponent of
matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
. * For every fixed constant , and every graph that is in any nontrivial minor-closed graph family (e.g., a planar graph), if contains a simple cycle of size , then such cycle can be found in: ** expected time, or ** worst-case time. * If a graph contains a subgraph isomorphic to a bounded
treewidth In graph theory, the treewidth of an undirected graph is an integer number which specifies, informally, how far the graph is from being a tree. The smallest treewidth is 1; the graphs with treewidth 1 are exactly the trees and the forests. The gra ...
graph which has vertices, then such a subgraph can be found in polynomial time.


The method

To solve the problem of finding a subgraph H = (V_H, E_H) in a given graph , where can be a path, a cycle, or any bounded
treewidth In graph theory, the treewidth of an undirected graph is an integer number which specifies, informally, how far the graph is from being a tree. The smallest treewidth is 1; the graphs with treewidth 1 are exactly the trees and the forests. The gra ...
graph where , V_H, = O(\log , V, ), the method of color-coding begins by randomly coloring each vertex of with k = , V_H, colors, and then tries to find a colorful copy of in colored . Here, a graph is colorful if every vertex in it is colored with a distinct color. This method works by repeating (1) random coloring a graph and (2) finding colorful copy of the target subgraph, and eventually the target subgraph can be found if the process is repeated a sufficient number of times. Suppose a copy of in becomes colorful with some non-zero probability . It immediately follows that if the random coloring is repeated times, then this copy is expected to become colorful once. Note that though is small, it is shown that if , V_H, = O(\log , V, ), is only polynomially small. Suppose again there exists an algorithm such that, given a graph and a coloring which maps each vertex of to one of the colors, it finds a copy of colorful , if one exists, within some runtime . Then the expected time to find a copy of in , if one exists, is O(\tfrac). Sometimes it is also desirable to use a more restricted version of colorfulness. For example, in the context of finding cycles in planar graphs, it is possible to develop an algorithm that finds well-colored cycles. Here, a cycle is well-colored if its vertices are colored by consecutive colors.


Example

An example would be finding a simple cycle of length in graph . By applying random coloring method, each simple cycle has a probability of k!/k^k > e^ to become colorful, since there are k^k ways of coloring the vertices on the cycle, among which there are k! colorful occurrences. Then an algorithm (described next) can be used to find colorful cycles in the randomly colored graph in time O(V^\omega), where \omega is the matrix multiplication constant. Therefore, it takes e^k\cdot O(V^\omega) overall time to find a simple cycle of length in . The colorful cycle-finding algorithm works by first finding all pairs of vertices in that are connected by a simple path of length , and then checking whether the two vertices in each pair are connected. Given a coloring function to color graph , enumerate all partitions of the color set into two subsets of size k/2 each. Note that can be divided into and accordingly, and let and denote the subgraphs induced by and respectively. Then, recursively find colorful paths of length k/2 - 1 in each of and . Suppose the boolean matrix and represent the connectivity of each pair of vertices in and by a colorful path, respectively, and let be the matrix describing the adjacency relations between vertices of and those of , the boolean product A_1BA_2 gives all pairs of vertices in that are connected by a colorful path of length . Thus, the recursive relation of matrix multiplications is t(k) \le 2^k\cdot t(k/2), which yields a runtime of 2^\cdot V^\omega = O(V^\omega). Although this algorithm finds only the end points of the colorful path, another algorithm by Alon and Naor that finds colorful paths themselves can be incorporated into it.


Derandomization

The derandomization of color-coding involves enumerating possible colorings of a graph , such that the randomness of coloring is no longer required. For the target subgraph in to be discoverable, the enumeration has to include at least one instance where the is colorful. To achieve this, enumerating a -perfect family of hash functions from to is sufficient. By definition, is -perfect if for every subset of where , S, = k, there exists a hash function in such that is perfect. In other words, there must exist a hash function in that colors any given vertices with distinct colors. There are several approaches to construct such a -perfect hash family: # The best explicit construction is by
Moni Naor Moni Naor ( he, מוני נאור) is an Israeli computer scientist, currently a professor at the Weizmann Institute of Science. Naor received his Ph.D. in 1989 at the University of California, Berkeley. His advisor was Manuel Blum. He works i ...
, Leonard J. Schulman, and Aravind Srinivasan, where a family of size e^k k^ \log , V, can be obtained. This construction does not require the target subgraph to exist in the original subgraph finding problem. # Another explicit construction by Jeanette P. Schmidt and Alan Siegel yields a family of size 2^\log^2 , V, . # Another construction that appears in the original paper of
Noga Alon Noga Alon ( he, נוגה אלון; born 17 February 1956) is an Israeli mathematician and a professor of mathematics at Princeton University noted for his contributions to combinatorics and theoretical computer science, having authored hundreds of ...
et al. can be obtained by first building a -perfect family that maps to followed by building another -perfect family that maps to In the first step, it is possible to construct such a family with random bits that are almost -wise independent,Alon, N., Goldreich, O., Hastad, J., and Peralta, R. 1990. Simple construction of almost k-wise independent random variables. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science (October 22–24, 1990). SFCS. IEEE Computer Society, Washington, DC, 544-553 vol.2. and the sample space needed for generating those random bits can be as small as k^\log , V, . In the second step, it has been shown by Jeanette P. Schmidt and Alan Siegel that the size of such -perfect family can be 2^. Consequently, by composing the -perfect families from both steps, a -perfect family of size 2^\log , V, that maps from to can be obtained. In the case of derandomizing well-coloring, where each vertex on the subgraph is colored consecutively, a -perfect family of hash functions from to is needed. A sufficient -perfect family which maps from to can be constructed in a way similar to the approach 3 above (the first step). In particular, it is done by using random bits that are almost independent, and the size of the resulting -perfect family will be k^\log , V, . The derandomization of color-coding method can be easily parallelized, yielding efficient NC algorithms.


Applications

Recently, color-coding has attracted much attention in the field of bioinformatics. One example is the detection of
signaling pathways Signal transduction is the process by which a chemical or physical signal is transmitted through a cell as a series of molecular events, most commonly protein phosphorylation catalyzed by protein kinases, which ultimately results in a cellula ...
in protein-protein interaction (PPI) networks. Another example is to discover and to count the number of motifs in PPI networks. Studying both
signaling pathways Signal transduction is the process by which a chemical or physical signal is transmitted through a cell as a series of molecular events, most commonly protein phosphorylation catalyzed by protein kinases, which ultimately results in a cellula ...
and motifs allows a deeper understanding of the similarities and differences of many biological functions, processes, and structures among organisms. Due to the huge amount of gene data that can be collected, searching for pathways or motifs can be highly time consuming. However, by exploiting the color-coding method, the motifs or signaling pathways with k=O(\log n) vertices in a network with vertices can be found very efficiently in polynomial time. Thus, this enables us to explore more complex or larger structures in PPI networks.


Further reading

* *


References

{{DEFAULTSORT:Color-Coding Graph algorithms