In mathematics, a graph partition is the reduction of a

graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...

to a smaller graph by partitioning its set of nodes into mutually exclusive groups. Edges of the original graph that cross between the groups will produce edges in the partitioned graph. If the number of resulting edges is small compared to the original graph, then the partitioned graph may be better suited for analysis and problem-solving than the original. Finding a partition that simplifies graph analysis is a hard problem, but one that has applications to scientific computing,

VLSI Very large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining millions or billions of MOS transistors onto a single chip. VLSI began in the 1970s when MOS integrated circuit (Metal Oxide Semiconductor) c ...

circuit design, and task scheduling in multiprocessor computers, among others. Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological and biological networks. For a survey on recent trends in computational methods and applications see . Two common examples of graph partitioning are

minimum cut In graph theory, a minimum cut or min-cut of a graph is a cut (a partition of the vertices of a graph into two disjoint subsets) that is minimal in some metric. Variations of the minimum cut problem consider weighted graphs, directed graphs, ter ...

and

maximum cut For a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets and , such that the number of edges between and is as large as possible. Fin ...

problems.

Problem complexity

Typically, graph partition problems fall under the category of

NP-hard In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...

problems. Solutions to these problems are generally derived using heuristics and approximation algorithms. However, uniform graph partitioning or a balanced graph partition problem can be shown to be

NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...

to approximate within any finite factor. Even for special graph classes such as trees and grids, no reasonable approximation algorithms exist, unless

P=NP The P versus NP problem is a major unsolved problem in theoretical computer science. In informal terms, it asks whether every problem whose solution can be quickly verified can also be quickly solved. The informal term ''quickly'', used abov ...

. Grids are a particularly interesting case since they model the graphs resulting from Finite Element Model (FEM) simulations. When not only the number of edges between the components is approximated, but also the sizes of the components, it can be shown that no reasonable fully polynomial algorithms exist for these graphs.

Problem

Consider a graph ''G'' = (''V'', ''E''), where ''V'' denotes the set of ''n'' vertices and ''E'' the set of edges. For a (''k'',''v'') balanced partition problem, the objective is to partition ''G'' into ''k'' components of at most size ''v'' · (''n''/''k''), while minimizing the capacity of the edges between separate components. Also, given ''G'' and an integer ''k'' > 1, partition ''V'' into ''k'' parts (subsets) ''V''₁, ''V''₂, ..., ''V_k'' such that the parts are disjoint and have equal size, and the number of edges with endpoints in different parts is minimized. Such partition problems have been discussed in literature as bicriteria-approximation or resource augmentation approaches. A common extension is to

hypergraph In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices. Formally, an undirected hypergraph H is a pair H = (X,E) wh ...

s, where an edge can connect more than two vertices. A hyperedge is not cut if all vertices are in one partition, and cut exactly once otherwise, no matter how many vertices are on each side. This usage is common in

electronic design automation Electronic design automation (EDA), also referred to as electronic computer-aided design (ECAD), is a category of software tools for designing Electronics, electronic systems such as integrated circuits and printed circuit boards. The tools wo ...

Analysis

For a specific (''k'', 1 + ''ε'') balanced partition problem, we seek to find a minimum cost partition of ''G'' into ''k'' components with each component containing a maximum of (1 + ''ε'')·(''n''/''k'') nodes. We compare the cost of this approximation algorithm to the cost of a (''k'',1) cut, wherein each of the ''k'' components must have the same size of (''n''/''k'') nodes each, thus being a more restricted problem. Thus, :

\max_i , V_i,  \le (1+\varepsilon) \left\lceil\frac\right\rceil.

We already know that (2,1) cut is the minimum bisection problem and it is NP-complete. Next, we assess a 3-partition problem wherein ''n'' = 3''k'', which is also bounded in polynomial time. Now, if we assume that we have a finite approximation algorithm for (''k'', 1)-balanced partition, then, either the 3-partition instance can be solved using the balanced (''k'',1) partition in ''G'' or it cannot be solved. If the 3-partition instance can be solved, then (''k'', 1)-balanced partitioning problem in ''G'' can be solved without cutting any edge. Otherwise, if the 3-partition instance cannot be solved, the optimum (''k'', 1)-balanced partitioning in ''G'' will cut at least one edge. An approximation algorithm with a finite approximation factor has to differentiate between these two cases. Hence, it can solve the 3-partition problem which is a contradiction under the assumption that ''P'' = ''NP''. Thus, it is evident that (''k'',1)-balanced partitioning problem has no polynomial-time approximation algorithm with a finite approximation factor unless ''P'' = ''NP''. The

planar separator theorem In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of verti ...

states that any ''n''-vertex

planar graph In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cross ...

can be partitioned into roughly equal parts by the removal of O() vertices. This is not a partition in the sense described above, because the partition set consists of vertices rather than edges. However, the same result also implies that every planar graph of bounded degree has a balanced cut with O() edges.

Graph partition methods

Since graph partitioning is a hard problem, practical solutions are based on heuristics. There are two broad categories of methods, local and global. Well-known local methods are the Kernighan–Lin algorithm, and Fiduccia-Mattheyses algorithms, which were the first effective 2-way cuts by local search strategies. Their major drawback is the arbitrary initial partitioning of the vertex set, which can affect the final solution quality. Global approaches rely on properties of the entire graph and do not rely on an arbitrary initial partition. The most common example is spectral partitioning, where a partition is derived from approximate eigenvectors of the adjacency matrix, or

spectral clustering In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as ...

that groups graph vertices using the

eigendecomposition In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matr ...

of the

graph Laplacian In the mathematical field of graph theory, the Laplacian matrix, also called the graph Laplacian, admittance matrix, Kirchhoff matrix or discrete Laplacian, is a matrix representation of a graph. Named after Pierre-Simon Laplace, the graph Lapl ...

matrix.

Multi-level methods

A multi-level graph partitioning algorithm works by applying one or more stages. Each stage reduces the size of the graph by collapsing vertices and edges, partitions the smaller graph, then maps back and refines this partition of the original graph. A wide variety of partitioning and refinement methods can be applied within the overall multi-level scheme. In many cases, this approach can give both fast execution times and very high quality results. One widely used example of such an approach is

METIS Metis or Métis may refer to: Ethnic groups * Métis, recognized Indigenous communities in Canada and America whose distinct culture and language emerged after early intermarriage between First Nations peoples and early European settlers, prima ...

, a graph partitioner, and hMETIS, the corresponding partitioner for hypergraphs. An alternative approach originated from and implemented, e.g., in

scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector ...

with the partitioning determined from

eigenvectors In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...

of the

matrix for the original graph computed by

LOBPCG Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a matrix-free method for finding the largest (or smallest) eigenvalues and the corresponding eigenvectors of a symmetric generalized eigenvalue problem :A x= \lambda B x, for a ...

solver with

multigrid In numerical analysis, a multigrid method (MG method) is an algorithm for solving differential equations using a hierarchy of discretizations. They are an example of a class of techniques called multiresolution methods, very useful in problems exhi ...

preconditioning In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducin ...

Spectral partitioning and spectral bisection

Given a graph

G=(V,E)

with

adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. In the special case of a finite simp ...

A

, where an entry

A_

implies an edge between node

i

and

j

, and

degree matrix In the mathematical field of algebraic graph theory, the degree matrix of an undirected graph is a diagonal matrix which contains information about the degree of each vertex—that is, the number of edges attached to each vertex.. It is used togeth ...

D

, which is a diagonal matrix, where each diagonal entry of a row

i

d_

, represents the node degree of node

i

. The Laplacian matrix

L

is defined as

L = D - A

. Now, a ratio-cut partition for graph

G = (V, E)

is defined as a partition of

V

into disjoint

U

, and

W

, minimizing the ratio :

\frac

of the number of edges that actually cross this cut to the number of pairs of vertices that could support such edges. Spectral graph partitioning can be motivated by analogy with partitioning of a vibrating string or a mass-spring system and similarly extended to the case of negative weights of the graph.

Fiedler eigenvalue and eigenvector

In such a scenario, the second smallest eigenvalue (

\lambda_2

) of

L

, yields a ''lower bound'' on the optimal cost (

c

) of ratio-cut partition with

c\geq \frac

. The eigenvector (

V_2

) corresponding to

\lambda_2

, called the ''Fiedler vector'', bisects the graph into only two communities based on the ''sign of the corresponding vector entry''. Division into a larger number of communities can be achieved by repeated ''bisection'' or by using ''multiple eigenvectors'' corresponding to the smallest eigenvalues. The examples in Figures 1,2 illustrate the spectral bisection approach. Bisected network

Modularity and ratio-cut

Minimum cut partitioning however fails when the number of communities to be partitioned, or the partition sizes are unknown. For instance, optimizing the cut size for free group sizes puts all vertices in the same community. Additionally, cut size may be the wrong thing to minimize since a good division is not just one with small number of edges between communities. This motivated the use of Modularity (Q) as a metric to optimize a balanced graph partition. The example in Figure 3 illustrates 2 instances of the same graph such that in ''(a)'' modularity (Q) is the partitioning metric and in ''(b)'', ratio-cut is the partitioning metric. Graph comparison

Applications

Conductance

Another objective function used for graph partitioning is Conductance which is the ratio between the number of cut edges and the volume of the smallest part. Conductance is related to electrical flows and random walks. The

Cheeger bound In mathematics, the Cheeger bound is a bound of the second largest eigenvalue of the transition matrix of a finite-state, discrete-time, reversible stationary Markov chain. It can be seen as a special case of Cheeger inequalities in expander graph ...

guarantees that spectral bisection provides partitions with nearly optimal conductance. The quality of this approximation depends on the second smallest eigenvalue of the Laplacian λ₂.

Immunization

Graph partition can be useful for identifying the minimal set of nodes or links that should be immunized in order to stop epidemics.

Other graph partition methods

Spin models have been used for clustering of multivariate data wherein similarities are translated into coupling strengths. The properties of ground state spin configuration can be directly interpreted as communities. Thus, a graph is partitioned to minimize the Hamiltonian of the partitioned graph. The

Hamiltonian Hamiltonian may refer to: * Hamiltonian mechanics, a function that represents the total energy of a system * Hamiltonian (quantum mechanics), an operator corresponding to the total energy of that system ** Dyall Hamiltonian, a modified Hamiltonian ...

(H) is derived by assigning the following partition rewards and penalties. * Reward internal edges between nodes of same group (same spin) * Penalize missing edges in same group * Penalize existing edges between different groups * Reward non-links between different groups. Additionally, Kernel-PCA-based Spectral clustering takes a form of least squares Support Vector Machine framework, and hence it becomes possible to project the data entries to a kernel induced feature space that has maximal variance, thus implying a high separation between the projected communities. Some methods express graph partitioning as a multi-criteria optimization problem which can be solved using local methods expressed in a game theoretic framework where each node makes a decision on the partition it chooses. For very large-scale distributed graphs classical partition methods might not apply (e.g., spectral partitioning, Metis) since they require full access to graph data in order to perform global operations. For such large-scale scenarios distributed graph partitioning is used to perform partitioning through asynchronous local operations only.

Software tools

implements

with the partitioning determined from

of the

matrix for the original graph computed by

ARPACK ARPACK, the ARnoldi PACKage, is a numerical computation, numerical software library written in Fortran, FORTRAN 77 for solving large scale eigenvalue problems in the matrix-free methods, matrix-free fashion. The package is designed to compute a fe ...

, or by

solver with

. Chaco, due to Hendrickson and Leland, implements the multilevel approach outlined above and basic local search algorithms. Moreover, they implement spectral partitioning techniques.

is a graph partitioning family by Karypis and Kumar. Among this family, kMetis aims at greater partitioning speed, hMetis, applies to hypergraphs and aims at partition quality, and ParMetis is a parallel implementation of the Metis graph partitioning algorithm. PaToH is another hypergraph partitioner. KaHyPar is a multilevel hypergraph partitioning framework providing direct k-way and recursive bisection based partitioning algorithms. It instantiates the multilevel approach in its most extreme version, removing only a single vertex in every level of the hierarchy. By using this very fine grained ''n''-level approach combined with strong local search heuristics, it computes solutions of very high quality. Scotch is graph partitioning framework by Pellegrini. It uses recursive multilevel bisection and includes sequential as well as parallel partitioning techniques. Jostle is a sequential and parallel graph partitioning solver developed by Chris Walshaw. The commercialized version of this partitioner is known as NetWorks. Party implements the Bubble/shape-optimized framework and the Helpful Sets algorithm. The software packages DibaP and its MPI-parallel variant PDibaP by Meyerhenke implement the Bubble framework using diffusion; DibaP also uses AMG-based techniques for coarsening and solving linear systems arising in the diffusive approach. Sanders and Schulz released a graph partitioning package KaHIP (Karlsruhe High Quality Partitioning) that implements for example flow-based methods, more-localized local searches and several parallel and sequential meta-heuristics. The tools Parkway by Trifunovic and Knottenbelt as well as Zoltan by Devine et al. focus on hypergraph partitioning. List of free open-source frameworks:

References

External links

* Chamberlain, Bradford L. (1998)
"Graph Partitioning Algorithms for Distributing Workloads of Parallel Computations"

Bibliography

* * An exhaustive analysis of the problem from a theoretical point of view. * One of the early fundamental works in the field. However, performance is O(n²), so it is no longer commonly used. * A later variant that is linear time, very commonly used, both by itself and as part of multilevel partitioning, see below. * Multi-level partitioning is the current state of the art. This paper also has good explanations of many other methods, and comparisons of the various methods on a wide variety of problems. * Graph partitioning (and in particular, hypergraph partitioning) has many applications to IC design. * Simulated annealing can be used as well. * {{cite journal , title=New spectral methods for ratio cut partitioning and clustering , last1=Hagen , first1=L. , last2=Kahng , first2=A. B. , journal=IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , date=September 1992 , volume=11 , issue=9 , pages= 1074–1085 , doi=10.1109/43.159993. There is a whole class of ''spectral partitioning'' methods, which use the Eigenvectors of the Laplacian of the connectivity graph. You can se

using Matlab. NP-complete problems Computational problems in graph theory