computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, Prim's algorithm is a

greedy algorithm A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...

that finds a

minimum spanning tree A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. ...

for a weighted

undirected graph In discrete mathematics, particularly in graph theory, a graph is a structure consisting of a set of objects where some pairs of the objects are in some sense "related". The objects are represented by abstractions called '' vertices'' (also call ...

. This means it finds a subset of the

edge Edge or EDGE may refer to: Technology Computing * Edge computing, a network load-balancing system * Edge device, an entry point to a computer network * Adobe Edge, a graphical development application * Microsoft Edge, a web browser developed by ...

s that forms a

tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...

that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex. The algorithm was developed in 1930 by

Czech Czech may refer to: * Anything from or related to the Czech Republic, a country in Europe ** Czech language ** Czechs, the people of the area ** Czech culture ** Czech cuisine * One of three mythical brothers, Lech, Czech, and Rus *Czech (surnam ...

mathematician

Vojtěch Jarník Vojtěch Jarník (; 22 December 1897 – 22 September 1970) was a Czech mathematician. He worked for many years as a professor and administrator at Charles University, and helped found the Czechoslovak Academy of Sciences. He is the namesake of ...

and later rediscovered and republished by

computer scientist A computer scientist is a scientist who specializes in the academic study of computer science. Computer scientists typically work on the theoretical side of computation. Although computer scientists can also focus their work and research on ...

s Robert C. Prim in 1957 and

Edsger W. Dijkstra Edsger Wybe Dijkstra ( ; ; 11 May 1930 – 6 August 2002) was a Dutch computer scientist, programmer, software engineer, mathematician, and science essayist. Born in Rotterdam in the Netherlands, Dijkstra studied mathematics and physics and the ...

in 1959. Therefore, it is also sometimes called the Jarník's algorithm, Prim–Jarník algorithm, Prim–Dijkstra algorithm. or the DJP algorithm.. Other well-known algorithms for this problem include

Kruskal's algorithm Kruskal's algorithm finds a minimum spanning forest of an undirected edge-weighted graph. If the graph is connected, it finds a minimum spanning tree. It is a greedy algorithm that in each step adds to the forest the lowest-weight edge that ...

and

Borůvka's algorithm Borůvka's algorithm is a greedy algorithm for finding a minimum spanning tree in a graph, or a minimum spanning forest in the case of a graph that is not connected. It was first published in 1926 by Otakar Borůvka as a method of constructing an ...

. These algorithms find the minimum spanning forest in a possibly disconnected graph; in contrast, the most basic form of Prim's algorithm only finds minimum spanning trees in connected graphs. However, running Prim's algorithm separately for each connected component of the graph, it can also be used to find the minimum spanning forest. In terms of their asymptotic

time complexity In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...

, these three algorithms are equally fast for

sparse graph In mathematics, a dense graph is a Graph (discrete mathematics), graph in which the number of edges is close to the maximal number of edges (where every pair of Vertex (graph theory), vertices is connected by one edge). The opposite, a graph with ...

s, but slower than other more sophisticated algorithms. However, for graphs that are sufficiently dense, Prim's algorithm can be made to run in

linear time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...

, meeting or improving the time bounds for other algorithms., p. 77.

Description

The algorithm may informally be described as performing the following steps: In more detail, it may be implemented following the

pseudocode In computer science, pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actio ...

below. function Prim(vertices, edges) is for each vertex in vertices do cheapestCost ertex← ∞ cheapestEdge ertex← null explored ← empty set unexplored ← set containing all vertices startVertex ← any element of vertices cheapestCost tartVertex← 0 while unexplored is not empty do // Select vertex in unexplored with minimum cost currentVertex ← vertex in unexplored with minimum cheapestCost ertex unexplored.remove(currentVertex) explored.add(currentVertex) for each edge (currentVertex, neighbor) in edges do if neighbor in unexplored and weight(currentVertex, neighbor) < cheapestCost eighborTHEN cheapestCost eighbor← weight(currentVertex, neighbor) cheapestEdge eighbor← (currentVertex, neighbor) resultEdges ← empty list for each vertex in vertices do if cheapestEdge ertex≠ null THEN resultEdges.append(cheapestEdge ertex return resultEdges As described above, the starting vertex for the algorithm will be chosen arbitrarily, because the first iteration of the main loop of the algorithm will have a set of vertices in ''Q'' that all have equal weights, and the algorithm will automatically start a new tree in ''F'' when it completes a spanning tree of each connected component of the input graph. The algorithm may be modified to start with any particular vertex ''s'' by setting ''C'' 's''to be a number smaller than the other values of ''C'' (for instance, zero), and it may be modified to only find a single spanning tree rather than an entire spanning forest (matching more closely the informal description) by stopping whenever it encounters another vertex flagged as having no associated edge. Different variations of the algorithm differ from each other in how the set ''Q'' is implemented: as a simple

linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whi ...

array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...

of vertices, or as a more complicated

priority queue In computer science, a priority queue is an abstract data type similar to a regular queue (abstract data type), queue or stack (abstract data type), stack abstract data type. In a priority queue, each element has an associated ''priority'', which ...

data structure. This choice leads to differences in the

of the algorithm. In general, a priority queue will be quicker at finding the vertex ''v'' with minimum cost, but will entail more expensive updates when the value of ''C'' 'w''changes.

Time complexity

The time complexity of Prim's algorithm depends on the data structures used for the graph and for ordering the edges by weight, which can be done using a

. The following table shows the typical choices: A simple implementation of Prim's, using an

adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph (discrete mathematics), graph. The elements of the matrix (mathematics), matrix indicate whether pairs of Vertex (graph theory), vertices ...

or an

adjacency list In graph theory and computer science, an adjacency list is a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph. This ...

graph representation and linearly searching an array of weights to find the minimum weight edge to add, requires O(, V, ²) running time. However, this running time can be greatly improved by using heaps to implement finding minimum weight edges in the algorithm's inner loop. A first improved version uses a heap to store all edges of the input graph, ordered by their weight. This leads to an O(, E, log , E, ) worst-case running time. But storing vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed

(MST) (or infinity if no such edge exists). Every time a vertex ''v'' is chosen and added to the MST, a decrease-key operation is performed on all vertices ''w'' outside the partial MST such that ''v'' is connected to ''w'', setting the key to the minimum of its previous value and the edge cost of (''v'',''w''). Using a simple

binary heap A binary heap is a heap (data structure), heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964 as a data structure fo ...

data structure, Prim's algorithm can now be shown to run in time O(, E, log , V, ) where , E, is the number of edges and , V, is the number of vertices. Using a more sophisticated

Fibonacci heap In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binar ...

, this can be brought down to O(, E, + , V, log , V, ), which is asymptotically faster when the graph is

dense Density (volumetric mass density or specific mass) is the ratio of a substance's mass to its volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' (or ''d'') can also be use ...

enough that , E, is ω(, V, ), and

when , E, is at least , V, log , V, . For graphs of even greater density (having at least , V, ^''c'' edges for some ''c'' > 1), Prim's algorithm can be made to run in linear time even more simply, by using a ''d''-ary heap in place of a Fibonacci heap.

Proof of correctness

Let ''P'' be a connected, weighted

graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discret ...

. At every iteration of Prim's algorithm, an edge must be found that connects a vertex in a subgraph to a vertex outside the subgraph. Since ''P'' is connected, there will always be a path to every vertex. The output ''Y'' of Prim's algorithm is a

, because the edge and vertex added to tree ''Y'' are connected. Let ''Y₁'' be a minimum spanning tree of graph P. If ''Y₁''=''Y'' then ''Y'' is a minimum spanning tree. Otherwise, let ''e'' be the first edge added during the construction of tree ''Y'' that is not in tree ''Y₁'', and ''V'' be the set of vertices connected by the edges added before edge ''e''. Then one endpoint of edge ''e'' is in set ''V'' and the other is not. Since tree ''Y₁'' is a spanning tree of graph ''P'', there is a path in tree ''Y₁'' joining the two endpoints. As one travels along the path, one must encounter an edge ''f'' joining a vertex in set ''V'' to one that is not in set ''V''. Now, at the iteration when edge ''e'' was added to tree ''Y'', edge ''f'' could also have been added and it would be added instead of edge ''e'' if its weight was less than ''e'', and since edge ''f'' was not added, we conclude that :

w(f) \ge w(e).

Let tree ''Y₂'' be the graph obtained by removing edge ''f'' from and adding edge ''e'' to tree ''Y₁''. It is easy to show that tree ''Y₂'' is connected, has the same number of edges as tree ''Y₁'', and the total weights of its edges is not larger than that of tree ''Y₁'', therefore it is also a minimum spanning tree of graph ''P'' and it contains edge ''e'' and all the edges added before it during the construction of set ''V''. Repeat the steps above and we will eventually obtain a minimum spanning tree of graph ''P'' that is identical to tree ''Y''. This shows ''Y'' is a minimum spanning tree. The minimum spanning tree allows for the first subset of the sub-region to be expanded into a larger subset ''X'', which we assume to be the minimum.

Parallel algorithm

Distributed adjacency matrix for parallel prim

The main loop of Prim's algorithm is inherently sequential and thus not

parallelizable In mathematics, a differentiable manifold M of dimension ''n'' is called parallelizable if there exist Smooth function, smooth vector fields \ on the manifold, such that at every point p of M the tangent vectors \ provide a Basis of a vector space, ...

. However, the

inner loop In computer programs, an important form of control flow is the Loop (computing), loop which causes a block of code to be executed more than once. A common idiom is to have a loop Nested loop, nested inside another loop, with the contained loop be ...

, which determines the next edge of minimum weight that does not form a cycle, can be parallelized by dividing the vertices and edges between the available processors. The following

demonstrates this. This algorithm can generally be implemented on distributed machines as well as on shared memory machines. The running time is

O(\tfrac) + O(, V,  \log , P, )

, assuming that the ''reduce'' and ''broadcast'' operations can be performed in

O(\log , P, )

. A variant of Prim's algorithm for shared memory machines, in which Prim's sequential algorithm is being run in parallel, starting from different vertices, has also been explored. It should, however, be noted that more sophisticated algorithms exist to solve the distributed minimum spanning tree problem in a more efficient manner.

References

External links