computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...

, Prim's algorithm (also known as Jarník's algorithm) is a greedy algorithm that finds a

minimum spanning tree A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. ...

for a

weighted A weight function is a mathematical device used when performing a sum, integral, or average to give some elements more "weight" or influence on the result than other elements in the same set. The result of this application of a weight function is ...

undirected graph In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' ve ...

. This means it finds a subset of the

edge Edge or EDGE may refer to: Technology Computing * Edge computing, a network load-balancing system * Edge device, an entry point to a computer network * Adobe Edge, a graphical development application * Microsoft Edge, a web browser developed ...

s that forms a

tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...

that includes every

vertex Vertex, vertices or vertexes may refer to: Science and technology Mathematics and computer science *Vertex (geometry), a point where two or more curves, lines, or edges meet *Vertex (computer graphics), a data structure that describes the position ...

, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex. The algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník and later rediscovered and republished by

computer scientist A computer scientist is a person who is trained in the academic study of computer science. Computer scientists typically work on the theoretical side of computation, as opposed to the hardware side on which computer engineers mainly focus (a ...

s Robert C. Prim in 1957 and Edsger W. Dijkstra in 1959. Therefore, it is also sometimes called the Jarník's algorithm, Prim–Jarník algorithm, Prim–Dijkstra algorithm. or the DJP algorithm.. Other well-known algorithms for this problem include

Kruskal's algorithm Kruskal's algorithm finds a minimum spanning forest of an undirected edge-weighted graph. If the graph is connected, it finds a minimum spanning tree. (A minimum spanning tree of a connected graph is a subset of the edges that forms a tree that ...

and

Borůvka's algorithm Borůvka's algorithm is a greedy algorithm for finding a minimum spanning tree in a graph, or a minimum spanning forest in the case of a graph that is not connected. It was first published in 1926 by Otakar Borůvka as a method of constructing ...

. These algorithms find the minimum spanning forest in a possibly disconnected graph; in contrast, the most basic form of Prim's algorithm only finds minimum spanning trees in connected graphs. However, running Prim's algorithm separately for each connected component of the graph, it can also be used to find the minimum spanning forest. In terms of their asymptotic

time complexity In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...

, these three algorithms are equally fast for sparse graphs, but slower than other more sophisticated algorithms. However, for graphs that are sufficiently dense, Prim's algorithm can be made to run in

linear time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...

, meeting or improving the time bounds for other algorithms., p. 77. Prim's algorithm

Description

The algorithm may informally be described as performing the following steps: In more detail, it may be implemented following the

pseudocode In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...

below. As described above, the starting vertex for the algorithm will be chosen arbitrarily, because the first iteration of the main loop of the algorithm will have a set of vertices in ''Q'' that all have equal weights, and the algorithm will automatically start a new tree in ''F'' when it completes a spanning tree of each connected component of the input graph. The algorithm may be modified to start with any particular vertex ''s'' by setting ''C'' 's''to be a number smaller than the other values of ''C'' (for instance, zero), and it may be modified to only find a single spanning tree rather than an entire spanning forest (matching more closely the informal description) by stopping whenever it encounters another vertex flagged as having no associated edge. Different variations of the algorithm differ from each other in how the set ''Q'' is implemented: as a simple

linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which ...

or array of vertices, or as a more complicated

priority queue In computer science, a priority queue is an abstract data-type similar to a regular queue or stack data structure in which each element additionally has a ''priority'' associated with it. In a priority queue, an element with high priority is se ...

data structure. This choice leads to differences in the

of the algorithm. In general, a priority queue will be quicker at finding the vertex ''v'' with minimum cost, but will entail more expensive updates when the value of ''C'' 'w''changes.

Time complexity

The time complexity of Prim's algorithm depends on the data structures used for the graph and for ordering the edges by weight, which can be done using a

. The following table shows the typical choices: A simple implementation of Prim's, using an

adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. In the special case of a finite simp ...

or an

adjacency list In graph theory and computer science, an adjacency list is a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph. This is ...

graph representation and linearly searching an array of weights to find the minimum weight edge to add, requires O(, V, ²) running time. However, this running time can be greatly improved further by using heaps to implement finding minimum weight edges in the algorithm's inner loop. A first improved version uses a heap to store all edges of the input graph, ordered by their weight. This leads to an O(, E, log , E, ) worst-case running time. But storing vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed

(MST) (or infinity if no such edge exists). Every time a vertex ''v'' is chosen and added to the MST, a decrease-key operation is performed on all vertices ''w'' outside the partial MST such that ''v'' is connected to ''w'', setting the key to the minimum of its previous value and the edge cost of (''v'',''w''). Using a simple binary heap data structure, Prim's algorithm can now be shown to run in time O(, E, log , V, ) where , E, is the number of edges and , V, is the number of vertices. Using a more sophisticated

Fibonacci heap In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the bina ...

, this can be brought down to O(, E, + , V, log , V, ), which is asymptotically faster when the graph is dense enough that , E, is ω(, V, ), and

when , E, is at least , V, log , V, . For graphs of even greater density (having at least , V, ^''c'' edges for some ''c'' > 1), Prim's algorithm can be made to run in linear time even more simply, by using a ''d''-ary heap in place of a Fibonacci heap. Prim's algorithm proof

Proof of correctness

Let ''P'' be a connected, weighted graph. At every iteration of Prim's algorithm, an edge must be found that connects a vertex in a subgraph to a vertex outside the subgraph. Since ''P'' is connected, there will always be a path to every vertex. The output ''Y'' of Prim's algorithm is a

, because the edge and vertex added to tree ''Y'' are connected. Let ''Y₁'' be a minimum spanning tree of graph P. If ''Y₁''=''Y'' then ''Y'' is a minimum spanning tree. Otherwise, let ''e'' be the first edge added during the construction of tree ''Y'' that is not in tree ''Y₁'', and ''V'' be the set of vertices connected by the edges added before edge ''e''. Then one endpoint of edge ''e'' is in set ''V'' and the other is not. Since tree ''Y₁'' is a spanning tree of graph ''P'', there is a path in tree ''Y₁'' joining the two endpoints. As one travels along the path, one must encounter an edge ''f'' joining a vertex in set ''V'' to one that is not in set ''V''. Now, at the iteration when edge ''e'' was added to tree ''Y'', edge ''f'' could also have been added and it would be added instead of edge ''e'' if its weight was less than ''e'', and since edge ''f'' was not added, we conclude that :

w(f) \ge w(e).

Let tree ''Y₂'' be the graph obtained by removing edge ''f'' from and adding edge ''e'' to tree ''Y₁''. It is easy to show that tree ''Y₂'' is connected, has the same number of edges as tree ''Y₁'', and the total weights of its edges is not larger than that of tree ''Y₁'', therefore it is also a minimum spanning tree of graph ''P'' and it contains edge ''e'' and all the edges added before it during the construction of set ''V''. Repeat the steps above and we will eventually obtain a minimum spanning tree of graph ''P'' that is identical to tree ''Y''. This shows ''Y'' is a minimum spanning tree. The minimum spanning tree allows for the first subset of the sub-region to be expanded into a smaller subset ''X'', which we assume to be the minimum.

Parallel algorithm

Distributed adjacency matrix for parallel prim

The main loop of Prim's algorithm is inherently sequential and thus not parallelizable. However, the inner loop, which determines the next edge of minimum weight that does not form a cycle, can be parallelized by dividing the vertices and edges between the available processors. The following

demonstrates this. This algorithm can generally be implemented on distributed machines as well as on shared memory machines. The running time is

O(\tfrac) + O(, V,  \log , P, )

, assuming that the ''reduce'' and ''broadcast'' operations can be performed in

O(\log , P, )

. A variant of Prim's algorithm for shared memory machines, in which Prim's sequential algorithm is being run in parallel, starting from different vertices, has also been explored. It should, however, be noted that more sophisticated algorithms exist to solve the

distributed minimum spanning tree The distributed minimum spanning tree (MST) problem involves the construction of a minimum spanning tree by a distributed algorithm, in a network where nodes communicate by message passing. It is radically different from the classical sequential pr ...

problem in a more efficient manner.

References

External links