In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a topological sort or topological ordering of a
directed graph
In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs.
Definition
In formal terms, a directed graph is an ordered pa ...
is a
linear ordering
In mathematics, a total or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation \leq on some set X, which satisfies the following for all a, b and c in X:
# a \leq a ( reflexi ...
of its
vertices such that for every directed edge ''uv'' from vertex ''u'' to vertex ''v'', ''u'' comes before ''v'' in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node ''v'' is visited only after all its dependencies are visited''.'' A topological ordering is possible if and only if the graph has no
directed cycle
Director may refer to:
Literature
* ''Director'' (magazine), a British magazine
* ''The Director'' (novel), a 1971 novel by Henry Denker
* ''The Director'' (play), a 2000 play by Nancy Hasty
Music
* Director (band), an Irish rock band
* ''Di ...
s, that is, if it is a
directed acyclic graph
In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ve ...
(DAG). Any DAG has at least one topological ordering, and
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s are known for constructing a topological ordering of any DAG in
linear time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
. Topological sorting has many applications especially in ranking problems such as
feedback arc set
In graph theory and graph algorithms, a feedback arc set or feedback edge set in a directed graph is a subset of the edges of the graph that contains at least one edge out of every cycle in the graph. Removing these edges from the graph breaks al ...
. Topological sorting is possible even when the DAG has
disconnected components.
Examples
The canonical application of topological sorting is in
scheduling
A schedule or a timetable, as a basic time-management tool, consists of a list of times at which possible task (project management), tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order ...
a sequence of jobs or tasks based on their
dependencies. The jobs are represented by vertices, and there is an edge from ''x'' to ''y'' if job ''x'' must be completed before job ''y'' can be started (for example, when washing clothes, the washing machine must finish before we put the clothes in the dryer). Then, a topological sort gives an order in which to perform the jobs. A closely related application of topological sorting algorithms was first studied in the early 1960s in the context of the
PERT
Pert or PERT may refer to:
Ships
* - see List of United States Navy ships: P
* , a World War II corvette, originally HMS ''Nepeta''
* ''Pert'' (sidewheeler), a 19th-century steamboat that operated in British Columbia, Canada
Statistics
* PE ...
technique for scheduling in
project management
Project management is the process of leading the work of a team to achieve all project goals within the given constraints. This information is usually described in project documentation, created at the beginning of the development process. Th ...
. In this application, the vertices of a graph represent the milestones of a project, and the edges represent tasks that must be performed between one milestone and another. Topological sorting forms the basis of linear-time algorithms for finding the
critical path of the project, a sequence of milestones and tasks that controls the length of the overall project schedule.
In computer science, applications of this type arise in
instruction scheduling
In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing ...
, ordering of formula cell evaluation when recomputing formula values in
spreadsheet
A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cel ...
s,
logic synthesis
In computer engineering, logic synthesis is a process by which an abstract specification of desired circuit behavior, typically at register transfer level (RTL), is turned into a design implementation in terms of logic gates, typically by a compu ...
, determining the order of compilation tasks to perform in
makefile
In software development, Make is a build automation tool that automatically builds executable programs and libraries from source code by reading files called ''Makefiles'' which specify how to derive the target program. Though integrated develo ...
s, data
serialization
In computing, serialization (or serialisation) is the process of translating a data structure or object state into a format that can be stored (e.g. files in secondary storage devices, data buffers in primary storage devices) or transmitted (e ...
, and resolving symbol dependencies in
linkers. It is also used to decide in which order to load tables with foreign keys in databases.
Algorithms
The usual algorithms for topological sorting have running time linear in the number of nodes plus the number of edges, asymptotically,
Kahn's algorithm
One of these algorithms, first described by , works by choosing vertices in the same order as the eventual topological sort. First, find a list of "start nodes" which have no incoming edges and insert them into a set S; at least one such node must exist in a non-empty acyclic graph. Then:
''L'' ← Empty list that will contain the sorted elements
''S'' ← Set of all nodes with no incoming edge
while ''S'' is not empty do
remove a node ''n'' from ''S''
add ''n'' to ''L''
for each node ''m'' with an edge ''e'' from ''n'' to ''m'' do
remove edge ''e'' from the graph
if ''m'' has no other incoming edges then
insert ''m'' into ''S''
if ''graph'' has edges then
return error ''(graph has at least one cycle)''
else
return ''L'' ''(a topologically sorted order)''
If the graph is a
DAG, a solution will be contained in the list L (the solution is not necessarily unique). Otherwise, the graph must have at least one cycle and therefore a topological sort is impossible.
Reflecting the non-uniqueness of the resulting sort, the structure S can be simply a set or a queue or a stack. Depending on the order that nodes n are removed from set S, a different solution is created. A variation of Kahn's algorithm that breaks ties
lexicographically
In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a ...
forms a key component of the
Coffman–Graham algorithm
In job shop scheduling and graph drawing, the Coffman–Graham algorithm is an algorithm, named after Edward G. Coffman, Jr. and Ronald Graham, for arranging the elements of a partially ordered set into a sequence of levels. The algorithm chooses a ...
for parallel scheduling and
layered graph drawing
Layered graph drawing or hierarchical graph drawing is a type of graph drawing in which the vertices of a directed graph are drawn in horizontal rows or layers with the edges generally directed downwards.... It is also known as Sugiyama-style gra ...
.
Depth-first search
An alternative algorithm for topological sorting is based on
depth-first search
Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible alon ...
. The algorithm loops through each node of the graph, in an arbitrary order, initiating a depth-first search that terminates when it hits any node that has already been visited since the beginning of the topological sort or the node has no outgoing edges (i.e. a leaf node):
''L'' ← Empty list that will contain the sorted nodes
while exists nodes without a permanent mark do
select an unmarked node ''n''
visit(''n'')
function visit(node ''n'')
if ''n'' has a permanent mark then
return
if ''n'' has a temporary mark then
stop ''(graph has at least one cycle)''
mark ''n'' with a temporary mark
for each node ''m'' with an edge from ''n'' to ''m'' do
visit(''m'')
remove temporary mark from ''n''
mark ''n'' with a permanent mark
add ''n'' to head of ''L''
Each node ''n'' gets ''prepended'' to the output list L only after considering all other nodes which depend on ''n'' (all descendants of ''n'' in the graph). Specifically, when the algorithm adds node ''n'', we are guaranteed that all nodes which depend on ''n'' are already in the output list L: they were added to L either by the recursive call to visit() which ended before the call to visit ''n'', or by a call to visit() which started even before the call to visit ''n''. Since each edge and node is visited once, the algorithm runs in linear time. This depth-first-search-based algorithm is the one described by ; it seems to have been first described in print by Tarjan in 1976.
Parallel algorithms
On a
parallel random-access machine
In computer science, a parallel random-access machine (parallel RAM or PRAM) is a shared-memory abstract machine. As its name indicates, the PRAM is intended as the parallel-computing analogy to the random-access machine (RAM) (not to be confused ...
, a topological ordering can be constructed in ''O''(log
2 ''n'') time using a polynomial number of processors, putting the problem into the complexity class
NC2.
One method for doing this is to repeatedly square the
adjacency matrix
In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
In the special case of a finite simp ...
of the given graph, logarithmically many times, using
min-plus matrix multiplication Min-plus matrix multiplication, also known as distance product, is an operation on matrices.
Given two n \times n matrices A = (a_) and B = (b_), their distance product C = (c_) = A \star B is defined as an n \times n matrix such that c_ = \min_^n ...
with maximization in place of minimization. The resulting matrix describes the
longest path
In graph theory and theoretical computer science, the longest path problem is the problem of finding a simple path of maximum length in a given graph. A path is called ''simple'' if it does not have any repeated vertices; the length of a path may ...
distances in the graph. Sorting the vertices by the lengths of their longest incoming paths produces a topological ordering.
An algorithm for parallel topological sorting on
distributed memory
In computer science, distributed memory refers to a multiprocessor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data are required, the computational task mu ...
machines parallelizes the algorithm of Kahn for a
DAG . On a high level, the algorithm of Kahn repeatedly removes the vertices of indegree 0 and adds them to the topological sorting in the order in which they were removed. Since the outgoing edges of the removed vertices are also removed, there will be a new set of vertices of indegree 0, where the procedure is repeated until no vertices are left. This algorithm performs
iterations, where is the longest path in . Each iteration can be parallelized, which is the idea of the following algorithm.
In the following it is assumed that the graph partition is stored on processing elements (PE) which are labeled
. Each PE initializes a set of local vertices
with
indegree
In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs.
Definition
In formal terms, a directed graph is an ordered pa ...
0, where the upper index represents the current iteration. Since all vertices in the local sets
have indegree 0, i.e. they are not adjacent, they can be given in an arbitrary order for a valid topological sorting. To assign a global index to each vertex, a
prefix sum In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers is a second sequence of numbers , the sums of prefixes ( running totals) of the input sequence:
:
:
:
:...
For instance, the prefix sums ...
is calculated over the sizes of
. So each step, there are
vertices added to the topological sorting.
In the first step, PE assigns the indices
to the local vertices in
. These vertices in
are removed, together with their corresponding outgoing edges. For each outgoing edge
with endpoint in another PE
, the message
is posted to PE . After all vertices in
are removed, the posted messages are sent to their corresponding PE. Each message
received updates the indegree of the local vertex . If the indegree drops to zero, is added to
. Then the next iteration starts.
In step , PE assigns the indices
, where
is the total amount of processed vertices after step . This procedure repeats until there are no vertices left to process, hence
. Below is a high level,
single program, multiple data pseudo code overview of this algorithm.
Note that the
prefix sum In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers is a second sequence of numbers , the sums of prefixes ( running totals) of the input sequence:
:
:
:
:...
For instance, the prefix sums ...
for the local offsets
can be efficiently calculated in parallel.
p processing elements with IDs from 0 to ''p''-1
Input: G = (V, E) DAG, distributed to PEs, PE index j = 0, ..., p - 1
Output: topological sorting of G
function traverseDAGDistributed
δ incoming degree of local vertices ''V''
// All vertices with indegree 0
nrOfVerticesProcessed = 0
do
global build prefix sum over size of ''Q'' // get offsets and total amount of vertices in this step
offset = nrOfVerticesProcessed + sum(Q
i, i = 0 to j - 1) // ''j'' is the processor index
foreach u in Q
localOrder
= index++;
foreach (u,v) in E do post message (''u, v'') to PE owning vertex ''v''
nrOfVerticesProcessed += sum(, Q
i, , i = 0 to p - 1)
deliver all messages to neighbors of vertices in Q
receive messages for local vertices V
remove all vertices in Q
foreach message (''u, v'') received:
if --δ
= 0
add ''v'' to ''Q''
while global size of ''Q'' > 0
return localOrder
The communication cost depends heavily on the given graph partition. As for runtime, on a
CRCW-PRAM model that allows fetch-and-decrement in constant time, this algorithm runs in
, where is again the longest path in and the maximum degree.
Application to shortest path finding
The topological ordering can also be used to quickly compute
shortest paths through a
weighted
A weight function is a mathematical device used when performing a sum, integral, or average to give some elements more "weight" or influence on the result than other elements in the same set. The result of this application of a weight function is ...
directed acyclic graph. Let be the list of vertices in such a graph, in topological order. Then the following algorithm computes the shortest path from some source vertex to all other vertices:
* Let be an array of the same length as ; this will hold the shortest-path distances from . Set , all other .
* Let be an array of the same length as , with all elements initialized to . Each will hold the predecessor of in the shortest path from to .
* Loop over the vertices as ordered in , starting from :
** For each vertex directly following (i.e., there exists an edge from to ):
*** Let be the weight of the edge from to .
*** Relax the edge: if , set
**** ,
**** .
Equivalently:
* Let be an array of the same length as ; this will hold the shortest-path distances from . Set , all other .
* Let be an array of the same length as , with all elements initialized to . Each will hold the predecessor of in the shortest path from to .
* Loop over the vertices as ordered in , starting from :
** For each vertex into (i.e., there exists an edge from to ):
*** Let be the weight of the edge from to .
*** Relax the edge: if , set
**** ,
**** .
On a graph of vertices and edges, this algorithm takes , i.e.,
linear
Linearity is the property of a mathematical relationship (''function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear r ...
, time.
Uniqueness
If a topological sort has the property that all pairs of consecutive vertices in the sorted order are connected by edges, then these edges form a directed
Hamiltonian path
In the mathematical field of graph theory, a Hamiltonian path (or traceable path) is a path in an undirected or directed graph that visits each vertex exactly once. A Hamiltonian cycle (or Hamiltonian circuit) is a cycle that visits each vertex ...
in the
DAG. If a Hamiltonian path exists, the topological sort order is unique; no other order respects the edges of the path. Conversely, if a topological sort does not form a Hamiltonian path, the DAG will have two or more valid topological orderings, for in this case it is always possible to form a second valid ordering by swapping two consecutive vertices that are not connected by an edge to each other. Therefore, it is possible to test in linear time whether a unique ordering exists, and whether a Hamiltonian path exists, despite the
NP-hard
In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
ness of the Hamiltonian path problem for more general directed graphs (i.e. cyclic directed graphs).
Relation to partial orders
Topological orderings are also closely related to the concept of a
linear extension
In order theory, a branch of mathematics, a linear extension of a partial order is a total order (or linear order) that is compatible with the partial order. As a classic example, the lexicographic order of totally ordered sets is a linear extens ...
of a
partial order
In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a binary ...
in mathematics. A partially ordered set is just a set of objects together with a definition of the "≤" inequality relation, satisfying the axioms of reflexivity (''x'' ≤ ''x''), antisymmetry (if ''x'' ≤ ''y'' and ''y'' ≤ ''x'' then ''x'' = ''y'') and
transitivity (if ''x'' ≤ ''y'' and ''y'' ≤ ''z'', then ''x'' ≤ ''z''). A
total order
In mathematics, a total or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation \leq on some set X, which satisfies the following for all a, b and c in X:
# a \leq a ( reflexive) ...
is a partial order in which, for every two objects ''x'' and ''y'' in the set, either ''x'' ≤ ''y'' or ''y'' ≤ ''x''. Total orders are familiar in computer science as the comparison operators needed to perform
comparison sort
A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should occ ...
ing algorithms. For finite sets, total orders may be identified with linear sequences of objects, where the "≤" relation is true whenever the first object precedes the second object in the order; a comparison sorting algorithm may be used to convert a total order into a sequence in this way. A linear extension of a partial order is a total order that is compatible with it, in the sense that, if ''x'' ≤ ''y'' in the partial order, then ''x'' ≤ ''y'' in the total order as well.
One can define a partial ordering from any DAG by letting the set of objects be the vertices of the DAG, and defining ''x'' ≤ ''y'' to be true, for any two vertices ''x'' and ''y'', whenever there exists a
directed path
In graph theory, a path in a graph is a finite or infinite sequence of edges which joins a sequence of vertices which, by most definitions, are all distinct (and since the vertices are distinct, so are the edges). A directed path (sometimes c ...
from ''x'' to ''y''; that is, whenever ''y'' is
reachable from ''x''. With these definitions, a topological ordering of the DAG is the same thing as a linear extension of this partial order. Conversely, any partial ordering may be defined as the reachability relation in a DAG. One way of doing this is to define a DAG that has a vertex for every object in the partially ordered set, and an edge ''xy'' for every pair of objects for which ''x'' ≤ ''y''. An alternative way of doing this is to use the
transitive reduction
In the mathematical field of graph theory, a transitive reduction of a directed graph is another directed graph with the same vertices and as few edges as possible, such that for all pairs of vertices , a (directed) path from to in exists if ...
of the partial ordering; in general, this produces DAGs with fewer edges, but the reachability relation in these DAGs is still the same partial order. By using these constructions, one can use topological ordering algorithms to find linear extensions of partial orders.
Relation to scheduling optimisation
By definition, the solution of a scheduling problem that includes a precedence graph is a valid solution to topological sort (irrespective of the number of machines), however, topological sort in itself is ''not'' enough to optimally solve a scheduling optimisation problem. Hu's algorithm is a popular method used to solve scheduling problems that require a precedence graph and involve processing times (where the goal is to minimise the largest completion time amongst all the jobs). Like topological sort, Hu's algorithm is not unique and can be solved using DFS (by finding the largest path length and then assigning the jobs).
See also
*
tsort
The tsort program is a command line utility on Unix and Unix-like platforms, that performs a topological sort on its input. , it is part of the POSIX.1 standard.
History
According to its info page, this command was initially written for providi ...
, a Unix program for topological sorting
*
Feedback arc set
In graph theory and graph algorithms, a feedback arc set or feedback edge set in a directed graph is a subset of the edges of the graph that contains at least one edge out of every cycle in the graph. Removing these edges from the graph breaks al ...
, a set of edges whose removal allows the remaining subgraph to be topologically sorted
*
Tarjan's strongly connected components algorithm
Tarjan's strongly connected components algorithm is an algorithm in graph theory for finding the strongly connected components (SCCs) of a directed graph. It runs in linear time, matching the time bound for alternative methods including Kosaraju's ...
, an algorithm that gives the topologically sorted list of strongly connected components in a graph
*
Pre-topological order
References
Further reading
*
D. E. Knuth
Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sci ...
,
The Art of Computer Programming
''The Art of Computer Programming'' (''TAOCP'') is a comprehensive monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. Volumes 1–5 are intended to represent the central core of compu ...
, Volume 1, section 2.2.3, which gives an algorithm for topological sorting of a partial ordering, and a brief history.
External links
NIST Dictionary of Algorithms and Data Structures: topological sort*
{{sorting
Graph algorithms
Sorting algorithms
Directed acyclic graphs
Articles with example pseudocode