Tree search algorithm
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, tree traversal (also known as tree search and walking the tree) is a form of
graph traversal In computer science, graph traversal (also known as graph search) refers to the process of visiting (checking and/or updating) each vertex in a graph. Such traversals are classified by the order in which the vertices are visited. Tree traversal ...
and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a
tree data structure In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be c ...
, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
, but they may be generalized to other trees as well.


Types

Unlike
linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which ...
s,
one-dimensional array In computer science, an array is a data structure consisting of a collection of ''elements'' (values or variables), each identified by at least one ''array index'' or ''key''. An array is stored such that the position of each element can be co ...
s and other linear data structures, which are canonically traversed in linear order, trees may be traversed in multiple ways. They may be traversed in depth-first or breadth-first order. There are three common ways to traverse them in depth-first order: in-order, pre-order and post-order. Beyond these basic traversals, various more complex or hybrid schemes are possible, such as
depth-limited search In computer science, iterative deepening search or more specifically iterative deepening depth-first search (IDS or IDDFS) is a state space/graph search strategy in which a depth-limited version of depth-first search is run repeatedly with incr ...
es like
iterative deepening depth-first search In computer science, iterative deepening search or more specifically iterative deepening depth-first search (IDS or IDDFS) is a state space/graph search strategy in which a depth-limited version of depth-first search is run repeatedly with incr ...
. The latter, as well as breadth-first search, can also be used to traverse infinite trees, see below.


Data structures for tree traversal

Traversing a tree involves iterating over all nodes in some manner. Because from a given node there is more than one possible next node (it is not a linear data structure), then, assuming sequential computation (not parallel), some nodes must be deferred—stored in some way for later visiting. This is often done via a stack (LIFO) or
queue __NOTOC__ Queue () may refer to: * Queue area, or queue, a line or area where people wait for goods or services Arts, entertainment, and media *''ACM Queue'', a computer magazine * ''The Queue'' (Sorokin novel), a 1983 novel by Russian author ...
(FIFO). As a tree is a self-referential (recursively defined) data structure, traversal can be defined by
recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathemati ...
or, more subtly,
corecursion In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, ...
, in a natural and clear fashion; in these cases the deferred nodes are stored implicitly in the
call stack In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or m ...
. Depth-first search is easily implemented via a stack, including recursively (via the call stack), while breadth-first search is easily implemented via a queue, including corecursively.


Depth-first search

In ''depth-first search'' (DFS), the search tree is deepened as much as possible before going to the next sibling. To traverse binary trees with depth-first search, perform the following operations at each node: # If the current node is empty then return. # Execute the following three operations in a certain order: #: N: Visit the current node. #: L: Recursively traverse the current node's left subtree. #: R: Recursively traverse the current node's right subtree. The trace of a traversal is called a sequentialisation of the tree. The traversal trace is a list of each visited node. No one sequentialisation according to pre-, in- or post-order describes the underlying tree uniquely. Given a tree with distinct elements, either pre-order or post-order paired with in-order is sufficient to describe the tree uniquely. However, pre-order with post-order leaves some ambiguity in the tree structure. There are three methods at which position of the traversal relative to the node (in the figure: red, green, or blue) the visit of the node shall take place. The choice of exactly one color determines exactly one visit of a node as described below. Visit at all three colors results in a threefold visit of the same node yielding the “all-order” sequentialisation: :--------------------------


Pre-order, NLR

# Visit the current node (in the figure: position red). # Recursively traverse the current node's left subtree. # Recursively traverse the current node's right subtree. The pre-order traversal is a topologically sorted one, because a parent node is processed before any of its child nodes is done.


Post-order, LRN

# Recursively traverse the current node's left subtree. # Recursively traverse the current node's right subtree. # Visit the current node (in the figure: position blue). Post-order traversal can be useful to get postfix expression of a
binary expression tree A binary expression tree is a specific kind of a binary tree used to represent expressions. Two common types of expressions that a binary expression tree can represent are algebraic and boolean. These trees can represent expressions that contai ...
.


In-order, LNR

# Recursively traverse the current node's left subtree. # Visit the current node (in the figure: position green). # Recursively traverse the current node's right subtree. In a
binary search tree In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and ...
ordered such that in each node the key is greater than all keys in its left subtree and less than all keys in its right subtree, in-order traversal retrieves the keys in ''ascending'' sorted order.


Reverse pre-order, NRL

# Visit the current node. # Recursively traverse the current node's right subtree. # Recursively traverse the current node's left subtree.


Reverse post-order, RLN

# Recursively traverse the current node's right subtree. # Recursively traverse the current node's left subtree. # Visit the current node.


Reverse in-order, RNL

# Recursively traverse the current node's right subtree. # Visit the current node. # Recursively traverse the current node's left subtree. In a
binary search tree In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and ...
ordered such that in each node the key is greater than all keys in its left subtree and less than all keys in its right subtree, reverse in-order traversal retrieves the keys in ''descending'' sorted order.


Arbitrary trees

To traverse arbitrary trees (not necessarily binary trees) with depth-first search, perform the following operations at each node: # If the current node is empty then return. # Visit the current node for pre-order traversal. # For each ''i'' from 1 to the current node's number of subtrees − 1, or from the latter to the former for reverse traversal, do: ## Recursively traverse the current node's ''i''-th subtree. ## Visit the current node for in-order traversal. # Recursively traverse the current node's last subtree. # Visit the current node for post-order traversal. Depending on the problem at hand, pre-order, post-order, and especially one of the number of subtrees − 1 in-order operations may be optional. Also, in practice more than one of pre-order, post-order, and in-order operations may be required. For example, when inserting into a ternary tree, a pre-order operation is performed by comparing items. A post-order operation may be needed afterwards to re-balance the tree.


Breadth-first search

In ''breadth-first search'' (BFS) or ''level-order search'', the search tree is broadened as much as possible before going to the next depth.


Other types

There are also tree traversal algorithms that classify as neither depth-first search nor breadth-first search. One such algorithm is
Monte Carlo tree search In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in software that plays board games. In that context MCTS is used to solve the game tree. MCT ...
, which concentrates on analyzing the most promising moves, basing the expansion of the
search tree In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and less ...
on
random sampling In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attemp ...
of the search space.


Applications

Pre-order traversal can be used to make a prefix expression (
Polish notation Polish notation (PN), also known as normal Polish notation (NPN), Łukasiewicz notation, Warsaw notation, Polish prefix notation or simply prefix notation, is a mathematical notation in which operators ''precede'' their operands, in contrast ...
) from expression trees: traverse the expression tree pre-orderly. For example, traversing the depicted arithmetic expression in pre-order yields "+ * ''A'' − ''B'' ''C'' + ''D'' ''E''". In prefix notation, there is no need for any parentheses as long as each operator has a fixed number of operands. Preorder traversal is also used to create a copy of the tree. Post-order traversal can generate a postfix representation (
Reverse Polish notation Reverse Polish notation (RPN), also known as reverse Łukasiewicz notation, Polish postfix notation or simply postfix notation, is a mathematical notation in which operators ''follow'' their operands, in contrast to Polish notation (PN), in whi ...
) of a binary tree. Traversing the depicted arithmetic expression in post-order yields "''A'' ''B'' ''C'' − * ''D'' ''E'' + +"; the latter can easily be transformed into
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
to evaluate the expression by a stack machine. Postorder traversal is also used to delete the tree. Each node is freed after freeing its children. In-order traversal is very commonly used on
binary search tree In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and ...
s because it returns values from the underlying set in order, according to the comparator that set up the binary search tree.


Implementations


Depth-first search implementation


Pre-order implementation


Post-order implementation


In-order implementation


Another variant of Pre-order

If the tree is represented by an array (first index is 0), it is possible to calculate the index of the next element: procedure bubbleUp(array, i, leaf) k ← 1 i ← (i - 1)/2 while (leaf + 1) % (k * 2) ≠ k i ← (i - 1)/2 k ← 2 * k return i procedure preorder(array) i ← 0 while i ≠ array.size visit(array if i = size - 1 i ← size else if i < size/2 i ← i * 2 + 1 else leaf ← i - size/2 parent ← bubble_up(array, i, leaf) i ← parent * 2 + 2


Advancing to the next or previous node

The node to be started with may have been found in the binary search tree bst by means of a standard search function, which is shown here in an implementation without parent pointers, i.e. it uses a stack for holding the ancestor pointers. procedure search(bst, key) // returns a (node, stack) node ← bst.root stack ← empty stack while node ≠ null stack.push(node) if key = node.key return (node, stack) if key < node.key node ← node.left else node ← node.right return (null, empty stack) The function inorderNext returns an in-order-neighbor of node, either the (for dir=1) or the (for dir=0), and the updated stack, so that the binary search tree may be sequentially in-order-traversed and searched in the given direction dir further on. procedure inorderNext(node, dir, stack) newnode ← node.child ir if newnode ≠ null do node ← newnode stack.push(node) newnode ← node.child -dir until newnode = null return (node, stack) // node does not have a dir-child: do if stack.isEmpty() return (null, empty stack) oldnode ← node node ← stack.pop() // parent of oldnode until oldnode ≠ node.child ir // now oldnode = node.child -dir // i.e. node = ancestor (and predecessor/successor) of original node return (node, stack) Note that the function does not use keys, which means that the sequential structure is completely recorded by the binary search tree’s edges. For traversals without change of direction, the ( amortised) average complexity is \mathcal(1) , because a full traversal takes 2 n-2 steps for a BST of size n , 1 step for edge up and 1 for edge down. The worst-case complexity is \mathcal(h) with h as the height of the tree. All the above implementations require stack space proportional to the height of the tree which is a
call stack In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or m ...
for the recursive and a parent (ancestor) stack for the iterative ones. In a poorly balanced tree, this can be considerable. With the iterative implementations we can remove the stack requirement by maintaining parent pointers in each node, or by threading the tree (next section).


Morris in-order traversal using threading

A binary tree is threaded by making every left child pointer (that would otherwise be null) point to the in-order predecessor of the node (if it exists) and every right child pointer (that would otherwise be null) point to the in-order successor of the node (if it exists). Advantages: # Avoids recursion, which uses a call stack and consumes memory and time. # The node keeps a record of its parent. Disadvantages: # The tree is more complex. # We can make only one traversal at a time. # It is more prone to errors when both the children are not present and both values of nodes point to their ancestors. Morris traversal is an implementation of in-order traversal that uses threading: # Create links to the in-order successor. # Print the data using these links. # Revert the changes to restore original tree.


Breadth-first search

Also, listed below is pseudocode for a simple
queue __NOTOC__ Queue () may refer to: * Queue area, or queue, a line or area where people wait for goods or services Arts, entertainment, and media *''ACM Queue'', a computer magazine * ''The Queue'' (Sorokin novel), a 1983 novel by Russian author ...
based level-order traversal, and will require space proportional to the maximum number of nodes at a given depth. This can be as much as half the total number of nodes. A more space-efficient approach for this type of traversal can be implemented using an
iterative deepening depth-first search In computer science, iterative deepening search or more specifically iterative deepening depth-first search (IDS or IDDFS) is a state space/graph search strategy in which a depth-limited version of depth-first search is run repeatedly with incr ...
. procedure levelorder(node) queue ← empty queue queue.enqueue(node) while not queue.isEmpty() node ← queue.dequeue() visit(node) if node.left ≠ null queue.enqueue(node.left) if node.right ≠ null queue.enqueue(node.right) If the tree is represented by an array (first index is 0), it is sufficient iterating through all elements: procedure levelorder(array) for i from 0 to array.size visit(array


Infinite trees

While traversal is usually done for trees with a finite number of nodes (and hence finite depth and finite
branching factor In computing, tree data structures, and game theory, the branching factor is the number of children at each node, the outdegree. If this value is not uniform, an ''average branching factor'' can be calculated. For example, in chess, if a "no ...
) it can also be done for infinite trees. This is of particular interest in
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
(particularly with
lazy evaluation In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed ( non-strict evaluation) and which also avoids repeated evaluations (sharing). The ...
), as infinite data structures can often be easily defined and worked with, though they are not (strictly) evaluated, as this would take infinite time. Some finite trees are too large to represent explicitly, such as the
game tree In the context of Combinatorial game theory, which typically studies sequential games with perfect information, a game tree is a graph representing all possible game states within such a game. Such games include well-known ones such as chess, ch ...
for
chess Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to dist ...
or go, and so it is useful to analyze them as if they were infinite. A basic requirement for traversal is to visit every node eventually. For infinite trees, simple algorithms often fail this. For example, given a binary tree of infinite depth, a depth-first search will go down one side (by convention the left side) of the tree, never visiting the rest, and indeed an in-order or post-order traversal will never visit ''any'' nodes, as it has not reached a leaf (and in fact never will). By contrast, a breadth-first (level-order) traversal will traverse a binary tree of infinite depth without problem, and indeed will traverse any tree with bounded branching factor. On the other hand, given a tree of depth 2, where the root has infinitely many children, and each of these children has two children, a depth-first search will visit all nodes, as once it exhausts the grandchildren (children of children of one node), it will move on to the next (assuming it is not post-order, in which case it never reaches the root). By contrast, a breadth-first search will never reach the grandchildren, as it seeks to exhaust the children first. A more sophisticated analysis of running time can be given via infinite ordinal numbers; for example, the breadth-first search of the depth 2 tree above will take ω·2 steps: ω for the first level, and then another ω for the second level. Thus, simple depth-first or breadth-first searches do not traverse every infinite tree, and are not efficient on very large trees. However, hybrid methods can traverse any (countably) infinite tree, essentially via a
diagonal argument A diagonal argument, in mathematics, is a technique employed in the proofs of the following theorems: *Cantor's diagonal argument (the earliest) *Cantor's theorem * Russell's paradox *Diagonal lemma ** Gödel's first incompleteness theorem **Tarski ...
("diagonal"—a combination of vertical and horizontal—corresponds to a combination of depth and breadth). Concretely, given the infinitely branching tree of infinite depth, label the root (), the children of the root (1), (2), …, the grandchildren (1, 1), (1, 2), …, (2, 1), (2, 2), …, and so on. The nodes are thus in a one-to-one correspondence with finite (possibly empty) sequences of positive numbers, which are countable and can be placed in order first by sum of entries, and then by
lexicographic order In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a ...
within a given sum (only finitely many sequences sum to a given value, so all entries are reached—formally there are a finite number of compositions of a given natural number, specifically 2''n''−1 compositions of ), which gives a traversal. Explicitly: # () # (1) # (1, 1) (2) # (1, 1, 1) (1, 2) (2, 1) (3) # (1, 1, 1, 1) (1, 1, 2) (1, 2, 1) (1, 3) (2, 1, 1) (2, 2) (3, 1) (4) etc. This can be interpreted as mapping the infinite depth binary tree onto this tree and then applying breadth-first search: replace the "down" edges connecting a parent node to its second and later children with "right" edges from the first child to the second child, from the second child to the third child, etc. Thus at each step one can either go down (append a (, 1) to the end) or go right (add one to the last number) (except the root, which is extra and can only go down), which shows the correspondence between the infinite binary tree and the above numbering; the sum of the entries (minus one) corresponds to the distance from the root, which agrees with the 2''n''−1 nodes at depth in the infinite binary tree (2 corresponds to binary).


References


Sources

* Dale, Nell. Lilly, Susan D. "Pascal Plus Data Structures". D. C. Heath and Company. Lexington, MA. 1995. Fourth Edition. * Drozdek, Adam. "Data Structures and Algorithms in C++". Brook/Cole. Pacific Grove, CA. 2001. Second edition.
"Tree Transversal" (math.northwestern.edu)


External links


Storing Hierarchical Data in a Database
with traversal examples in PHP




Sample code for recursive tree traversal in Python.

See tree traversal implemented in various programming language
on Rosetta Code
Tree traversal without recursion
{{DEFAULTSORT:Tree Traversal Trees (data structures) Articles with example pseudocode Graph algorithms Recursion Iteration in programming de:Binärbaum#Traversierung ja:木構造 (データ構造)