Tree (computing)
   HOME

TheInfoList



OR:

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected
nodes In general, a node is a localized swelling (a "knot") or a point of intersection (a Vertex (graph theory), vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two ...
. Each node in the tree can be connected to many children (depending on the type of tree), but must be connected to exactly one parent, except for the ''root'' node, which has no parent. These constraints mean there are no cycles or "loops" (no node can be its own ancestor), and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to
linear data structure This is a list of well-known data structures. For a wider list of terms, see list of terms relating to algorithms and data structures. For a comparison of running times for a subset of this list see comparison of data structures. Data types P ...
s, many trees cannot be represented by relationships between neighboring nodes in a single straight line.
Binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
s are a commonly used type, which constrain the number of children for each parent to exactly two. When the order of the children is specified, this data structure corresponds to an ordered tree in graph theory. A value or pointer to other data may be associated with every node in the tree, or sometimes only with the ''leaf nodes'', which have no children. The abstract data type can be represented in a number of ways, including a list of parents with pointers to children, a list of children with pointers to parents, or a list of nodes and a separate list of parent-child relations (a specific type of
adjacency list In graph theory and computer science, an adjacency list is a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph. This is ...
). Representations might also be more complicated, for example using indexes or ancestor lists for performance. Trees as used in computing are similar to but can be different from mathematical constructs of trees in graph theory, trees in set theory, and trees in descriptive set theory.


Applications

Trees are commonly used to represent or manipulate
hierarchical A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an important ...
data in applications such as: * File systems for: ** Directory structure used to organize subdirectories and files ( symbolic links create non-tree graphs, as do multiple
hard link In computing, a hard link is a directory entry (in a directory-based file system) that associates a name with a file. Thus, each file must have at least one hard link. Creating additional hard links for a file makes the contents of that file acc ...
s to the same file or directory) ** The mechanism used to allocate and link blocks of data on the storage device * Class hierarchy or "inheritance tree" showing the relationships among classes in object-oriented programming; multiple inheritance produces non-tree graphs * Abstract syntax trees for computer languages *
Natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
: **
Parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
s ** Modeling utterances in a generative grammar ** Dialogue tree for generating conversations * Document Object Models ("DOM tree") of XML and HTML documents * Search trees store data in a way that makes an efficient search algorithm possible via tree traversal ** A binary search tree is a type of
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
* Representing sorted lists of data *
Computer-generated imagery Computer-generated imagery (CGI) is the use of computer graphics to create or contribute to images in art, printed media, video games, simulators, and visual effects in films, television programs, shorts, commercials, and videos. The images may ...
: **
Space partitioning In geometry, space partitioning is the process of dividing a space (usually a Euclidean space) into two or more disjoint subsets (see also partition of a set). In other words, space partitioning divides a space into non-overlapping regions. Any ...
, including binary space partitioning **
Digital compositing Digital compositing is the process of digitally assembling multiple images to make a final image, typically for print, motion pictures or screen display. It is the digital analogue of optical film compositing. Mathematics The basic operation use ...
* Storing Barnes–Hut trees used to simulate galaxies * Implementing heaps *
Nested set In a naive set theory, a nested set is a set containing a chain of subsets, forming a hierarchical structure, like Matryoshka doll, Russian dolls. It is used as reference-concept in all hierarchy, scientific hierarchy definitions, and many techn ...
s * Hierarchical taxonomies such as the Dewey Decimal Classification with sections of increasing specificity. *
Hierarchical temporal memory Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book ''On Intelligence'' by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for ...
* Genetic programming *
Hierarchical clustering In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into ...
Trees can be used to represent and manipulate various mathematical structures, such as: * Paths through an arbitrary node-and-edge graph (including multigraphs), by making multiple nodes in the tree for each graph node used in multiple paths * Any mathematical hierarchy Tree structures are often used for mapping the relationships between things, such as: * Components and subcomponents which can be visualized in an
exploded-view drawing An exploded-view drawing is a diagram, picture, schematic or technical drawing of an object, that shows the relationship or order of assembly of various parts. It shows the components of an object slightly separated by distance, or suspended in ...
*
Subroutine call In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. Functions may ...
s used to identify which subroutines in a program call other subroutines non recursively * Inheritance of DNA among species by evolution, of source code by software projects (e.g.
Linux distribution timeline Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which in ...
), of designs in various types of cars, etc. * The contents of hierarchical namespaces
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
and YAML documents can be thought of as trees, but are typically represented by nested
lists A ''list'' is any set of items in a row. List or lists may also refer to: People * List (surname) Organizations * List College, an undergraduate division of the Jewish Theological Seminary of America * SC Germania List, German rugby unio ...
and
dictionaries A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by radical and stroke for ideographic languages), which may include information on definitions, usage, etymologies, p ...
.


Terminology

A node is a structure which may contain data and connections to other nodes, sometimes called edges or links. Each node in a tree has zero or more child nodes, which are below it in the tree (by convention, trees are drawn with ''descendants'' going downwards). A node that has a child is called the child's parent node (or
superior Superior may refer to: *Superior (hierarchy), something which is higher in a hierarchical structure of any kind Places *Superior (proposed U.S. state), an unsuccessful proposal for the Upper Peninsula of Michigan to form a separate state *Lake ...
). All nodes have exactly one parent, except the topmost root node, which has none. A node might have many ancestor nodes, such as the parent's parent. Child nodes with the same parent are sibling nodes. Typically siblings have an order, with the first one conventionally drawn on the left. Some definitions allow a tree to have no nodes at all, in which case it is called ''empty''. An internal node (also known as an inner node, inode for short, or branch node) is any node of a tree that has child nodes. Similarly, an external node (also known as an outer node, leaf node, or terminal node) is any node that does not have child nodes. The height of a node is the length of the longest downward path to a leaf from that node. The height of the root is the height of the tree. The depth of a node is the length of the path to its root (i.e., its ''root path''). When using zero-based counting, the root node has depth zero, leaf nodes have height zero, and a tree with only a single node (hence both a root and leaf) has depth and height zero. Conventionally, an empty tree (tree with no nodes, if such are allowed) has height −1. Each non-root node can be treated as the root node of its own subtree, which includes that node and all its descendants. Other terms used with trees: This is the same as depth when using zero-based counting.


Examples of trees and non-trees


Common operations

* Enumerating all the items * Enumerating a section of a tree * Searching for an item * Adding a new item at a certain position on the tree * Deleting an item * Pruning: Removing a whole section of a tree *
Grafting Grafting or graftage is a horticultural technique whereby tissues of plants are joined so as to continue their growth together. The upper part of the combined plant is called the scion () while the lower part is called the rootstock. The succ ...
: Adding a whole section to a tree * Finding the root for any node * Finding the lowest common ancestor of two nodes


Traversal and search methods

Stepping through the items of a tree, by means of the connections between parents and children, is called walking the tree, and the action is a ''walk'' of the tree. Often, an operation might be performed when a pointer arrives at a particular node. A walk in which each parent node is traversed before its children is called a pre-order walk; a walk in which the children are traversed before their respective parents are traversed is called a post-order walk; a walk in which a node's left subtree, then the node itself, and finally its right subtree are traversed is called an in-order traversal. (This last scenario, referring to exactly two subtrees, a left subtree and a right subtree, assumes specifically a
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
.) A level-order walk effectively performs a breadth-first search over the entirety of a tree; nodes are traversed level by level, where the root node is visited first, followed by its direct child nodes and their siblings, followed by its grandchild nodes and their siblings, etc., until all nodes in the tree have been traversed.


Representations

There are many different ways to represent trees. In working memory, nodes are typically dynamically allocated records with pointers to their children, their parents, or both, as well as any associated data. If of a fixed size, the nodes might be stored in a list. Nodes and relationships between nodes might be stored in a separate special type of
adjacency list In graph theory and computer science, an adjacency list is a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph. This is ...
. In
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
s, nodes are typically represented as table rows, with indexed row IDs facilitating pointers between parents and children. Nodes can also be stored as items in an array, with relationships between them determined by their positions in the array (as in a binary heap). A
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
can be implemented as a list of lists: the head of a list (the value of the first term) is the left child (subtree), while the tail (the list of second and subsequent terms) is the right child (subtree). This can be modified to allow values as well, as in Lisp S-expressions, where the head (value of first term) is the value of the node, the head of the tail (value of second term) is the left child, and the tail of the tail (list of third and subsequent terms) is the right child. Ordered trees can be naturally encoded by finite sequences, for example with natural numbers.


Type theory

As an abstract data type, the abstract tree type with values of some type is defined, using the abstract forest type (list of trees), by the functions: : value: → : children: → : nil: () → : node: × → with the axioms: : value(node(, )) = : children(node(, )) = In terms of type theory, a tree is an
inductive type In type theory, a system has inductive types if it has facilities for creating a new type from constants and functions that create terms of that type. The feature serves a role similar to data structures in a programming language and allows a ty ...
defined by the constructors (empty forest) and (tree with root node with given value and children).


Mathematical terminology

Viewed as a whole, a tree data structure is an ordered tree, generally with values attached to each node. Concretely, it is (if required to be non-empty): * A rooted tree with the "away from root" direction (a more narrow term is an " arborescence"), meaning: ** A directed graph, ** whose underlying undirected graph is a tree (any two vertices are connected by exactly one simple path), ** with a distinguished root (one vertex is designated as the root), ** which determines the direction on the edges (arrows point away from the root; given an edge, the node that the edge points from is called the ''parent'' and the node that the edge points to is called the ''child''), together with: * an ordering on the child nodes of a given node, and * a value (of some data type) at each node. Often trees have a fixed (more properly, bounded) branching factor ( outdegree), particularly always having two child nodes (possibly empty, hence ''at most'' two ''non-empty'' child nodes), hence a "binary tree". Allowing empty trees makes some definitions simpler, some more complicated: a rooted tree must be non-empty, hence if empty trees are allowed the above definition instead becomes "an empty tree or a rooted tree such that ...". On the other hand, empty trees simplify defining fixed branching factor: with empty trees allowed, a binary tree is a tree such that every node has exactly two children, each of which is a tree (possibly empty). The complete sets of operations on the tree must include the fork operation.


See also

* Tree structure (general) * :Trees (data structures) (catalogs types of computational trees)


Notes


References


Further reading

* Donald Knuth. '' The Art of Computer Programming: Fundamental Algorithms'', Third Edition. Addison-Wesley, 1997. . Section 2.3: Trees, pp. 308–423. *
Thomas H. Cormen Thomas H. Cormen is the co-author of ''Introduction to Algorithms'', along with Charles Leiserson, Ron Rivest, and Cliff Stein. In 2013, he published a new book titled '' Algorithms Unlocked''. He is a professor of computer science at Dartmout ...
, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. '' Introduction to Algorithms'', Second Edition. MIT Press and McGraw-Hill, 2001. . Section 10.4: Representing rooted trees, pp. 214–217. Chapters 12–14 (Binary Search Trees, Red–Black Trees, Augmenting Data Structures), pp. 253–320.


External links


Data Trees as a Means of Presenting Complex Data Analysis
by Sally Knipe on August 8, 2013

from the Dictionary of Algorithms and Data Structures
CRAN package data.tree
– implementation of a tree data structure in the R programming language
WormWeb.org: Interactive Visualization of the ''C. elegans'' Cell Tree
– Visualize the entire cell lineage tree of the nematode ''C. elegans'' (javascript)
''Binary Trees'' by L. Allison
{{DEFAULTSORT:Tree (Data Structure) Data types Knowledge representation Abstract data types de:Baum (Graphentheorie)