The nested set model is a technique for representing
nested set collection
A nested set collection or nested set family is a collection of sets that consists of chains of subsets forming a hierarchical structure, like Matryoshka doll, Russian dolls.
It is used as reference concept in hierarchy, scientific hierarchy def ...
s (also known as
tree
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
s or
hierarchies
A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an importan ...
) in
relational database
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s.
It is based on Nested Intervals, that "are immune to hierarchy reorganization problem, and allow answering ancestor path hierarchical queries algorithmically — without accessing the stored hierarchy relation".
Motivation
The standard
relational algebra
In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd.
The main applica ...
and
relational calculus
The relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that is part of the relational model for databases and provide a declarative way to specify database queries. The raison d'être ...
, and the
SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
operations based on them, are unable to express directly all desirable operations on hierarchies. The nested set model is a solution to that problem.
An alternative solution is the expression of the hierarchy as a parent-child relation.
Joe Celko called this the
adjacency list model. If the hierarchy can have arbitrary depth, the adjacency list model does not allow the expression of operations such as comparing the contents of hierarchies of two elements, or determining whether an element is somewhere in the subhierarchy of another element. When the hierarchy is of fixed or bounded depth, the operations are possible, but expensive, due to the necessity of performing one
relational join per level. This is often known as the
bill of materials
A bill of materials or product structure (sometimes bill of material, BOM or associated list) is a list of the raw materials, sub-assemblies, intermediate assemblies, sub-components, parts, and the quantities of each needed to manufacture an Prod ...
problem.
Hierarchies may be expressed easily by switching to a
graph database
A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the dat ...
. Alternatively, several resolutions exist for the relational model and are available as a workaround in some
relational database management system
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s:
* support for a dedicated
hierarchy data type, such as in SQL's
hierarchical query
A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an important ...
facility;
* extending the relational language with hierarchy manipulations, such as in the
nested relational algebra.
* extending the relational language with
transitive closure
In mathematics, the transitive closure of a homogeneous binary relation on a set (mathematics), set is the smallest Relation (mathematics), relation on that contains and is Transitive relation, transitive. For finite sets, "smallest" can be ...
, such as SQL's CONNECT statement; this allows a parent-child relation to be used, but execution remains expensive;
* the queries can be expressed in a language that supports iteration and is wrapped around the relational operations, such as
PL/SQL
PL/SQL (Procedural Language for SQL) is Oracle Corporation's procedural extension for SQL and the Oracle relational database. PL/SQL is available in Oracle Database (since version 6 - stored PL/SQL procedures/functions/packages/triggers sinc ...
,
T-SQL or a
general-purpose programming language
In computer software, a general-purpose programming language (GPL) is a programming language for building software in a wide variety of application Domain (software engineering), domains. Conversely, a Domain-specific language, domain-specific pro ...
When these solutions are not available or not feasible, another approach must be taken.
Technique
The nested set model is to number the nodes according to a
tree traversal
In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a Tree (data structure), tree data stru ...
, which visits each node twice, assigning numbers in the order of visiting, and at both visits. This leaves two numbers for each node, which are stored as two attributes. Querying becomes inexpensive: hierarchy membership can be tested by comparing these numbers. Updating requires renumbering and is therefore expensive. Refinements that use
rational number
In mathematics, a rational number is a number that can be expressed as the quotient or fraction of two integers, a numerator and a non-zero denominator . For example, is a rational number, as is every integer (for example,
The set of all ...
s instead of integers can avoid renumbering, and so are faster to update, although much more complicated.
Example
In a clothing store catalog, clothing may be categorized according to the hierarchy given on the left:

The "Clothing" category, with the highest position in the hierarchy, encompasses all subordinating categories. It is therefore given left and right domain values of 1 and 22, the latter value being the double of the total number of nodes being represented. The next hierarchical level contains "Men's" and "Women's", both containing levels within themselves that must be accounted for. Each level's data node is assigned left and right domain values according to the number of sublevels contained within, as shown in the table data.
Performance
Queries using nested sets can be expected to be faster than queries using a
stored procedure
A stored procedure (also termed prc, proc, storp, sproc, StoPro, StoredProc, StoreProc, sp, or SP) is a subroutine available to applications that access a relational database management system (RDBMS). Such procedures are stored in the database d ...
to traverse an adjacency list, and so are the faster option for databases which lack native recursive query constructs, such as
MySQL
MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
5.x. However, recursive SQL queries can be expected to perform comparably for 'find immediate descendants' queries, and much faster for other depth search queries, and so are the faster option for databases which provide them, such as
PostgreSQL
PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...
,
Oracle
An oracle is a person or thing considered to provide insight, wise counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. If done through occultic means, it is a form of divination.
Descript ...
, and
Microsoft SQL Server
Microsoft SQL Server is a proprietary relational database management system developed by Microsoft using Structured Query Language (SQL, often pronounced "sequel"). As a database server, it is a software product with the primary function of ...
.
MySQL
MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
used to lack recursive query constructs but added such features in version 8.
Drawbacks
The use case for a dynamic endless database tree hierarchy is rare. The Nested Set model is appropriate where the tree element and one or two attributes are the only data, but is a poor choice when more complex relational data exists for the elements in the tree. Given an arbitrary starting depth for a category of 'Vehicles' and a child of 'Cars' with a child of 'Mercedes', a foreign key table relationship must be established unless the tree table is natively non-normalized. Attributes of a newly created tree item may not share all attributes with a parent, child or even a sibling. If a foreign key table is established for a table of 'Plants' attributes, no integrity is given to the child attribute data of 'Trees' and its child 'Oak'. Therefore, in each case of an item inserted into the tree, a foreign key table of the item's attributes must be created for all but the most trivial of use cases.
If the tree isn't expected to change often, a properly normalized hierarchy of attribute tables can be created in the initial design of a system, leading to simpler, more portable SQL statements; specifically ones that don't require an arbitrary number of runtime, programmatically created or deleted tables for changes to the tree. For more complex systems, hierarchy can be developed through relational models rather than an implicit numeric tree structure. Depth of an item is simply another attribute rather than the basis for an entire DB architecture. As stated in ''SQL Antipatterns'':
Nested Sets is a clever solution – maybe too clever. It also fails to support referential integrity. It’s best used when you need to query a tree more frequently than you need to modify the tree.
The model doesn't allow for multiple parent categories. For example, an 'Oak' could be a child of 'Tree-Type', but also 'Wood-Type'. An additional tagging or taxonomy has to be established to accommodate this, again leading to a design more complex than a straightforward fixed model.
Nested sets are very slow for inserts because it requires updating left and right domain values for all records in the table after the insert. This can cause a lot of database stress as many rows are rewritten and indexes rebuilt. However, if it is possible to store a forest of small trees in table instead of a single big tree, the overhead may be significantly reduced, since only one small tree must be updated.
The
nested interval model does not suffer from this problem, but is more complex to implement, and is not as well known. It still suffers from the relational foreign-key table problem. The nested interval model stores the position of the nodes as rational numbers expressed as quotients (n/d)
Variations
Using the nested set model as described above has some performance limitations during certain tree traversal operations. For example, trying to find the immediate child nodes given a parent node requires pruning the subtree to a specific level as in the following
SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
code example:
SELECT Child.Node, Child.Left, Child.Right
FROM Tree as Parent, Tree as Child
WHERE
Child.Left BETWEEN Parent.Left AND Parent.Right
AND NOT EXISTS ( -- No Middle Node
SELECT *
FROM Tree as Mid
WHERE Mid.Left BETWEEN Parent.Left AND Parent.Right
AND Child.Left BETWEEN Mid.Left AND Mid.Right
AND Mid.Node NOT IN (Parent.Node, Child.Node)
)
AND Parent.Left = 1 -- Given Parent Node Left Index
Or, equivalently:
SELECT DISTINCT Child.Node, Child.Left, Child.Right
FROM Tree as Child, Tree as Parent
WHERE Parent.Left < Child.Left AND Parent.Right > Child.Right -- associate Child Nodes with ancestors
GROUP BY Child.Node, Child.Left, Child.Right
HAVING max(Parent.Left) = 1 -- Subset for those with the given Parent Node as the nearest ancestor
The query will be more complicated when searching for children more than one level deep. To overcome this limitation and simplify
tree traversal
In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a Tree (data structure), tree data stru ...
an additional column is added to the model to maintain the depth of a node within a tree.
In this model, finding the immediate children given a parent node can be accomplished with the following
SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
code:
SELECT Child.Node, Child.Left, Child.Right
FROM Tree as Child, Tree as Parent
WHERE
Child.Depth = Parent.Depth + 1
AND Child.Left > Parent.Left
AND Child.Right < Parent.Right
AND Parent.Depth = 1 -- Given Parent Node Left Index
See also
*
Adjacency list
In graph theory and computer science, an adjacency list is a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph. This ...
*
Calkin–Wilf tree
*
Tree traversal
In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a Tree (data structure), tree data stru ...
*
Tree (data structure)
In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be co ...
References
External links
Troels' links to Hierarchical data in RDBMSsManaging hierarchical data in relational databasesPHP PEAR Implementation for Nested Sets– by Daniel Khan
Transform any Adjacency List to Nested Sets using MySQL stored proceduresPHP Doctrine DBAL implementation for Nested Sets– by PreviousNext
R Nested Set– Nested Set example in R
{{DEFAULTSORT:Nested Set Model
Database theory