HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a radix tree (also radix trie or compact prefix tree or compressed trie) is a
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
that represents a space-optimized
trie In computer science, a trie, also called digital tree or prefix tree, is a type of ''k''-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes def ...
(prefix tree) in which each node that is the only child is merged with its parent. The result is that the number of children of every internal node is at most the
radix In a positional numeral system, the radix or base is the number of unique digits, including the digit zero, used to represent numbers. For example, for the decimal/denary system (the most common system in use today) the radix (base number) is t ...
of the radix tree, where is a positive integer and a power of 2, having ≥ 1. Unlike regular trees, edges can be labeled with sequences of elements as well as single elements. This makes radix trees much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes. Unlike regular trees (where whole keys are compared ''en masse'' from their beginning up to the point of inequality), the key at each node is compared chunk-of-bits by chunk-of-bits, where the quantity of bits in that chunk at that node is the radix of the radix trie. When is 2, the radix trie is binary (i.e., compare that node's 1-bit portion of the key), which minimizes sparseness at the expense of maximizing trie depth—i.e., maximizing up to conflation of nondiverging bit-strings in the key. When ≥ 4 is a power of 2, then the radix trie is an -ary trie, which lessens the depth of the radix trie at the expense of potential sparseness. As an optimization, edge labels can be stored in constant size by using two pointers to a string (for the first and last elements). Note that although the examples in this article show strings as sequences of characters, the type of the string elements can be chosen arbitrarily; for example, as a bit or byte of the string representation when using
multibyte character A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation, usually in a computer. Most common variable-width encodings ar ...
encodings or
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
.


Applications

Radix trees are useful for constructing
associative array In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an as ...
s with keys that can be expressed as strings. They find particular application in the area of IP
routing Routing is the process of selecting a path for traffic in a network or between or across multiple networks. Broadly, routing is performed in many types of networks, including circuit-switched networks, such as the public switched telephone netw ...
, where the ability to contain large ranges of values with a few exceptions is particularly suited to the hierarchical organization of
IP address An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
es. They are also used for
inverted index In computer science, an inverted index (also referred to as a postings list, postings file, or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of do ...
es of text documents in
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
.


Operations

Radix trees support insertion, deletion, and searching operations. Insertion adds a new string to the trie while trying to minimize the amount of data stored. Deletion removes a string from the trie. Searching operations include (but are not necessarily limited to) exact lookup, find predecessor, find successor, and find all strings with a prefix. All of these operations are O(''k'') where k is the maximum length of all strings in the set, where length is measured in the quantity of bits equal to the radix of the radix trie.


Lookup

The lookup operation determines if a string exists in a trie. Most operations modify this approach in some way to handle their specific tasks. For instance, the node where a string terminates may be of importance. This operation is similar to tries except that some edges consume multiple elements. The following pseudo code assumes that these classes exist. Edge * ''Node'' targetNode * ''string'' label Node * ''Array of Edges'' edges * ''function'' isLeaf() function lookup(''string'' x)


Insertion

To insert a string, we search the tree until we can make no further progress. At this point we either add a new outgoing edge labeled with all remaining elements in the input string, or if there is already an outgoing edge sharing a prefix with the remaining input string, we split it into two edges (the first labeled with the common prefix) and proceed. This splitting step ensures that no node has more children than there are possible string elements. Several cases of insertion are shown below, though more may exist. Note that r simply represents the root. It is assumed that edges can be labelled with empty strings to terminate strings where necessary and that the root has no incoming edge. (The lookup algorithm described above will not work when using empty-string edges.) File:Inserting the string 'water' into a Patricia trie.png, Insert 'water' at the root File:Insert 'slower' with a null node into a Patricia trie.png, Insert 'slower' while keeping 'slow' File:Insert 'test' into a Patricia trie when 'tester' exists.png, Insert 'test' which is a prefix of 'tester' File:Inserting the word 'team' into a Patricia trie with a split.png, Insert 'team' while splitting 'test' and creating a new edge label 'st' File:Insert 'toast' into a Patricia trie with a split and a move.png, Insert 'toast' while splitting 'te' and moving previous strings a level lower


Deletion

To delete a string x from a tree, we first locate the leaf representing x. Then, assuming x exists, we remove the corresponding leaf node. If the parent of our leaf node has only one other child, then that child's incoming label is appended to the parent's incoming label and the child is removed.


Additional operations

* Find all strings with common prefix: Returns an array of strings that begin with the same prefix. * Find predecessor: Locates the largest string less than a given string, by lexicographic order. * Find successor: Locates the smallest string greater than a given string, by lexicographic order.


History

The datastructure was invented in 1968 by Donald R. Morrison, with whom it is primarily associated, and by Gernot Gwehenberger.
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
, pages 498-500 in Volume III of
The Art of Computer Programming ''The Art of Computer Programming'' (''TAOCP'') is a comprehensive monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. Volumes 1–5 are intended to represent the central core of compu ...
, calls these "Patricia's trees", presumably after the acronym in the title of Morrison's paper: "PATRICIA - Practical Algorithm to Retrieve Information Coded in Alphanumeric". Today, Patricia tries are seen as radix trees with radix equals 2, which means that each bit of the key is compared individually and each node is a two-way (i.e., left versus right) branch.


Comparison to other data structures

(In the following comparisons, it is assumed that the keys are of length ''k'' and the data structure contains ''n'' members.) Unlike balanced trees, radix trees permit lookup, insertion, and deletion in O(''k'') time rather than O(log ''n''). This does not seem like an advantage, since normally ''k'' ≥ log ''n'', but in a balanced tree every comparison is a string comparison requiring O(''k'') worst-case time, many of which are slow in practice due to long common prefixes (in the case where comparisons begin at the start of the string). In a trie, all comparisons require constant time, but it takes ''m'' comparisons to look up a string of length ''m''. Radix trees can perform these operations with fewer comparisons, and require many fewer nodes. Radix trees also share the disadvantages of tries, however: as they can only be applied to strings of elements or elements with an efficiently reversible mapping to strings, they lack the full generality of balanced search trees, which apply to any data type with a
total ordering In mathematics, a total or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation \leq on some set X, which satisfies the following for all a, b and c in X: # a \leq a ( reflexive) ...
. A reversible mapping to strings can be used to produce the required total ordering for balanced search trees, but not the other way around. This can also be problematic if a data type only provides a comparison operation, but not a (de)
serialization In computing, serialization (or serialisation) is the process of translating a data structure or object state into a format that can be stored (e.g. files in secondary storage devices, data buffers in primary storage devices) or transmitted (e ...
operation.
Hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
s are commonly said to have expected O(1) insertion and deletion times, but this is only true when considering computation of the hash of the key to be a constant-time operation. When hashing the key is taken into account, hash tables have expected O(''k'') insertion and deletion times, but may take longer in the worst case depending on how collisions are handled. Radix trees have worst-case O(''k'') insertion and deletion. The successor/predecessor operations of radix trees are also not implemented by hash tables.


Variants

A common extension of radix trees uses two colors of nodes, 'black' and 'white'. To check if a given string is stored in the tree, the search starts from the top and follows the edges of the input string until no further progress can be made. If the search string is consumed and the final node is a black node, the search has failed; if it is white, the search has succeeded. This enables us to add a large range of strings with a common prefix to the tree, using white nodes, then remove a small set of "exceptions" in a space-efficient manner by ''inserting'' them using black nodes. The
HAT-trie The HAT-trie is a type of radix trie that uses array nodes to collect individual key–value pairs under radix nodes and hash buckets into an associative array. Unlike a simple hash table, HAT-tries store key–value in an ordered collection. The ...
is a cache-conscious data structure based on radix trees that offers efficient string storage and retrieval, and ordered iterations. Performance, with respect to both time and space, is comparable to the cache-conscious
hashtable In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', also ...
. See HAT trie implementation notes a

A PATRICIA trie is a special variant of the radix 2 (binary) trie, in which rather than explicitly store every bit of every key, the nodes store only the position of the first bit which differentiates two sub-trees. During traversal the algorithm examines the indexed bit of the search key and chooses the left or right sub-tree as appropriate. Notable features of the PATRICIA trie include that the trie only requires one node to be inserted for every unique key stored, making PATRICIA much more compact than a standard binary trie. Also, since the actual keys are no longer explicitly stored it is necessary to perform one full key comparison on the indexed record in order to confirm a match. In this respect PATRICIA bears a certain resemblance to indexing using a hash table. The adaptive radix tree is a radix tree variant that integrates adaptive node sizes to the radix tree. One major drawback of the usual radix trees is the use of space, because it uses a constant node size in every level. The major difference between the radix tree and the adaptive radix tree is its variable size for each node based on the number of child elements, which grows while adding new entries. Hence, the adaptive radix tree leads to a better use of space without reducing its speed. A common practice is to relax the criteria of disallowing parents with only one child in situations where the parent represents a valid key in the data set. This variant of radix tree achieves a higher space efficiency than the one which only allows internal nodes with at least two children.Can a node of Radix tree which represents a valid key have one child?
/ref>


See also

*
Prefix tree In computer science, a trie, also called digital tree or prefix tree, is a type of M-ary tree, ''k''-ary search tree, a tree (data structure), tree data structure used for locating specific keys from within a set. These keys are most often Str ...
(also known as a Trie) *
Deterministic acyclic finite state automaton In computer science, a deterministic acyclic finite state automaton (DAFSA),Jan Daciuk, Stoyan Mihov, Bruce Watson and Richard Watson (2000). Incremental construction of minimal acyclic finite state automata. Computational Linguistics 26(1):3-16. ...
(DAFSA) *
Ternary search tries In computer science, a ternary search tree is a type of trie (sometimes called a ''prefix tree'') where nodes are arranged in a manner similar to a binary search tree, but with up to three children rather than the binary tree's limit of two. Lik ...
*
Hash trie In computer science, hash trie can refer to: * Hash tree (persistent data structure), a trie used to map hash values to keys * A space-efficient implementation of a sparse trie, in which the descendants of each node may be interleaved in memory. ( ...
*
Deterministic finite automata In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automa ...
*
Judy array Judy is a short form of the name Judith. Judy may refer to: Places * Judy, Kentucky, village in Montgomery County, United States * Judy Woods, woodlands in Bradford, West Yorkshire, England, United Kingdom Animals * Judy (dog) (1936–1950) ...
*
Search algorithm In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with eith ...
*
Extendible hashing Extendible hashing is a type of hash system which treats a hash as a bit string and uses a trie for bucket lookup. Because of the hierarchical nature of the system, re-hashing is an incremental operation (done one bucket at a time, as needed). Thi ...
*
Hash array mapped trie A hash array mapped trie (HAMT) is an implementation of an associative array that combines the characteristics of a hash table and an array mapped trie. It is a refined version of the more general notion of a Hash tree (persistent data structure), ...
*
Prefix hash tree A prefix hash tree (PHT) is a distributed data structure that enables more sophisticated queries over a distributed hash table (DHT). The prefix hash tree uses the lookup interface of a DHT to construct a trie-based data structure that is both eff ...
*
Burstsort Burstsort and its variants are cache-efficient algorithms for sorting strings. They are variants of the traditional radix sort but faster for large data sets of common strings, first published in 2003, with some optimizing versions published in ...
*
Luleå algorithm The Luleå algorithm of computer science, designed by , is a technique for storing and searching internet routing tables efficiently. It is named after the Luleå University of Technology, the home institute/university of the technique's authors. T ...
*
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algori ...


References


External links


Algorithms and Data Structures Research & Reference Material: PATRICIA
by Lloyd Allison,
Monash University Monash University () is a public research university based in Melbourne, Victoria, Australia. Named for prominent World War I general Sir John Monash, it was founded in 1958 and is the second oldest university in the state. The university has a ...

Patricia Tree
NIST Dictionary of Algorithms and Data Structures The NIST ''Dictionary of Algorithms and Data Structures'' is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data structures. For algorithms and ...

Crit-bit trees
by
Daniel J. Bernstein Daniel Julius Bernstein (sometimes known as djb; born October 29, 1971) is an American German mathematician, cryptologist, and computer scientist. He is a visiting professor at CASA at Ruhr University Bochum, as well as a research professor of ...

Radix Tree API in the Linux Kernel
by Jonathan Corbet
Kart (key alteration radix tree)
by Paul Jarc
The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases
by Viktor Leis, Alfons Kemper,
Thomas Neumann Thomas Neumann (born 1977) is a German computer scientist and full professor for ''Data Science and Engineering'' at the Technical University of Munich (TUM). Education and career Thomas Neumann finished his studies in business informatics at ...
,
Technical University of Munich The Technical University of Munich (TUM or TU Munich; german: Technische Universität München) is a public research university in Munich, Germany. It specializes in engineering, technology, medicine, and applied and natural sciences. Establis ...


Implementations


FreeBSD Implementation
used for paging, forwarding and other things.
Linux Kernel Implementation
used for the page cache, among other things.
Java implementation of Concurrent Radix Tree
by Niall Gallagher
Practical Algorithm Template Library
a C++ library on PATRICIA tries (VC++ >=2003, GCC G++ 3.x), by Roman S. Klyujkov
Patricia Trie C++ template class implementation
by Radu Gruian

"based on big-endian patricia trees". Web-browsabl


Patricia Trie implementation in Java
by Roger Kapsi and Sam Berlin
Crit-bit trees
forked from C code by Daniel J. Bernstein

i
libcpropsPatricia Trees : efficient sets and maps over integers in
OCaml OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...
, by Jean-Christophe Filliâtre
Radix DB (Patricia trie) implementation in C
by G. B. Versiani
Libart - Adaptive Radix Trees implemented in C
by Armon Dadgar with other contributors (Open Source, BSD 3-clause license)
rax
a radix tree implementation in ANSI C by
Salvatore Sanfilippo Redis (; Remote Dictionary Server) is an in-memory data structure store, used as a distributed cache, distributed, in-memory database, in-memory Key-value database, key–value database, cache and message broker, with optional durability (databa ...
(the creator o
REDIS
{{CS-Trees Trees (data structures) String data structures