computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...

, a ternary search tree is a type of

trie In computer science, a trie, also called digital tree or prefix tree, is a type of ''k''-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes def ...

(sometimes called a ''prefix tree'') where nodes are arranged in a manner similar to a

binary search tree In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and ...

, but with up to three children rather than the binary tree's limit of two. Like other prefix trees, a ternary search tree can be used as an

associative map In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an a ...

structure with the ability for incremental

string search In computer science, string-searching algorithms, sometimes called string-matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string ...

. However, ternary search trees are more space efficient compared to standard prefix trees, at the cost of speed. Common applications for ternary search trees include spell-checking and

auto-completion Autocomplete, or word completion, is a feature in which an application predicts the rest of a word a user is typing. In Android and iOS smartphones, this is called predictive text. In graphical user interfaces, users can typically press the tab ...

Description

Each node of a ternary search tree stores a single

character Character or Characters may refer to: Arts, entertainment, and media Literature * ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...

, an

object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ai ...

(or a pointer to an object depending on implementation), and pointers to its three children conventionally named ''equal kid'', ''lo kid'' and ''hi kid'', which can also be referred respectively as ''middle (child)'', ''lower (child)'' and ''higher (child)''. A node may also have a pointer to its parent node as well as an indicator as to whether or not the node marks the end of a word. The ''lo kid'' pointer must point to a node whose character value is ''less than the current node''. The ''hi kid'' pointer must point to a node whose character is ''greater than the current node''. The ''equal kid'' points to the next character in the word. The figure below shows a ternary search tree with the strings "cute","cup","at","as","he","us" and "i": c / , \ a u h , , , \ t t e u / / , / , s p e i s As with other trie data structures, each node in a ternary search tree represents a prefix of the stored strings. All strings in the middle subtree of a node start with that prefix.

Operations

Insertion

Inserting a value into a ternary search can be defined recursively or iteratively much as lookups are defined. This recursive method is continually called on nodes of the tree given a key which gets progressively shorter by pruning characters off the front of the key. If this method reaches a node that has not been created, it creates the node and assigns it the character value of the first character in the key. Whether a new node is created or not, the method checks to see if the first character in the string is greater than or less than the character value in the node and makes a recursive call on the appropriate node as in the lookup operation. If, however, the key's first character is equal to the node's value then the insertion procedure is called on the equal kid and the key's first character is pruned away. Like

s and other

data structures In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...

, ternary search trees can become degenerate depending on the order of the keys. Inserting keys in alphabetical order is one way to attain the worst possible degenerate tree. Inserting the keys in random order often produces a well-balanced tree. function insertion(string key) is node p := root //initialized to be equal in case root is null node last := root int idx := 0 while p is not null do //recurse on proper subtree if key dx< p.splitchar then last := p p := p.left else if key dx> p.splitchar then last := p p := p.right else: // key is already in our Tree if idx

length(key) then return //trim character from our key idx := idx+1 last := p p := p.mid p := node() //add p in as a child of the last non-null node (or root if root is null) if root

null then root := p else if last.splitchar < key dxthen last.right := p else if last.splitchar > key dxthen last.left := p else last.mid := p p.splitchar := key dx idx := idx+1 // Insert remainder of key while idx < length(key) do p.mid := node() p.mid.splitchar := key dx idx += 1

Search

To look up a particular node or the data associated with a node, a string key is needed. A lookup procedure begins by checking the root node of the tree and determining which of the following conditions has occurred. If the first character of the string is less than the character in the root node, a recursive lookup can be called on the tree whose root is the lo kid of the current root. Similarly, if the first character is greater than the current node in the tree, then a recursive call can be made to the tree whose root is the hi kid of the current node. As a final case, if the first character of the string is equal to the character of the current node then the function returns the node if there are no more characters in the key. If there are more characters in the key then the first character of the key must be removed and a recursive call is made given the equal kid node and the modified key. This can also be written in a non-recursive way by using a pointer to the current node and a pointer to the current character of the key.

Pseudocode

function search(string query) is if is_empty(query) then return false node p := root int idx := 0 while p is not null do if query dx< p.splitchar then p := p.left else if query dx> p.splitchar then p := p.right; else if idx = length(query) then return true idx := idx + 1 p := p.mid return false

Deletion

The delete operation consists of searching for a key string in the search tree and finding a node, called firstMid in the below pseudocode, such that the path from the middle child of firstMid to the end of the search path for the key string has no left or right children. This would represent a unique suffix in the ternary tree corresponding to the key string. If there is no such path, this means that the key string is either fully contained as a prefix of another string, or is not in the search tree. Many implementations make use of an end of string character to ensure only the latter case occurs. The path is then deleted from firstMid.mid to the end of the search path. In the case that firstMid is the root, the key string must have been the last string in the tree, and thus the root is set to null after the deletion. function delete(string key) is if is_empty(key) then return node p := root int idx := 0 node firstMid := null while p is not null do if key dx< p.splitchar then firstMid := null p := p.left else if key dx> p.splitchar then firstMid := null p := p.right else firstMid := p while p is not null and key dx

p.splitchar do idx := idx + 1 p := p.mid if firstMid

null then return // No unique string suffix // At this point, firstMid points to the node before the strings unique suffix occurs node q := firstMid.mid node p := q firstMid.mid := null // disconnect suffix from tree while q is not null do //walk down suffix path and delete nodes p := q q := q.mid delete(p) // free memory associated with node p if firstMid

root then delete(root) //delete the entire tree root := null

Traversal

Partial-match searching

Near-neighbor searching

Running time

The running time of ternary search trees varies significantly with the input. Ternary search trees run best when given several ''similar strings'', especially when those strings ''share a common prefix''. Alternatively, ternary search trees are effective when storing a large number of relatively ''short strings'' (such as words in a

dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by radical and stroke for ideographic languages), which may include information on definitions, usage, etymologies ...

). Running times for ternary search trees are similar to binary search trees, in that they typically run in logarithmic time, but can run in linear time in the degenerate (worst) case. Further, the size of the strings must also be kept in mind when considering runtime. For example, in the search path for a string of length ''k'', there will be ''k'' traversals down middle children in the tree, as well as a logarithmic number of traversals down left and right children in the tree. Thus, in a ternary search tree on a small number of very large strings the lengths of the strings can dominate the runtime. Time complexities for ternary search tree operations:

Comparison to other data structures

Tries

While being slower than other

prefix tree In computer science, a trie, also called digital tree or prefix tree, is a type of M-ary tree, ''k''-ary search tree, a tree (data structure), tree data structure used for locating specific keys from within a set. These keys are most often Str ...

s, ternary search trees can be better suited for larger data sets due to their space-efficiency.

Hash maps

Hashtable In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', also ...

s can also be used in place of ternary search trees for mapping strings to values. However, hash maps also frequently use more memory than ternary search trees (but not as much as tries). Additionally, hash maps are typically slower at reporting a string that is not in the same data structure, because it must compare the entire string rather than just the first few characters. There is some evidence that shows ternary search trees running faster than hash maps. Additionally, hash maps do not allow for many of the uses of ternary search trees, such as ''near-neighbor lookups''.

DAFSAs (
deterministic acyclic finite state automaton In computer science, a deterministic acyclic finite state automaton (DAFSA),Jan Daciuk, Stoyan Mihov, Bruce Watson and Richard Watson (2000). Incremental construction of minimal acyclic finite state automata. Computational Linguistics 26(1):3-16. ...
)

If storing dictionary words is all that is required (i.e., storage of information auxiliary to each word is not required), a minimal deterministic acyclic finite state automaton (DAFSA) would use less space than a trie or a ternary search tree. This is because a DAFSA can compress identical branches from the trie which correspond to the same suffixes (or parts) of different words being stored.

Uses

Ternary search trees can be used to solve many problems in which a large number of strings must be stored and retrieved in an arbitrary order. Some of the most common or most useful of these are below: * Anytime a

could be used but a less memory-consuming structure is preferred. * A quick and space-saving data structure for mapping strings to other data. * To implement

. * As a

spell check In software, a spell checker (or spelling checker or spell check) is a software feature that checks for misspellings in a text. Spell-checking features are often embedded in software or services, such as a word processor, email client, electronic di ...

. * Near-neighbor searching (of which spell-checking is a special case). * As a

database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...

especially when indexing by several non-key fields is desirable. * In place of a

hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...

References

External links

Ternary Search Trees
page with papers (by Jon Bentley and Robert Sedgewick) about ternary search trees and algorithms for "sorting and searching strings"
Ternary Search Tries
– a video by Robert Sedgewick

Implementation in Java of a TST by Robert Sedgewick and Kevin Wayne {{Strings , state=collapsed Trees (data structures) Search algorithms