Predecessor Problem
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, the predecessor problem involves maintaining a
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
of items to, given an element, efficiently query which element precedes or succeeds that element in an order.
Data structures In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
used to solve the problem include
balanced binary search tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...
s,
van Emde Boas tree A van Emde Boas tree (), also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with -bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boa ...
s, and
fusion tree In computer science, a fusion tree is a type of tree data structure that implements an associative array on -bit integers on a finite universe, where each of the input integer has size less than 2w and is non-negative. When operating on a collecti ...
s. In the static predecessor problem, the set of elements does not change, but in the dynamic predecessor problem, insertions into and deletions from the set are allowed. The predecessor problem is a simple case of the nearest neighbor problem, and data structures that solve it have applications in problems like
integer sorting In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by integer keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numb ...
.


Definition

The problem consists of maintaining a set , which contains a subset of integers. Each of these
integers An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language o ...
can be stored with a
word size In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word si ...
of , implying that U \le 2^w. Data structures that solve the problem support these operations: * predecessor(x), which returns the largest element in less than or equal to * successor(x), which returns the smallest element in greater than or equal to In addition, data structures which solve the
dynamic Dynamics (from Greek δυναμικός ''dynamikos'' "powerful", from δύναμις ''dynamis'' "power") or dynamic may refer to: Physics and engineering * Dynamics (mechanics) ** Aerodynamics, the study of the motion of air ** Analytical dynam ...
version of the problem also support these operations: * insert(x), which adds to the set * delete(x), which removes from the set The problem is typically analyzed in a transdichotomous
model of computation In computer science, and more specifically in computability theory and computational complexity theory, a model of computation is a model which describes how an output of a mathematical function is computed given an input. A model describes how ...
such as
word RAM In theoretical computer science, the word RAM (word random-access machine) model is a model of computation in which a random-access machine does bitwise operations on a word of bits. Michael Fredman and Dan Willard created it in 1990 to simulate pr ...
.


Data structures

One simple solution to this problem is to use a
balanced binary search tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...
, which achieves (in
Big O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Lan ...
) a
running time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
of O(\log n) for predecessor queries. The
Van Emde Boas tree A van Emde Boas tree (), also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with -bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boa ...
achieves a query time of O(\log \log U), but requires O(U)
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consider ...
.
Dan Willard Dan Edward Willard is an American computer scientist and logician, and is a professor of computer science at the University at Albany. Education and career Willard did his undergraduate studies in mathematics at Stony Brook University, graduati ...
proposed an improvement on this space usage with the
x-fast trie In computer science, an x-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time ''O''(log log ''M''), using ''O''(''n'' log ''M'') space, where ...
, which requires O(n \log U) space and the same query time, and the more complicated
y-fast trie In computer science, a y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time ''O''(log log ''M''), using ''O''(''n'') space, where ''n'' is the number ...
, which only requires O(n) space.
Fusion tree In computer science, a fusion tree is a type of tree data structure that implements an associative array on -bit integers on a finite universe, where each of the input integer has size less than 2w and is non-negative. When operating on a collecti ...
s, introduced by
Michael Fredman Michael Lawrence Fredman is an emeritus professor at the Computer Science Department at Rutgers University, United States. He earned his Ph.D. degree from Stanford University in 1972 under the supervision of Donald Knuth. He was a member of the ...
and Willard, achieve O(\log_w n) query time and O(n) for predecessor queries for the static problem. The dynamic problem has been solved using exponential trees with O(\log_w n + \log \log n) query time, and with
expected time In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case com ...
O(\log_w n) using
hashing Hash, hashes, hash mark, or hashing may refer to: Substances * Hash (food), a coarse mixture of ingredients * Hash, a nickname for hashish, a cannabis product Hash mark * Hash mark (sports), a marking on hockey rinks and gridiron football fiel ...
..


Mathematical properties

There have been a number of papers proving lower bounds on the predecessor problem, or identifying what the running time of
asymptotically optimal In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor (independent of the input size) worse than the best possible algorithm. It is a term commonly en ...
solutions would be. For example, Michael Beame and
Faith Ellen Faith Ellen (formerly known as Faith E. Fich) is a professor of computer science at the University of Toronto who studies distributed data structures and the theory of distributed computing. She earned her bachelor's degree and masters from the U ...
proved that
for all In mathematical logic, a universal quantification is a type of quantifier, a logical constant which is interpreted as "given any" or "for all". It expresses that a predicate can be satisfied by every member of a domain of discourse. In other w ...
values of ,
there exists In predicate logic, an existential quantification is a type of quantifier, a logical constant which is interpreted as "there exists", "there is at least one", or "for some". It is usually denoted by the logical operator symbol ∃, which, w ...
a value of with query time (in Big Theta notation) \Omega\left(\tfrac\right), and similarly, for all values of , there exists a value of such that the query time is \Omega\left(\sqrt\right). Other proofs of lower bounds include the notion of
communication complexity In theoretical computer science, communication complexity studies the amount of communication required to solve a problem when the input to the problem is distributed among two or more parties. The study of communication complexity was first intro ...
. For the static predecessor problem, Mihai Pătrașcu and
Mikkel Thorup Mikkel Thorup (born 1965) is a Danish computer scientist working at University of Copenhagen. He completed his undergraduate education at Technical University of Denmark and his doctoral studies at Oxford University in 1993. From 1993 to 1998, h ...
showed the following lower bound for the optimal search time, in the cell-probe model: O(1) \min \left\{ \begin{array}{l} \log_{w} n \\ \lg \frac{\ell - \lg n}{a} \\ \frac{\lg \frac{\ell}{a{\lg \left( \frac{a}{\lg n} \,\cdot\, \lg \frac{\ell}{a} \right)} \\ \frac{\lg \frac{\ell}{a{\lg \left( \lg \frac{\ell}{a} \right/\left. \lg\frac{\lg n}{a} \right)} \end{array} \right. where the RAM has word length w, the set contains n integers of \ell bits each and is represented in the RAM using S words of space, and defining a = \lg \frac{S}{n} + \lg w. In the case where w = \ell = \gamma \lg n for \gamma > 1 and S = n \cdot \lg^{O(1)} n, the optimal search time is \Theta(\lg \ell) and the
van Emde Boas tree A van Emde Boas tree (), also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with -bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boa ...
achieves this bound.


See also

*
Integer sorting In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by integer keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numb ...
*
y-fast trie In computer science, a y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time ''O''(log log ''M''), using ''O''(''n'') space, where ''n'' is the number ...
*
Fusion tree In computer science, a fusion tree is a type of tree data structure that implements an associative array on -bit integers on a finite universe, where each of the input integer has size less than 2w and is non-negative. When operating on a collecti ...


References

{{reflist Data structures Computational problems