Wavelet Tree
   HOME



picture info

Wavelet Tree
The Wavelet Tree is a succinct data structure to store strings in compressed space. It generalizes the \mathbf_q and \mathbf_q operations defined on bitvectors to arbitrary alphabets. Originally introduced to represent compressed suffix arrays, it has found application in several contexts. The tree is defined by recursively partitioning the alphabet into pairs of subsets; the leaves correspond to individual symbols of the alphabet, and at each node a bitvector stores whether a symbol of the string belongs to one subset or the other. The name derives from an analogy with the wavelet transform for signals, which recursively decomposes a signal into low-frequency and high-frequency components. Properties Let \Sigma be a finite alphabet with \sigma=. By using succinct dictionaries in the nodes, a string s \in \Sigma^* can be stored in H_0(s) + o(\log \sigma), where H_0(s) is the order-0 empirical entropy of s. If the tree is balanced, the operations \mathbf, \mathbf_q, and \ma ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]




Succinct Data Structure
In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but (unlike other compressed representations) still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, (unlabeled) trees, and planar graphs. Unlike general lossless data compression algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure, insofar as the size of the stored or encoded data similarly depends upon the specific content of the data itself. Suppose that Z is the information-theoretical optimal number of bits needed to store some data. A representation of this data is called: * '' implicit'' if it takes Z + O(1) bits of space, * ''succinct'' if it takes Z + o(Z) bits of space, and * ''compact'' if it takes O(Z) bits of space. For example, a data ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]


Compressed Suffix Array
In computer science, a compressed suffix arrayR. Grossi and J. S. VitterCompressed Suffix Arrays and Suffix Trees, with Applications to Text Indexing and String Matching ''SIAM Journal on Computing,'' 35(2), 2005, 378–407. An earlier version appeared in ''Proceedings of the 32nd ACM Symposium on Theory of Computing,'' May 2000, 397–406.Paolo Ferragina and Giovanni Manzini (2000)"Opportunistic Data Structures with Applications" Proceedings of the 41st Annual Symposium on Foundations of Computer Science. p.390.R. Grossi, A. Gupta, and J. S. VitterHigh-Order Entropy-Compressed Text Indexes ''Proceedings of the 14th Annual SIAM/ACM Symposium on Discrete Algorithms,'' January 2003, 841–850. is a compressed data structure for pattern matching. Compressed suffix arrays are a general class of data structure that improve on the suffix array. These data structures enable quick search for an arbitrary string with a comparatively small index. Given a text ''T'' of ''n'' characters from ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]


picture info

Entropy (information Theory)
In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states. Given a discrete random variable X, which may be any member x within the set \mathcal and is distributed according to p\colon \mathcal\to[0, 1], the entropy is \Eta(X) := -\sum_ p(x) \log p(x), where \Sigma denotes the sum over the variable's possible values. The choice of base for \log, the logarithm, varies for different applications. Base 2 gives the unit of bits (or "shannon (unit), shannons"), while base Euler's number, ''e'' gives "natural units" nat (unit), nat, and base 10 gives units of "dits", "bans", or "Hartley (unit), hartleys". An equivalent definition of entropy is the expected value of the self-information of a v ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]


FM-index
In computer science, an FM-index is a compressed full-text substring index based on the Burrows–Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Giovanni Manzini,Paolo Ferragina and Giovanni Manzini (2000)"Opportunistic Data Structures with Applications".Proceedings of the 41st Annual Symposium on Foundations of Computer Science. p.390. who describe it as an opportunistic data structure as it allows compression of the input text while still permitting fast substring queries. The name stands for Full-text index in Minute space. It can be used to efficiently find the number of occurrences of a pattern within the compressed text, as well as locate the position of each occurrence. The query time, as well as the required storage space, has a sublinear complexity with respect to the size of the input data. The original authors have devised improvements to their original approach and dubbed it "FM-Index version 2". A further improvem ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]


picture info

Trie
In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store their associated key. Instead, each node's ''position'' within the trie determines its associated key, with the connections between nodes defined by individual Character (computing), characters rather than the entire key. Tries are particularly effective for tasks such as autocomplete, spell checking, and IP routing, offering advantages over hash tables due to their prefix-based organization and lack of hash collisions. Every child node shares a common prefix (computer science), prefix with its parent node, and the root node represents the empty string. While basic trie implementations can be memory-intensive, various optimization techniques such as compression and bitwise representations have been developed to improve their efficiency. A n ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]


picture info

Trees (data Structures)
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only plants that are usable as lumber, or only plants above a specified height. But wider definitions include taller palms, tree ferns, bananas, and bamboos. Trees are not a monophyletic taxonomic group but consist of a wide variety of plant species that have independently evolved a trunk and branches as a way to tower above other plants to compete for sunlight. The majority of tree species are angiosperms or hardwoods; of the rest, many are gymnosperms or softwoods. Trees tend to be long-lived, some trees reaching several thousand years old. Trees evolved around 400 million years ago, and it is estimated that there are around three trillion mature trees in the world currently. A tree typically has many secondary branches supported clear ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]