Compressed Suffix Array
   HOME





Compressed Suffix Array
In computer science, a compressed suffix arrayR. Grossi and J. S. VitterCompressed Suffix Arrays and Suffix Trees, with Applications to Text Indexing and String Matching ''SIAM Journal on Computing,'' 35(2), 2005, 378–407. An earlier version appeared in ''Proceedings of the 32nd ACM Symposium on Theory of Computing,'' May 2000, 397–406.Paolo Ferragina and Giovanni Manzini (2000)"Opportunistic Data Structures with Applications" Proceedings of the 41st Annual Symposium on Foundations of Computer Science. p.390.R. Grossi, A. Gupta, and J. S. VitterHigh-Order Entropy-Compressed Text Indexes ''Proceedings of the 14th Annual SIAM/ACM Symposium on Discrete Algorithms,'' January 2003, 841–850. is a compressed data structure for pattern matching. Compressed suffix arrays are a general class of data structure that improve on the suffix array. These data structures enable quick search for an arbitrary string with a comparatively small index. Given a text ''T'' of ''n'' characters from ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Computer Science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, applied disciplines (including the design and implementation of Computer architecture, hardware and Software engineering, software). Algorithms and data structures are central to computer science. The theory of computation concerns abstract models of computation and general classes of computational problem, problems that can be solved using them. The fields of cryptography and computer security involve studying the means for secure communication and preventing security vulnerabilities. Computer graphics (computer science), Computer graphics and computational geometry address the generation of images. Programming language theory considers different ways to describe computational processes, and database theory concerns the management of re ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


External Links
An internal link is a type of hyperlink on a web page to another page or resource, such as an image or document, on the same website or domain. It is the opposite of an external link, a link that directs a user to content that is outside its domain. Hyperlinks are considered either "external" or "internal" depending on their target or destination. Generally, a link to a page outside the same domain or website is considered external, whereas one that points at another section of the same web page or to another page of the same website or domain is considered internal. Both internal and external links allow users of the website to navigate to another web page or resource. These definitions become clouded, however, when the same organization operates multiple domains functioning as a single web experience, e.g. when a secure commerce website is used for purchasing things displayed on a non-secure website. In these cases, links that are "external" by the above definition can conce ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Database Index Techniques
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database. Before digital storage and retrieval of data have become widespread, index cards were used for data storage in a wide range of applications and environments: in the home to record and store recipes, shopping lists, contact information and other organizational data; in business to record presentation notes, project research and notes, and contact information; in schools as flash cards ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


String Data Structures
String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian animated short * ''Strings'' (2004 film), a film directed by Anders Rønnow Klarlund * ''Strings'' (2011 film), an American dramatic thriller film * ''Strings'' (2012 film), a British film by Rob Savage * '' Bravetown'' (2015 film), an American drama film originally titled ''Strings'' * '' The String'' (2009), a French film Music Instruments * String (music), the flexible element that produces vibrations and sound in string instruments * String instrument, a musical instrument that produces sound through vibrating strings ** List of string instruments * String piano, a pianistic extended technique in which sound is produced by direct manipulation of the strings, rather than striking the piano's keys Types of groups * String band, music ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Suffix Array
In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of bibliometrics. Suffix arrays were introduced by as a simple, space efficient alternative to suffix trees. They had independently been discovered by Gaston Gonnet in 1987 under the name ''PAT array'' . gave the first in-place \mathcal(n) time suffix array construction algorithm that is optimal both in time and space, where ''in-place'' means that the algorithm only needs \mathcal(1) additional space beyond the input string and the output suffix array. Enhanced suffix arrays (ESAs) are suffix arrays with additional tables that reproduce the full functionality of suffix trees preserving the same time and memory complexity. A sorted array of only some (rather than all) suffixes of a string is called a sparse suffix array. Definition Let S=S ..S /math> be an n-string and let S ,j/math ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




FM-index
In computer science, an FM-index is a compressed full-text substring index based on the Burrows–Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Giovanni Manzini,Paolo Ferragina and Giovanni Manzini (2000)"Opportunistic Data Structures with Applications".Proceedings of the 41st Annual Symposium on Foundations of Computer Science. p.390. who describe it as an opportunistic data structure as it allows compression of the input text while still permitting fast substring queries. The name stands for Full-text index in Minute space. It can be used to efficiently find the number of occurrences of a pattern within the compressed text, as well as locate the position of each occurrence. The query time, as well as the required storage space, has a sublinear complexity with respect to the size of the input data. The original authors have devised improvements to their original approach and dubbed it "FM-Index version 2". A further improvem ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, data science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term ''computational biology'' refers to building and using models of biological systems. Computational, statistical, and computer programming techniques have been used for In silico, computer simulation analyses of biological queries. They include reused specific analysis "pipelines", particularly in the field of genomics, such as by the identification of genes and single nucleotide polymorphis ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Sequence Alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix (mathematics), matrix. Gaps are inserted between the Residue (chemistry), residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences such as calculating the Edit distance, distance cost between strings in a natural language, or to display financial data. Interpretation If two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as indels (that is, insertion or deletion mutations) introduced in one or both lineages in the time since they diverged from one another. In sequence ali ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Auxiliary Memory
Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer is what manipulates data by performing computations. In practice, almost all computers use a storage hierarchy, which puts fast but expensive and small storage options close to the CPU and slower but less expensive and larger options further away. Generally, the fast technologies are referred to as "memory", while slower persistent technologies are referred to as "storage". Even the first computer designs, Charles Babbage's Analytical Engine and Percy Ludgate's Analytical Machine, clearly distinguished between processing and memory (Babbage stored numbers as rotations of gears, while Ludgate stored numbers as displacements of rods in shuttles). This distinction was extended in the Von Neumann architecture, wh ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Compressed Data Structure
The term compressed data structure arises in the computer science subfields of algorithms, data structures, and theoretical computer science. It refers to a data structure whose operations are roughly as fast as those of a conventional data structure for the problem, but whose size can be substantially smaller. The size of the compressed data structure is typically highly dependent upon the information entropy of the data being represented. Important examples of compressed data structures include the compressed suffix array and the FM-index, both of which can represent an arbitrary text of characters ''T'' for pattern matching. Given any input pattern ''P'', they support the operation of finding if and where ''P'' appears in ''T''. The search time is proportional to the sum of the length of pattern ''P'', a very slow-growing function of the length of the text ''T'', and the number of reported matches. The space they occupy is roughly equal to the size of the text ''T'' in entrop ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Wavelet Tree
The Wavelet Tree is a succinct data structure to store strings in compressed space. It generalizes the \mathbf_q and \mathbf_q operations defined on bitvectors to arbitrary alphabets. Originally introduced to represent compressed suffix arrays, it has found application in several contexts. The tree is defined by recursively partitioning the alphabet into pairs of subsets; the leaves correspond to individual symbols of the alphabet, and at each node a bitvector stores whether a symbol of the string belongs to one subset or the other. The name derives from an analogy with the wavelet transform for signals, which recursively decomposes a signal into low-frequency and high-frequency components. Properties Let \Sigma be a finite alphabet with \sigma=. By using succinct dictionaries in the nodes, a string s \in \Sigma^* can be stored in H_0(s) + o(\log \sigma), where H_0(s) is the order-0 empirical entropy of s. If the tree is balanced, the operations \mathbf, \mathbf_q, and \ma ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Entropy (information Theory)
In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states. Given a discrete random variable X, which may be any member x within the set \mathcal and is distributed according to p\colon \mathcal\to[0, 1], the entropy is \Eta(X) := -\sum_ p(x) \log p(x), where \Sigma denotes the sum over the variable's possible values. The choice of base for \log, the logarithm, varies for different applications. Base 2 gives the unit of bits (or "shannon (unit), shannons"), while base Euler's number, ''e'' gives "natural units" nat (unit), nat, and base 10 gives units of "dits", "bans", or "Hartley (unit), hartleys". An equivalent definition of entropy is the expected value of the self-information of a v ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]