Burrows–Wheeler Transform
   HOME





Burrows–Wheeler Transform
The Burrows–Wheeler transform (BWT) rearranges a character string into runs of similar characters, in a manner that can be reversed to recover the original string. Since compression techniques such as move-to-front transform and run-length encoding are more effective when such runs are present, the BWT can be used as a preparatory step to improve the efficiency of a compression algorithm, and is used this way in software such as bzip2. The algorithm can be implemented efficiently using a suffix array thus reaching linear time complexity. It was invented by David Wheeler in 1983, and later published by him and Michael Burrows in 1994. Their paper included a compression algorithm, called the Block-sorting Lossless Data Compression Algorithm or BSLDCA, that compresses data by using the BWT followed by move-to-front coding and Huffman coding or arithmetic coding. Description The transform is done by constructing a matrix (known as the Burrows-Wheeler Matrix) whose rows are the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]


picture info

Character String (computer Science)
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. More general, ''string'' may also denote a sequence (or list) of data other than just characters. Depending on the programming language and precise data type used, a variable declared to be a string may either cause storage in memory to be statically allocated for a predetermined maximum length or employ dynamic allocation to allow it to hold a variable number of elements. When a string appears literally in source code, it is known as a string literal or an anonymous string. In formal languages, which are used in mathematical logic and theoretical computer science, a stri ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]


Lyndon Word
In mathematics, in the areas of combinatorics and computer science, a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations. Lyndon words are named after mathematician Roger Lyndon, who investigated them in 1954, calling them standard lexicographic sequences. Anatoly Shirshov introduced Lyndon words in 1953 calling them regular words. Lyndon words are a special case of Hall words; almost all properties of Lyndon words are shared by Hall words. Definitions Several equivalent definitions exist. A k-ary Lyndon word of length n > 0 is an n-character string over an alphabet of size k, and which is the unique minimum element in the lexicographical ordering in the multiset of all its rotations. Being the singularly smallest rotation implies that a Lyndon word differs from any of its non-trivial rotations, and is therefore aperiodic.; . Alternately, a word w is a Lyndon word if and only if it is nonempty and lexicographically stri ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]


ChIP-Seq
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with Massively parallel signature sequencing, massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations. Uses ChIP-seq is primarily used to determine how transcription factors and other chromatin-associated proteins influence phenotype-affecting mechanisms. Determining how proteins interact with DNA to regulate gene expression is essential for fully understanding many biological processes and disease states. This epigenetic information is complementary to genotype and expression analysis. ChIP-seq technology is currently seen primarily as an alternative to ChIP-chip which requires a protein microarray, hybridization ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   [Amazon]



MORE