Re-Pair
   HOME



picture info

Re-Pair
Re-Pair (short for recursive pairing) is a grammar-based compression algorithm that, given an input text, builds a straight-line program, i.e. a context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ... generating a single string: the input text. In order to perform the compression in linear time, it consumes the amount of memory that is approximately five times the size of its input. The grammar is built by recursively replacing the most frequent pair of characters occurring in the text. Once there is no pair of characters occurring twice, the resulting string is used as the axiom of the grammar. Therefore, the output grammar is such that all rules but the axiom have two symbols on the right-hand side. How it works Re-Pair was first introduced by N. J. Larsson ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Grammar-based Code
Grammar-based codes or grammar-based compression are Data compression, compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms. To compress a data sequence x = x_1 \cdots x_n, a grammar-based code transforms x into a context-free grammar G. The problem of finding a smallest grammar for an input sequence (smallest grammar problem) is known to be NP-hard, so many grammar-transform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar G is further compressed by statistical encoders like arithmetic coding. Examples and characteristics The class of grammar-based codes is very broad. It includes block codes, the multilevel pattern matching (MPM) algorithm, variations of the incremental parsing LZ77 and LZ78, Lempel-Ziv code, and many other new universal lossless compression algorithms. Grammar-based codes are univers ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Byte Pair Encoding
Byte-pair encoding (also known as BPE, or digram coding) is an algorithm, first described in 1994 by Philip Gage, for encoding strings of text into smaller strings by creating and using a translation table. A slightly modified version of the algorithm is used in large language model tokenizers. The original version of the algorithm focused on compression. It replaces the highest-frequency pair of bytes with a new byte that was not contained in the initial dataset. A lookup table of the replacements is required to rebuild the initial dataset. The modified version builds "tokens" (units of recognition) that match varying amounts of source text, from single characters (including single digits or single punctuation marks) to whole words (even long compound words). Original algorithm The original BPE algorithm operates by iteratively replacing the most common contiguous sequences of characters in a target text with unused 'placeholder' bytes. The iteration ends when no sequences can be ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Compression Algorithms
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder. The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding: encoding is done at the source of the data before it is stored or transmitted. Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a signal ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Straight-line Program
In computer science, a straight-line program is, informally, a program that does not contain any loop or any test, and is formed by a sequence of steps that apply each an operation to previously computed elements. This article is devoted to the case where the allowed operations are the operations of a group, that is multiplication and inversion. More specifically a straight-line program (SLP) for a finite group ''G'' = ⟨''S''⟩ is a finite sequence ''L'' of elements of ''G'' such that every element of ''L'' either belongs to ''S'', is the inverse of a preceding element, or the product of two preceding elements. An SLP ''L'' is said to ''compute'' a group element ''g'' ∈ ''G'' if ''g'' ∈ ''L'', where ''g'' is encoded by a word in ''S'' and its inverses. Intuitively, an SLP computing some ''g'' ∈ ''G'' is an ''efficient'' way of storing ''g'' as a group word over ''S''; observe that if ''g'' is constructed in ''i'' steps, the word l ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Context-free Grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form : A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empty). Regardless of which symbols surround it, the single nonterminal A on the left hand side can always be replaced by \alpha on the right hand side. This distinguishes it from a context-sensitive grammar, which can have production rules in the form \alpha A \beta \rightarrow \alpha \gamma \beta with A a nonterminal symbol and \alpha, \beta, and \gamma strings of terminal and/or nonterminal symbols. A formal grammar is essentially a set of production rules that describe all possible strings in a given formal language. Production rules are simple replacements. For example, the first rule in the picture, : \lan ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Re Pair Example
Re or RE may refer to: Arts, media and entertainment * '' ...Re'', a 2016 Indian Kannada-language film * ''Realencyclopädie der classischen Altertumswissenschaft'', a German encyclopedia of classical scholarship * ''Resident Evil'', a horror game franchise Music * Re, the second syllable of the scale in solfège ** D (musical note) or Re, the second note of the musical scale in ''fixed do'' solfège * Re: (band), a musical duo based in Canada and the US Albums * ''Re'' (Café Tacuba album) * ''Re'' (Les Rita Mitsouko album) * '' Re.'', by Aya Ueto * ''Re:'' (EP), by Kard Language * ''re'' (interjection), in Greek * Re (kana) (れ and レ), Japanese syllables * ''In re'', Latin for 'in the matter of...' ** RE: and Re:, a standard email subject line prefix Organisations * Renew Europe, a political group in the European Parliament * Renovación Española, a former Spanish monarchist political party * Royal Engineers, a part of the British Army * Royal Society of Painter-Pr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Structure Repair
A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such as biological organisms, minerals and chemicals. Abstract structures include data structures in computer science and musical form. Types of structure include a hierarchy (a cascade of one-to-many relationships), a network featuring many-to-many links, or a lattice featuring connections between components that are neighbors in space. Load-bearing Buildings, aircraft, skeletons, anthills, beaver dams, bridges and salt domes are all examples of load-bearing structures. The results of construction are divided into buildings and non-building structures, and make up the infrastructure of a human society. Built structures are broadly divided by their varying design approaches and standards, into categories including building structures, arch ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Slide00
Slide or Slides may refer to: Places * Slide, California, former name of Fortuna, California Arts, entertainment, and media Music Albums * ''Slide'' (Lisa Germano album), 1998 * ''Slide'' (George Clanton album), 2018 *''Slide'', by Patrick Gleeson, 2007 * ''Slide'' (Luna EP), 1993 * ''Slide'' (Madeline Merlo EP), 2022 Songs * "Slide" (Slave song), 1977 * "Slide" (The Big Dish song), 1986 * "Slide" (Goo Goo Dolls song), 1998 * "Slide" (Calvin Harris song), 2017 * "Slide" (FBG Duck song), 2018 * "Slide" (French Montana song), 2019 * "Slide" (H.E.R. song), 2019 * "Slide" (Madeline Merlo song), 2022 * "Slide" (¥$ song), 2024 * "Step Back"/"Slide", by Superheist, 2001 *"Slide", by Chris Brown from '' Breezy'' *"Slide", by Dido from '' No Angel'' *"Slide", by Doechii from ''Alligator Bites Never Heal'' *"The Slide", by Cowboy Junkies from '' One Soul Now'' Other uses in music * Slide (musical ornament), a musical embellishment found particularly in Baroque music * Slide (tune type), ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Slide01
Slide or Slides may refer to: Places *Slide, California, former name of Fortuna, California Arts, entertainment, and media Music Albums * ''Slide'' (Lisa Germano album), 1998 * ''Slide'' (George Clanton album), 2018 *''Slide'', by Patrick Gleeson, 2007 * ''Slide'' (Luna EP), 1993 * ''Slide'' (Madeline Merlo EP), 2022 Songs * "Slide" (Slave song), 1977 * "Slide" (The Big Dish song), 1986 * "Slide" (Goo Goo Dolls song), 1998 * "Slide" (Calvin Harris song), 2017 * "Slide" (FBG Duck song), 2018 * "Slide" (French Montana song), 2019 * "Slide" (H.E.R. song), 2019 * "Slide" (Madeline Merlo song), 2022 * "Slide" (¥$ song), 2024 * "Step Back"/"Slide", by Superheist, 2001 *"Slide", by Chris Brown from '' Breezy'' *"Slide", by Dido from '' No Angel'' *"Slide", by Doechii from ''Alligator Bites Never Heal'' *"The Slide", by Cowboy Junkies from '' One Soul Now'' Other uses in music *Slide (musical ornament), a musical embellishment found particularly in Baroque music *Slide (tune type), a t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Variable-length Code
In coding theory, a variable-length code is a code which maps source symbols to a ''variable'' number of bits. The equivalent concept in computer science is '' bit string''. Variable-length codes can allow sources to be compressed and decompressed with ''zero'' error (lossless data compression) and still be read back symbol by symbol. With the right coding strategy, an independent and identically-distributed source may be compressed almost arbitrarily close to its entropy. This is in contrast to fixed-length coding methods, for which data compression is only possible for large blocks of data, and any compression beyond the logarithm of the total number of possibilities comes with a finite (though perhaps arbitrarily small) probability of failure. Some examples of well-known variable-length coding strategies are Huffman coding, Lempel–Ziv coding, arithmetic coding, and context-adaptive variable-length coding. Codes and their extensions The extension of a code is the m ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  



MORE