HOME
*



picture info

Grammar-based Code
Grammar-based codes or Grammar-based compression are compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms. To compress a data sequence x = x_1 \cdots x_n, a grammar-based code transforms x into a context-free grammar G. The problem of finding a smallest grammar for an input sequence ( smallest grammar problem) is known to be NP-hard, so many grammar-transform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar G is further compressed by statistical encoders like arithmetic coding. Examples and characteristics The class of grammar-based codes is very broad. It includes block codes, the multilevel pattern matching (MPM) algorithm, variations of the incremental parsing Lempel-Ziv code, and many other new universal lossless compression algorithms. Grammar-based codes are universal in the sense that they can ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Ergodic
In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies that the average behavior of the system can be deduced from the trajectory of a "typical" point. Equivalently, a sufficiently large collection of random samples from a process can represent the average statistical properties of the entire process. Ergodicity is a property of the system; it is a statement that the system cannot be reduced or factored into smaller components. Ergodic theory is the study of systems possessing ergodicity. Ergodic systems occur in a broad range of systems in physics and in geometry. This can be roughly understood to be due to a common phenomenon: the motion of particles, that is, geodesics on a hyperbolic manifold are divergent; when that manifold is compact, that is, of finite size, those orbits return to the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Straight-line Grammar
A straight-line grammar (sometimes abbreviated as SLG) is a formal grammar that generates exactly one string.Florian Benz and Timo Kötzing, “An effective heuristic for the smallest grammar problem,” Proceedings of the fifteenth annual conference on Genetic and evolutionary computation conference - GECCO ’13, 2013. , p. 488 Consequently, it does not branch (every non-terminal has only one associated production rule) nor loop (if non-terminal ''A'' appears in a derivation of ''B'', then ''B'' does not appear in a derivation of ''A''). Areas of usefulness Straight-line grammars are widely used in the development of algorithms that execute directly on compressed structures (without prior decompression). SLGs are of interest in fields like Kolmogorov complexity, Lossless data compression, Structure discovery and Compressed data structures. The problem of finding a context-free grammar (equivalently: an SLG) of minimal size that generates a given string is called the small ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Grammar Induction
Grammar induction (or grammatical inference) is the process in machine learning of learning a formal grammar (usually as a collection of ''re-write rules'' or '' productions'' or alternatively as a finite state machine or automaton of some kind) from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs. Grammar classes Grammatical inference has often been very focused on the problem of learning finite state machines of various types (see the article Induction of regular languages for details on these approaches), since there have been efficient algorithms for this problem since the 1980s. Since the beginning of the century, these approaches have been extended to the problem of inference of context-free grammars and richer formalisms, such as multiple ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Dictionary Coder
A dictionary coder, also sometimes known as a substitution coder, is a class of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure (called the 'dictionary') maintained by the encoder. When the encoder finds such a match, it substitutes a reference to the string's position in the data structure. Methods and applications Some dictionary coders use a 'static dictionary', one whose full set of strings is determined before coding begins and does not change during the coding process. This approach is most often used when the message or set of messages to be encoded is fixed and large; for instance, an application that stores the contents of a book in the limited storage space of a PDA generally builds a static dictionary from a concordance of the text and then uses that dictionary to compress the verses. This scheme of using Huffman coding to represent indices into a concor ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


GLZA
Grammar-based codes or Grammar-based compression are compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms. To compress a data sequence x = x_1 \cdots x_n, a grammar-based code transforms x into a context-free grammar G. The problem of finding a smallest grammar for an input sequence (smallest grammar problem) is known to be NP-hard, so many grammar-transform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar G is further compressed by statistical encoders like arithmetic coding. Examples and characteristics The class of grammar-based codes is very broad. It includes block codes, the multilevel pattern matching (MPM) algorithm, variations of the incremental parsing Lempel-Ziv code, and many other new universal lossless compression algorithms. Grammar-based codes are universal in the sense that they can a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Re-Pair
Re-Pair (short for Recursive Pairing) is a grammar-based compression algorithm that, given an input text, builds a straight-line program, i.e. a context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ... generating a single string: the input text. The grammar is built by recursively replacing the most frequent pair of characters occurring in the text. Once there is no pair of characters occurring twice, the resulting string is used as the axiom of the grammar. Therefore, the output grammar is such that all rules but the axiom have two symbols on the right-hand side. How it works Re-Pair was first introduced by NJ. Larsson and A. MoffatLarsson, N. J., & Moffat, A. (2000). Off-line dictionary-based compression. Proceedings of the IEEE, 88(11), 1722-1732. in 1999. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


SEQUITUR Algorithm
Sequitur (or Nevill-Manning algorithm) is a recursive algorithm developed by Craig Nevill-Manning and Ian H. Witten in 1997 that infers a hierarchical structure (context-free grammar) from a sequence of discrete symbols. The algorithm operates in linear space and time. It can be used in data compression software applications. Constraints The sequitur algorithm constructs a grammar by substituting repeating phrases in the given sequence with new rules and therefore produces a concise representation of the sequence. For example, if the sequence is : S→abcab, the algorithm will produce : S→AcA, A→ab. While scanning the input sequence, the algorithm follows two constraints for generating its grammar efficiently: digram uniqueness and rule utility. Digram uniqueness Whenever a new symbol is scanned from the sequence, it is appended with the last scanned symbol to form a new digram. If this digram has been formed earlier then a new rule is made to replace both occurrences of ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Entropy Rate
In the mathematical theory of probability, the entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process. For stochastic processes with a countable index, the entropy rate H(X) is the limit of the joint entropy of n members of the process X_k divided by n, as n tends to infinity: :H(X) = \lim_ \frac H(X_1, X_2, \dots X_n) when the limit exists. An alternative, related quantity is: :H'(X) = \lim_ H(X_n, X_, X_, \dots X_1) For strongly stationary stochastic processes, H(X) = H'(X). The entropy rate can be thought of as a general property of stochastic sources; this is the asymptotic equipartition property. The entropy rate may be used to estimate the complexity of stochastic processes. It is used in diverse applications ranging from characterizing the complexity of languages, blind source separation, through to optimizing quantizers and data compression algorithms. For example, a maximum ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Data Compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder. The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding; encoding done at the source of the data before it is stored or transmitted. Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a signal. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




LZ77 And LZ78
LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations including LZW, LZSS, LZMA and others. Besides their academic influence, these algorithms formed the basis of several ubiquitous compression schemes, including GIF and the DEFLATE algorithm used in PNG and ZIP. They are both theoretically dictionary coders. LZ77 maintains a sliding window during compression. This was later shown to be equivalent to the ''explicit dictionary'' constructed by LZ78—however, they are only equivalent when the entire data is intended to be decompressed. Since LZ77 encodes and decodes from a sliding window over previously seen characters, decompression must always start at the beginning of the input. Conceptually, LZ78 decompression could allow random access to the input if the entire dictionary were known in ad ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Block Code
In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists, mathematicians, and computer scientists to study the limitations of ''all'' block codes in a unified way. Such limitations often take the form of ''bounds'' that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors. Examples of block codes are Reed–Solomon codes, Hamming codes, Hadamard codes, Expander codes, Golay codes, and Reed–Muller codes. These examples also belong to the class of linear codes, and hence they are called linear block codes. More particularly, these codes are known as algebraic block codes, or cyclic block codes, because they can be generated using boolean polyno ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]