HOME
*





ZPAQ
ZPAQ is an open source command line archiver for Windows and Linux. It uses a journaling or append-only format which can be rolled back to an earlier state to retrieve older versions of files and directories. It supports fast incremental update by adding only files whose last-modified date has changed since the previous update. It compresses using deduplication and several algorithms (LZ77, BWT, and context mixing) depending on the data type and the selected compression level. To preserve forward and backward compatibility between versions as the compression algorithm is improved, it stores the decompression algorithm in the archive. The ZPAQ source code includes a public domain API, libzpaq, which provides compression and decompression services to C++ applications. The format is believed to be unencumbered by patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited per ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Matt Mahoney (computer Scientist)
PAQ is a series of lossless data compression archivers that have gone through collaborative development to top rankings on several benchmarks measuring compression ratio (although at the expense of speed and memory usage). Specialized versions of PAQ have won the Hutter Prize and the Calgary Challenge. PAQ is free software distributed under the GNU General Public License. Algorithm PAQ uses a context mixing algorithm. Context mixing is related to prediction by partial matching (PPM) in that the compressor is divided into a predictor and an arithmetic coder, but differs in that the next-symbol prediction is computed using a weighted combination of probability estimates from a large number of models conditioned on different contexts. Unlike PPM, a context doesn't need to be contiguous. Most PAQ versions collect next-symbol statistics for the following contexts: * ''n''-grams; the context is the last bytes before the predicted symbol (as in PPM); * whole-word ''n''-grams, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Context Mixing
Context mixing is a type of data compression algorithm in which the next-symbol predictions of two or more statistical models are combined to yield a prediction that is often more accurate than any of the individual predictions. For example, one simple method (not necessarily the best) is to average the probabilities assigned by each model. The random forest is another method: it outputs the prediction that is the mode of the predictions output by individual models. Combining models is an active area of research in machine learning. The PAQ series of data compression programs use context mixing to assign probabilities to individual bits of the input. Application to Data Compression Suppose that we are given two conditional probabilities, P(X, A) and P(X, B), and we wish to estimate P(X, A,B), the probability of event X given both conditions A and B. There is insufficient information for probability theory to give a result. In fact, it is possible to construct scenarios in which ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

X86-64
x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode. With 64-bit mode and the new paging mode, it supports vastly larger amounts of virtual memory and physical memory than was possible on its 32-bit predecessors, allowing programs to store larger amounts of data in memory. x86-64 also expands general-purpose registers to 64-bit, and expands the number of them from 8 (some of which had limited or fixed functionality, e.g. for stack management) to 16 (fully general), and provides numerous other enhancements. Floating-point arithmetic is supported via mandatory SSE2-like instructions, and x87/ MMX style registers are generally not used (but still available even in 64-bit mode); instead, a set of 16 vector registers, 128 bits each, is used. (Each register can store one or two double-preci ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Patent
A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an enabling disclosure of the invention."A patent is not the grant of a right to make or use or sell. It does not, directly or indirectly, imply any such right. It grants only the right to exclude others. The supposition that a right to make is created by the patent grant is obviously inconsistent with the established distinctions between generic and specific patents, and with the well-known fact that a very considerable portion of the patents granted are in a field covered by a former relatively generic or basic patent, are tributary to such earlier patent, and cannot be practiced unless by license thereunder." – ''Herman v. Youngstown Car Mfg. Co.'', 191 F. 579, 584–85, 112 CCA 185 (6th Cir. 1911) In most countries, patent rights fall under private law and the patent holder mus ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Free Data Compression Software
Free may refer to: Concept * Freedom, having the ability to do something, without having to obey anyone/anything * Freethought, a position that beliefs should be formed only on the basis of logic, reason, and empiricism * Emancipate, to procure political rights, as for a disenfranchised group * Free will, control exercised by rational agents over their actions and decisions * Free of charge, also known as gratis. See Gratis vs libre. Computing * Free (programming), a function that releases dynamically allocated memory for reuse * Free format, a file format which can be used without restrictions * Free software, software usable and distributable with few restrictions and no payment * Freeware, a broader class of software available at no cost Mathematics * Free object ** Free abelian group ** Free algebra ** Free group ** Free module ** Free semigroup * Free variable People * Free (surname) * Free (rapper) (born 1968), or Free Marie, American rapper and media personality ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Archive Formats
In computing, an archive file is a computer file that is composed of one or more files along with metadata. Archive files are used to collect multiple data files together into a single file for easier portability and storage, or simply to compress files to use less storage space. Archive files often store directory structures, error detection and correction information, arbitrary comments, and sometimes use built-in encryption. Applications Portability Archive files are particularly useful in that they store file system data and metadata within the contents of a particular file, and thus can be stored on systems or sent over channels that do not support the file system in question, only file contents – examples include sending a directory structure over email, files with names unsupported on the target file system due to length or characters, and retaining files' date and time information. Additionally, it facilitates transferring high numbers of small files such as resource ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

FASTQ Format
FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity. It was originally developed at the Wellcome Trust Sanger Institute to bundle a FASTA formatted sequence and its quality data, but has recently become the ''de facto'' standard for storing the output of high-throughput sequencing instruments such as the Illumina Genome Analyzer. Format A FASTQ file has four line-separated fields per sequence: * Field 1 begins with a '@' character and is followed by a sequence identifier and an ''optional'' description (like a FASTA title line). * Field 2 is the raw sequence letters. * Field 3 begins with a '+' character and is ''optionally'' followed by the same sequence identifier (and any description) again. * Field 4 encodes the quality values for the sequence in Field 2, and must contain the same num ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Codec
A codec is a device or computer program that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or data stream, and hence is a type of codec. ''Endec'' is a portmanteau of encoder/decoder. A coder or encoder encodes a data stream or a signal for transmission or storage, possibly in encrypted form, and the decoder function reverses the encoding for playback or editing. Codecs are used in videoconferencing, streaming media, and video editing applications. History In the mid-20th century, a codec was a device that coded analog signals into digital form using pulse-code modulation (PCM). Later, the name was also applied to software for converting between digital signal formats, including companding functions. Examples An audio codec converts analog audio signals into digital signals for transmission or encodes them for storage. A receiv ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Grep
grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command ''g/re/p'' (''globally search for a regular expression and print matching lines''), which has the same effect. grep was originally developed for the Unix operating system, but later available for all Unix-like systems and some others such as OS-9. History Before it was named, grep was a private utility written by Ken Thompson to search files for certain patterns. Doug McIlroy, unaware of its existence, asked Thompson to write such a program. Responding that he would think about such a utility overnight, Thompson actually corrected bugs and made improvements for about an hour on his own program called s (short for "search"). The next day he presented the program to McIlroy, who said it was exactly what he wanted. Thompson's account may explain the belief that grep was written overnight. Thompson wrote the first version in PDP-11 assembly ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Advanced Encryption Standard
The Advanced Encryption Standard (AES), also known by its original name Rijndael (), is a specification for the encryption of electronic data established by the U.S. National Institute of Standards and Technology (NIST) in 2001. AES is a variant of the Rijndael block cipher developed by two Belgian cryptographers, Joan Daemen and Vincent Rijmen, who submitted a proposal to NIST during the AES selection process. Rijndael is a family of ciphers with different key and block sizes. For AES, NIST selected three members of the Rijndael family, each with a block size of 128 bits, but three different key lengths: 128, 192 and 256 bits. AES has been adopted by the U.S. government. It supersedes the Data Encryption Standard (DES), which was published in 1977. The algorithm described by AES is a symmetric-key algorithm, meaning the same key is used for both encrypting and decrypting the data. In the United States, AES was announced by the NIST as U.S. FIPS PUB 197 (FIPS 197) on Novemb ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Rolling Hash
A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input. A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a moving average function can be computed much more quickly than other low-pass filters. One of the main applications is the Rabin–Karp string search algorithm, which uses the rolling hash described below. Another popular application is the rsync program, which uses a checksum based on Mark Adler's adler-32 as its rolling hash. Low Bandwidth Network Filesystem (LBFS) uses a Rabin fingerprint as its rolling hash. FastCDC (Fast Content-Defined Chunking) uses a compute-efficient Gear fingerprint as its rolling hash. At best, rolling hash values are pairwise independentDaniel Lemire, Owen K ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Rabin Fingerprint
The Rabin fingerprinting scheme is a method for implementing fingerprints using polynomials over a finite field. It was proposed by Michael O. Rabin. Scheme Given an ''n''-bit message ''m''0,...,''m''n-1, we view it as a polynomial of degree ''n''-1 over the finite field GF(2). : f(x) = m_0 + m_1 x + \ldots + m_ x^ We then pick a random irreducible polynomial of degree ''k'' over GF(2), and we define the fingerprint of the message ''m'' to be the remainder r(x) after division of f(x) by p(x) over GF(2) which can be viewed as a polynomial of degree or as a ''k''-bit number. Applications Many implementations of the Rabin–Karp algorithm internally use Rabin fingerprints. The ''Low Bandwidth Network Filesystem'' (LBFS) from MIT uses Rabin fingerprints to implement variable size shift-resistant blocks.Athicha Muthitacharoen, Benjie Chen, and David Mazières"A Low-bandwidth Network File System"/ref> The basic idea is that the filesystem computes the cryptographic hash of each bl ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]