Bzip2
bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities such as tar for tasks such as handling multiple files, and other tools for encryption, and archive splitting. bzip2 was initially released in 1996 by Julian Seward. It compresses most files more effectively than older LZW and Deflate compression algorithms but is slower. bzip2 is particularly efficient for text data, and decompression is relatively fast. The algorithm uses several layers of compression techniques, such as run-length encoding (RLE), Burrows–Wheeler transform (BWT), move-to-front transform (MTF), and Huffman coding. bzip2 compresses data in blocks between 100 and 900 kB and uses the Burrows–Wheeler transform to convert frequently recurring character sequences into strings of identical letters. The move-to-front transform and Huffman coding are then ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Tar (computing)
In computing, tar is a shell command for combining multiple computer files into a single archive file. It was originally developed for magnetic tape storage reading and writing data for a sequential I/O device with no file system, and the name is short for the format description "tape archive". When stored in a file system, a file that tar reads and writes is often called a ''tarball''. A tarball contains metadata for the contained files including the name, ownership, timestamps, permissions and directory organization. As a file containing other files with associated metadata, a tarball is useful for software distribution and backup. POSIX abandoned ''tar'' in favor of '' pax'', yet ''tar'' continues to have widespread use. History The command was introduced to Unix in January 1979, replacing the tp program (which in turn replaced "tap"). The file structure was standardized in POSIX.1-1988 and later POSIX.1-2001, and became a format supported by most modern file ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Julian Seward
Julian Seward is a British compiler writer and Free Software contributor who lives in Stuttgart. He is commonly known for creating the bzip2 compression tool in 1996, as well as the valgrind memory debugging toolset founded in 2000. In 2006, he won a second O'Reilly Open Source Award for his work on Valgrind. As of 2009, Seward worked at Mozilla. Contributions * bzip2 (1996), a data compressor * cacheprof (1999), a tool for locating the sources of D-cache misses * Valgrind Valgrind () is a programming tool for memory debugging, memory leak detection, and profiling. Valgrind was originally designed to be a freely licensed memory debugging tool for Linux on x86, but has since evolved to become a generic framework ..., a memory debugger Awards * July 2006 – Julian Seward won a Google-O'Reilly Open Source Award for "Best Toolmaker" for his work on Valgrind References External links Interview with Valgrind Author Julian Seward on techrepublic.com January 20 ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
File Archiver
In computing, a file archiver is utility software that combines computer file, files into a single archive file or in less common cases, multiple files. A minimally designed archiver might concatenate the content of files along with file file name, name and length. A more advanced archiver stores additional metadata, such as the Timestamping (computing), timestamps, file attributes and access control information. An archiver might lossless data compression, compress input file content to reduce the size of the resulting archive. The process of making an archive file is called ''archiving'' or ''packing''. Reconstructing the original files from an archive is called ''unarchiving'', ''unpacking'' or ''extracting''. Multics In the early days of computing, Multics provided the command a basic archiver without compression that descended from the Compatible Time-Sharing System, CTSS command of the same name. Multics also provided a magnetic tape archiver command, , which was perha ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Burrows–Wheeler Transform
The Burrows–Wheeler transform (BWT) rearranges a character string into runs of similar characters, in a manner that can be reversed to recover the original string. Since compression techniques such as move-to-front transform and run-length encoding are more effective when such runs are present, the BWT can be used as a preparatory step to improve the efficiency of a compression algorithm, and is used this way in software such as bzip2. The algorithm can be implemented efficiently using a suffix array thus reaching linear time complexity. It was invented by David Wheeler in 1983, and later published by him and Michael Burrows in 1994. Their paper included a compression algorithm, called the Block-sorting Lossless Data Compression Algorithm or BSLDCA, that compresses data by using the BWT followed by move-to-front coding and Huffman coding or arithmetic coding. Description The transform is done by constructing a matrix (known as the Burrows-Wheeler Matrix) whose rows are the ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Move-to-front Transform
The move-to-front (MTF) transform is an encoding of data (typically a stream of bytes) designed to improve the performance of entropy encoding techniques of compression. When efficiently implemented, it is fast enough that its benefits usually justify including it as an extra step in data compression algorithm. This algorithm was first published by Boris Ryabko under the name of "book stack" in 1980. Subsequently, it was rediscovered by J.K. Bentley et al. in 1986, as attested in the explanatory note. The transform The main idea is that each symbol in the data is replaced by its index in the stack of “recently used symbols”. For example, long sequences of identical symbols are replaced by as many zeroes, whereas when a symbol that has not been used in a long time appears, it is replaced with a large number. Thus at the end the data is transformed into a sequence of integers; if the data exhibits a lot of local correlations, then these integers tend to be small. Let us gi ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Bijective Numeration
Bijective numeration is any numeral system in which every non-negative integer can be represented in exactly one way using a finite string of digits. The name refers to the bijection (i.e. one-to-one correspondence) that exists in this case between the set of non-negative integers and the set of finite strings using a finite set of symbols (the "digits"). Most ordinary numeral systems, such as the common decimal system, are not bijective because more than one string of digits can represent the same positive integer. In particular, adding leading zeroes does not change the value represented, so "1", "01" and "001" all represent the number one. Even though only the first is usual, the fact that the others are possible means that the decimal system is not bijective. However, the unary numeral system, with only one digit, ''is'' bijective. A bijective base-''k'' numeration is a bijective positional notation. It uses a string of digits from the set (where ''k'' ≥ 1) to encode ea ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Bit Array
A bit array (also known as bitmask, bit map, bit set, bit string, or bit vector) is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores ''kw'' bits, where ''w'' is the number of bits in the unit of storage, such as a byte or Word (computer architecture), word, and ''k'' is some nonnegative integer. If ''w'' does not divide the number of bits to be stored, some space is wasted due to Fragmentation (computing), internal fragmentation. Definition A bit array is a mapping from some domain (almost always a range of integers) to values in the set . The values can be interpreted as dark/light, absent/present, locked/unlocked, valid/invalid, et cetera. The point is that there are only two possible values, so they can be stored in one bit. As with other arrays, the access to a single bit can be managed ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Elias Delta Coding
Elias δ code or Elias delta code is a universal code encoding the positive integers developed by Peter Elias. Encoding To code a number ''X'' ≥ 1: # Let ''N'' = ⌊log2 ''X''⌋; be the highest power of 2 in ''X'', so 2''N'' ≤ ''X'' < 2''N''+1. # Let ''L'' = ⌊log2 ''N''+1⌋ be the highest power of 2 in ''N''+1, so 2''L'' ≤ ''N''+1 < 2''L''+1. # Write ''L'' zeros, followed by # the ''L''+1-bit binary representation of ''N''+1, followed by # all but the leading bit (i.e. the last ''N'' bits) of ''X''. An equivalent way to express the same process: #Separate ''X'' into the highest power of 2 it contains (2''N'') and the remaining ''N'' binary digits. #Encode ''N''+1 with Elias gamma coding. #Append the remaining ''N'' binary digits to this representation of ''N''+1. To represent a number [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Cluster Computing
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike Grid computing, grid computers, computer clusters have each Node (networking), node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing. The components of a cluster are usually connected to each other through fast local area networks, with each Node (networking), node (computer used as a server) running its own instance of an operating system. In most circumstances, all of the nodes use the same hardware and the same operating system, although in some setups (e.g. using Open Source Cluster Application Resources (OSCAR)), different operating systems can be used on each computer, or different hardware. Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable s ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Canonical Huffman Code
In computer science and information theory, a canonical Huffman code is a particular type of Huffman code with unique properties which allow it to be described in a very compact manner. Rather than storing the structure of the code tree explicitly, canonical Huffman codes are ordered in such a way that it suffices to only store the lengths of the codewords, which reduces the overhead of the codebook. Motivation Data compressors generally work in one of two ways. Either the decompressor can infer what codebook the compressor has used from previous context, or the compressor must tell the decompressor what the codebook is. Since a canonical Huffman codebook can be stored especially efficiently, most compressors start by generating a "normal" Huffman codebook, and then convert it to canonical Huffman before using it. In order for a symbol code scheme such as the Huffman code to be decompressed, the same model that the encoding algorithm used to compress the source data must be ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |