Data Differencing
   HOME
*





Data Differencing
In computer science and information theory, data differencing or differential compression is producing a technical description of the difference between two sets of data – a source and a target. Formally, a data differencing algorithm takes as input source data and target data, and produces difference data such that given the source data and the difference data, one can reconstruct the target data ("Patch (computing), patching" the source with the difference to produce the target). Examples One of the best-known examples of data differencing is the diff utility, which produces line-by-line differences of text files (and in some implementations, binary files, thus being a general-purpose differencing tool). Differencing of general binary files goes under the rubric of delta encoding, with a widely used example being the algorithm used in rsync. A standardized generic differencing format is VCDIFF, implemented in such utilities as Xdelta version 3. A high-efficiency (small patch f ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Computer Science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical disciplines (including the design and implementation of Computer architecture, hardware and Computer programming, software). Computer science is generally considered an area of research, academic research and distinct from computer programming. Algorithms and data structures are central to computer science. The theory of computation concerns abstract models of computation and general classes of computational problem, problems that can be solved using them. The fields of cryptography and computer security involve studying the means for secure communication and for preventing Vulnerability (computing), security vulnerabilities. Computer graphics (computer science), Computer graphics and computational geometry address the generation of images. Progr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Xdelta
xdelta is a command line program for delta encoding, which generates the difference between two files. This is similar to diff and patch, but it is targeted for binary files and does not generate human readable output. It was first released in 1997. The developer of xdelta is Joshua MacDonald, who currently maintains the program. The algorithm of xdelta1 was based on the algorithm of rsync, developed by Andrew Tridgell, though it uses a smaller block size. xdelta3 can generate standardized VCDIFF format, and it realized the compatibility among other delta encoding software which supports the VCDIFF format. It runs on Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ... operating systems and Microsoft Windows. xdelta can handle up to 264 byte files, and it is suitable for ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Entropy (information Theory)
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \mathcal and is distributed according to p: \mathcal\to , 1/math>: \Eta(X) := -\sum_ p(x) \log p(x) = \mathbb \log p(X), where \Sigma denotes the sum over the variable's possible values. The choice of base for \log, the logarithm, varies for different applications. Base 2 gives the unit of bits (or " shannons"), while base ''e'' gives "natural units" nat, and base 10 gives units of "dits", "bans", or " hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable. The concept of information entropy was introduced by Claude Shannon in his 1948 paper "A Mathematical Theory of Communication",PDF archived froherePDF archived frohere and is also referred to as Shannon entropy. Shannon's theory defi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

John Wiley & Sons
John Wiley & Sons, Inc., commonly known as Wiley (), is an American multinational publishing company founded in 1807 that focuses on academic publishing and instructional materials. The company produces books, journals, and encyclopedias, in print and electronically, as well as online products and services, training materials, and educational materials for undergraduate, graduate, and continuing education students. History The company was established in 1807 when Charles Wiley opened a print shop in Manhattan. The company was the publisher of 19th century American literary figures like James Fenimore Cooper, Washington Irving, Herman Melville, and Edgar Allan Poe, as well as of legal, religious, and other non-fiction titles. The firm took its current name in 1865. Wiley later shifted its focus to scientific, technical, and engineering subject areas, abandoning its literary interests. Wiley's son John (born in Flatbush, New York, October 4, 1808; died in East Orange, New Je ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Data Compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder. The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding; encoding done at the source of the data before it is stored or transmitted. Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a signal. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Google Chrome
Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, and also for Android, where it is the default browser. The browser is also the main component of ChromeOS, where it serves as the platform for web applications. Most of Chrome's source code comes from Google's free and open-source software project ''Chromium'', but Chrome is licensed as proprietary freeware. WebKit was the original rendering engine, but Google eventually forked it to create the Blink engine; all Chrome variants except iOS now use Blink. , StatCounter estimates that Chrome has a 67% worldwide browser market share (after peaking at 72.38% in November 2018) on personal computers (PC), is most used on tablets (having surpassed Safari), and is also dominant on smartphones and at 65% across all platforms combined. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Colin Percival
Colin A. Percival (born 1980) is a Canadian computer scientist and computer security researcher. He completed his undergraduate education at Simon Fraser University and a doctorate at the University of Oxford. While at university he joined the FreeBSD project, and achieved some notoriety for discovering a security weakness in Intel's hyper-threading technology. Besides his work in delta compression and the introduction of memory-hard functions, he is also known for developing the Tarsnap online backup service, which became his full-time job. Education Percival began taking mathematics courses at Simon Fraser University (SFU) at age 13, as a student at Burnaby Central Secondary School. He graduated from Burnaby Central and officially enrolled at SFU in 1998. At SFU he studied number theory under Peter Borwein, and competed in the William Lowell Putnam Mathematical Competition, placing in the top 15 in 1998 and as a Putnam Fellow (in the top six) in 1999. From 1998 to 2000 he ran t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




VCDIFF
VCDIFF is a format and an algorithm for delta encoding, described in IETF'RFC 3284 The algorithm is based on Jon Bentley and Douglas McIlroy's paper "Data Compression Using Long Common Strings" written in 1999. VCDIFF is used as one of the delta encoding algorithms in "Delta encoding in HTTP"RFC 3229 and was employed in Google's Shared Dictionary Compression Over HTTP technology, formerly used in their Chrome browser. Delta instructions VCDIFF has 3 delta instructions. ADD, COPY, and RUN. ADD adds a new sequence, COPY copies from an old sequence, and RUN adds repeated data. Implementations Free software implementations include xdelta (version 3) and open-vcdiff. * Google's Shared Dictionary Compression Over HTTP proposal uses this algorithm, and was included in the Google Chrome browser, up to version 58. * xdelta - A tool, which is an Open Source VCDIFF delta compression implementation google/open-vcdiff- Another Open Source VCDIFF delta compression implementation as part ovc ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Information Theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering (field), information engineering, and electrical engineering. A key measure in information theory is information entropy, entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a dice, die (with six equally likely outcomes). Some other important measures in information theory are mutual informat ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Rsync
rsync is a utility for efficiently transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like operating systems and is under the GPL-3.0-or-later license. Rsync is written in C as a single threaded application. The rsync algorithm is a type of delta encoding, and is used for minimizing network usage. Zlib may be used for additional data compression, and SSH or stunnel can be used for security. Rsync is typically used for synchronizing files and directories between two different systems. For example, if the command rsync local-file user@remote-host:remote-file is run, rsync will use SSH to connect as user to remote-host. Once connected, it will invoke the remote host's rsync and then the two programs will determine what parts of the local file need to be transferred so that the remote file matches the local one. One application of r ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Delta Encoding
Delta encoding is a way of storing or transmitting data in the form of '' differences'' (deltas) between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding is sometimes called delta compression, particularly where archival histories of changes are required (e.g., in revision control software). The differences are recorded in discrete files called "deltas" or "diffs". In situations where differences are small – for example, the change of a few words in a large document or the change of a few records in a large table – delta encoding greatly reduces data redundancy. Collections of unique deltas are substantially more space-efficient than their non-encoded equivalents. From a logical point of view the difference between two data values is the information required to obtain one value from the other – see relative entropy. The difference between identical values (under some equivalence) is often called ''0'' or the neut ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]