Rabin Fingerprint
   HOME
*





Rabin Fingerprint
The Rabin fingerprinting scheme is a method for implementing fingerprints using polynomials over a finite field. It was proposed by Michael O. Rabin. Scheme Given an ''n''-bit message ''m''0,...,''m''n-1, we view it as a polynomial of degree ''n''-1 over the finite field GF(2). : f(x) = m_0 + m_1 x + \ldots + m_ x^ We then pick a random irreducible polynomial of degree ''k'' over GF(2), and we define the fingerprint of the message ''m'' to be the remainder r(x) after division of f(x) by p(x) over GF(2) which can be viewed as a polynomial of degree or as a ''k''-bit number. Applications Many implementations of the Rabin–Karp algorithm internally use Rabin fingerprints. The ''Low Bandwidth Network Filesystem'' (LBFS) from MIT uses Rabin fingerprints to implement variable size shift-resistant blocks.Athicha Muthitacharoen, Benjie Chen, and David Mazières"A Low-bandwidth Network File System"/ref> The basic idea is that the filesystem computes the cryptographic hash of each bl ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Fingerprint (computing)
In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item (such as a computer file) to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposesA. Z. Broder. Some applications of Rabin's fingerprinting method. In Sequences II: Methods in Communications, Security, and Computer Science, pages 143--152. Springer-Verlag, 1993 just as human fingerprints uniquely identify people for practical purposes. This fingerprint may be used for data deduplication purposes. This is also referred to as file fingerprinting, data fingerprinting, or structured data fingerprinting. Fingerprints are typically used to avoid the comparison and transmission of bulky data. For instance, a web browser or proxy server can efficiently check whether a remote file has been modified, by fetching only its fingerprint and comparing it with that of the previously fetched copy.Detecting duplicate and near-duplicate ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Pseudo-random
A pseudorandom sequence of numbers is one that appears to be statistically random, despite having been produced by a completely deterministic and repeatable process. Background The generation of random numbers has many uses, such as for random sampling, Monte Carlo methods, board games, or gambling. In physics, however, most processes, such as gravitational acceleration, are deterministic, meaning that they always produce the same outcome from the same starting point. Some notable exceptions are radioactive decay and quantum measurement, which are both modeled as being truly random processes in the underlying physics. Since these processes are not practical sources of random numbers, people use pseudorandom numbers, which ideally have the unpredictability of a truly random sequence, despite being generated by a deterministic process. In many applications, the deterministic process is a computer algorithm called a pseudorandom number generator, which must first be provided wi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Rolling Hash
A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input. A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a moving average function can be computed much more quickly than other low-pass filters. One of the main applications is the Rabin–Karp string search algorithm, which uses the rolling hash described below. Another popular application is the rsync program, which uses a checksum based on Mark Adler's adler-32 as its rolling hash. Low Bandwidth Network Filesystem (LBFS) uses a Rabin fingerprint as its rolling hash. FastCDC (Fast Content-Defined Chunking) uses a compute-efficient Gear fingerprint as its rolling hash. At best, rolling hash values are pairwise independentDaniel Lemire, Owen K ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




W-shingling
In natural language processing a ''w-shingling'' is a set of ''unique'' ''shingles'' (therefore ''n-grams'') each of which is composed of contiguous subsequences of tokens within a document, which can then be used to ascertain the similarity between documents. The symbol ''w'' denotes the quantity of tokens in each shingle selected, or solved for. The document, "a rose is a rose is a rose" can therefore be maximally tokenized as follows: :(a,rose,is,a,rose,is,a,rose) The set of all contiguous ''sequences of 4 tokens'' (Thus 4=''n'', thus 4-''grams'') is : Which can then be reduced, or maximally shingled in this particular instance to . Resemblance For a given shingle size, the degree to which two documents ''A'' and ''B'' resemble each other can be expressed as the ratio of the magnitudes of their shinglings' intersection and union, or :r(A,B)= where , A, is the size of set A. The resemblance is a number in the range ,1 where 1 indicates that two documents are ident ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Rsync
rsync is a utility for efficiently transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like operating systems and is under the GPL-3.0-or-later license. Rsync is written in C as a single threaded application. The rsync algorithm is a type of delta encoding, and is used for minimizing network usage. Zlib may be used for additional data compression, and SSH or stunnel can be used for security. Rsync is typically used for synchronizing files and directories between two different systems. For example, if the command rsync local-file user@remote-host:remote-file is run, rsync will use SSH to connect as user to remote-host. Once connected, it will invoke the remote host's rsync and then the two programs will determine what parts of the local file need to be transferred so that the remote file matches the local one. One application of r ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Rolling Hash
A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input. A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a moving average function can be computed much more quickly than other low-pass filters. One of the main applications is the Rabin–Karp string search algorithm, which uses the rolling hash described below. Another popular application is the rsync program, which uses a checksum based on Mark Adler's adler-32 as its rolling hash. Low Bandwidth Network Filesystem (LBFS) uses a Rabin fingerprint as its rolling hash. FastCDC (Fast Content-Defined Chunking) uses a compute-efficient Gear fingerprint as its rolling hash. At best, rolling hash values are pairwise independentDaniel Lemire, Owen K ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Cryptographic Hash Function
A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output result (hash value) for a random input string ("message") is 2^ (like for any good hash), so the hash value can be used as a representative of the message; * finding an input string that matches a given hash value (a ''pre-image'') is unfeasible, unless the value is selected from a known pre-calculated dictionary (" rainbow table"). The ''resistance'' to such search is quantified as security strength, a cryptographic hash with n bits of hash value is expected to have a ''preimage resistance'' strength of n bits. A ''second preimage'' resistance strength, with the same expectations, refers to a similar problem of finding a second message that matches the given hash value when one message is already known; * finding any pair of different messa ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Hash Function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually used to index a fixed-size table called a ''hash table''. Use of a hash function to index a hash table is called ''hashing'' or ''scatter storage addressing''. Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. They require an amount of storage space only fractionally greater than the total space required for the data or records themselves. Hashing is a computationally and storage space-efficient form of data access that avoids the non-constant access time of ordered and unordered lists and structured trees, and the often exponential storage requirements of direct access of state spaces of large or variable-length keys. Use of ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Cryptographic Hash
A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output result (hash value) for a random input string ("message") is 2^ (like for any good hash), so the hash value can be used as a representative of the message; * finding an input string that matches a given hash value (a ''pre-image'') is unfeasible, unless the value is selected from a known pre-calculated dictionary ("rainbow table"). The ''resistance'' to such search is quantified as security strength, a cryptographic hash with n bits of hash value is expected to have a ''preimage resistance'' strength of n bits. A ''second preimage'' resistance strength, with the same expectations, refers to a similar problem of finding a second message that matches the given hash value when one message is already known; * finding any pair of different messag ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Polynomial
In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An example of a polynomial of a single indeterminate is . An example with three indeterminates is . Polynomials appear in many areas of mathematics and science. For example, they are used to form polynomial equations, which encode a wide range of problems, from elementary word problems to complicated scientific problems; they are used to define polynomial functions, which appear in settings ranging from basic chemistry and physics to economics and social science; they are used in calculus and numerical analysis to approximate other functions. In advanced mathematics, polynomials are used to construct polynomial rings and algebraic varieties, which are central concepts in algebra and algebraic geometry. Etymology The word ''polynomial'' join ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

David Mazières
David (; , "beloved one") (traditional spelling), , ''Dāwūd''; grc-koi, Δαυΐδ, Dauíd; la, Davidus, David; gez , ዳዊት, ''Dawit''; xcl, Դաւիթ, ''Dawitʿ''; cu, Давíдъ, ''Davidŭ''; possibly meaning "beloved one". was, according to the Hebrew Bible, the third king of the United Kingdom of Israel. In the Books of Samuel, he is described as a young shepherd and harpist who gains fame by slaying Goliath, a champion of the Philistines, in southern Canaan. David becomes a favourite of Saul, the first king of Israel; he also forges a notably close friendship with Jonathan, a son of Saul. However, under the paranoia that David is seeking to usurp the throne, Saul attempts to kill David, forcing the latter to go into hiding and effectively operate as a fugitive for several years. After Saul and Jonathan are both killed in battle against the Philistines, a 30-year-old David is anointed king over all of Israel and Judah. Following his rise to power, David c ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Rabin–Karp Algorithm
In computer science, the Rabin–Karp algorithm or Karp–Rabin algorithm is a string-searching algorithm created by that uses hashing to find an exact match of a pattern string in a text. It uses a rolling hash to quickly filter out positions of the text that cannot match the pattern, and then checks for a match at the remaining positions. Generalizations of the same idea can be used to find more than one match of a single pattern, or to find matches for more than one pattern. To find a single match of a single pattern, the expected time of the algorithm is linear in the combined length of the pattern and text, although its worst-case time complexity is the product of the two lengths. To find multiple matches, the expected time is linear in the input lengths, plus the combined length of all the matches, which could be greater than linear. In contrast, the Aho–Corasick algorithm can find all matches of multiple patterns in worst-case time and space linear in the input length ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]