HOME

TheInfoList



OR:

Pearson hashing is a
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
designed for fast execution on processors with 8-bit
register Register or registration may refer to: Arts entertainment, and media Music * Register (music), the relative "height" or range of a note, melody, part, instrument, etc. * ''Register'', a 2017 album by Travis Miller * Registration (organ), th ...
s. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent on every byte of the input. Its implementation requires only a few instructions, plus a 256-byte
lookup table In computer science, a lookup table (LUT) is an array that replaces runtime computation with a simpler array indexing operation. The process is termed as "direct addressing" and LUTs differ from hash tables in a way that, to retrieve a value v wi ...
containing a
permutation In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or proc ...
of the values 0 through 255. This hash function is a
CBC-MAC In cryptography, a cipher block chaining message authentication code (CBC-MAC) is a technique for constructing a message authentication code (MAC) from a block cipher. The message is encrypted with some block cipher algorithm in cipher block cha ...
that uses an 8-bit
substitution cipher In cryptography, a substitution cipher is a method of encrypting in which units of plaintext are replaced with the ciphertext, in a defined manner, with the help of a key; the "units" may be single letters (the most common), pairs of letters, trip ...
implemented via the substitution table. An 8-bit
cipher In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is ''encipherment''. To encipher or encode i ...
has negligible cryptographic security, so the Pearson hash function is not
cryptographically strong Strong cryptography or cryptographically strong are general terms applied to cryptographic systems or components that are considered highly resistant to cryptanalysis. Demonstrating the resistance of any cryptographic scheme to attack is a com ...
, but it is useful for implementing
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
s or as a data integrity check code, for which purposes it offers these benefits: * It is extremely simple. * It executes quickly on resource-limited processors. * There is no simple class of inputs for which
collision In physics, a collision is any event in which two or more bodies exert forces on each other in a relatively short time. Although the most common use of the word ''collision'' refers to incidents in which two or more objects collide with great fo ...
s (identical outputs) are especially likely. * Given a small, privileged set of inputs (e.g.,
reserved word In a computer language, a reserved word (also known as a reserved identifier) is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is "reserved from use". This is a syntactic definition, and a re ...
s for a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a
perfect hash function In computer science, a perfect hash function for a set is a hash function that maps distinct elements in to a set of integers, with no collisions. In mathematical terms, it is an injective function. Perfect hash functions may be used to imp ...
. * Two input strings differing by exactly one character never collide. E.g., applying the algorithm on the strings ABC and AEC will never produce the same value. One of its drawbacks when compared with other hashing algorithms designed for
8-bit processor In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data buses of ...
s is the suggested 256 byte lookup table, which can be prohibitively large for a small
microcontroller A microcontroller (MCU for ''microcontroller unit'', often also MC, UC, or μC) is a small computer on a single VLSI integrated circuit (IC) chip. A microcontroller contains one or more CPUs (processor cores) along with memory and programmable i ...
with a program memory size on the order of hundreds of bytes. A workaround to this is to use a simple permutation function instead of a table stored in program memory. However, using a too simple function, such as T = 255-i, partly defeats the usability as a hash function as
anagram An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For example, the word ''anagram'' itself can be rearranged into ''nag a ram'', also the word ...
s will result in the same hash value; using a too complex function, on the other hand, will affect speed negatively. Using a function rather than a table also allows extending the block size. Such functions naturally have to be
bijective In mathematics, a bijection, also known as a bijective function, one-to-one correspondence, or invertible function, is a function between the elements of two sets, where each element of one set is paired with exactly one element of the other s ...
, like their table variants. The algorithm can be described by the following
pseudocode In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...
, which computes the hash of message ''C'' using the permutation table ''T'': algorithm pearson hashing is h := 0 for each c in C loop h := T h xor c end loop return h The hash variable () may be initialized differently, e.g. to the length of the data () modulo 256; this particular choice is used in the Python implementation example below.


Example implementations


Python, 8-bit output

The 'table' parameter requires a pseudo-randomly shuffled list of range ..255 This may easily be generated by using python's builtin function and using to permutate it: from random import shuffle example_table = list(range(0, 256)) shuffle(example_table) def hash8(message: str, table) -> int: """Pearson hashing.""" hash = len(message) % 256 for i in message: hash = table ash ^ ord(i) return hash


C#, 8-bit

public class PearsonHashing { public byte Hash(string input) { const byte[] T = { /* Permutation of 0-255 */ }; byte hash = 0; byte[] bytes = Encoding.UTF8.GetBytes(input); foreach (var b in bytes) { hash = T[(byte)(hash ^ b)]; } return hash; } }


See also

*
Non-cryptographic hash functions This is a list of hash functions, including cyclic redundancy checks, checksum functions, and cryptographic hash functions. Cyclic redundancy checks Adler-32 is often mistaken for a CRC, but it is not: it is a checksum. Checksums Universa ...


References

Error detection and correction Hash function (non-cryptographic) Articles with example pseudocode