Cryptographic hash function
   HOME

TheInfoList



OR:

A cryptographic hash function (CHF) is a
hash algorithm A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
(a
map A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects, regions, or themes. Many maps are static, fixed to paper or some other durable medium, while others are dynamic or interactive. Although ...
of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for
cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
: * the probability of a particular n-bit output result (
hash value A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
) for a random input string ("message") is 2^ (like for any good hash), so the hash value can be used as a representative of the message; * finding an input string that matches a given hash value (a ''pre-image'') is unfeasible, unless the value is selected from a known pre-calculated dictionary (" rainbow table"). The ''resistance'' to such search is quantified as
security strength In cryptography, security level is a measure of the strength that a cryptographic primitive — such as a cipher or hash function — achieves. Security level is usually expressed as a number of " bits of security" (also security strengt ...
, a cryptographic hash with n bits of hash value is expected to have a ''preimage resistance'' strength of n bits. A ''second preimage'' resistance strength, with the same expectations, refers to a similar problem of finding a second message that matches the given hash value when one message is already known; * finding any pair of different messages that yield the same hash value (a ''collision'') is also unfeasible, a cryptographic hash is expected to have a ''collision resistance'' strength of n/2 bits (lower due to the
birthday paradox In probability theory, the birthday problem asks for the probability that, in a set of randomly chosen people, at least two will share a birthday. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 5 ...
). Cryptographic hash functions have many information-security applications, notably in digital signatures, message authentication codes (MACs), and other forms of
authentication Authentication (from ''authentikos'', "real, genuine", from αὐθέντης ''authentes'', "author") is the act of proving an assertion, such as the identity of a computer system user. In contrast with identification, the act of indicati ...
. They can also be used as ordinary
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
s, to index data in
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
s, for
fingerprinting A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfac ...
, to detect duplicate data or uniquely identify files, and as
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
s to detect accidental data corruption. Indeed, in information-security contexts, cryptographic hash values are sometimes called (''digital'') ''fingerprints'', ''checksums'', or just ''hash values'', even though all these terms stand for more general functions with rather different properties and purposes.


Properties

Most cryptographic hash functions are designed to take a string of any length as input and produce a fixed-length hash value. A cryptographic hash function must be able to withstand all known types of cryptanalytic attack. In theoretical cryptography, the security level of a cryptographic hash function has been defined using the following properties: ; Pre-image resistance: Given a hash value , it should be difficult to find any message such that . This concept is related to that of a
one-way function In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, s ...
. Functions that lack this property are vulnerable to
preimage attack In cryptography, a preimage attack on cryptographic hash functions tries to find a message that has a specific hash value. A cryptographic hash function should resist attacks on its preimage (set of possible inputs). In the context of attack, the ...
s. ; Second pre-image resistance: Given an input , it should be difficult to find a different input such that . This property is sometimes referred to as ''weak collision resistance''. Functions that lack this property are vulnerable to second-preimage attacks. ;
Collision resistance In cryptography, collision resistance is a property of cryptographic hash functions: a hash function ''H'' is collision-resistant if it is hard to find two inputs that hash to the same output; that is, two inputs ''a'' and ''b'' where ''a'' ≠ '' ...
: It should be difficult to find two different messages and such that . Such a pair is called a cryptographic hash collision. This property is sometimes referred to as ''strong collision resistance''. It requires a hash value at least twice as long as that required for pre-image resistance; otherwise collisions may be found by a birthday attack. Collision resistance implies second pre-image resistance but does not imply pre-image resistance. The weaker assumption is always preferred in theoretical cryptography, but in practice, a hash-function which is only second pre-image resistant is considered insecure and is therefore not recommended for real applications. Informally, these properties mean that a malicious adversary cannot replace or modify the input data without changing its digest. Thus, if two strings have the same digest, one can be very confident that they are identical. Second pre-image resistance prevents an attacker from crafting a document with the same hash as a document the attacker cannot control. Collision resistance prevents an attacker from creating two distinct documents with the same hash. A function meeting these criteria may still have undesirable properties. Currently, popular cryptographic hash functions are vulnerable to ''length-extension'' attacks: given and but not , by choosing a suitable an attacker can calculate , where ∥ denotes
concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...
. This property can be used to break naive authentication schemes based on hash functions. The HMAC construction works around these problems. In practice, collision resistance is insufficient for many practical uses. In addition to collision resistance, it should be impossible for an adversary to find two messages with substantially similar digests; or to infer any useful information about the data, given only its digest. In particular, a hash function should behave as much as possible like a random function (often called a random oracle in proofs of security) while still being deterministic and efficiently computable. This rules out functions like the SWIFFT function, which can be rigorously proven to be collision-resistant assuming that certain problems on ideal lattices are computationally difficult, but, as a linear function, does not satisfy these additional properties. Checksum algorithms, such as CRC32 and other cyclic redundancy checks, are designed to meet much weaker requirements and are generally unsuitable as cryptographic hash functions. For example, a CRC was used for message integrity in the WEP encryption standard, but an attack was readily discovered, which exploited the linearity of the checksum.


Degree of difficulty

In cryptographic practice, "difficult" generally means "almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important". The meaning of the term is therefore somewhat dependent on the application since the effort that a malicious agent may put into the task is usually proportional to their expected gain. However, since the needed effort usually multiplies with the digest length, even a thousand-fold advantage in processing power can be neutralized by adding a dozen bits to the latter. For messages selected from a limited set of messages, for example
password A password, sometimes called a passcode (for example in Apple devices), is secret data, typically a string of characters, usually used to confirm a user's identity. Traditionally, passwords were expected to be memorized, but the large number of ...
s or other short messages, it can be feasible to invert a hash by trying all possible messages in the set. Because cryptographic hash functions are typically designed to be computed quickly, special key derivation functions that require greater computing resources have been developed that make such
brute-force attack In cryptography, a brute-force attack consists of an attacker submitting many passwords or passphrases with the hope of eventually guessing correctly. The attacker systematically checks all possible passwords and passphrases until the correc ...
s more difficult. In some theoretical analyses "difficult" has a specific mathematical meaning, such as "not solvable in asymptotic
polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
". Such interpretations of ''difficulty'' are important in the study of
provably secure cryptographic hash function In cryptography, cryptographic hash functions can be divided into two main categories. In the first category are those functions whose designs are based on mathematical problems, and whose security thus follows from rigorous mathematical proofs, c ...
s but do not usually have a strong connection to practical security. For example, an exponential-time algorithm can sometimes still be fast enough to make a feasible attack. Conversely, a polynomial-time algorithm (e.g., one that requires steps for -digit keys) may be too slow for any practical use.


Illustration

An illustration of the potential use of a cryptographic hash is as follows: Alice poses a tough math problem to
Bob Bob, BOB, or B.O.B. may refer to: Places * Mount Bob, New York, United States *Bob Island, Palmer Archipelago, Antarctica People, fictional characters, and named animals *Bob (given name), a list of people and fictional characters *Bob (surname ...
and claims that she has solved it. Bob would like to try it himself, but would yet like to be sure that Alice is not bluffing. Therefore, Alice writes down her solution, computes its hash, and tells Bob the hash value (whilst keeping the solution secret). Then, when Bob comes up with the solution himself a few days later, Alice can prove that she had the solution earlier by revealing it and having Bob hash it and check that it matches the hash value given to him before. (This is an example of a simple
commitment scheme A commitment scheme is a cryptographic primitive that allows one to commit to a chosen value (or chosen statement) while keeping it hidden to others, with the ability to reveal the committed value later.Oded Goldreich (2001). Foundations of Crypt ...
; in actual practice, Alice and Bob will often be computer programs, and the secret would be something less easily spoofed than a claimed puzzle solution.)


Applications


Verifying the integrity of messages and files

An important application of secure hashes is the verification of
message integrity Information security, sometimes shortened to InfoSec, is the practice of protecting information by mitigating information risks. It is part of information risk management. It typically involves preventing or reducing the probability of unauthorize ...
. Comparing message digests (hash digests over the message) calculated before, and after, transmission can determine whether any changes have been made to the message or
file File or filing may refer to: Mechanical tools and processes * File (tool), a tool used to ''remove'' fine amounts of material from a workpiece **Filing (metalworking), a material removal process in manufacturing ** Nail file, a tool used to gent ...
. MD5,
SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20- byte) hash value known as a message digest – typically rendered as 40 hexadec ...
, or
SHA-2 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression ...
hash digests are sometimes published on websites or forums to allow verification of integrity for downloaded files, including files retrieved using
file sharing File sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include r ...
such as mirroring. This practice establishes a
chain of trust In computer security, a chain of trust is established by validating each component of hardware and software from the end entity up to the root certificate. It is intended to ensure that only trusted software and hardware can be used while still ...
as long as the hashes are posted on a trusted site – usually the originating site – authenticated by
HTTPS Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It is used for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is enc ...
. Using a cryptographic hash and a chain of trust detects malicious changes to the file. Non-cryptographic
error-detecting code In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communica ...
s such as cyclic redundancy checks only prevent against ''non-malicious'' alterations of the file, since an intentional
spoof Spoof, spoofs, spoofer, or spoofing may refer to: * Forgery of goods or documents * Semen, in Australian slang * Spoof (game), a guessing game * Spoofing (finance), a disruptive algorithmic-trading tactic designed to manipulate markets __NOTOC__ ...
can readily be crafted to have the colliding code value.


Signature generation and verification

Almost all digital signature schemes require a cryptographic hash to be calculated over the message. This allows the signature calculation to be performed on the relatively small, statically sized hash digest. The message is considered authentic if the signature verification succeeds given the signature and recalculated hash digest over the message. So the message integrity property of the cryptographic hash is used to create secure and efficient digital signature schemes.


Password verification

Password verification commonly relies on cryptographic hashes. Storing all user passwords as
cleartext In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. Overview With the advent of comp ...
can result in a massive security breach if the password file is compromised. One way to reduce this danger is to only store the hash digest of each password. To authenticate a user, the password presented by the user is hashed and compared with the stored hash. A password reset method is required when password hashing is performed; original passwords cannot be recalculated from the stored hash value. Standard cryptographic hash functions are designed to be computed quickly, and, as a result, it is possible to try guessed passwords at high rates. Common
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, m ...
s can try billions of possible passwords each second. Password hash functions that perform
key stretching In cryptography, key stretching techniques are used to make a possibly weak key, typically a password or passphrase, more secure against a brute-force attack by increasing the resources (time and possibly space) it takes to test each possible ke ...
– such as
PBKDF2 In cryptography, PBKDF1 and PBKDF2 (Password-Based Key Derivation Function 1 and 2) are key derivation functions with a sliding computational cost, used to reduce vulnerabilities of brute-force attacks. PBKDF2 is part of RSA Laboratories' Publ ...
,
scrypt In cryptography, scrypt (pronounced "ess crypt") is a password-based key derivation function created by Colin Percival in March 2009, originally for the Tarsnap online backup service. The algorithm was specifically designed to make it costly ...
or
Argon2 Argon2 is a key derivation function that was selected as the winner of the 2015 Password Hashing Competition. It was designed by Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich from the University of Luxembourg. The reference implementation o ...
– commonly use repeated invocations of a cryptographic hash to increase the time (and in some cases computer memory) required to perform
brute-force attack In cryptography, a brute-force attack consists of an attacker submitting many passwords or passphrases with the hope of eventually guessing correctly. The attacker systematically checks all possible passwords and passphrases until the correc ...
s on stored password hash digests. A password hash requires the use of a large random, non-secret
salt Salt is a mineral composed primarily of sodium chloride (NaCl), a chemical compound belonging to the larger class of salts; salt in the form of a natural crystalline mineral is known as rock salt or halite. Salt is present in vast quantitie ...
value which can be stored with the password hash. The salt randomizes the output of the password hash, making it impossible for an adversary to store tables of passwords and precomputed hash values to which the password hash digest can be compared.


Proof-of-work

A proof-of-work system (or protocol, or function) is an economic measure to deter
denial-of-service attack In computing, a denial-of-service attack (DoS attack) is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host conn ...
s and other service abuses such as spam on a network by requiring some work from the service requester, usually meaning processing time by a computer. A key feature of these schemes is their asymmetry: the work must be moderately hard (but feasible) on the requester side but easy to check for the service provider. One popular system – used in
Bitcoin mining The bitcoin network is a peer-to-peer payment network that operates on a cryptographic protocol. Users send and receive bitcoins, the units of currency, by broadcasting digitally-signed messages to the network using bitcoin cryptocurrency ...
and
Hashcash Hashcash is a proof-of-work system used to limit email spam and denial-of-service attacks, and more recently has become known for its use in bitcoin (and other cryptocurrencies) as part of the mining algorithm. Hashcash was proposed in 1997 by Adam ...
– uses partial hash inversions to prove that work was done, to unlock a mining reward in Bitcoin, and as a good-will token to send an e-mail in Hashcash. The sender is required to find a message whose hash value begins with a number of zero bits. The average work that the sender needs to perform in order to find a valid message is exponential in the number of zero bits required in the hash value, while the recipient can verify the validity of the message by executing a single hash function. For instance, in Hashcash, a sender is asked to generate a header whose 160-bit SHA-1 hash value has the first 20 bits as zeros. The sender will, on average, have to try times to find a valid header.


File or data identifier

A message digest can also serve as a means of reliably identifying a file; several
source code management In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections o ...
systems, including
Git Git () is a distributed version control system: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data in ...
,
Mercurial Mercurial is a distributed revision control tool for software developers. It is supported on Microsoft Windows and Unix-like systems, such as FreeBSD, macOS, and Linux. Mercurial's major design goals include high performance and scalability, d ...
and
Monotone Monotone refers to a sound, for example music or speech, that has a single unvaried tone. See: monophony. Monotone or monotonicity may also refer to: In economics *Monotone preferences, a property of a consumer's preference ordering. *Monotonic ...
, use the
sha1sum is a computer program that calculates and verifies SHA-1 hashes. It is commonly used to verify the integrity of files. It (or a variant) is installed by default on most Linux distributions. Typically distributed alongside are , , and , whic ...
of various types of content (file content, directory trees, ancestry information, etc.) to uniquely identify them. Hashes are used to identify files on
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer n ...
filesharing File sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include re ...
networks. For example, in an
ed2k link In computing, eD2k links (''ed2k://'') are hyperlinks used to denote files stored on computers connected to the eDonkey filesharing P2P network. General Many programs, such as eMule, MLDonkey and the original eDonkey2000 client by '' MetaMachi ...
, an
MD4 The MD4 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1990. The digest length is 128 bits. The algorithm has influenced later designs, such as the MD5, SHA-1 and RIPEMD algorithms. The initialism "MD" s ...
-variant hash is combined with the file size, providing sufficient information for locating file sources, downloading the file, and verifying its contents.
Magnet links Magnet is a URI scheme that defines the format of magnet links, a de facto standard for identifying files ( URN) by their content, via cryptographic hash value rather than by their location. Although magnet links can be used in a number of con ...
are another example. Such file hashes are often the top hash of a
hash list In computer science, a hash list is typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup (hash tables) and distributed databases (distributed has ...
or a
hash tree Hash, hashes, hash mark, or hashing may refer to: Substances * Hash (food), a coarse mixture of ingredients * Hash, a nickname for hashish, a cannabis product Hash mark *Hash mark (sports), a marking on hockey rinks and gridiron football fiel ...
which allows for additional benefits. One of the main applications of a
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
is to allow the fast look-up of data in a
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
. Being hash functions of a particular kind, cryptographic hash functions lend themselves well to this application too. However, compared with standard hash functions, cryptographic hash functions tend to be much more expensive computationally. For this reason, they tend to be used in contexts where it is necessary for users to protect themselves against the possibility of forgery (the creation of data with the same digest as the expected data) by potentially malicious participants.


Hash functions based on block ciphers

There are several methods to use a block cipher to build a cryptographic hash function, specifically a
one-way compression function In cryptography, a one-way compression function is a function that transforms two fixed-length inputs into a fixed-length output. The transformation is "one-way", meaning that it is difficult given a particular output to compute inputs which compre ...
. The methods resemble the
block cipher modes of operation In cryptography, a block cipher mode of operation is an algorithm that uses a block cipher to provide information security such as confidentiality or authenticity. A block cipher by itself is only suitable for the secure cryptographic transform ...
usually used for encryption. Many well-known hash functions, including
MD4 The MD4 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1990. The digest length is 128 bits. The algorithm has influenced later designs, such as the MD5, SHA-1 and RIPEMD algorithms. The initialism "MD" s ...
, MD5,
SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20- byte) hash value known as a message digest – typically rendered as 40 hexadec ...
and
SHA-2 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression ...
, are built from block-cipher-like components designed for the purpose, with feedback to ensure that the resulting function is not invertible.
SHA-3 SHA-3 (Secure Hash Algorithm 3) is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like struc ...
finalists included functions with block-cipher-like components (e.g.,
Skein Skein may refer to: * A flock of geese or ducks in flight * A wound ball of yarn with a centre pull strand; see Hank * A metal piece fitted over the end of a wagon axle, to which the wheel is mounted * Skein (unit), a unit of length used by wea ...
,
BLAKE Blake is a surname which originated from Old English. Its derivation is uncertain; it could come from "blac", a nickname for someone who had dark hair or skin, or from "blaac", a nickname for someone with pale hair or skin. Another theory, presuma ...
) though the function finally selected,
Keccak SHA-3 (Secure Hash Algorithm 3) is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like stru ...
, was built on a cryptographic sponge instead. A standard block cipher such as AES can be used in place of these custom block ciphers; that might be useful when an
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...
needs to implement both encryption and hashing with minimal code size or hardware area. However, that approach can have costs in efficiency and security. The ciphers in hash functions are built for hashing: they use large keys and blocks, can efficiently change keys every block, and have been designed and vetted for resistance to
related-key attack In cryptography, a related-key attack is any form of cryptanalysis where the attacker can observe the operation of a cipher under several different keys whose values are initially unknown, but where some mathematical relationship connecting the ke ...
s. General-purpose ciphers tend to have different design goals. In particular, AES has key and block sizes that make it nontrivial to use to generate long hash values; AES encryption becomes less efficient when the key changes each block; and related-key attacks make it potentially less secure for use in a hash function than for encryption.


Hash function design


Merkle–Damgård construction

A hash function must be able to process an arbitrary-length message into a fixed-length output. This can be achieved by breaking the input up into a series of equally sized blocks, and operating on them in sequence using a
one-way compression function In cryptography, a one-way compression function is a function that transforms two fixed-length inputs into a fixed-length output. The transformation is "one-way", meaning that it is difficult given a particular output to compute inputs which compre ...
. The compression function can either be specially designed for hashing or be built from a block cipher. A hash function built with the Merkle–Damgård construction is as resistant to collisions as is its compression function; any collision for the full hash function can be traced back to a collision in the compression function. The last block processed should also be unambiguously length padded; this is crucial to the security of this construction. This construction is called the
Merkle–Damgård construction In cryptography, the Merkle–Damgård construction or Merkle–Damgård hash function is a method of building collision-resistant cryptographic hash functions from collision-resistant one-way compression functions. Goldwasser, S. and Bellare, M ...
. Most common classical hash functions, including
SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20- byte) hash value known as a message digest – typically rendered as 40 hexadec ...
and MD5, take this form.


Wide pipe versus narrow pipe

A straightforward application of the Merkle–Damgård construction, where the size of hash output is equal to the internal state size (between each compression step), results in a narrow-pipe hash design. This design causes many inherent flaws, including length-extension, multicollisions, long message attacks, generate-and-paste attacks, and also cannot be parallelized. As a result, modern hash functions are built on wide-pipe constructions that have a larger internal state size – which range from tweaks of the Merkle–Damgård construction to new constructions such as the
sponge construction In cryptography, a sponge function or sponge construction is any of a class of algorithms with finite state (computer science), internal state that take an input bit stream of any length and produce an output bit stream of any desired length. Spon ...
and
HAIFA construction The HAIFA construction (''hash iterative framework'') is a cryptographic structure used in the design of Cryptographic hash function, hash functions. It is one of the modern alternatives to the Merkle–Damgård construction, avoiding its weakness ...
. None of the entrants in the
NIST hash function competition The NIST hash function competition was an open competition held by the US National Institute of Standards and Technology (NIST) to develop a new hash function called SHA-3 to complement the older SHA-1 and SHA-2. The competition was formally an ...
use a classical Merkle–Damgård construction. Meanwhile, truncating the output of a longer hash, such as used in SHA-512/256, also defeats many of these attacks.


Use in building other cryptographic primitives

Hash functions can be used to build other
cryptographic primitives Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
. For these other primitives to be cryptographically secure, care must be taken to build them correctly.
Message authentication code In cryptography, a message authentication code (MAC), sometimes known as a ''tag'', is a short piece of information used for authenticating a message. In other words, to confirm that the message came from the stated sender (its authenticity) and ...
s (MACs) (also called keyed hash functions) are often built from hash functions. HMAC is such a MAC. Just as block ciphers can be used to build hash functions, hash functions can be used to build block ciphers. Luby-Rackoff constructions using hash functions can be provably secure if the underlying hash function is secure. Also, many hash functions (including
SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20- byte) hash value known as a message digest – typically rendered as 40 hexadec ...
and
SHA-2 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression ...
) are built by using a special-purpose block cipher in a Davies–Meyer or other construction. That cipher can also be used in a conventional mode of operation, without the same security guarantees; for example,
SHACAL SHACAL-1 (originally simply SHACAL) is a 160-bit block cipher based on SHA-1, and supports keys from 128-bit to 512-bit. SHACAL-2 is a 256-bit block cipher based upon the larger hash function SHA-256. Both SHACAL-1 and SHACAL-2 were selected fo ...
,
BEAR Bears are carnivoran mammals of the family Ursidae. They are classified as caniforms, or doglike carnivorans. Although only eight species of bears are extant, they are widespread, appearing in a wide variety of habitats throughout the Nor ...
and
LION The lion (''Panthera leo'') is a large Felidae, cat of the genus ''Panthera'' native to Africa and India. It has a muscular, broad-chested body; short, rounded head; round ears; and a hairy tuft at the end of its tail. It is sexually dimorphi ...
.
Pseudorandom number generator A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generate ...
s (PRNGs) can be built using hash functions. This is done by combining a (secret) random seed with a counter and hashing it. Some hash functions, such as
Skein Skein may refer to: * A flock of geese or ducks in flight * A wound ball of yarn with a centre pull strand; see Hank * A metal piece fitted over the end of a wagon axle, to which the wheel is mounted * Skein (unit), a unit of length used by wea ...
,
Keccak SHA-3 (Secure Hash Algorithm 3) is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like stru ...
, and
RadioGatún RadioGatún is a cryptographic hash primitive created by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. It was first publicly presented at the NIST Second Cryptographic Hash Workshop, held in Santa Barbara, California, on ...
, output an arbitrarily long stream and can be used as a
stream cipher stream cipher is a symmetric key cipher where plaintext digits are combined with a pseudorandom cipher digit stream (keystream). In a stream cipher, each plaintext digit is encrypted one at a time with the corresponding digit of the keystream ...
, and stream ciphers can also be built from fixed-length digest hash functions. Often this is done by first building a
cryptographically secure pseudorandom number generator A cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG) is a pseudorandom number generator (PRNG) with properties that make it suitable for use in cryptography. It is also loosely kno ...
and then using its stream of random bytes as
keystream In cryptography, a keystream is a stream of random or pseudorandom characters that are combined with a plaintext message to produce an encrypted message (the ciphertext). The "characters" in the keystream can be bits, bytes, numbers or actual chara ...
.
SEAL Seal may refer to any of the following: Common uses * Pinniped, a diverse group of semi-aquatic marine mammals, many of which are commonly called seals, particularly: ** Earless seal, or "true seal" ** Fur seal * Seal (emblem), a device to impr ...
is a stream cipher that uses
SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20- byte) hash value known as a message digest – typically rendered as 40 hexadec ...
to generate internal tables, which are then used in a keystream generator more or less unrelated to the hash algorithm. SEAL is not guaranteed to be as strong (or weak) as SHA-1. Similarly, the key expansion of the HC-128 and HC-256 stream ciphers makes heavy use of the
SHA-256 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression ...
hash function.


Concatenation

Concatenating In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenati ...
outputs from multiple hash functions provide collision resistance as good as the strongest of the algorithms included in the concatenated result. For example, older versions of Transport Layer Security (TLS) and Secure Sockets Layer (SSL) used concatenated MD5 and
SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20- byte) hash value known as a message digest – typically rendered as 40 hexadec ...
sums. This ensures that a method to find collisions in one of the hash functions does not defeat data protected by both hash functions. For
Merkle–Damgård construction In cryptography, the Merkle–Damgård construction or Merkle–Damgård hash function is a method of building collision-resistant cryptographic hash functions from collision-resistant one-way compression functions. Goldwasser, S. and Bellare, M ...
hash functions, the concatenated function is as collision-resistant as its strongest component, but not more collision-resistant.
Antoine Joux Antoine Joux (born 1967) is a French cryptographer,"Antoine Joux, Prix Gödel 2013"
Bullet ...
observed that 2-collisions lead to -collisions: if it is feasible for an attacker to find two messages with the same MD5 hash, then they can find as many additional messages with that same MD5 hash as they desire, with no greater difficulty. Among those messages with the same MD5 hash, there is likely to be a collision in SHA-1. The additional work needed to find the SHA-1 collision (beyond the exponential birthday search) requires only
polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
.


Cryptographic hash algorithms

There are many cryptographic hash algorithms; this section lists a few algorithms that are referenced relatively often. A more extensive list can be found on the page containing a comparison of cryptographic hash functions.


MD5

MD5 was designed by
Ronald Rivest Ronald Linn Rivest (; born May 6, 1947) is a cryptographer and an Institute Professor at MIT. He is a member of MIT's Department of Electrical Engineering and Computer Science (EECS) and a member of MIT's Computer Science and Artificial Int ...
in 1991 to replace an earlier hash function, MD4, and was specified in 1992 as RFC 1321. Collisions against MD5 can be calculated within seconds which makes the algorithm unsuitable for most use cases where a cryptographic hash is required. MD5 produces a digest of 128 bits (16 bytes).


SHA-1

SHA-1 was developed as part of the U.S. Government's
Capstone CAPSTONE (Cislunar Autonomous Positioning System Technology Operations and Navigation Experiment) is a lunar orbiter that will test and verify the calculated orbital stability planned for the Lunar Gateway space station. The spacecraft is a 12- ...
project. The original specification – now commonly called SHA-0 – of the algorithm was published in 1993 under the title Secure Hash Standard, FIPS PUB 180, by U.S. government standards agency NIST (National Institute of Standards and Technology). It was withdrawn by the NSA shortly after publication and was superseded by the revised version, published in 1995 in FIPS  PUB 180-1 and commonly designated SHA-1. Collisions against the full SHA-1 algorithm can be produced using the shattered attack and the hash function should be considered broken. SHA-1 produces a hash digest of 160 bits (20 bytes). Documents may refer to SHA-1 as just "SHA", even though this may conflict with the other Secure Hash Algorithms such as SHA-0, SHA-2, and SHA-3.


RIPEMD-160

RIPEMD (RACE Integrity Primitives Evaluation Message Digest) is a family of cryptographic hash functions developed in Leuven, Belgium, by Hans Dobbertin, Antoon Bosselaers, and Bart Preneel at the COSIC research group at the Katholieke Universiteit Leuven, and first published in 1996. RIPEMD was based upon the design principles used in MD4 and is similar in performance to the more popular SHA-1. RIPEMD-160 has, however, not been broken. As the name implies, RIPEMD-160 produces a hash digest of 160 bits (20 bytes).


Whirlpool

Whirlpool is a cryptographic hash function designed by Vincent Rijmen and Paulo S. L. M. Barreto, who first described it in 2000. Whirlpool is based on a substantially modified version of the Advanced Encryption Standard (AES). Whirlpool produces a hash digest of 512 bits (64 bytes).


SHA-2

SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA), first published in 2001. They are built using the Merkle–Damgård structure, from a one-way compression function itself built using the Davies–Meyer structure from a (classified) specialized block cipher. SHA-2 basically consists of two hash algorithms: SHA-256 and SHA-512. SHA-224 is a variant of SHA-256 with different starting values and truncated output. SHA-384 and the lesser-known SHA-512/224 and SHA-512/256 are all variants of SHA-512. SHA-512 is more secure than SHA-256 and is commonly faster than SHA-256 on 64-bit machines such as
AMD64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
. The output size in bits is given by the extension to the "SHA" name, so SHA-224 has an output size of 224 bits (28 bytes); SHA-256, 32 bytes; SHA-384, 48 bytes; and SHA-512, 64 bytes.


SHA-3

SHA-3 (Secure Hash Algorithm 3) was released by NIST on August 5, 2015. SHA-3 is a subset of the broader cryptographic primitive family Keccak. The Keccak algorithm is the work of Guido Bertoni, Joan Daemen, Michael Peeters, and Gilles Van Assche. Keccak is based on a sponge construction which can also be used to build other cryptographic primitives such as a stream cipher. SHA-3 provides the same output sizes as SHA-2: 224, 256, 384, and 512 bits. Configurable output sizes can also be obtained using the SHAKE-128 and SHAKE-256 functions. Here the -128 and -256 extensions to the name imply the
security strength In cryptography, security level is a measure of the strength that a cryptographic primitive — such as a cipher or hash function — achieves. Security level is usually expressed as a number of " bits of security" (also security strengt ...
of the function rather than the output size in bits.


BLAKE2

BLAKE2, an improved version of BLAKE, was announced on December 21, 2012. It was created by Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O'Hearn, and Christian Winnerlein with the goal of replacing the widely used but broken MD5 and SHA-1 algorithms. When run on 64-bit x64 and ARM architectures, BLAKE2b is faster than SHA-3, SHA-2, SHA-1, and MD5. Although BLAKE and BLAKE2 have not been standardized as SHA-3 has, BLAKE2 has been used in many protocols including the
Argon2 Argon2 is a key derivation function that was selected as the winner of the 2015 Password Hashing Competition. It was designed by Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich from the University of Luxembourg. The reference implementation o ...
password hash, for the high efficiency that it offers on modern CPUs. As BLAKE was a candidate for SHA-3, BLAKE and BLAKE2 both offer the same output sizes as SHA-3 – including a configurable output size.


BLAKE3

BLAKE3, an improved version of BLAKE2, was announced on January 9, 2020. It was created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and Zooko Wilcox-O'Hearn. BLAKE3 is a single algorithm, in contrast to BLAKE and BLAKE2, which are algorithm families with multiple variants. The BLAKE3 compression function is closely based on that of BLAKE2s, with the biggest difference being that the number of rounds is reduced from 10 to 7. Internally, BLAKE3 is a
Merkle tree In cryptography and computer science, a hash tree or Merkle tree is a tree in which every "leaf" (node) is labelled with the cryptographic hash of a data block, and every node that is not a leaf (called a ''branch'', ''inner node'', or ''inode'') ...
, and it supports higher degrees of parallelism than BLAKE2.


Attacks on cryptographic hash algorithms

There is a long list of cryptographic hash functions but many have been found to be vulnerable and should not be used. For instance, NIST selected 51 hash functionsAndrew Regenscheid, Ray Perlner, Shu-Jen Chang, John Kelsey, Mridul Nandi, Souradyuti Paul
Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition
/ref> as candidates for round 1 of the SHA-3 hash competition, of which 10 were considered broken and 16 showed significant weaknesses and therefore did not make it to the next round; more information can be found on the main article about the
NIST hash function competition The NIST hash function competition was an open competition held by the US National Institute of Standards and Technology (NIST) to develop a new hash function called SHA-3 to complement the older SHA-1 and SHA-2. The competition was formally an ...
s. Even if a hash function has never been broken, a successful attack against a weakened variant may undermine the experts' confidence. For instance, in August 2004 collisions were found in several then-popular hash functions, including MD5.XiaoyunWang, Dengguo Feng, Xuejia Lai, Hongbo Yu
Collisions for Hash Functions MD4, MD5, HAVAL-128, and RIPEMD
/ref> These weaknesses called into question the security of stronger algorithms derived from the weak hash functions – in particular, SHA-1 (a strengthened version of SHA-0), RIPEMD-128, and RIPEMD-160 (both strengthened versions of RIPEMD). On August 12, 2004, Joux, Carribault, Lemuel, and Jalby announced a collision for the full SHA-0 algorithm. Joux et al. accomplished this using a generalization of the Chabaud and Joux attack. They found that the collision had complexity and took about 80,000 CPU hours on a
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...
with 256 Itanium 2 processors – equivalent to 13 days of full-time use of the supercomputer. In February 2005, an attack on SHA-1 was reported that would find collision in about 269 hashing operations, rather than the 280 expected for a 160-bit hash function. In August 2005, another attack on SHA-1 was reported that would find collisions in 263 operations. Other theoretical weaknesses of SHA-1 have been known:Xiaoyun Wang,
Yiqun Lisa Yin Yiqun Lisa Yin is a Chinese-American cryptographer and independent security consultant. Yin is known for breaking the SHA-1 cryptographic hash function, for developing the RC6 block cipher, and for her service as editor of the IEEE P1363 project fo ...
, and Hongbo Yu
Finding Collisions in the Full SHA-1
/ref>Bruce Schneier

(summarizes Wang et al. results and their implications)
and in February 2017 Google announced a collision in SHA-1. Security researchers recommend that new applications can avoid these problems by using later members of the SHA family, such as
SHA-2 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression ...
, or using techniques such as randomized hashingShai Halevi and Hugo Krawczyk
Randomized Hashing and Digital Signatures
/ref> that do not require collision resistance. A successful, practical attack broke MD5 used within certificates for
Transport Layer Security Transport Layer Security (TLS) is a cryptographic protocol designed to provide communications security over a computer network. The protocol is widely used in applications such as email, instant messaging, and voice over IP, but its use in securi ...
in 2008.Alexander Sotirov, Marc Stevens, Jacob Appelbaum, Arjen Lenstra, David Molnar, Dag Arne Osvik, Benne de Weger
MD5 considered harmful today: Creating a rogue CA certificate
accessed March 29, 2009.
Many cryptographic hashes are based on the
Merkle–Damgård construction In cryptography, the Merkle–Damgård construction or Merkle–Damgård hash function is a method of building collision-resistant cryptographic hash functions from collision-resistant one-way compression functions. Goldwasser, S. and Bellare, M ...
. All cryptographic hashes that directly use the full output of a Merkle–Damgård construction are vulnerable to
length extension attack In cryptography and computer security, a length extension attack is a type of attack where an attacker can use Hash(''message1'') and the length of ''message1'' to calculate Hash(''message1'' ‖ ''message2'') for an attacker-controlled ''message2 ...
s. This makes the MD5, SHA-1, RIPEMD-160, Whirlpool, and the SHA-256 / SHA-512 hash algorithms all vulnerable to this specific attack. SHA-3, BLAKE2, BLAKE3, and the truncated SHA-2 variants are not vulnerable to this type of attack.


Attacks on hashed passwords

A common use of hashes is to store password authentication data. Rather than store the plaintext of user passwords, a controlled access system stores the hash of each user's password in a file or database. When someone requests access, the password they submit is hashed and compared with the stored value. If the database is stolen (an all too frequent occurrence), the thief will only have the hash values, not the passwords. However, most people choose passwords in predictable ways. Lists of common passwords are widely circulated and many passwords are short enough that all possible combinations can be tested if fast hashes are used. The use of
cryptographic salt In cryptography, a salt is random data that is used as an additional input to a one-way function that hashes data, a password or passphrase. Salts are used to safeguard passwords in storage. Historically, only the output from an invocation of a ...
prevents some attacks, such as building files of precomputing hash values, e.g. rainbow tables. But searches on the order of 100 billion tests per second are possible with high-end
graphics processor A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s, making direct attacks possible even with salt. The United States
National Institute of Standards and Technology The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into physical sci ...
recommends storing passwords using special hashes called key derivation functions (KDFs) that have been created to slow brute force searches. Slow hashes include
pbkdf2 In cryptography, PBKDF1 and PBKDF2 (Password-Based Key Derivation Function 1 and 2) are key derivation functions with a sliding computational cost, used to reduce vulnerabilities of brute-force attacks. PBKDF2 is part of RSA Laboratories' Publ ...
,
bcrypt bcrypt is a password-hashing function designed by Niels Provos and David Mazières, based on the Blowfish cipher and presented at USENIX in 1999. Besides incorporating a salt to protect against rainbow table attacks, bcrypt is an adaptive fu ...
,
scrypt In cryptography, scrypt (pronounced "ess crypt") is a password-based key derivation function created by Colin Percival in March 2009, originally for the Tarsnap online backup service. The algorithm was specifically designed to make it costly ...
,
argon2 Argon2 is a key derivation function that was selected as the winner of the 2015 Password Hashing Competition. It was designed by Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich from the University of Luxembourg. The reference implementation ...
,
Balloon A balloon is a flexible bag that can be inflated with a gas, such as helium, hydrogen, nitrous oxide, oxygen, and air. For special tasks, balloons can be filled with smoke, liquid water, granular media (e.g. sand, flour or rice), or light so ...
and some recent modes of Unix crypt. For KSFs that perform multiple hashes to slow execution, NIST recommends an iteration count of 10,000 or more.


See also

*
Avalanche effect In cryptography, the avalanche effect is the desirable property of cryptographic algorithms, typically block ciphers and cryptographic hash functions, wherein if an input is changed slightly (for example, flipping a single bit), the output changes ...
* Comparison of cryptographic hash functions * Cryptographic agility *
CRYPTREC CRYPTREC is the Cryptography Research and Evaluation Committees set up by the Japanese Government to evaluate and recommend cryptographic techniques for government and industrial use. It is comparable in many respects to the European Union's NESSIE ...
*
File fixity File fixity is a digital preservation term referring to the property of a digital file being fixed, or unchanged. Fixity checking is the process of verifying that a digital object has not been altered or corrupted. During transfer, a repository may ...
* HMAC *
Hash chain A hash chain is the successive application of a cryptographic hash function to a piece of data. In computer security, a hash chain is a method to produce many one-time keys from a single key or password. For non-repudiation a hash function can ...
*
Length extension attack In cryptography and computer security, a length extension attack is a type of attack where an attacker can use Hash(''message1'') and the length of ''message1'' to calculate Hash(''message1'' ‖ ''message2'') for an attacker-controlled ''message2 ...
* MD5CRK *
Message authentication code In cryptography, a message authentication code (MAC), sometimes known as a ''tag'', is a short piece of information used for authenticating a message. In other words, to confirm that the message came from the stated sender (its authenticity) and ...
*
NESSIE NESSIE (New European Schemes for Signatures, Integrity and Encryption) was a European research project funded from 2000 to 2003 to identify secure cryptographic primitives. The project was comparable to the NIST AES process and the Japanese Gov ...
*
PGP word list The PGP Word List ("Pretty Good Privacy word list", also called a biometric word list for reasons explained below) is a list of words for conveying data bytes in a clear unambiguous way via a voice channel. They are analogous in purpose to the NATO ...
*
Random oracle In cryptography, a random oracle is an oracle (a theoretical black box) that responds to every ''unique query'' with a (truly) random response chosen uniformly from its output domain. If a query is repeated, it responds the same way every time th ...
* Security of cryptographic hash functions *
SHA-3 SHA-3 (Secure Hash Algorithm 3) is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like struct ...
*
Universal one-way hash function In cryptography a universal one-way hash function (UOWHF, often pronounced "woof"), is a type of universal hash function of particular importance to cryptography. UOWHF's are proposed as an alternative to collision-resistant hash functions (CRHFs ...


References


Citations


Sources

* * * * * * * * * *


External links

* (companion web site contains online cryptography course that covers hash functions) * *
Open source python based application with GUI used to verify downloads.
{{DEFAULTSORT:Cryptographic Hash Function Cryptography Cryptographic primitives Hashing