cryptography Cryptography, or cryptology (from "hidden, secret"; and ''graphein'', "to write", or ''-logy, -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of Adversary (cryptography), ...

cryptographic hash functions A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with a fixed size of n bits) that has special properties desirable for a cryptographic application: * the probability of a particula ...

can be divided into two main categories. In the first category are those functions whose designs are based on mathematical problems, and whose security thus follows from rigorous mathematical proofs, complexity theory and formal reduction. These functions are called ''provably secure cryptographic hash functions''. To construct these is very difficult, and few examples have been introduced. Their practical use is limited. In the second category are functions which are not based on mathematical problems, but on an ad-hoc constructions, in which the bits of the message are mixed to produce the hash. These are then believed to be hard to break, but no formal proof is given. Almost all hash functions in widespread use reside in this category. Some of these functions are already broken, and are no longer in use. ''See''

Hash function security summary This article summarizes publicly known attacks against cryptographic hash functions. Note that not all entries may be up to date. For a summary of other hash function parameters, see comparison of cryptographic hash functions. Table color key ...

Types of security of hash functions

Generally, the ''basic'' security of

can be seen from different angles: pre-image resistance, second pre-image resistance, collision resistance, and pseudo-randomness. * Pre-image resistance: given a hash , it should be ''hard'' to find any message such that . This concept is related to that of the

one-way function In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, s ...

. Functions that lack this property are vulnerable to

pre-image attack In cryptography, a preimage attack on cryptographic hash functions tries to find a message that has a specific hash value. A cryptographic hash function should resist attacks on its preimage (set of possible inputs). In the context of attack, the ...

s. * Second pre-image resistance: given an input , it should be ''hard'' to find another input such that . This property is sometimes referred to as weak collision resistance. Functions that lack this property are vulnerable to second pre-image attacks. * Collision resistance: it should be ''hard'' to find two different messages and such that . Such a pair is called a (cryptographic) hash collision. This property is sometimes referred to as strong collision resistance. It requires a hash value at least twice as long as what is required for pre-image resistance; otherwise, collisions may be found by a

birthday attack A birthday attack is a bruteforce collision attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likeli ...

. * Pseudo-randomness: it should be ''hard'' to distinguish a pseudo-random number generator based on the hash function from true random number generator; for example, it passes usual randomness tests.

The meaning of ''hard''

The basic question is the meaning of ''hard''. There are two approaches to answer this question. First is the intuitive/practical approach: "''hard'' means that it is almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important." The second approach is theoretical and is based on the

computational complexity theory In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and explores the relationships between these classifications. A computational problem ...

: if problem ''A'' is hard, then there exists a formal security reduction from a problem which is widely considered unsolvable in

polynomial time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...

, such as

integer factorization In mathematics, integer factorization is the decomposition of a positive integer into a product of integers. Every positive integer greater than 1 is either the product of two or more integer factors greater than 1, in which case it is a comp ...

or the

discrete logarithm In mathematics, for given real numbers a and b, the logarithm \log_b(a) is a number x such that b^x=a. Analogously, in any group G, powers b^k can be defined for all integers k, and the discrete logarithm \log_b(a) is an integer k such that b^k=a ...

problem. However, non-existence of a polynomial time algorithm does not automatically ensure that the system is secure. The difficulty of a problem also depends on its size. For example, RSA public-key cryptography (which relies on the difficulty of

) is considered secure only with keys that are at least 2048 bits long, whereas keys for the ElGamal cryptosystem (which relies on the difficulty of the

problem) are commonly in the range of 256–512 bits.

Password case

If the set of inputs to the hash is relatively small or is ordered by likelihood in some way, then a brute force search may be practical, regardless of theoretical security. The likelihood of recovering the preimage depends on the input set size and the speed or cost of computing the hash function. A common example is the use of hashes to store

password A password, sometimes called a passcode, is secret data, typically a string of characters, usually used to confirm a user's identity. Traditionally, passwords were expected to be memorized, but the large number of password-protected services t ...

validation data. Rather than store the plaintext of user passwords, an access control system typically stores a hash of the password. When a person requests access, the password they submit is hashed and compared with the stored value. If the stored validation data is stolen, then the thief will only have the hash values, not the passwords. However, most users choose passwords in predictable ways, and passwords are often short enough so that all possible combinations can be tested if fast hashes are used. Special hashes called

key derivation function In cryptography, a key derivation function (KDF) is a cryptographic algorithm that derives one or more secret keys from a secret value such as a master key, a password, or a passphrase using a pseudorandom function (which typically uses a cr ...

s have been created to slow searches. ''See''

Password cracking In cryptanalysis and computer security, password cracking is the process of guessing passwords protecting a computer system. A common approach (brute-force attack) is to repeatedly try guesses for the password and to check them against an availab ...

Cryptographic hash functions

Most hash functions are built on an ad-hoc basis, where the bits of the message are nicely mixed to produce the hash. Various

bitwise operation In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operatio ...

s (e.g. rotations), modular additions, and compression functions are used in iterative mode to ensure high complexity and pseudo-randomness of the output. In this way, the security is very hard to prove and the proof is usually not done. Only a few years ago, one of the most popular hash functions,

SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecimal digits. It was designed by the United States ...

, was shown to be less secure than its length suggested: collisions could be found in only 2⁵¹ tests, rather than the brute-force number of 2⁸⁰. In other words, most of the hash functions in use nowadays are not provably collision-resistant. These hashes are not based on purely mathematical functions. This approach results generally in more effective hashing functions, but with the risk that a weakness of such a function will be eventually used to find collisions. One famous case is

MD5 The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as Request for Comments, RFC 1321. MD5 ...

Provably secure hash functions

In this approach, the security of a hash function is based on some hard mathematical problem, and it is proved that finding collisions of the hash function is as hard as breaking the underlying problem. This gives a somewhat stronger notion of security than just relying on complex mixing of bits as in the classical approach. A cryptographic hash function has ''provable security against collision attacks'' if finding collisions is provably polynomial-time reducible from a problem ''P'' which is supposed to be unsolvable in polynomial time. The function is then called provably secure, or just provable. It means that if finding collisions would be feasible in polynomial time by algorithm ''A'', then one could find and use polynomial time algorithm ''R'' (reduction algorithm) that would use algorithm ''A'' to solve problem ''P'', which is widely supposed to be unsolvable in polynomial time. That is a contradiction. This means that finding collisions cannot be easier than solving ''P''. However, this only indicates that finding collisions is difficult in ''some'' cases, as not all instances of a computationally hard problem are typically hard. Indeed, very large instances of NP-hard problems are routinely solved, while only the hardest are practically impossible to solve.

Hard problems

Examples of problems that are assumed to be not solvable in polynomial time include *

Discrete logarithm In mathematics, for given real numbers a and b, the logarithm \log_b(a) is a number x such that b^x=a. Analogously, in any group G, powers b^k can be defined for all integers k, and the discrete logarithm \log_b(a) is an integer k such that b^k=a ...

* Modular square roots *

Integer factorization In mathematics, integer factorization is the decomposition of a positive integer into a product of integers. Every positive integer greater than 1 is either the product of two or more integer factors greater than 1, in which case it is a comp ...

* Subset sum

Downsides of the provable approach

* Current collision-resistant hash algorithms that have provable security

reductions Reductions (, also called ; ) were settlements established by Spanish rulers and Roman Catholic missionaries in Spanish America and the Spanish East Indies (the Philippines). In Portuguese-speaking Latin America, such reductions were also ...

are too inefficient to be used in practice. In comparison to classical hash functions, they tend to be relatively slow and do not always meet all of criteria traditionally expected of cryptographic hashes.

Very smooth hash In cryptography, Very Smooth Hash (VSH) is a secure cryptographic hash function invented in 2005 by Scott Contini, Arjen Lenstra, and Ron Steinfeld. Provably secure means that finding collisions is as difficult as some known hard mathematical pr ...

is an example. * Constructing a hash function with provable security is much more difficult than using a classical approach where one just hopes that the complex mixing of bits in the hashing algorithm is strong enough to prevent adversary from finding collisions. * The proof is often a reduction to a problem with asymptotically hard worst-case or

average-case complexity In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case comp ...

. Worst-case complexity measures the difficulty of solving pathological cases rather than typical cases of the underlying problem. Even a reduction to a problem with hard average-case complexity offers only limited security, as there still can be an algorithm that easily solves the problem for a subset of the problem space. For example, early versions of Fast Syndrome Based Hash turned out to be insecure. This problem was solved in the latest version. SWIFFT is an example of a hash function that circumvents these security problems. It can be shown that, for any algorithm that can break SWIFFT with probability within an estimated time , one can find an algorithm that solves the ''worst-case'' scenario of a certain difficult mathematical problem within time depending on and .

Example of (impractical) provably secure hash function

Let , where is a hard-to-factor composite number, and is some prespecified base value. A collision reveals a multiple of the

multiplicative order In number theory, given a positive integer ''n'' and an integer ''a'' coprime to ''n'', the multiplicative order of ''a'' modulo ''n'' is the smallest positive integer ''k'' such that a^k\ \equiv\ 1 \pmod n. In other words, the multiplicative orde ...

of modulo . This information can be used to factor in polynomial time, assuming certain properties of . But the algorithm is quite inefficient because it requires on average 1.5 multiplications modulo per message-bit.

More practical provably secure hash functions

* VSH—Very Smooth Hash—a provably secure collision-resistant hash function assuming the hardness of finding nontrivial modular square roots modulo composite number (this is proven to be as hard as factoring ). * MuHASH * ECOH— Elliptic Curve Only hash function—based on the concept of

elliptic curves In mathematics, an elliptic curve is a Smoothness, smooth, Projective variety, projective, algebraic curve of Genus of an algebraic curve, genus one, on which there is a specified point . An elliptic curve is defined over a field (mathematics), ...

, the

subset sum problem The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset S of integers and a target-sum T, and the question is to decide whether any subset of the integers sum to precisely T''.'' ...

, and summation of polynomials. The security proof of the collision resistance was based on weakened assumptionsm, and eventually a second pre-image attack was found. * FSB— Fast Syndrome-Based hash function—it can be proven that breaking FSB is at least as difficult as solving ''regular syndrome decoding'', which is known to be NP-complete. * SWIFFT—SWIFFT is based on the

fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in ...

and is provably collision resistant, under a relatively mild assumption about the worst-case difficulty of finding short vectors in cyclic/ ideal lattices. * Chaum, van Heijst, Pfitzmann hash function—a compression function where finding collisions is as hard as solving the

discrete logarithm problem In mathematics, for given real numbers a and b, the logarithm \log_b(a) is a number x such that b^x=a. Analogously, in any group G, powers b^k can be defined for all integers k, and the discrete logarithm \log_b(a) is an integer k such that b^k=a ...

in a finite group . * Knapsack-based hash functions—a family of hash functions based on the

knapsack problem The knapsack problem is the following problem in combinatorial optimization: :''Given a set of items, each with a weight and a value, determine which items to include in the collection so that the total weight is less than or equal to a given lim ...

. * The Zémor-Tillich hash function—a family of hash functions that relies on the arithmetic of the group of matrices SL₂. Finding collisions is at least as difficult as finding factorization of certain elements in this group. This is supposed to be hard, at least

PSPACE-complete In computational complexity theory, a decision problem is PSPACE-complete if it can be solved using an amount of memory that is polynomial in the input length (PSPACE, polynomial space) and if every other problem that can be solved in polynomial sp ...

. For this hash, an attack was eventually discovered with a time complexity close to . This beat by far the birthday bound and ideal pre-image complexities, which are and for the Zémor-Tillich hash function. As the attacks include a birthday search in a reduced set of size , they indeed do not destroy the idea of provable security or invalidate the scheme but rather suggest that the initial parameters were too small. * Hash functions from Sigma Protocols—there exists a general way of constructing a provably secure hash, specifically from any (suitable) sigma protocol. A faster version of VSH (called VSH*) could be obtained in this way.

References

{{Cryptography navbox , hash Cryptographic hash functions