Provably Secure Cryptographic Hash Function
   HOME

TheInfoList



OR:

In
cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or '' -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adv ...
,
cryptographic hash functions A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output r ...
can be divided into two main categories. In the first category are those functions whose designs are based on mathematical problems, and whose security thus follows from rigorous mathematical proofs, complexity theory and formal reduction. These functions are called Provably Secure Cryptographic Hash Functions. To construct these is very difficult, and few examples have been introduced. Their practical use is limited. In the second category are functions which are not based on mathematical problems, but on an ad-hoc constructions, in which the bits of the message are mixed to produce the hash. These are then believed to be hard to break, but no formal proof is given. Almost all hash functions in widespread use reside in this category. Some of these functions are already broken, and are no longer in use. ''See''
Hash function security summary This article summarizes publicly known attacks against cryptographic hash functions. Note that not all entries may be up to date. For a summary of other hash function parameters, see comparison of cryptographic hash functions. Table color key ...
.


Types of security of hash functions

Generally, the ''basic'' security of
cryptographic hash functions A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output r ...
can be seen from different angles: pre-image resistance, second pre-image resistance, collision resistance, and pseudo-randomness. * Pre-image resistance: given a hash h it should be ''hard'' to find any message m such that h = hash(m). This concept is related to that of the
one-way function In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, s ...
. Functions that lack this property are vulnerable to
pre-image attack In cryptography, a preimage attack on cryptographic hash functions tries to find a message that has a specific hash value. A cryptographic hash function should resist attacks on its preimage (set of possible inputs). In the context of attack, the ...
s. * Second pre-image resistance: given an input m_1, it should be ''hard'' to find another input, m_2 (not equal to m_1) such that hash(m_1) = hash(m_2). This property is sometimes referred to as weak collision resistance. Functions that lack this property are vulnerable to
second pre-image attack The second (symbol: s) is the unit of time in the International System of Units (SI), historically defined as of a day – this factor derived from the division of the day first into 24 hours, then to 60 minutes and finally to 60 seconds each ...
s. * Collision resistance: it should be ''hard'' to find two different messages m_1 and m_2 such that hash(m_1) = hash(m_2). Such a pair is called a (cryptographic) hash collision. This property is sometimes referred to as strong collision resistance. It requires a hash value at least twice as long as what is required for pre-image resistance, otherwise collisions may be found by a
birthday attack A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likel ...
. * Pseudo-randomness: it should be ''hard'' to distinguish the pseudo-random number generator based on the hash function from a random number generator, e.g., it passes usual randomness tests.


The meaning of "hard"

The basic question is the meaning of "hard". There are two approaches to answer this question. First is the intuitive/practical approach: "hard means that it is almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important." The second approach is theoretical and is based on the
computational complexity theory In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved ...
. If problem A is hard, there exists a formal security reduction from a problem which is widely considered unsolvable in
polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
, such as
integer factorization In number theory, integer factorization is the decomposition of a composite number into a product of smaller integers. If these factors are further restricted to prime numbers, the process is called prime factorization. When the numbers are s ...
problem or
discrete logarithm In mathematics, for given real numbers ''a'' and ''b'', the logarithm log''b'' ''a'' is a number ''x'' such that . Analogously, in any group ''G'', powers ''b'k'' can be defined for all integers ''k'', and the discrete logarithm log''b ...
problem. However, non-existence of a polynomial time algorithm does not automatically ensure that the system is secure. The difficulty of a problem also depends on its size. For example,
RSA public key cryptography RSA (Rivest–Shamir–Adleman) is a public-key cryptosystem that is widely used for secure data transmission. It is also one of the oldest. The acronym "RSA" comes from the surnames of Ron Rivest, Adi Shamir and Leonard Adleman, who publicly ...
relies on the difficulty of
integer factorization In number theory, integer factorization is the decomposition of a composite number into a product of smaller integers. If these factors are further restricted to prime numbers, the process is called prime factorization. When the numbers are s ...
. However, it is considered secure only with keys that are at least 2048 bits large.


Password case

Also, if the set of inputs to the hash is relatively small or is ordered by likelihood in some way, then a brute force search may be practical, regardless of theoretical security. Likelihood of recovering the preimage depends on the input set size and the speed or cost of computing the hash function. A common example is the use of hashes to store
password A password, sometimes called a passcode (for example in Apple devices), is secret data, typically a string of characters, usually used to confirm a user's identity. Traditionally, passwords were expected to be memorized, but the large number of ...
validation data. Rather than store the plaintext of user passwords, an access control system typically stores a hash of the password. When a person requests access, the password they submit is hashed and compared with the stored value. If the stored validation data is stolen, the thief will only have the hash values, not the passwords. However most users choose passwords in predictable ways and often passwords are short enough so that all possible combinations can be tested if fast hashes are used. Special hashes called
key derivation function In cryptography, a key derivation function (KDF) is a cryptographic algorithm that derives one or more secret keys from a secret value such as a master key, a password, or a passphrase using a pseudorandom function (which typically uses a cry ...
s have been created to slow searches. ''See''
Password cracking In cryptanalysis and computer security, password cracking is the process of recovering passwords from data that has been stored in or transmitted by a computer system in scrambled form. A common approach ( brute-force attack) is to repeatedly t ...
.


Cryptographic hash functions

Most hash functions are built on an ad hoc basis, where the bits of the message are nicely mixed to produce the hash. Various
bitwise operation In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic oper ...
s (e.g. rotations), modular additions and compression functions are used in iterative mode to ensure high complexity and pseudo-randomness of the output. In this way, the security is very hard to prove and the proof is usually not done. Only a few years ago, one of the most popular hash functions,
SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160- bit (20- byte) hash value known as a message digest – typically rendered as 40 hexa ...
, was shown to be less secure than its length suggested: collisions could be found in only 251 tests, rather than the brute-force number of 280. In other words, most of the hash functions in use nowadays are not provably collision-resistant. These hashes are not based on purely mathematical functions. This approach results generally in more effective hashing functions, but with the risk that a weakness of such a function will be eventually used to find collisions. The famous case is MD5. Meaning of "hard to find collision" in these cases means "almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important." The meaning of the term is therefore somewhat dependent on the application, since the effort that a malicious agent may put into the task is usually proportional to his expected gain.


Provably secure hash functions

In this approach, we base the security of hash function on some hard mathematical problem and we prove that finding collisions of the hash functions is as hard as breaking the underlying problem. This gives a somewhat stronger notion of security than just relying on complex mixing of bits as in the classical approach. A cryptographic hash function has provable security against collision attacks if finding collisions is provably polynomial-time reducible from problem P which is supposed to be unsolvable in polynomial time. The function is then called provably secure, or just provable. It means that if finding collisions would be feasible in polynomial time by algorithm A, we could find and use polynomial time algorithm R (reduction algorithm) that would use algorithm A to solve problem P, which is widely supposed to be unsolvable in polynomial time. That is a contradiction. This means that finding collisions cannot be easier than solving P. However, this only indicates that finding collisions is difficult in 'some' cases, as not all instances of a computationally hard problem are typically hard. Indeed, very large instances of NP-hard problems are routinely solved, while only the hardest are practically impossible to solve. Hash functions with the proof of their security are based on mathematical functions.


Hard problems

Examples of problems, that are assumed to be not solvable in polynomial time *
Discrete Logarithm Problem In mathematics, for given real numbers ''a'' and ''b'', the logarithm log''b'' ''a'' is a number ''x'' such that . Analogously, in any group ''G'', powers ''b'k'' can be defined for all integers ''k'', and the discrete logarithm log''b' ...
* Finding Modular Square Roots * Integer Factorization Problem *
Subset Sum Problem The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset S of integers and a target-sum T, and the question is to decide whether any subset of the integers sum to precisely T''.'' T ...


Downsides of provable approach

* Current collision-resistant hash algorithms that have provable security
reductions Reductions ( es, reducciones, also called ; , pl. ) were settlements created by Spanish rulers and Roman Catholic missionaries in Spanish America and the Spanish East Indies (the Philippines). In Portuguese-speaking Latin America, such r ...
are too inefficient to be used in practice. In comparison to classical hash functions, they tend to be relatively slow and do not always meet all of criteria traditionally expected of cryptographic hashes. Very smooth hash is an example. * Constructing a hash function with provable security is much more difficult than using a classical approach where we just hope that the complex mixing of bits in the hashing algorithm is strong enough to prevent adversary from finding collisions. * The proof is often a reduction to a problem with asymptotically hard
worst-case In computer science, best, worst, and average cases of a given algorithm express what the resource usage is ''at least'', ''at most'' and ''on average'', respectively. Usually the resource being considered is running time, i.e. time complexity, b ...
or
average-case complexity In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case com ...
. Worst-case measures the difficulty of solving pathological cases rather than typical cases of the underlying problem. Even a reduction to a problem with hard average complexity offers only limited security as there still can be an algorithm that easily solves the problem for a subset of the problem space. For example, early versions of
Fast Syndrome Based Hash In cryptography, the fast syndrome-based hash functions (FSB) are a family of cryptographic hash functions introduced in 2003 by Daniel Augot, Matthieu Finiasz, and Nicolas Sendrier. Unlike most other cryptographic hash functions in use today, ...
turned out to be insecure. This problem was solved in the latest version.
SWIFFT In cryptography, SWIFFT is a collection of provably secure hash functions. It is based on the concept of the fast Fourier transform (FFT). SWIFFT is not the first hash function based on FFT, but it sets itself apart by providing a mathematical p ...
is an example of a hash function that circumvents these security problems. It can be shown that for any algorithm that can break SWIFFT with probability P within an estimated time T, we can find an algorithm that solves the ''worst-case'' scenario of a certain difficult mathematical problem within time T' depending on T and P.


Example of (impractical) Provably Secure Hash Function

Hash(''m'') = ''x''''m'' mod ''n'' where ''n'' is hard to factor composite number, and ''x'' is some prespecified base value. A collision ''x''''m1'' congruent to ''x''''m2'' reveals a multiple ''m1 - m2'' of the order of ''x''. Such information can be used to factor ''n'' in polynomial time assuming certain properties of ''x''. But the algorithm is quite inefficient because it requires on average 1.5 multiplications modulo ''n'' per message-bit.


More practical provably secure hash functions

* VSH - Very Smooth Hash function - a provably secure collision-resistant hash function assuming the hardness of finding nontrivial modular square roots modulo composite number n (this is proven to be as hard as factoring n). * MuHASH * ECOH - Elliptic Curve Only hash function - based on the concept of Elliptic curves, Subset Sum Problem and summation of polynomials. The security proof of the collision resistance was based on weakened assumptions and eventually a second pre-image attack was found. * FSB - Fast Syndrome-Based hash function - it can be proven that breaking FSB is at least as difficult as solving a certain NP-complete problem known as Regular Syndrome Decoding. *
SWIFFT In cryptography, SWIFFT is a collection of provably secure hash functions. It is based on the concept of the fast Fourier transform (FFT). SWIFFT is not the first hash function based on FFT, but it sets itself apart by providing a mathematical p ...
- SWIFFT is based on the
Fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in th ...
and is provably collision resistant, under a relatively mild assumption about the worst-case difficulty of finding short vectors in cyclic/ ideal lattices. * Chaum, van Heijst, Pfitzmann hash function - A compression function where finding collisions is as hard as solving the
discrete logarithm problem In mathematics, for given real numbers ''a'' and ''b'', the logarithm log''b'' ''a'' is a number ''x'' such that . Analogously, in any group ''G'', powers ''b'k'' can be defined for all integers ''k'', and the discrete logarithm log''b'' ...
in a finite group F_. * Knapsack-based hash functions - A family of hash functions based on the
Knapsack problem The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit an ...
. * The Zémor-Tillich hash function - A family of hash functions that rely on the arithmetic of the group of matrices SL2. Finding collisions is at least as difficult as finding factorization of certain elements in this group. This is supposed to be hard, at least
PSPACE-complete In computational complexity theory, a decision problem is PSPACE-complete if it can be solved using an amount of memory that is polynomial in the input length (polynomial space) and if every other problem that can be solved in polynomial space can b ...
. For this hash, an attack was eventually discovered with a time complexity close to 2^. This beat by far the birthday bound and ideal pre-image complexities which are 2^ and 2^ for the Zémor-Tillich hash function. As the attacks include a birthday search in a reduced set of size 2n they indeed do not destroy the idea of provable security or invalidate the scheme but rather suggest that the initial parameters were too small. * Hash functions from Sigma Protocols - there exists a general way of constructing a provably secure hash, specifically from any (suitable) sigma protocol. A faster version of VSH (called VSH*) could be obtained in this way.


References

{{Cryptography navbox , hash Cryptographic hash functions