cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...

, a collision attack on a

cryptographic hash A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output re ...

tries to find two inputs producing the same hash value, i.e. a

hash collision In computer science, a hash collision or hash clash is when two pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of bits. Al ...

. This is in contrast to a

preimage attack In cryptography, a preimage attack on cryptographic hash functions tries to find a message that has a specific hash value. A cryptographic hash function should resist attacks on its preimage (set of possible inputs). In the context of attack, th ...

where a specific target hash value is specified. There are roughly two types of collision attacks: ;Classical collision attack: Find two different messages ''m''₁ and ''m''₂ such that ''hash''(''m''₁) = ''hash''(''m''₂). More generally: ;Chosen-prefix collision attack: Given two different prefixes ''p''₁ and ''p''₂, find two appendages ''m''₁ and ''m''₂ such that ''hash''(''p''₁ ∥ ''m''₁) = ''hash''(''p''₂ ∥ ''m''₂), where ∥ denotes the

concatenation In formal language, formal language theory and computer programming, string concatenation is the operation of joining character string (computer science), character strings wikt:end-to-end, end-to-end. For example, the concatenation of "sno ...

operation.

Classical collision attack

Mathematically stated, a collision attack finds two different messages ''m1'' and ''m2'', such that ''hash(m1)'' = ''hash(m2)''. In a classical collision attack, the attacker has no control over the content of either message, but they are arbitrarily chosen by the algorithm. Much like symmetric-key ciphers are vulnerable to

brute force attack In cryptography, a brute-force attack consists of an attacker submitting many passwords or passphrases with the hope of eventually guessing correctly. The attacker systematically checks all possible passwords and passphrases until the correct ...

s, every

cryptographic hash function A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output re ...

is inherently vulnerable to collisions using a

birthday attack A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likeli ...

. Due to the

birthday problem In probability theory, the birthday problem asks for the probability that, in a set of randomly chosen people, at least two will share a birthday. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 5 ...

, these attacks are much faster than a brute force would be. A hash of ''n'' bits can be broken in 2^''n''/2 time steps (evaluations of the hash function). More efficient attacks are possible by employing

cryptanalysis Cryptanalysis (from the Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic sec ...

to specific hash functions. When a collision attack is discovered and is found to be faster than a birthday attack, a hash function is often denounced as "broken". The

NIST hash function competition The NIST hash function competition was an open competition held by the US National Institute of Standards and Technology (NIST) to develop a new hash function called SHA-3 to complement the older SHA-1 and SHA-2. The competition was formally ann ...

was largely induced by published collision attacks against two very commonly used hash functions, MD5Xiaoyun Wang, Dengguo Feng, Xuejia Lai, Hongbo Yu
Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD
Cryptology ePrint Archive Report 2004/199, 16 Aug 2004, revised 17 Aug 2004. Retrieved July 27, 2008. and

SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecima ...

. The collision attacks against MD5 have improved so much that, as of 2007, it takes just a few seconds on a regular computer. Hash collisions created this way are usually constant length and largely unstructured, so cannot directly be applied to attack widespread document formats or protocols. However, workarounds are possible by abusing dynamic constructs present in many formats. In this way, two documents would be created which are as similar as possible in order to have the same hash value. One document would be shown to an authority to be signed, and then the signature could be copied to the other file. Such a malicious document would contain two different messages in the same document, but conditionally display one or the other through subtle changes to the file: * Some document formats like

PostScript PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, Doug Br ...

, or macros in

Microsoft Word Microsoft Word is a word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other platforms includin ...

, have conditional constructs. (if-then-else) that allow testing whether a location in the file has one value or another in order to control what is displayed. *

TIFF Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word processin ...

files can contain cropped images, with a different part of an image being displayed without affecting the hash value. *

PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...

files are vulnerable to collision attacks by using color value (such that text of one message is displayed with a white color that blends into the background, and text of the other message is displayed with a dark color) which can then be altered to change the signed document's content.

Chosen-prefix collision attack

An extension of the collision attack is the chosen-prefix collision attack, which is specific to Merkle–Damgård hash functions. In this case, the attacker can choose two arbitrarily different documents, and then append different calculated values that result in the whole documents having an equal hash value. This attack is much more powerful than a classical collision attack. Mathematically stated, given two different prefixes ''p''₁, ''p''₂, the attack finds two appendages ''m''₁ and ''m''₂ such that ''hash''(''p''₁ ∥ ''m''₁) = ''hash''(''p''₂ ∥ ''m''₂) (where ∥ is the

operation). In 2007, a chosen-prefix collision attack was found against MD5, requiring roughly 2⁵⁰ evaluations of the MD5 function. The paper also demonstrates two

X.509 In cryptography, X.509 is an International Telecommunication Union (ITU) standard defining the format of public key certificates. X.509 certificates are used in many Internet protocols, including TLS/SSL, which is the basis for HTTPS, the secu ...

certificates for different domain names, with colliding hash values. This means that a

certificate authority In cryptography, a certificate authority or certification authority (CA) is an entity that stores, signs, and issues digital certificates. A digital certificate certifies the ownership of a public key by the named subject of the certificate. This ...

could be asked to sign a certificate for one domain, and then that certificate (specially its signature) could be used to create a new rogue certificate to impersonate another domain. A real-world collision attack was published in December 2008 when a group of security researchers published a forged

signing certificate that could be used to impersonate a

, taking advantage of a prefix collision attack against the MD5 hash function. This meant that an attacker could impersonate any SSL-secured website as a

man-in-the-middle In cryptography and computer security, a man-in-the-middle, monster-in-the-middle, machine-in-the-middle, monkey-in-the-middle, meddler-in-the-middle, manipulator-in-the-middle (MITM), person-in-the-middle (PITM) or adversary-in-the-middle (AiTM) ...

, thereby subverting the certificate validation built in every

web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...

to protect

electronic commerce E-commerce (electronic commerce) is the activity of electronically buying or selling of products on online services or over the Internet. E-commerce draws on technologies such as mobile commerce, electronic funds transfer, supply chain manageme ...

. The rogue certificate may not be revokable by real authorities, and could also have an arbitrary forged expiry time. Even though MD5 was known to be very weak in 2004, certificate authorities were still willing to sign MD5-verified certificates in December 2008, and at least one Microsoft code-signing certificate was still using MD5 in May 2012. The

Flame A flame (from Latin ''flamma'') is the visible, gaseous part of a fire. It is caused by a highly exothermic chemical reaction taking place in a thin zone. When flames are hot enough to have ionized gaseous components of sufficient density they ...

malware successfully used a new variation of a chosen-prefix collision attack to spoof

code signing Code signing is the process of digitally signing executables and scripts to confirm the software author and guarantee that the code has not been altered or corrupted since it was signed. The process employs the use of a cryptographic hash to v ...

of its components by a Microsoft root certificate that still used the compromised MD5 algorithm. In 2019, researchers found a chosen-prefix collision attack against

with computing complexity between 2^66.9 and 2^69.4 and cost less than 100,000 US dollars. In 2020, researchers reduced the complexity of chosen-prefix collision attack against SHA-1 to 2^63.4.

Attack scenarios

Many applications of cryptographic hash functions do not rely on

collision resistance In cryptography, collision resistance is a property of cryptographic hash functions: a hash function ''H'' is collision-resistant if it is hard to find two inputs that hash to the same output; that is, two inputs ''a'' and ''b'' where ''a'' ≠ '' ...

, thus collision attacks do not affect their security. For example,

HMAC In cryptography, an HMAC (sometimes expanded as either keyed-hash message authentication code or hash-based message authentication code) is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret ...

s are not vulnerable. For the attack to be useful, the attacker must be in control of the input to the hash function.

Digital signatures

Because

digital signature A digital signature is a mathematical scheme for verifying the authenticity of digital messages or documents. A valid digital signature, where the prerequisites are satisfied, gives a recipient very high confidence that the message was created b ...

algorithms cannot sign a large amount of data efficiently, most implementations use a hash function to reduce ("compress") the amount of data that needs to be signed down to a constant size. Digital signature schemes often become vulnerable to hash collisions as soon as the underlying hash function is practically broken; techniques like randomized (salted) hashing will buy extra time by requiring the harder

. The usual attack scenario goes like this: # Mallory creates two different documents A and B that have an identical hash value, i.e., a collision. Mallory seeks to deceive Bob into accepting document B, ostensibly from Alice. # Mallory sends document A to Alice, who agrees to what the document says, signs its hash, and sends the signature to Mallory. # Mallory attaches the signature from document A to document B. # Mallory then sends the signature and document B to Bob, claiming that Alice signed B. Because the digital signature matches document B's hash, Bob's software is unable to detect the substitution. In 2008, researchers used a chosen-prefix collision attack against MD5 using this scenario, to produce a rogue

certificate. They created two versions of a TLS

public key certificate In cryptography, a public key certificate, also known as a digital certificate or identity certificate, is an electronic document used to prove the validity of a public key. The certificate includes information about the key, information about the ...

, one of which appeared legitimate and was submitted for signing by the RapidSSL certificate authority. The second version, which had the same MD5 hash, contained flags which signal web browsers to accept it as a legitimate authority for issuing arbitrary other certificates.

Hash flooding

Hash flooding (also known as HashDoS) is a

denial of service In computing, a denial-of-service attack (DoS attack) is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host connect ...

attack that uses hash collisions to exploit the worst-case (linear probe) runtime of

hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...

lookups. It was originally described in 2003. To execute such an attack, the attacker sends the server multiple pieces of data that hash to the same value and then tries to get the server to perform slow lookups. As the main focus of hash functions used in hash tables was speed instead of security, most major programming languages were affected, with new vulnerabilities of this class still showing up a decade after the original presentation. To prevent hash flooding without making the hash function overly complex, newer keyed hash functions are introduced, with the security objective that collisions are hard to find as long as the key is unknown. They may be slower than previous hashes, but are still much easier to compute than cryptographic hashes. As of 2021, Daniel J. Bernstein's

SipHash SipHash is an add–rotate–xor (ARX) based family of pseudorandom functions created by Jean-Philippe Aumasson and Daniel J. Bernstein in 2012, in response to a spate of "hash flooding" denial-of-service attacks (HashDoS) in late 2011. Althou ...

(2012) is the most widely-used hash function in this class. (Non-keyed "simple" hashes remain safe to use as long as the application's hash table is not controllable from the outside.) It is possible to perform an analogous attack to fill up

Bloom filter A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in ...

s using a (partial) preimage attack.

References

External links

"Meaningful Collisions", attack scenarios for exploiting cryptographic hash collisions

Fast MD5 and MD4 Collision Generators
- Bishop Fox (formerly Stach & Liu). Create MD4 and MD5 hash collisions using groundbreaking new code that improves upon the techniques originally developed by Xiaoyun Wang. Using a 1.6 GHz Pentium 4, MD5 collisions can be generated in an average of 45 minutes, and MD4 collisions can be generated in an average of 5 seconds. Originally released on 22Jun2006. {{Cryptography navbox, hash Cryptographic attacks Cryptographic hash functions