cryptography Cryptography, or cryptology (from "hidden, secret"; and ''graphein'', "to write", or ''-logy, -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of Adversary (cryptography), ...

, unicity distance is the length of an original

ciphertext In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext ...

needed to break the cipher by reducing the number of possible spurious keys to zero in a

brute force attack In cryptography, a brute-force attack or exhaustive key search is a cryptanalytic attack that consists of an attacker submitting many possible Key (cryptography), keys or passwords with the hope of eventually guessing correctly. This strategy can ...

. That is, after trying every possible key, there should be just one decipherment that makes sense, i.e. expected amount of ciphertext needed to determine the key completely, assuming the underlying message has redundancy.

Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, computer scientist, cryptographer and inventor known as the "father of information theory" and the man who laid the foundations of th ...

defined the unicity distance in his 1949 paper "

Communication Theory of Secrecy Systems "Communication Theory of Secrecy Systems" is a paper published in 1949 by Claude Shannon discussing cryptography from the viewpoint of information theory Information theory is the mathematical study of the quantification (science), quantifica ...

". Consider an attack on the ciphertext string "WNAIW" encrypted using a

Vigenère cipher The Vigenère cipher () is a method of encryption, encrypting alphabetic text where each letter of the plaintext is encoded with a different Caesar cipher, whose increment is determined by the corresponding letter of another text, the key (crypt ...

with a five letter key. Conceivably, this string could be deciphered into any other string—RIVER and WATER are both possibilities for certain keys. This is a general rule of

cryptanalysis Cryptanalysis (from the Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic se ...

: with no additional information it is impossible to decode this message. Of course, even in this case, only a certain number of five letter keys will result in English words. Trying all possible keys we will not only get RIVER and WATER, but SXOOS and KHDOP as well. The number of "working" keys will likely be very much smaller than the set of all possible keys. The problem is knowing which of these "working" keys is the right one; the rest are spurious.

Relation with key size and possible plaintexts

In general, given particular assumptions about the size of the key and the number of possible messages, there is an average ciphertext length where there is only one key (on average) that will generate a readable message. In the example above we see only

upper case Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally ''#Majuscule, majuscule'') and smaller lowercase (more formally ''#Minuscule, minuscule'') in the written representation of certain langua ...

English characters, so if we assume that the

plaintext In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. Overview With the advent of comp ...

has this form, then there are 26 possible letters for each position in the string. Likewise if we assume five-character upper case keys, there are K=26⁵ possible keys, of which the majority will not "work". A tremendous number of possible messages, N, can be generated using even this limited set of characters: N = 26^L, where L is the length of the message. However, only a smaller set of them is readable

due to the rules of the language, perhaps M of them, where M is likely to be very much smaller than N. Moreover, M has a one-to-one relationship with the number of keys that work, so given K possible keys, only K × (M/N) of them will "work". One of these is the correct key, the rest are spurious. Since M/N gets arbitrarily small as the length L of the message increases, there is eventually some L that is large enough to make the number of spurious keys equal to zero. Roughly speaking, this is the L that makes KM/N=1. This L is the unicity distance.

Relation with key entropy and plaintext redundancy

The unicity distance can equivalently be defined as the minimum amount of ciphertext required to permit a computationally unlimited adversary to recover the unique encryption key. The expected unicity distance can then be shown to be: :

U = H(k) / D

where ''U'' is the unicity distance, ''H''(''k'') is the entropy of the key space (e.g. 128 for 2¹²⁸ equiprobable keys, rather less if the key is a memorized pass-phrase). ''D'' is defined as the plaintext redundancy in bits per character. Now an alphabet of 32 characters can carry 5 bits of information per character (as 32 = 2⁵). In general the number of bits of information per character is , where ''N'' is the number of characters in the alphabet and is the

binary logarithm In mathematics, the binary logarithm () is the exponentiation, power to which the number must be exponentiation, raised to obtain the value . That is, for any real number , :x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n. For example, th ...

. So for English each character can convey bits of information. However the average amount of actual information carried per character in meaningful English text is only about 1.5 bits per character. So the plain text redundancy is ''D'' = 4.7 − 1.5 = 3.2. Basically the bigger the unicity distance the better. For a one time pad of unlimited size, given the unbounded entropy of the key space, we have

U = \infty

, which is consistent with the

one-time pad The one-time pad (OTP) is an encryption technique that cannot be Cryptanalysis, cracked in cryptography. It requires the use of a single-use pre-shared key that is larger than or equal to the size of the message being sent. In this technique, ...

being unbreakable.

Unicity distance of substitution cipher

For a simple

substitution cipher In cryptography, a substitution cipher is a method of encrypting in which units of plaintext are replaced with the ciphertext, in a defined manner, with the help of a key; the "units" may be single letters (the most common), pairs of letters, t ...

, the number of possible keys is , the number of ways in which the alphabet can be permuted. Assuming all keys are equally likely, bits. For English text , thus . So given 28 characters of ciphertext it should be theoretically possible to work out an English plaintext and hence the key.

Practical application

Unicity distance is a useful theoretical measure, but it does not say much about the security of a block cipher when attacked by an adversary with real-world (limited) resources. Consider a block cipher with a unicity distance of three ciphertext blocks. Although there is clearly enough information for a computationally unbounded adversary to find the right key (simple exhaustive search), this may be computationally infeasible in practice. The unicity distance can be increased by reducing the plaintext redundancy. One way to do this is to deploy

data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...

techniques prior to encryption, for example by removing redundant vowels while retaining readability. This is a good idea anyway, as it reduces the amount of data to be encrypted. Ciphertexts greater than the unicity distance can be assumed to have only one meaningful decryption. Ciphertexts shorter than the unicity distance may have multiple plausible decryptions. Unicity distance is not a measure of how much ciphertext is required for cryptanalysis, but how much ciphertext is required for there to be only one reasonable solution for cryptanalysis.

References

{{Reflist

External links

Bruce Schneier Bruce Schneier (; born January 15, 1963) is an American cryptographer, computer security professional, privacy specialist, and writer. Schneier is an Adjunct Lecturer in Public Policy at the Harvard Kennedy School and a Fellow at the Berkman ...

How to Recognize Plaintext
(Crypto-Gram Newsletter December 15, 1998)
Unicity Distance computed for common ciphers
Cryptographic attacks Information theory