Spurious Key
   HOME

TheInfoList



OR:

In
cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
, unicity distance is the length of an original
ciphertext In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext ...
needed to break the cipher by reducing the number of possible spurious keys to zero in a
brute force attack In cryptography, a brute-force attack consists of an attacker submitting many passwords or passphrases with the hope of eventually guessing correctly. The attacker systematically checks all possible passwords and passphrases until the correct ...
. That is, after trying every possible
key Key or The Key may refer to: Common meanings * Key (cryptography), a piece of information that controls the operation of a cryptography algorithm * Key (lock), device used to control access to places or facilities restricted by a lock * Key (map ...
, there should be just one decipherment that makes sense, i.e. expected amount of ciphertext needed to determine the key completely, assuming the underlying message has redundancy.
Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American people, American mathematician, electrical engineering, electrical engineer, and cryptography, cryptographer known as a "father of information theory". As a 21-year-o ...
defined the unicity distance in his 1949 paper "
Communication Theory of Secrecy Systems "Communication Theory of Secrecy Systems" is a paper published in 1949 by Claude Shannon discussing cryptography from the viewpoint of information theory. It is one of the foundational treatments (arguably ''the'' foundational treatment) of modern c ...
". Consider an attack on the ciphertext string "WNAIW" encrypted using a
Vigenère cipher The Vigenère cipher () is a method of encryption, encrypting alphabetic text by using a series of interwoven Caesar ciphers, based on the letters of a keyword. It employs a form of polyalphabetic cipher, polyalphabetic substitution. First desc ...
with a five letter key. Conceivably, this string could be deciphered into any other string—RIVER and WATER are both possibilities for certain keys. This is a general rule of
cryptanalysis Cryptanalysis (from the Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic sec ...
: with no additional information it is impossible to decode this message. Of course, even in this case, only a certain number of five letter keys will result in English words. Trying all possible keys we will not only get RIVER and WATER, but SXOOS and KHDOP as well. The number of "working" keys will likely be very much smaller than the set of all possible keys. The problem is knowing which of these "working" keys is the right one; the rest are spurious.


Relation with key size and possible plaintexts

In general, given particular assumptions about the size of the key and the number of possible messages, there is an average ciphertext length where there is only one key (on average) that will generate a readable message. In the example above we see only
upper case Letter case is the distinction between the letters that are in larger uppercase or capitals (or more formally ''majuscule'') and smaller lowercase (or more formally ''minuscule'') in the written representation of certain languages. The writing ...
English characters, so if we assume that the
plaintext In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. Overview With the advent of comp ...
has this form, then there are 26 possible letters for each position in the string. Likewise if we assume five-character upper case keys, there are K=265 possible keys, of which the majority will not "work". A tremendous number of possible messages, N, can be generated using even this limited set of characters: N = 26L, where L is the length of the message. However, only a smaller set of them is readable
plaintext In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. Overview With the advent of comp ...
due to the rules of the language, perhaps M of them, where M is likely to be very much smaller than N. Moreover, M has a one-to-one relationship with the number of keys that work, so given K possible keys, only K × (M/N) of them will "work". One of these is the correct key, the rest are spurious. Since M/N gets arbitrarily small as the length L of the message increases, there is eventually some L that is large enough to make the number of spurious keys equal to zero. Roughly speaking, this is the L that makes KM/N=1. This L is the unicity distance.


Relation with key entropy and plaintext redundancy

The unicity distance can equivalently be defined as the minimum amount of ciphertext required to permit a computationally unlimited adversary to recover the unique encryption key. The expected unicity distance can then be shown to be: : U = H(k) / D where ''U'' is the unicity distance, ''H''(''k'') is the entropy of the key space (e.g. 128 for 2128 equiprobable keys, rather less if the key is a memorized pass-phrase). ''D'' is defined as the plaintext redundancy in bits per character. Now an alphabet of 32 characters can carry 5 bits of information per character (as 32 = 25). In general the number of bits of information per character is , where ''N'' is the number of characters in the alphabet and is the
binary logarithm In mathematics, the binary logarithm () is the power to which the number must be raised to obtain the value . That is, for any real number , :x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n. For example, the binary logarithm of is , the b ...
. So for English each character can convey bits of information. However the average amount of actual information carried per character in meaningful English text is only about 1.5 bits per character. So the plain text redundancy is ''D'' = 4.7 − 1.5 = 3.2. Basically the bigger the unicity distance the better. For a one time pad of unlimited size, given the unbounded entropy of the key space, we have U = \infty, which is consistent with the
one-time pad In cryptography, the one-time pad (OTP) is an encryption technique that cannot be cracked, but requires the use of a single-use pre-shared key that is not smaller than the message being sent. In this technique, a plaintext is paired with a ran ...
being unbreakable.


Unicity distance of substitution cipher

For a simple
substitution cipher In cryptography, a substitution cipher is a method of encrypting in which units of plaintext are replaced with the ciphertext, in a defined manner, with the help of a key; the "units" may be single letters (the most common), pairs of letters, trip ...
, the number of possible keys is , the number of ways in which the alphabet can be permuted. Assuming all keys are equally likely, bits. For English text , thus . So given 28 characters of ciphertext it should be theoretically possible to work out an English plaintext and hence the key.


Practical application

Unicity distance is a useful theoretical measure, but it doesn't say much about the security of a block cipher when attacked by an adversary with real-world (limited) resources. Consider a block cipher with a unicity distance of three ciphertext blocks. Although there is clearly enough information for a computationally unbounded adversary to find the right key (simple exhaustive search), this may be computationally infeasible in practice. The unicity distance can be increased by reducing the plaintext redundancy. One way to do this is to deploy
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
techniques prior to encryption, for example by removing redundant vowels while retaining readability. This is a good idea anyway, as it reduces the amount of data to be encrypted. Ciphertexts greater than the unicity distance can be assumed to have only one meaningful decryption. Ciphertexts shorter than the unicity distance may have multiple plausible decryptions. Unicity distance is not a measure of how much ciphertext is required for cryptanalysis, but how much ciphertext is required for there to be only one reasonable solution for cryptanalysis.


References

{{Reflist


External links

*
Bruce Schneier Bruce Schneier (; born January 15, 1963) is an American cryptographer, computer security professional, privacy specialist, and writer. Schneier is a Lecturer in Public Policy at the Harvard Kennedy School and a Fellow at the Berkman Klein Cente ...

How to Recognize Plaintext
(Crypto-Gram Newsletter December 15, 1998)
Unicity Distance computed for common ciphers
Cryptographic attacks Information theory