PGP word list
   HOME

TheInfoList



OR:

The PGP Word List ("
Pretty Good Privacy Pretty Good Privacy (PGP) is an encryption program that provides cryptographic privacy and authentication for data communication. PGP is used for signing, encrypting, and decrypting texts, e-mails, files, directories, and whole disk partition ...
word list", also called a biometric word list for reasons explained below) is a list of
word A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
s for conveying data
bytes The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
in a clear unambiguous way via a voice channel. They are analogous in purpose to the
NATO phonetic alphabet The (International) Radiotelephony Spelling Alphabet, commonly known as the NATO phonetic alphabet, is the most widely used set of clear code words for communicating the letters of the Roman alphabet, technically a ''radiotelephonic spellin ...
used by pilots, except a longer list of words is used, each word corresponding to one of the 256 distinct numeric byte values.


History and structure

The PGP Word List was designed in 1995 by
Patrick Juola Patrick Juola is an internationally noted expert in text analysis, security, forensics, and stylometry. He is a professor of computer science at Duquesne University. As a faculty member at Duquesne University, he has authored two books and more t ...
, a computational linguist, and
Philip Zimmermann Philip R. Zimmermann (born 1954) is an American computer scientist and cryptographer. He is the creator of Pretty Good Privacy (PGP), the most widely used email encryption software in the world. He is also known for his work in VoIP encryption ...
, creator of PGP. The words were carefully chosen for their
phonetic Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. ...
distinctiveness, using
genetic algorithms In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to gene ...
to select lists of words that had optimum separations in
phoneme In phonology and linguistics, a phoneme () is a unit of sound that can distinguish one word from another in a particular language. For example, in most dialects of English, with the notable exception of the West Midlands and the north-wes ...
space. The candidate word lists were randomly drawn from
Grady Ward William Grady Ward (born April 4, 1951) is an American software engineer, lexicographer, and Internet activist who has been prominent in the Scientology versus the Internet controversy. Biography Grady Ward created the Moby Project, an extensive ...
's Moby Pronunciator list as raw material for the search, successively refined by the genetic algorithms. The automated search converged to an optimized solution in about 40 hours on a
DEC Alpha Alpha (original name Alpha AXP) is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set compute ...
, a particularly fast machine in that era. The Zimmermann–Juola list was originally designed to be used in
PGPfone PGPfone was a secure voice telephony system developed by Philip Zimmermann in 1995. The PGPfone protocol had little in common with Zimmermann's popular PGP email encryption package, except for the use of the name. It used ephemeral Diffie-Hel ...
, a secure VoIP application, to allow the two parties to verbally compare a short authentication string to detect a
man-in-the-middle attack In cryptography and computer security, a man-in-the-middle, monster-in-the-middle, machine-in-the-middle, monkey-in-the-middle, meddler-in-the-middle, manipulator-in-the-middle (MITM), person-in-the-middle (PITM) or adversary-in-the-middle (AiTM) ...
(MiTM). It was called a
biometric Biometrics are body measurements and calculations related to human characteristics. Biometric authentication (or realistic authentication) is used in computer science as a form of identification and access control. It is also used to identify in ...
word list because the authentication depended on the two human users recognizing each other's distinct voices as they read and compared the words over the voice channel, binding the identity of the speaker with the words, which helped protect against the MiTM attack. The list can be used in many other situations where a biometric binding of identity is not needed, so calling it a biometric word list may be imprecise. Later, it was used in PGP to compare and verify PGP
public key Public-key cryptography, or asymmetric cryptography, is the field of cryptographic systems that use pairs of related keys. Each key pair consists of a public key and a corresponding private key. Key pairs are generated with cryptographic al ...
fingerprints A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surf ...
over a voice channel. This is known in PGP applications as the "biometric" representation. When it was applied to PGP, the list of words was further refined, with contributions by
Jon Callas Jon Callas is an American computer security expert, software engineer, user experience designer, and technologist who is the co-founder and former CTO of the global encrypted communications service Silent Circle.http://www.linkedin.com/in/joncal ...
. More recently, it has been used in Zfone and the
ZRTP ZRTP (composed of Z and Real-time Transport Protocol) is a cryptographic key-agreement protocol to negotiate the keys for encryption between two end points in a Voice over IP (VoIP) phone telephony call based on the Real-time Transport Protocol. ...
protocol, the successor to PGPfone. The list is actually composed of two lists, each containing 256
phonetically Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. ...
distinct words, in which each word represents a different byte value between 0 and 255. Two lists are used because reading aloud long random sequences of human words usually risks three kinds of errors: 1) transposition of two consecutive words, 2) duplicate words, or 3) omitted words. To detect all three kinds of errors, the two lists are used alternately for the even-offset bytes and the odd-offset bytes in the byte sequence. Each byte value is actually represented by two different words, depending on whether that byte appears at an even or an odd offset from the beginning of the byte sequence. The two lists are readily distinguished by the number of syllables; the even list has words of two syllables, the odd list has three. The two lists have a maximum word length of 9 and 11 letters, respectively. Using a two-list scheme was suggested by Zhahai Stewart.


Word lists

Here are the two lists of words as presented in the PGPfone Owner's Manual.


Examples

Each byte in a bytestring is encoded as a single word. A sequence of bytes is rendered in
network byte order In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most sig ...
, from left to right. For example, the leftmost (i.e. byte 0) is considered "even" and is encoded using the PGP Even Word table. The next byte to the right (i.e. byte 1) is considered "odd" and is encoded using the PGP Odd Word table. This process repeats until all bytes are encoded. Thus, "E582" produces "topmost Istanbul", whereas "82E5" produces "miser travesty". A PGP public key fingerprint that displayed in hexadecimal as
:E582 94F2 E9A2 2748 6E8B :061B 31CC 528F D7FA 3F19 would display in PGP Words (the "biometric" fingerprint) as
:topmost Istanbul Pluto vagabond treadmill Pacific brackish dictator goldfish Medusa :afflict bravadochatter revolver Dupont midsummer stopwatch whimsical cowbell bottomless The order of bytes in a bytestring depends on
Endianness In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the mos ...
.


Other word lists for data

There are several other word lists for conveying data in a clear unambiguous way via a voice channel: * the
NATO phonetic alphabet The (International) Radiotelephony Spelling Alphabet, commonly known as the NATO phonetic alphabet, is the most widely used set of clear code words for communicating the letters of the Roman alphabet, technically a ''radiotelephonic spellin ...
maps individual letters and digits to individual words * the S/KEY system maps 64 bit numbers to 6 short words of 1 to 4 characters each from a publicly accessible 2048-word dictionary. The same dictionary is used in RFC 1760 and RFC 2289. * the
Diceware Diceware is a method for creating passphrases, passwords, and other cryptographic variables using ordinary dice as a hardware random number generator. For each word in the passphrase, five rolls of a six-sided die are required. The numbers from ...
system maps five base-6 random digits (almost 13 bits of entropy) to a word from a dictionary of 7,776 distinct words. ** the Electronic Frontier Foundation has published a set of improved word lists based on the same concept * FIPS 181: Automated Password Generator converts random numbers into somewhat pronounceable "words". * mnemonic encoding converts 32 bits of data into 3 words from a vocabulary of 1626 words.mnemonic encoding
an
updated code
/ref> *
what3words what3words is a proprietary geocode system designed to identify any location with a resolution of about . It is owned by What3words Limited, based in London, England. The system encodes geographic coordinates into three permanently fixed dictio ...
encodes geographic coordinates in 3 dictionary words. * the BIP39 standard permits enconding a cryptographic key of fixed size (128 or 256 bits, usually the unencrypted master key of a
Cryptocurrency wallet A cryptocurrency wallet is a device, physical medium, program or a service which stores the public and/or private keys for cryptocurrency transactions. In addition to this basic function of storing the keys, a cryptocurrency wallet more often al ...
) into a short sequence of readable words known as the seed phrase, for the purpose of storing the key offline. This is used in cryptocurrencies such as Bitcoin or
Monero Monero (; Abbreviation: XMR) is a decentralized cryptocurrency. It uses a public distributed ledger with privacy-enhancing technologies that obfuscate transactions to achieve anonymity and fungibility. Observers cannot decipher addresses t ...
.


References

:''This article incorporates material that is copyrighted by PGP Corporation and has been licensed under the GNU Free Documentation License. (per Jon Callas, CTO, CSO PGP Corporation, 4-Jan-2007)'' {{reflist Spelling alphabets Military communications Cryptography OpenPGP