In
communications and
information processing
Information processing is the change (processing) of information in any manner detectable by an observer. As such, it is a process that ''describes'' everything that happens (changes) in the universe, from the falling of a rock (a change in posit ...
, code is a system of rules to convert
information—such as a
letter,
word, sound, image, or
gesture
A gesture is a form of non-verbal communication or non-vocal communication in which visible bodily actions communicate particular messages, either in place of, or in conjunction with, speech. Gestures include movement of the hands, face, or ot ...
—into another form, sometimes
shortened or
secret, for communication through a
communication channel or storage in a
storage medium
Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are conside ...
. An early example is an invention of
language, which enabled a person, through
speech
Speech is a human vocal communication using language. Each language uses Phonetics, phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if ...
, to communicate what they thought, saw, heard, or felt to others. But speech limits the range of communication to the distance a voice can carry and limits the audience to those present when the speech is uttered. The invention of
writing, which converted spoken language into
visual symbol
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
s, extended the range of communication across space and
time.
The process of encoding converts information from a
source into symbols for communication or storage. Decoding is the reverse process, converting code symbols back into a form that the recipient understands, such as English or/and Spanish.
One reason for coding is to enable communication in places where ordinary
plain language, spoken or written, is difficult or impossible. For example,
semaphore
Semaphore (; ) is the use of an apparatus to create a visual signal transmitted over distance. A semaphore can be performed with devices including: fire, lights, flags, sunlight, and moving arms. Semaphores can be used for telegraphy when arra ...
, where the configuration of
flags held by a signaler or the arms of a
semaphore tower encodes parts of the message, typically individual letters, and numbers. Another person standing a great distance away can interpret the flags and reproduce the words sent.
Theory
In
information theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
and
computer science, a code is usually considered as an
algorithm that uniquely represents
symbols
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
from some source
alphabet, by ''encoded'' strings, which may be in some other target alphabet. An extension of the code for representing sequences of symbols over the source alphabet is obtained by concatenating the encoded strings.
Before giving a mathematically precise definition, this is a brief example. The mapping
:
is a code, whose source alphabet is the set
and whose target alphabet is the set
. Using the extension of the code, the encoded string 0011001 can be grouped into codewords as 0 011 0 01, and these in turn can be decoded to the sequence of source symbols ''acab''.
Using terms from
formal language theory, the precise mathematical definition of this concept is as follows: let S and T be two finite sets, called the source and target
alphabets, respectively. A code
is a
total function mapping each symbol from S to a
sequence of symbols over T. The extension
of
, is a
homomorphism of
into
, which naturally maps each sequence of source symbols to a sequence of target symbols.
Variable-length codes
In this section, we consider codes that encode each source (clear text) character by a
code word from some dictionary, and
concatenation of such code words give us an encoded string. Variable-length codes are especially useful when clear text characters have different probabilities; see also
entropy encoding.
A ''prefix code'' is a code with the "prefix property": there is no valid code word in the system that is a
prefix
A prefix is an affix which is placed before the Word stem, stem of a word. Adding it to the beginning of one word changes it into another word. For example, when the prefix ''un-'' is added to the word ''happy'', it creates the word ''unhappy'' ...
(start) of any other valid code word in the set.
Huffman coding is the most known algorithm for deriving prefix codes. Prefix codes are widely referred to as "Huffman codes" even when the code was not produced by a Huffman algorithm. Other examples of prefix codes are
country calling codes, the country and publisher parts of
ISBNs, and the Secondary Synchronization Codes used in the
UMTS WCDMA 3G Wireless Standard.
Kraft's inequality characterizes the sets of codeword lengths that are possible in a prefix code. Virtually any uniquely decodable one-to-many code, not necessarily a prefix one, must satisfy Kraft's inequality.
Error-correcting codes
Codes may also be used to represent data in a way more resistant to errors in transmission or storage. This so-called
error-correcting code
In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea is ...
works by including carefully crafted redundancy with the stored (or transmitted) data. Examples include
Hamming code
In computer science and telecommunication, Hamming codes are a family of linear error-correcting codes. Hamming codes can detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors. By contrast, the sim ...
s,
Reed–Solomon,
Reed–Muller,
Walsh–Hadamard,
Bose–Chaudhuri–Hochquenghem,
Turbo,
Golay,
Goppa,
low-density parity-check code
In information theory, a low-density parity-check (LDPC) code is a linear error correcting code, a method of transmitting a message over a noisy transmission channel. An LDPC code is constructed using a sparse Tanner graph (subclass of the bipa ...
s, and
space–time code
A space–time code (STC) is a method employed to improve the reliability of data transmission in wireless communication systems using multiple transmit antennas. STCs rely on transmitting multiple, redundant copies of a data stream to the ...
s.
Error detecting codes can be optimised to detect ''burst errors'', or ''random errors''.
Examples
Codes in communication used for brevity
A cable code replaces words (e.g. ''ship'' or ''invoice'') with shorter words, allowing the same information to be sent with fewer
characters, more quickly, and less expensively.
Codes can be used for brevity. When
telegraph messages were the state of the art in rapid long-distance communication, elaborate systems of
commercial codes that encoded complete phrases into single mouths (commonly five-minute groups) were developed, so that telegraphers became conversant with such "words" as ''BYOXO'' ("Are you trying to weasel out of our deal?"), ''LIOUY'' ("Why do you not answer my question?"), ''BMULD'' ("You're a skunk!"), or ''AYYLU'' ("Not clearly coded, repeat more clearly.").
Code words were chosen for various reasons:
length
Length is a measure of distance. In the International System of Quantities, length is a quantity with dimension distance. In most systems of measurement a base unit for length is chosen, from which all other units are derived. In the Interna ...
,
pronounceability
Pronunciation is the way in which a word or a language is spoken. This may refer to generally agreed-upon sequences of sounds used in speaking a given word or language in a specific dialect ("correct pronunciation") or simply the way a particular ...
, etc. Meanings were chosen to fit perceived needs: commercial negotiations, military terms for military codes, diplomatic terms for diplomatic codes, any and all of the preceding for espionage codes. Codebooks and codebook publishers proliferated, including one run as a front for the American
Black Chamber run by
Herbert Yardley
Herbert Osborn Yardley (April 13, 1889 – August 7, 1958) was an American cryptologist. He founded and led the cryptographic organization the Black Chamber. Under Yardley, the cryptanalysts of The American Black Chamber broke Japanese diplomatic ...
between the First and Second World Wars. The purpose of most of these codes was to save on cable costs. The use of data coding for
data compression predates the computer era; an early example is the telegraph
Morse code
Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code is named after Samuel Morse, one of ...
where more-frequently used characters have shorter representations. Techniques such as
Huffman coding are now used by computer-based
algorithms to compress large data files into a more compact form for storage or transmission.
Character encodings
Character encodings are representations of textual data. A given character encoding may be associated with a specific character set (the collection of characters which it can represent), though some character sets have multiple character encodings and vice versa. Character encodings may be broadly grouped according to the number of bytes required to represent a single character: there are single-byte encodings,
multibyte (also called wide) encodings, and
variable-width (also called variable-length) encodings. The earliest character encodings were single-byte, the best-known example of which is
ASCII. ASCII remains in use today, for example in
HTTP headers. However, single-byte encodings cannot model character sets with more than 256 characters. Scripts that require large character sets such as
Chinese, Japanese and Korean must be represented with multibyte encodings. Early multibyte encodings were fixed-length, meaning that although each character was represented by more than one byte, all characters used the same number of bytes ("word length"), making them suitable for decoding with a lookup table. The final group, variable-width encodings, is a subset of multibyte encodings. These use more complex encoding and decoding logic to efficiently represent large character sets while keeping the representations of more commonly used characters shorter or maintaining backward compatibility properties. This group includes
UTF-8, an encoding of the
Unicode character set; UTF-8 is the most common encoding of text media on the Internet.
Genetic code
Biological
Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary in ...
organisms contain genetic material that is used to control their function and development. This is
DNA, which contains units named
genes from which
messenger RNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the p ...
is derived. This in turn produces
proteins through a
genetic code in which a series of triplets (
codon
The genetic code is the set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links ...
s) of four possible
nucleotides can be translated into one of twenty possible
amino acids. A sequence of codons results in a corresponding sequence of amino acids that form a protein molecule; a type of codon called a
stop codon signals the end of the sequence.
Gödel code
In
mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
, a
Gödel code was the basis for the proof of
Gödel's
incompleteness theorem. Here, the idea was to map
mathematical notation to a
natural number (using a
Gödel numbering
In mathematical logic, a Gödel numbering is a function that assigns to each symbol and well-formed formula of some formal language a unique natural number, called its Gödel number. The concept was developed by Kurt Gödel for the proof of ...
).
Other
There are codes using colors, like
traffic lights, the
color code employed to mark the nominal value of the
electrical resistor
A resistor is a passive two-terminal electrical component that implements electrical resistance as a circuit element. In electronic circuits, resistors are used to reduce current flow, adjust signal levels, to divide voltages, bias active el ...
s or that of the trashcans devoted to specific types of garbage (paper, glass, organic, etc.).
In
marketing,
coupon codes can be used for a financial discount or rebate when purchasing a product from a (usual internet) retailer.
In military environments, specific sounds with the
cornet
The cornet (, ) is a brass instrument similar to the trumpet but distinguished from it by its conical bore, more compact shape, and mellower tone quality. The most common cornet is a transposing instrument in B, though there is also a sopr ...
are used for different uses: to mark some moments of the day, to command the infantry on the battlefield, etc.
Communication systems for sensory impairments, such as
sign language for deaf people and
braille for blind people, are based on movement or tactile codes.
Musical scores are the most common way to encode
music.
Specific games have their own code systems to record the matches, e.g.
chess notation
Chess notation systems are used to record either the moves made or the position of the pieces in a game of chess. Chess notation is used in chess literature, and by players keeping a record of an ongoing game. The earliest systems of notation used ...
.
Cryptography
In the
history of cryptography,
codes were once common for ensuring the confidentiality of communications, although
cipher
In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is ''encipherment''. To encipher or encode i ...
s are now used instead.
Secret codes intended to obscure the real messages, ranging from serious (mainly
espionage in military, diplomacy, business, etc.) to trivial (romance, games) can be any kind of imaginative encoding:
flowers, game cards, clothes, fans, hats, melodies, birds, etc., in which the sole requirement is the pre-agreement on the meaning by both the sender and the receiver.
Other examples
Other examples of encoding include:
*Encoding (in
cognition
Cognition refers to "the mental action or process of acquiring knowledge and understanding through thought, experience, and the senses". It encompasses all aspects of intellectual functions and processes such as: perception, attention, thought, ...
) - a basic perceptual process of interpreting incoming stimuli; technically speaking, it is a complex, multi-stage process of converting relatively objective sensory input (e.g., light, sound) into a subjectively meaningful experience.
*A
content format - a specific encoding format for converting a specific type of
data to
information.
*Text encoding uses a
markup language
Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
to tag the structure and other features of a text to facilitate processing by computers. (See also
Text Encoding Initiative.)
*
Semantics encoding {{unreferenced, date=March 2017
A semantics encoding is a translation between formal languages. For programmers, the most familiar form of encoding is the compilation of a programming language into machine code or byte-code. Conversion between docu ...
of formal language A informal language B is a method of representing all terms (e.g. programs or descriptions) of language A using language B.
*
Data compression transforms a signal into a code optimized for
transmission
Transmission may refer to:
Medicine, science and technology
* Power transmission
** Electric power transmission
** Propulsion transmission, technology allowing controlled application of power
*** Automatic transmission
*** Manual transmission
*** ...
or
storage
Storage may refer to:
Goods Containers
* Dry cask storage, for storing high-level radioactive waste
* Food storage
* Intermodal container, cargo shipping
* Storage tank
Facilities
* Garage (residential), a storage space normally used to store car ...
, generally done with a
codec.
*
Neural encoding
Neural coding (or Neural representation) is a neuroscience field concerned with characterising the hypothetical relationship between the Stimulus (physiology), stimulus and the individual or Neuronal ensemble, ensemble neuronal responses and the re ...
- the way in which information is represented in
neurons.
*
Memory encoding - the process of converting sensations into memories.
*
Television encoding:
NTSC,
PAL and
SECAM
Other examples of decoding include:
*
Decoding (computer science)
*
Decoding methods, methods in communication theory for decoding codewords sent over a noisy channel
*
Digital signal processing
Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are ...
, the study of signals in a digital representation and the processing methods of these signals
*
Digital-to-analog converter, the use of analog circuit for decoding operations
* Word decoding, the use of
phonics to decipher print patterns and translate them into the sounds of language
Codes and acronyms
Acronyms and abbreviations can be considered codes, and in a sense, all
languages and
writing systems are codes for human thought.
International Air Transport Association airport codes are three-letter codes used to designate airports and used for
bag tag
Bag tags, also known as baggage tags, baggage checks or luggage tickets, have traditionally been used by bus, train, and airline carriers to route checked luggage to its final destination. The passenger stub is typically handed to the passenger ...
s.
Station codes are similarly used on railways but are usually national, so the same code can be used for different stations if they are in different countries.
Occasionally, a code word achieves an independent existence (and meaning) while the original equivalent phrase is forgotten or at least no longer has the precise meaning attributed to the code word. For example, '30' was widely used in
journalism to mean "end of story", and has been used in
other contexts to signify "the end".
See also
*
Asemic writing Asemic may refer to:
* Asemia
Asemia is the term for the medical condition of being unable to understand or express any signs or symbols.
It is a more severe condition than aphasia, which is the inability to understand linguistic signs. Asemia i ...
*
Cipher
In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is ''encipherment''. To encipher or encode i ...
*
Code (semiotics)
*
Equipment codes An equipment code describes the communication (COM), navigation (NAV), approach aids and surveillance transponder equipment on board an aircraft. These alphabetic codes are used on FAA and ICAO flight plan forms to aid Flight service station (FSS) p ...
*
Quantum error correction
Quantum error correction (QEC) is used in quantum computing to protect quantum information from errors due to decoherence and other quantum noise. Quantum error correction is theorised as essential to achieve fault tolerant quantum computing that ...
*
Semiotics
*
Universal language
References
*
Further reading
* {{cite book , date=1963 , title=Codes and Abbreviations for the Use of the International Telecommunication Services , edition=2nd , location=Geneva, Switzerland , publisher=International Telecommunication Union , oclc=13677884
Signal processing