Information theory is the scientific study of the quantification, storage, and

^{2} test: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.

communication
Communication (from Latin
Latin (, or , ) is a classical language
A classical language is a language
A language is a structured system of communication
Communication (from Latin ''communicare'', meaning "to share" or "to b ...

of digital
Digital usually refers to something using digits, particularly binary digits.
Technology and computing Hardware
*Digital electronics
Digital electronics is a field of electronics
Electronics comprises the physics, engineering, technology a ...

information
Information is processed, organised and structured data
Data (; ) are individual facts
A fact is something that is truth, true. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to c ...

. The field was fundamentally established by the works of Harry Nyquist
Harry Nyquist (, ; February 7, 1889 – April 4, 1976) was a Swedish physicist and electronic engineer
Printed circuit board
Electronic engineering (also called electronics and communications engineering) is an electrical engineering disciplin ...

and Ralph Hartley
Ralph Vinton Lyon Hartley (November 30, 1888 – May 1, 1970) was an American electronics
Electronics comprises the physics, engineering, technology and applications that deal with the emission, flow and control of electrons in vacuum and matte ...

, in the 1920s, and Claude Shannon
Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician
A mathematician is someone who uses an extensive knowledge of mathematics
Mathematics (from Greek: ) includes the study of such topics as numbe ...

in the 1940s. The field is at the intersection of probability theory
Probability theory is the branch of mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which they are containe ...

, statistics
Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data
Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sens ...

, computer science
Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application.
Computer science is the study of , , and . Computer science ...

, statistical mechanics
In physics
Physics is the that studies , its , its and behavior through , and the related entities of and . "Physical science is that department of knowledge which relates to the order of nature, or, in other words, to the regular ...

, information engineering, and electrical engineering
Electrical engineering is an engineering discipline concerned with the study, design, and application of equipment, devices, and systems which use electricity, electronics
The field of electronics is a branch of physics and electrical enginee ...

.
A key measure in information theory is entropy
Entropy is a scientific concept as well as a measurable physical property that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics ...

. Entropy quantifies the amount of uncertainty involved in the value of a random variable
A random variable is a variable whose values depend on outcomes of a random
In common parlance, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no ...

or the outcome of a random process
In probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

. For example, identifying the outcome of a fair coin flip
Coin flipping, coin tossing, or heads or tails is the practice of throwing a coin
A coin is a small, flat, (usually, depending on the country or value) round piece of metal
A metal (from Ancient Greek, Greek μέταλλον ''métallon'' ...

(with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a die
Die, as a verb, refers to death, the cessation of life.
Die may also refer to:
Games
* Die, singular of dice, small throwable objects used for producing random numbers
Manufacturing
* Die (integrated circuit), a rectangular piece of a semiconduct ...

(with six equally likely outcomes). Some other important measures in information theory are mutual information
In probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expr ...

, channel capacity, error exponent
In information theory
Information theory is the scientific study of the quantification, storage, and communication
Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, meaning among Subject (ph ...

s, and relative entropy
Relative may refer to:
General use
*Kinship
In , kinship is the web of social relationships that form an important part of the lives of all humans in all societies, although its exact meanings even within this discipline are often debated. ...

. Important sub-fields of information theory include source coding
In signal processing
Signal processing is an electrical engineering subfield that focuses on analysing, modifying, and synthesizing signals such as audio signal processing, sound, image processing, images, and scientific measurements. Signal ...

, algorithmic complexity theory
upright=1.4, This image illustrates part of the Mandelbrot set fractal. Simply storing the 24-bit color of each pixel in this image would require 1.61 million bytes, but a small computer program can reproduce these 1.61 million bytes using the def ...

, algorithmic information theory
Algorithmic information theory (AIT) is a branch of theoretical computer science
Theoretical computer science (TCS) is a subset of general computer science
Computer science deals with the theoretical foundations of information, algorith ...

and information-theoretic securityInformation-theoretic security is security of a cryptosystem
In cryptography
Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logy, -logia'', "study", respectively), is the pr ...

.
Applications of fundamental topics of information theory include source coding/data compression
In signal processing
Signal processing is an electrical engineering subfield that focuses on analysing, modifying, and synthesizing signals such as audio signal processing, sound, image processing, images, and scientific measurements. Signal ...

(e.g. for ZIP files), and channel coding/error detection and correction
In information theory
Information theory is the scientific study of the quantification, storage, and communication
Communication (from Latin ''communicare'', meaning "to share" or "to be in relation with") is "an apparent answer to ...

(e.g. for DSL
Digital subscriber line (DSL; originally digital subscriber loop) is a family of technologies that are used to transmit digital data
Digital usually refers to something using digits, particularly binary digits.
Technology and computing Hardwa ...

). Its impact has been crucial to the success of the Voyager
Voyager may refer to:
Computing and communications
* LG Voyager
The LG VX10000, also known as the Verizon Voyager or LG VX10K, is an Internet-enabled multimedia phone designed by LG Electronics and carried by Verizon Wireless, Telus, and Bel ...

missions to deep space, the invention of the compact disc
The compact disc (CD) is a digital
Digital usually refers to something using digits, particularly binary digits.
Technology and computing Hardware
*Digital electronics
Digital electronics is a field of electronics
Electronics compri ...

, the feasibility of mobile phones and the development of the Internet. The theory has also found applications in other areas, including statistical inference
Statistical inference is the process of using data analysis
Data analysis is a process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information ...

, cryptography
Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia
''-logy'' is a suffix in the English language, used with words originally adapted from Ancient Greek ending in (''- ...

, neurobiology
Neuroscience is the scientific study of the nervous system
In Biology, biology, the nervous system is a Complex system, highly complex part of an animal that coordinates its Behavior, actions and Sense, sensory information by transmitting ...

, perception
Perception (from the Latin
Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken in the area around Rome, known as Latium. Through the powe ...

, linguistics, the evolution and function of molecular codes (bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biology, biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformat ...

), thermal physics
Example of a thermal column between the ground and a cumulus
A thermal column (or thermal) is a column of rising air in the lower altitudes of Earth's atmosphere
File:Atmosphere gas proportions.svg, Composition of Earth's atmosphere by vo ...

, molecular dynamics, quantum computing
Quantum computing is a type of computation
Computation is any type of calculation
A calculation is a deliberate process that transforms one or more inputs into one or more results. The term is used in a variety of senses, from the very defini ...

, , information retrieval
Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text search, fu ...

, intelligence gathering
This is a list of intelligence gathering disciplines.
HUMINT
Human intelligence (HUMINT) are gathered from a person in the location in question. Sources can include the following:
* Advisors or foreign internal defense
Foreign internal ...

, plagiarism detection, pattern recognition
Pattern recognition is the automated recognition of pattern
A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of ...

, anomaly detection
In data analysis, anomaly detection (also outlier detection) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Typically the anomalous items will translate to ...

and even art creation.
Overview

Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was formalized in 1948 by Claude Shannon in a paper entitled ''A Mathematical Theory of Communication
"A Mathematical Theory of Communication" is an article by mathematician
A mathematician is someone who uses an extensive knowledge of mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers ( and ), formulas ...

'', in which information is thought of as a set of possible messages, and the goal is to send these messages over a noisy channel, and to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the noisy-channel coding theorem
In information theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of Digital data, digital information. The field was fundam ...

showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent.
Coding theory is concerned with finding explicit methods, called ''codes'', for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression (source coding) and error-correction
In information theory
Information theory is the scientific study of the quantification, storage, and communication
Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, meaning among Subjec ...

(channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible.
A third class of information theory codes are cryptographic algorithms (both code
In communication
Communication (from Latin
Latin (, or , ) is a classical language
A classical language is a language
A language is a structured system of communication
Communication (from Latin ''communicare'', mean ...

s and cipher
In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is ''encipherment''. To encipher or encode i ...

s). Concepts, methods and results from coding theory and information theory are widely used in cryptography and cryptanalysis
cipher machine
Cryptanalysis (from the Greek language, Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information system
An information system (IS) is a formal, sociotechnical
Sociotechnical ...

. ''See the article ban (unit)
The hartley (symbol Hart), also called a ban, or a dit (short for decimal digit), is a logarithmic unit that measures information or information entropy, entropy, based on base 10 logarithms and powers of 10. One hartley is the information conte ...

for a historical application.''
Historical background

The landmark event ''establishing'' the discipline of information theory and bringing it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the ''Bell System Technical Journal
The ''Bell Labs Technical Journal'' is the in-house scientific journal
In academic publishing
Academic publishing is the subfield of publishing
Publishing is the activity of making information, literature, music, software and other conte ...

'' in July and October 1948.
Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs
Nokia Bell Labs (formerly named Bell Labs Innovations (1996–2007), AT&T Bell Laboratories (1984–1996) and Bell Telephone Laboratories (1925–1984)) is an American industrial research and scientific development company
A company, ab ...

, all implicitly assuming events of equal probability. Harry Nyquist
Harry Nyquist (, ; February 7, 1889 – April 4, 1976) was a Swedish physicist and electronic engineer
Printed circuit board
Electronic engineering (also called electronics and communications engineering) is an electrical engineering disciplin ...

's 1924 paper, ''Certain Factors Affecting Telegraph Speed'', contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation (recalling Boltzmann's constant
The Boltzmann constant ( or ) is the proportionality factor that relates the average relative kinetic energy of particles in a ideal gas, gas with the thermodynamic temperature of the gas. It occurs in the definitions of the kelvin and the gas ...

), where ''W'' is the speed of transmission of intelligence, ''m'' is the number of different voltage levels to choose from at each time step, and ''K'' is a constant. Ralph Hartley
Ralph Vinton Lyon Hartley (November 30, 1888 – May 1, 1970) was an American electronics
Electronics comprises the physics, engineering, technology and applications that deal with the emission, flow and control of electrons in vacuum and matte ...

's 1928 paper, ''Transmission of Information'', uses the word ''information'' as a measurable quantity, reflecting the receiver's ability to distinguish one sequence of symbols from any other, thus quantifying information as , where ''S'' was the number of possible symbols, and ''n'' the number of symbols in a transmission. The unit of information was therefore the decimal digit
, in order of value.
A numerical digit is a single symbol used alone (such as "2") or in combinations (such as "25"), to represent number
A number is a mathematical object used to counting, count, measurement, measure, and nominal number, label ...

, which since has sometimes been called the hartley
Hartley may refer to:
Places Australia
* Hartley, New South Wales
* Hartley, South Australia
** Electoral district of Hartley, a state electoral district
Canada
* Hartley Bay, British Columbia
England
* Hartley, Cumbria
* Hartley, Plymouth ...

in his honor as a unit or scale or measure of information. Alan Turing
Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician
A mathematician is someone who uses an extensive knowledge of mathematics
Mathematics (from Ancient Greek, Greek: ) includes the study of such to ...

in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma
Enigma, aenigma, or The Enigma may refer to:
* Riddle, someone or something that is mysterious or puzzling
Biology
* Aenigma (beetle), ''Aenigma'' (beetle), a genus of beetles
* ''Zulunigma'' or ''Aenigma'', a genus of jumping spiders from South A ...

ciphers.
Much of the mathematics behind information theory with events of different probabilities were developed for the field of thermodynamics
Thermodynamics is a branch of physics that deals with heat, Work (thermodynamics), work, and temperature, and their relation to energy, entropy, and the physical properties of matter and radiation. The behavior of these quantities is governed b ...

by Ludwig Boltzmann
Ludwig Eduard Boltzmann (; 20 February 1844 – 5 September 1906) was an Austria
Austria, officially the Republic of Austria, is a landlocked country in the southern part of Central Europe, located on the Eastern Alps. It is compo ...

and J. Willard Gibbs. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer
Rolf William Landauer (February 4, 1927 – April 27, 1999) was a German-American physicist who made important contributions in diverse areas of the thermodynamics
Thermodynamics is a branch of physics that deals with heat, Work (thermodynami ...

in the 1960s, are explored in ''Entropy in thermodynamics and information theoryThe mathematical expressions for thermodynamic entropy in the statistical thermodynamics formulation established by Ludwig Boltzmann and J. Willard Gibbs in the 1870s are similar to the information entropy by Claude Elwood Shannon, Claude Shannon and ...

''.
In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion:
:"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."
With it came the ideas of
* the information entropy and redundancy of a source, and its relevance through the source coding theorem
In information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the limits to possible data compression, and the operational meaning of the Shannon entropy.
Named after Claude Shannon, the source coding theorem ...

;
* the mutual information, and the channel capacity of a noisy channel, including the promise of perfect loss-free communication given by the noisy-channel coding theorem;
* the practical result of the Shannon–Hartley law for the channel capacity of a Gaussian channel; as well as
* the bit
The bit is a basic unit of information in computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithm
of an algorithm (Euclid's algo ...

—a new way of seeing the most fundamental unit of information.
Quantities of information

Information theory is based onprobability theory
Probability theory is the branch of mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which they are containe ...

and statistics. Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables. The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy channel
Channel, channels, channeling, etc., may refer to:
Geography
* Channel (geography), in physical geography, a landform consisting of the outline (banks) of the path of a narrow body of water.
Australia
* Channel Country, region of outback Austr ...

in the limit of long block lengths, when the channel statistics are determined by the joint distribution.
The choice of logarithmic base in the following formulae determines the unit
Unit may refer to:
Arts and entertainment
* UNIT, a fictional military organization in the science fiction television series ''Doctor Who''
* Unit of action, a discrete piece of action (or beat) in a theatrical presentation
Music
* Unit (album), ...

of information entropy that is used. A common unit of information is the bit, based on the binary logarithm
In mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which they are contained (geometry), and quantities and ...

. Other units include the nat
Nat or NAT may refer to:
Computing
* Network address translation (NAT), in computer networking
Organizations
* National Actors Theatre, New York City, U.S.
* National AIDS trust, a British charity
* National Archives of Thailand
* National Asse ...

, which is based on the natural logarithm
The natural logarithm of a number is its logarithm
In mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers ( and ), formulas and related structures (), shapes and spaces in which they are contained ( ...

, and the decimal digit
, in order of value.
A numerical digit is a single symbol used alone (such as "2") or in combinations (such as "25"), to represent number
A number is a mathematical object used to counting, count, measurement, measure, and nominal number, label ...

, which is based on the common logarithm
In mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which they are contained (geometry), and quantities and ...

.
In what follows, an expression of the form is considered by convention to be equal to zero whenever . This is justified because $\backslash lim\_\; p\; \backslash log\; p\; =\; 0$ for any logarithmic base.
Entropy of an information source

Based on theprobability mass function
In probability
Probability is the branch of mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which th ...

of each source symbol to be communicated, the Shannon entropy
Entropy is a scientific concept as well as a measurable physical property that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamic ...

, in units of bits (per symbol), is given by
:$H\; =\; -\; \backslash sum\_\; p\_i\; \backslash log\_2\; (p\_i)$
where is the probability of occurrence of the -th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the shannon in his honor. Entropy is also commonly computed using the natural logarithm (base , where is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base will produce a measurement in byte
The byte is a unit of digital information that most commonly consists of eight bit
The bit is a basic unit of information in computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It ...

s per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.
Intuitively, the entropy of a discrete random variable is a measure of the amount of ''uncertainty'' associated with the value of when only its distribution is known.
The entropy of a source that emits a sequence of symbols that are independent and identically distributed
In probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expres ...

(iid) is bits (per message of symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length will be less than .
If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted. If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted. Between these two extremes, information can be quantified as follows. If $\backslash mathbb$ is the set of all messages that could be, and is the probability of some $x\; \backslash in\; \backslash mathbb\; X$, then the entropy, , of is defined:
:$H(X)\; =\; \backslash mathbb\_;\; href="/html/ALL/s/(x).html"\; ;"title="(x)">(x)$
(Here, is the self-information
In information theory
Information theory is the scientific study of the quantification, storage, and communication
Communication (from Latin ''communicare'', meaning "to share" or "to be in relation with") is "an apparent answer to the p ...

, which is the entropy contribution of an individual message, and $\backslash mathbb\_X$ is the expected value
In probability theory
Probability theory is the branch of mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and space ...

.) A property of entropy is that it is maximized when all the messages in the message space are equiprobable ; i.e., most unpredictable, in which case .
The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the shannon (Sh) as unit:
:$H\_(p)\; =\; -\; p\; \backslash log\_2\; p\; -\; (1-p)\backslash log\_2\; (1-p).$
Joint entropy

The of two discrete random variables and is merely the entropy of their pairing: . This implies that if and areindependent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independen ...

, then their joint entropy is the sum of their individual entropies.
For example, if represents the position of a chess piece— the row and the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.
:$H(X,\; Y)\; =\; \backslash mathbb\_;\; href="/html/ALL/s/\backslash log\_p(x,y).html"\; ;"title="\backslash log\; p(x,y)">\backslash log\; p(x,y)$
Despite similar notation, joint entropy should not be confused with .
Conditional entropy (equivocation)

The or ''conditional uncertainty'' of given random variable (also called the ''equivocation'' of about ) is the average conditional entropy over : :$H(X,\; Y)\; =\; \backslash mathbb\; E\_Y;\; href="/html/ALL/s/(X.html"\; ;"title="(X">y)$ Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that: : $H(X,\; Y)\; =\; H(X,Y)\; -\; H(Y)\; .\backslash ,$Mutual information (transinformation)

''Mutual information
In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual Statistical dependence, dependence between the two variables. More specifically, it quantifies the "Information content ...

'' measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of relative to is given by:
:$I(X;Y)\; =\; \backslash mathbb\_;\; href="/html/ALL/s/I(x,y).html"\; ;"title="I(x,y)">I(x,y)$
where (''S''pecific mutual ''I''nformation) is the pointwise mutual information.
A basic property of the mutual information is that
: $I(X;Y)\; =\; H(X)\; -\; H(X,\; Y).\backslash ,$
That is, knowing ''Y'', we can save an average of bits in encoding ''X'' compared to not knowing ''Y''.
Mutual information is symmetric
Symmetry (from Greek συμμετρία ''symmetria'' "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more pre ...

:
: $I(X;Y)\; =\; I(Y;X)\; =\; H(X)\; +\; H(Y)\; -\; H(X,Y).\backslash ,$
Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the posterior probability distribution
In Bayesian statistics
Bayesian statistics is a theory in the field of statistics based on the Bayesian probability, Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an Event (probability theory), e ...

of ''X'' given the value of ''Y'' and the prior distribution
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...

on ''X'':
: $I(X;Y)\; =\; \backslash mathbb\; E\_;\; href="/html/ALL/s/\_(\_p(X.html"\; ;"title="\_(\; p(X">Y=y)\; \backslash ,\; p(X)\; )$
In other words, this is a measure of how much, on the average, the probability distribution on ''X'' will change if we are given the value of ''Y''. This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:
: $I(X;\; Y)\; =\; D\_(p(X,Y)\; \backslash ,\; p(X)p(Y)).$
Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables and the multinomial distribution
In probability theory
Probability theory is the branch of concerned with . Although there are several different , probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of . Typically these axio ...

and to Pearson's χKullback–Leibler divergence (information gain)

The ''Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler divergence, D_\text (also called relative entropy), is a measure of how one probability distribution is different from a second, reference probability distribution.. Republished by Dover Publicatio ...

'' (or ''information divergence'', ''information gain'', or ''relative entropy'') is a way of comparing two distributions: a "true" probability distribution
In probability theory
Probability theory is the branch of mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces ...

, and an arbitrary probability distribution . If we compress data in a manner that assumes is the distribution underlying some data, when, in reality, is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined
:$D\_(p(X)\; \backslash ,\; q(X))\; =\; \backslash sum\_\; -p(x)\; \backslash log\; \backslash ,\; -\; \backslash ,\; \backslash sum\_\; -p(x)\; \backslash log\; =\; \backslash sum\_\; p(x)\; \backslash log\; \backslash frac.$
Although it is sometimes used as a 'distance metric', KL divergence is not a true metric
METRIC (Mapping EvapoTranspiration at high Resolution with Internalized Calibration) is a computer model
Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of or th ...

since it is not symmetric and does not satisfy the triangle inequality
In mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which they are contained (geometry), and quantities an ...

(making it a semi-quasimetric).
Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number ''X'' is about to be drawn randomly from a discrete set with probability distribution . If Alice knows the true distribution , while Bob believes (has a prior
Prior (or prioress) is an ecclesiastical
{{Short pages monitor
Information ''Entropy rate, rate'' is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is
:$r\; =\; \backslash lim\_\; H(X\_n,\; X\_,X\_,X\_,\; \backslash ldots);$
that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the ''average rate'' is
:$r\; =\; \backslash lim\_\; \backslash frac\; H(X\_1,\; X\_2,\; \backslash dots\; X\_n);$
that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.
Information rate is defined as
:$r\; =\; \backslash lim\_\; \backslash frac\; I(X\_1,\; X\_2,\; \backslash dots\; X\_n;Y\_1,Y\_2,\; \backslash dots\; Y\_n);$
It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of .

Semiotics of ideology

. ''Semiotica'', Issue 148.

PDF.

* R.V.L. Hartley

"Transmission of Information"

''Bell System Technical Journal'', July 1928 * Andrey Kolmogorov (1968),

Three approaches to the quantitative definition of information

in International Journal of Computer Mathematics.

Princeton

"A New Interpretation of Information Rate" ''Bell System Technical Journal'', Vol. 35, July 1956, pp. 917–26. * R. Landauer

IEEE.org

"Information is Physical" ''Proc. Workshop on Physics and Computation PhysComp'92'' (IEEE Comp. Sci.Press, Los Alamitos, 1993) pp. 1–4. * *

Information Theory, Inference, and Learning Algorithms

' Cambridge: Cambridge University Press, 2003. * Mansuripur, M. ''Introduction to Information Theory''. New York: Prentice Hall, 1987. * Robert McEliece, McEliece, R. ''The Theory of Information and Coding''. Cambridge, 2002. *Pierce, JR. "An introduction to information theory: symbols, signals and noise". Dover (2nd Edition). 1961 (reprinted by Dover 1980). * Reza, F. ''An Introduction to Information Theory''. New York: McGraw-Hill 1961. New York: Dover 1994. * * Stone, JV. Chapter 1 of boo

"Information Theory: A Tutorial Introduction"

University of Sheffield, England, 2014. . * Yeung, RW.

A First Course in Information Theory

' Kluwer Academic/Plenum Publishers, 2002. . * Yeung, RW.

Information Theory and Network Coding

' Springer 2008, 2002.

Information Theory in Computer Vision and Pattern Recognition

', Springer, 2009. * Vlatko Vedral, ''Decoding Reality: The Universe as Quantum Information'', Oxford University Press 2010.

(The Chinese University of Hong Kong)

Shuffled Cards, Messy Desks, and Disorderly Dorm Rooms - Examples of Entropy Increase? Nonsense!

, ''Journal of Chemical Education''

IEEE Information Theory Society

an

ITSOC Monographs, Surveys, and Reviews

{{DEFAULTSORT:Information Theory Information theory, Computer-related introductions in 1948 Computer science Cybernetics Formal sciences Information Age Claude Shannon

Channel capacity

Communications over a channel is the primary motivation of information theory. However, channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality. Consider the communications process over a discrete channel. A simple model of the process is shown below: :$\backslash xrightarrow[\backslash text]\; \backslash begin\backslash hline\; \backslash text\; \backslash \backslash \; f\_n\; \backslash \backslash \; \backslash hline\backslash end\; \backslash xrightarrow[\backslash mathrm]\; \backslash begin\backslash hline\; \backslash text\; \backslash \backslash \; p(y,\; x)\; \backslash \backslash \; \backslash hline\backslash end\; \backslash xrightarrow[\backslash mathrm]\; \backslash begin\backslash hline\; \backslash text\; \backslash \backslash \; g\_n\; \backslash \backslash \; \backslash hline\backslash end\; \backslash xrightarrow[\backslash mathrm]$ Here ''X'' represents the space of messages transmitted, and ''Y'' the space of messages received during a unit time over our channel. Let be the conditional probability distribution function of ''Y'' given ''X''. We will consider to be an inherent fixed property of our communications channel (representing the nature of the ''Signal noise, noise'' of our channel). Then the joint distribution of ''X'' and ''Y'' is completely determined by our channel and by our choice of , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the ''Signal (electrical engineering), signal'', we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the and is given by: :$C\; =\; \backslash max\_\; I(X;Y).\backslash !$ This capacity has the following property related to communicating at information rate ''R'' (where ''R'' is usually bits per symbol). For any information rate ''R'' < ''C'' and coding error ''ε'' > 0, for large enough ''N'', there exists a code of length ''N'' and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ''ε''; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate ''R'' > ''C'', it is impossible to transmit with arbitrarily small block error. ''Channel code, Channel coding'' is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.Capacity of particular channel models

* A continuous-time analog communications channel subject to Gaussian noise—see Shannon–Hartley theorem. * A binary symmetric channel (BSC) with crossover probability ''p'' is a binary input, binary output channel that flips the input bit with probability ''p''. The BSC has a capacity of bits per channel use, where is the binary entropy function to the base-2 logarithm: :: * A binary erasure channel (BEC) with erasure probability ''p'' is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is bits per channel use. ::Channels with memory and directed information

In practice many channels have memory. Namely, at time $i$ the channel is given by the conditional probability $P(y\_i,\; x\_i,x\_,x\_,...,x\_1,y\_,y\_,...,y\_1).$. It is often more comfortable to use the notation $x^i=(x\_i,x\_,x\_,...,x\_1)$ and the channel become $P(y\_i,\; x^i,y^).$. In such a case the capacity is given by themutual information
In probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expr ...

rate when there is no feedback available and the Directed information rate in the case that either there is feedback or not (if there is no feedback the directed informationj equals the mutual information).
Applications to other fields

Intelligence uses and secrecy applications

Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the Ban (unit), ban, was used in the Ultra project, breaking the German Enigma machine code and hastening the Victory in Europe Day, end of World War II in Europe. Shannon himself defined an important concept now called the unicity distance. Based on the redundancy of the plaintext, it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability. Information theory leads us to believe it is much more difficult to keep secrets than it might first appear. A brute force attack can break systems based on public-key cryptography, asymmetric key algorithms or on most commonly used methods of symmetric-key algorithm, symmetric key algorithms (sometimes called secret key algorithms), such as block ciphers. The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time. Information theoretic security refers to methods such as the one-time pad that are not vulnerable to such brute force attacks. In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the key (cryptography), key) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications. In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the Venona project was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.Pseudorandom number generation

Pseudorandom number generators are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed cryptographically secure pseudorandom number generators, but even they require random seeds external to the software to work as intended. These can be obtained via Extractor (mathematics), extractors, if done carefully. The measure of sufficient randomness in extractors is min-entropy, a value related to Shannon entropy throughRényi entropyIn information theory, the Rényi entropy generalizes the Hartley entropy, the Shannon entropy, the collision entropy and the min-entropy. Entropies quantify the diversity, uncertainty, or randomness of a system. The entropy is named after Alfréd ...

; Rényi entropy is also used in evaluating randomness in cryptographic systems. Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.
Seismic exploration

One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and digital signal processing offer a major improvement of resolution and image clarity over previous analog methods.Semiotics

Semiotics, Semioticians :nl:Doede Nauta, Doede Nauta and Winfried Nöth both considered Charles Sanders Peirce as having created a theory of information in his works on semiotics. Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing." Concepts from information theory such as redundancy and code control have been used by semioticians such as Umberto Eco and :it:Ferruccio Rossi-Landi, Ferruccio Rossi-Landi to explain ideology as a form of message transmission whereby a dominant social class emits its message by using signs that exhibit a high degree of redundancy such that only one message is decoded among a selection of competing ones.Nöth, Winfried (1981).Semiotics of ideology

. ''Semiotica'', Issue 148.

Miscellaneous applications

Information theory also has applications in Gambling and information theory, black hole information paradox, black holes, andbioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biology, biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformat ...

.
See also

* Algorithmic probability * Bayesian inference * Communication theory * Constructor theory - a generalization of information theory that includes quantum information * Inductive probability * Info-metrics * Minimum message length * Minimum description length * List of important publications in theoretical computer science#Information theory, List of important publications * Philosophy of informationApplications

* Active networking * Cryptanalysis * Cryptography * Cybernetics *Entropy in thermodynamics and information theoryThe mathematical expressions for thermodynamic entropy in the statistical thermodynamics formulation established by Ludwig Boltzmann and J. Willard Gibbs in the 1870s are similar to the information entropy by Claude Elwood Shannon, Claude Shannon and ...

* Gambling
* Intelligence (information gathering)
* reflection seismology, Seismic exploration
History

* Ralph Hartley, Hartley, R.V.L. * History of information theory * Claude Elwood Shannon, Shannon, C.E. * Timeline of information theory * Hubert Yockey, Yockey, H.P.Theory

* Coding theory * Detection theory * Estimation theory * Fisher information * Information algebra * Information asymmetry * Information field theory * Information geometry * Information theory and measure theory * Kolmogorov complexity * List of unsolved problems in information theory * Logic of information * Network coding * Philosophy of information * Quantum information science * Source codingConcepts

* Ban (unit) * Channel capacity * Communication channel * Communication source * Conditional entropy * Covert channel * Data compression * Decoder * Differential entropy * Fungible information * Information fluctuation complexity * Information entropy * Joint entropy *Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler divergence, D_\text (also called relative entropy), is a measure of how one probability distribution is different from a second, reference probability distribution.. Republished by Dover Publicatio ...

* Mutual information
In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual Statistical dependence, dependence between the two variables. More specifically, it quantifies the "Information content ...

* Pointwise mutual information (PMI)
* Receiver (information theory)
* Redundancy (information theory), Redundancy
* Rényi entropyIn information theory, the Rényi entropy generalizes the Hartley entropy, the Shannon entropy, the collision entropy and the min-entropy. Entropies quantify the diversity, uncertainty, or randomness of a system. The entropy is named after Alfréd ...

* Self-information
* Unicity distance
* Variety (cybernetics), Variety
* Hamming distance
References

Further reading

The classic work

* Claude Elwood Shannon, Shannon, C.E. (1948), "A Mathematical Theory of Communication
"A Mathematical Theory of Communication" is an article by mathematician
A mathematician is someone who uses an extensive knowledge of mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers ( and ), formulas ...

", ''Bell System Technical Journal'', 27, pp. 379–423 & 623–656, July & October, 1948PDF.

* R.V.L. Hartley

"Transmission of Information"

''Bell System Technical Journal'', July 1928 * Andrey Kolmogorov (1968),

Three approaches to the quantitative definition of information

in International Journal of Computer Mathematics.

Other journal articles

* J. L. Kelly, Jr.Princeton

"A New Interpretation of Information Rate" ''Bell System Technical Journal'', Vol. 35, July 1956, pp. 917–26. * R. Landauer

IEEE.org

"Information is Physical" ''Proc. Workshop on Physics and Computation PhysComp'92'' (IEEE Comp. Sci.Press, Los Alamitos, 1993) pp. 1–4. * *

Textbooks on information theory

* Arndt, C. ''Information Measures, Information and its Description in Science and Engineering'' (Springer Series: Signals and Communication Technology), 2004, * Ash, RB. ''Information Theory''. New York: Interscience, 1965. . New York: Dover 1990. * Gallager, R. ''Information Theory and Reliable Communication.'' New York: John Wiley and Sons, 1968. * Goldman, S. ''Information Theory''. New York: Prentice Hall, 1953. New York: Dover 1968 , 2005 * * Csiszar, I, Korner, J. ''Information Theory: Coding Theorems for Discrete Memoryless Systems'' Akademiai Kiado: 2nd edition, 1997. * David J. C. MacKay, MacKay, David J. C..Information Theory, Inference, and Learning Algorithms

' Cambridge: Cambridge University Press, 2003. * Mansuripur, M. ''Introduction to Information Theory''. New York: Prentice Hall, 1987. * Robert McEliece, McEliece, R. ''The Theory of Information and Coding''. Cambridge, 2002. *Pierce, JR. "An introduction to information theory: symbols, signals and noise". Dover (2nd Edition). 1961 (reprinted by Dover 1980). * Reza, F. ''An Introduction to Information Theory''. New York: McGraw-Hill 1961. New York: Dover 1994. * * Stone, JV. Chapter 1 of boo

"Information Theory: A Tutorial Introduction"

University of Sheffield, England, 2014. . * Yeung, RW.

A First Course in Information Theory

' Kluwer Academic/Plenum Publishers, 2002. . * Yeung, RW.

Information Theory and Network Coding

' Springer 2008, 2002.

Other books

* Leon Brillouin, ''Science and Information Theory'', Mineola, N.Y.: Dover, [1956, 1962] 2004. * James Gleick, ''The Information: A History, a Theory, a Flood'', New York: Pantheon, 2011. * A. I. Khinchin, ''Mathematical Foundations of Information Theory'', New York: Dover, 1957. * H. S. Leff and A. F. Rex, Editors, ''Maxwell's Demon: Entropy, Information, Computing'', Princeton University Press, Princeton, New Jersey (1990). * Robert K. Logan. ''What is Information? - Propagating Organization in the Biosphere, the Symbolosphere, the Technosphere and the Econosphere'', Toronto: DEMO Publishing. * Tom Siegfried, ''The Bit and the Pendulum'', Wiley, 2000. * Charles Seife, ''Decoding the Universe'', Viking, 2006. * Jeremy Campbell, ''Grammatical Man'', Touchstone/Simon & Schuster, 1982, * Henri Theil, ''Economics and Information Theory'', Rand McNally & Company - Chicago, 1967. * Escolano, Suau, Bonev,Information Theory in Computer Vision and Pattern Recognition

', Springer, 2009. * Vlatko Vedral, ''Decoding Reality: The Universe as Quantum Information'', Oxford University Press 2010.

MOOC on information theory

* Raymond W. Yeung,(The Chinese University of Hong Kong)

External links

* * Lambert F. L. (1999),Shuffled Cards, Messy Desks, and Disorderly Dorm Rooms - Examples of Entropy Increase? Nonsense!

, ''Journal of Chemical Education''

IEEE Information Theory Society

an

ITSOC Monographs, Surveys, and Reviews

{{DEFAULTSORT:Information Theory Information theory, Computer-related introductions in 1948 Computer science Cybernetics Formal sciences Information Age Claude Shannon