Information-theoretic
   HOME

TheInfoList



OR:

Information theory is the scientific study of the quantification, storage, and
communication Communication (from la, communicare, meaning "to share" or "to be in relation with") is usually defined as the transmission of information. The term may also refer to the message communicated through such transmissions or the field of inquir ...
of
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random ...
. The field was originally established by the works of
Harry Nyquist Harry Nyquist (, ; February 7, 1889 – April 4, 1976) was a Swedish-American physicist and electronic engineer who made important contributions to communication theory. Personal life Nyquist was born in the village Nilsby of the parish Stora Ki ...
and
Ralph Hartley Ralph Vinton Lyon Hartley (November 30, 1888 – May 1, 1970) was an American electronics researcher. He invented the Hartley oscillator and the Hartley transform, and contributed to the foundations of information theory. Biography Hartley was ...
, in the 1920s, and
Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American people, American mathematician, electrical engineering, electrical engineer, and cryptography, cryptographer known as a "father of information theory". As a 21-year-o ...
in the 1940s. The field is at the intersection of
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
,
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
,
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
,
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic be ...
,
information engineering Information engineering is the engineering discipline that deals with the generation, distribution, analysis, and use of information, data, and knowledge in systems. The field first became identifiable in the early 21st century. The component ...
, and
electrical engineering Electrical engineering is an engineering discipline concerned with the study, design, and application of equipment, devices, and systems which use electricity, electronics, and electromagnetism. It emerged as an identifiable occupation in the l ...
. A key measure in information theory is
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
. Entropy quantifies the amount of uncertainty involved in the value of a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
or the outcome of a
random process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appe ...
. For example, identifying the outcome of a fair
coin flip Coin flipping, coin tossing, or heads or tails is the practice of throwing a coin in the air and checking which side is showing when it lands, in order to choose between two alternatives, heads or tails, sometimes used to resolve a dispute betwe ...
(with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a
die Die, as a verb, refers to death, the cessation of life. Die may also refer to: Games * Die, singular of dice, small throwable objects used for producing random numbers Manufacturing * Die (integrated circuit), a rectangular piece of a semicondu ...
(with six equally likely outcomes). Some other important measures in information theory are
mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...
, channel capacity,
error exponent In information theory, the error exponent of a channel code or source code over the block length of the code is the rate at which the error probability decays exponentially with the block length of the code. Formally, it is defined as the limiting ...
s, and
relative entropy Relative may refer to: General use *Kinship and family, the principle binding the most basic social units society. If two people are connected by circumstances of birth, they are said to be ''relatives'' Philosophy *Relativism, the concept that ...
. Important sub-fields of information theory include
source coding In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
,
algorithmic complexity theory In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produ ...
,
algorithmic information theory Algorithmic information theory (AIT) is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects (as opposed to stochastically generated), such as st ...
and
information-theoretic security A cryptosystem is considered to have information-theoretic security (also called unconditional security) if the system is secure against adversaries with unlimited computing resources and time. In contrast, a system which depends on the computati ...
. Applications of fundamental topics of information theory include source coding/
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
(e.g. for ZIP files), and channel coding/
error detection and correction In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
(e.g. for
DSL Digital subscriber line (DSL; originally digital subscriber loop) is a family of technologies that are used to transmit digital data over telephone lines. In telecommunications marketing, the term DSL is widely understood to mean asymmetric dig ...
). Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the
compact disc The compact disc (CD) is a Digital media, digital optical disc data storage format that was co-developed by Philips and Sony to store and play digital audio recordings. In August 1982, the first compact disc was manufactured. It was then rele ...
, the feasibility of mobile phones and the development of the Internet. The theory has also found applications in other areas, including
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
,
cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
,
neurobiology Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions and disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, development ...
,
perception Perception () is the organization, identification, and interpretation of sensory information in order to represent and understand the presented information or environment. All perception involves signals that go through the nervous system ...
, linguistics, the evolution and function of molecular codes (
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
),
thermal physics Thermal physics is the combined study of thermodynamics, statistical mechanics, and kinetic theory of gases. This umbrella-subject is typically designed for physics students and functions to provide a general introduction to each of three core hea ...
,
molecular dynamics Molecular dynamics (MD) is a computer simulation method for analyzing the physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamic "evolution" of the ...
,
quantum computing Quantum computing is a type of computation whose operations can harness the phenomena of quantum mechanics, such as superposition, interference, and entanglement. Devices that perform quantum computations are known as quantum computers. Though ...
,
black hole A black hole is a region of spacetime where gravitation, gravity is so strong that nothing, including light or other Electromagnetic radiation, electromagnetic waves, has enough energy to escape it. The theory of general relativity predicts t ...
s,
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
,
intelligence gathering This is a list of intelligence gathering disciplines. HUMINT Human intelligence (HUMINT) are gathered from a person in the location in question. Sources can include the following: * Advisors or foreign internal defense (FID) personnel wor ...
,
plagiarism detection Plagiarism detection or content similarity detection is the process of locating instances of plagiarism or copyright infringement within a work or document. The widespread use of computers and the advent of the Internet have made it easier to pla ...
,
pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphi ...
,
anomaly detection In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority ...
and even art creation.


Overview

Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was formalized in 1948 by Claude Shannon in a paper entitled ''
A Mathematical Theory of Communication "A Mathematical Theory of Communication" is an article by mathematician Claude E. Shannon published in '' Bell System Technical Journal'' in 1948. It was renamed ''The Mathematical Theory of Communication'' in the 1949 book of the same name, a sm ...
'', in which information is thought of as a set of possible messages, and the goal is to send these messages over a noisy channel, and to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the
noisy-channel coding theorem In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (dig ...
showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent. Coding theory is concerned with finding explicit methods, called ''codes'', for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression (source coding) and
error-correction In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communica ...
(channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible. A third class of information theory codes are cryptographic algorithms (both
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
s and
cipher In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is ''encipherment''. To encipher or encode i ...
s). Concepts, methods and results from coding theory and information theory are widely used in cryptography and
cryptanalysis Cryptanalysis (from the Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic sec ...
. ''See the article
ban (unit) The hartley (symbol Hart), also called a ban, or a dit (short for decimal digit), is a logarithmic unit that measures information or entropy, based on base 10 logarithms and powers of 10. One hartley is the information content of an event if th ...
for a historical application.''


Historical background

The landmark event ''establishing'' the discipline of information theory and bringing it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the ''
Bell System Technical Journal The ''Bell Labs Technical Journal'' is the in-house scientific journal for scientists of Nokia Bell Labs, published yearly by the IEEE society. The managing editor is Charles Bahr. The journal was originally established as the ''Bell System Techn ...
'' in July and October 1948. Prior to this paper, limited information-theoretic ideas had been developed at
Bell Labs Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mult ...
, all implicitly assuming events of equal probability.
Harry Nyquist Harry Nyquist (, ; February 7, 1889 – April 4, 1976) was a Swedish-American physicist and electronic engineer who made important contributions to communication theory. Personal life Nyquist was born in the village Nilsby of the parish Stora Ki ...
's 1924 paper, ''Certain Factors Affecting Telegraph Speed'', contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation (recalling the
Boltzmann constant The Boltzmann constant ( or ) is the proportionality factor that relates the average relative kinetic energy of particles in a gas with the thermodynamic temperature of the gas. It occurs in the definitions of the kelvin and the gas constant, ...
), where ''W'' is the speed of transmission of intelligence, ''m'' is the number of different voltage levels to choose from at each time step, and ''K'' is a constant.
Ralph Hartley Ralph Vinton Lyon Hartley (November 30, 1888 – May 1, 1970) was an American electronics researcher. He invented the Hartley oscillator and the Hartley transform, and contributed to the foundations of information theory. Biography Hartley was ...
's 1928 paper, ''Transmission of Information'', uses the word ''information'' as a measurable quantity, reflecting the receiver's ability to distinguish one
sequence of symbols In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). ...
from any other, thus quantifying information as , where ''S'' was the number of possible symbols, and ''n'' the number of symbols in a transmission. The unit of information was therefore the
decimal digit A numerical digit (often shortened to just digit) is a single symbol used alone (such as "2") or in combinations (such as "25"), to represent numbers in a positional numeral system. The name "digit" comes from the fact that the ten digits (Latin ...
, which since has sometimes been called the
hartley Hartley may refer to: Places Australia *Hartley, New South Wales *Hartley, South Australia **Electoral district of Hartley, a state electoral district Canada *Hartley Bay, British Columbia United Kingdom *Hartley, Cumbria *Hartley, Plymou ...
in his honor as a unit or scale or measure of information.
Alan Turing Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical com ...
in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war
Enigma Enigma may refer to: *Riddle, someone or something that is mysterious or puzzling Biology *ENIGMA, a class of gene in the LIM domain Computing and technology * Enigma (company), a New York-based data-technology startup * Enigma machine, a family ...
ciphers. Much of the mathematics behind information theory with events of different probabilities were developed for the field of
thermodynamics Thermodynamics is a branch of physics that deals with heat, work, and temperature, and their relation to energy, entropy, and the physical properties of matter and radiation. The behavior of these quantities is governed by the four laws of the ...
by
Ludwig Boltzmann Ludwig Eduard Boltzmann (; 20 February 1844 – 5 September 1906) was an Austrian physicist and philosopher. His greatest achievements were the development of statistical mechanics, and the statistical explanation of the second law of thermodyn ...
and
J. Willard Gibbs Josiah Willard Gibbs (; February 11, 1839 – April 28, 1903) was an American scientist who made significant theoretical contributions to physics, chemistry, and mathematics. His work on the applications of thermodynamics was instrumental in t ...
. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by
Rolf Landauer Rolf William Landauer (February 4, 1927 – April 27, 1999) was a German-American physicist who made important contributions in diverse areas of the thermodynamics of information processing, condensed matter physics, and the conductivity of disor ...
in the 1960s, are explored in ''
Entropy in thermodynamics and information theory The mathematical expressions for thermodynamic entropy in the statistical thermodynamics formulation established by Ludwig Boltzmann and J. Willard Gibbs in the 1870s are similar to the information entropy by Claude Shannon and Ralph Hartley, develo ...
''. In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion: :"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point." With it came the ideas of * the information entropy and redundancy of a source, and its relevance through the
source coding theorem In information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the limits to possible data compression, and the operational meaning of the Shannon entropy. Named after Claude Shannon, the source coding theorem ...
; * the mutual information, and the channel capacity of a noisy channel, including the promise of perfect loss-free communication given by the noisy-channel coding theorem; * the practical result of the Shannon–Hartley law for the channel capacity of a
Gaussian channel Additive white Gaussian noise (AWGN) is a basic noise model used in information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics: * ''Additive'' because it is added to any nois ...
; as well as * the
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
—a new way of seeing the most fundamental unit of information.


Quantities of information

Information theory is based on
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and statistics, where quantified information is usually described in terms of bits. Information theory often concerns itself with measures of information of the distributions associated with random variables. One of the most important measures is called entropy, which forms the building block of many other measures. Entropy allows quantification of measure of information in a single random variable. Another useful concept is mutual information defined on two random variables, which describes the measure of information in common between those variables, which can be used to describe their correlation. The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy
channel Channel, channels, channeling, etc., may refer to: Geography * Channel (geography), in physical geography, a landform consisting of the outline (banks) of the path of a narrow body of water. Australia * Channel Country, region of outback Austral ...
in the limit of long block lengths, when the channel statistics are determined by the joint distribution. The choice of logarithmic base in the following formulae determines the
unit Unit may refer to: Arts and entertainment * UNIT, a fictional military organization in the science fiction television series ''Doctor Who'' * Unit of action, a discrete piece of action (or beat) in a theatrical presentation Music * ''Unit'' (alb ...
of information entropy that is used. A common unit of information is the bit, based on the
binary logarithm In mathematics, the binary logarithm () is the power to which the number must be raised to obtain the value . That is, for any real number , :x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n. For example, the binary logarithm of is , the b ...
. Other units include the
nat Nat or NAT may refer to: Computing * Network address translation (NAT), in computer networking Organizations * National Actors Theatre, New York City, U.S. * National AIDS trust, a British charity * National Archives of Thailand * National As ...
, which is based on the
natural logarithm The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...
, and the
decimal digit A numerical digit (often shortened to just digit) is a single symbol used alone (such as "2") or in combinations (such as "25"), to represent numbers in a positional numeral system. The name "digit" comes from the fact that the ten digits (Latin ...
, which is based on the
common logarithm In mathematics, the common logarithm is the logarithm with base 10. It is also known as the decadic logarithm and as the decimal logarithm, named after its base, or Briggsian logarithm, after Henry Briggs, an English mathematician who pioneered i ...
. In what follows, an expression of the form is considered by convention to be equal to zero whenever . This is justified because \lim_ p \log p = 0 for any logarithmic base.


Entropy of an information source

Based on the
probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
of each source symbol to be communicated, the Shannon
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
, in units of bits (per symbol), is given by :H = - \sum_ p_i \log_2 (p_i) where is the probability of occurrence of the -th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the shannon in his honor. Entropy is also commonly computed using the natural logarithm (base , where is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base will produce a measurement in
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
s per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol. Intuitively, the entropy of a discrete random variable is a measure of the amount of ''uncertainty'' associated with the value of when only its distribution is known. The entropy of a source that emits a sequence of symbols that are
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...
(iid) is bits (per message of symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length will be less than . If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted. If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted. Between these two extremes, information can be quantified as follows. If \mathbb is the set of all messages that could be, and is the probability of some x \in \mathbb X, then the entropy, , of is defined: : H(X) = \mathbb_
(x) An emoticon (, , rarely , ), short for "emotion icon", also known simply as an emote, is a pictorial representation of a facial expression using characters—usually punctuation marks, numbers, and letters—to express a person's feelings, ...
= -\sum_ p(x) \log p(x). (Here, is the
self-information In information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative ...
, which is the entropy contribution of an individual message, and \mathbb_X is the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
.) A property of entropy is that it is maximized when all the messages in the message space are equiprobable ; i.e., most unpredictable, in which case . The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the shannon (Sh) as unit: :H_(p) = - p \log_2 p - (1-p)\log_2 (1-p).


Joint entropy

The of two discrete random variables and is merely the entropy of their pairing: . This implies that if and are
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
, then their joint entropy is the sum of their individual entropies. For example, if represents the position of a chess piece— the row and the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece. :H(X, Y) = \mathbb_ \log p(x,y)= - \sum_ p(x, y) \log p(x, y) \, Despite similar notation, joint entropy should not be confused with .


Conditional entropy (equivocation)

The or ''conditional uncertainty'' of given random variable (also called the ''equivocation'' of about ) is the average conditional entropy over : : H(X, Y) = \mathbb E_Y y)= -\sum_ p(y) \sum_ p(x, y) \log p(x, y) = -\sum_ p(x,y) \log p(x, y). Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that: : H(X, Y) = H(X,Y) - H(Y) .\,


Mutual information (transinformation)

''
Mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...
'' measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of relative to is given by: :I(X;Y) = \mathbb_
I(x,y) I, or i, is the ninth letter and the third vowel letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''i'' (pronounced ), plural ...
= \sum_ p(x,y) \log \frac where (''S''pecific mutual ''I''nformation) is the
pointwise mutual information In statistics, probability theory and information theory, pointwise mutual information (PMI), or point mutual information, is a measure of association. It compares the probability of two events occurring together to what this probability would b ...
. A basic property of the mutual information is that : I(X;Y) = H(X) - H(X, Y).\, That is, knowing ''Y'', we can save an average of bits in encoding ''X'' compared to not knowing ''Y''. Mutual information is
symmetric Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definiti ...
: : I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\, Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the
posterior probability distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
of ''X'' given the value of ''Y'' and the
prior distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
on ''X'': : I(X;Y) = \mathbb E_ Y=y) \, p(X) ) In other words, this is a measure of how much, on the average, the probability distribution on ''X'' will change if we are given the value of ''Y''. This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution: : I(X; Y) = D_(p(X,Y) \, p(X)p(Y)). Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables and the
multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...
and to Pearson's χ2 test: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.


Kullback–Leibler divergence (information gain)

The ''
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
'' (or ''information divergence'', ''information gain'', or ''relative entropy'') is a way of comparing two distributions: a "true"
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
, and an arbitrary probability distribution . If we compress data in a manner that assumes is the distribution underlying some data, when, in reality, is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined :D_(p(X) \, q(X)) = \sum_ -p(x) \log \, - \, \sum_ -p(x) \log = \sum_ p(x) \log \frac. Although it is sometimes used as a 'distance metric', KL divergence is not a true
metric Metric or metrical may refer to: * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics In mathem ...
since it is not symmetric and does not satisfy the
triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of degenerate triangles, but ...
(making it a semi-quasimetric). Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number ''X'' is about to be drawn randomly from a discrete set with probability distribution . If Alice knows the true distribution , while Bob believes (has a
prior Prior (or prioress) is an ecclesiastical title for a superior in some religious orders. The word is derived from the Latin for "earlier" or "first". Its earlier generic usage referred to any monastic superior. In abbeys, a prior would be l ...
) that the distribution is , then Bob will be more surprised than Alice, on average, upon seeing the value of ''X''. The KL divergence is the (objective) expected value of Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the ''log'' is in base 2. In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.


Directed Information

Directed information Directed information, I(X^n\to Y^n) , is an information theory measure that quantifies the information flow from the random process X^n = \ to the random process Y^n = \. The term ''directed information'' was coined by James Massey and is defined ...
, I(X^n\to Y^n) , is an
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
measure that quantifies the
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random ...
flow from the random process X^n = \ to the random process Y^n = \. The term ''directed information'' was coined by
James Massey James Lee Massey (February 11, 1934 – June 16, 2013) was an American information theorist and cryptographer, Professor Emeritus of Digital Technology at ETH Zurich. His notable work includes the application of the Berlekamp–Massey algorithm to ...
and is defined as :I(X^n\to Y^n) \triangleq \sum_^n I(X^i;Y_i, Y^), where I(X^;Y_i, Y^) is the conditional mutual information I(X_1,X_2,...,X_;Y_i, Y_1,Y_2,...,Y_). Differently from the Mutual informaion, the dircted information is not symmetric. The I(X^n\to Y^n) measures the information bits that are transmitted caussaly from X^n to Y^n. The Directed information has many applications in problems where
causality Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the cau ...
plays an important role such as capacity of channel with feedback, capacity of discrete
memoryless In probability and statistics, memorylessness is a property of certain probability distributions. It usually refers to the cases when the distribution of a "waiting time" until a certain event does not depend on how much time has elapsed already ...
networks with feedback,
gambling Gambling (also known as betting or gaming) is the wagering of something of value ("the stakes") on a random event with the intent of winning something else of value, where instances of strategy are discounted. Gambling thus requires three el ...
with causal side information,
compression Compression may refer to: Physical science *Compression (physics), size reduction due to forces *Compression member, a structural element such as a column *Compressibility, susceptibility to compression * Gas compression *Compression ratio, of a ...
with causal side information, and in
real-time control Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constrai ...
communication settings, statistical physics.


Other quantities

Other important information theoretic quantities include Rényi entropy (a generalization of entropy),
differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
(a generalization of quantities of information to continuous distributions), and the conditional mutual information.


Coding theory

Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source. * Data compression (source coding): There are two formulations for the compression problem: **
lossless data compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...
: the data must be reconstructed exactly; **
lossy data compression In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of information theory is called ''
rate–distortion theory Rate–distortion theory is a major branch of information theory which provides the theoretical foundations for lossy data compression; it addresses the problem of determining the minimal number of bits per symbol, as measured by the rate ''R'', ...
''. * Error-correcting codes (channel coding): While data compression removes as much redundancy as possible, an
error-correcting code In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea is ...
adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel. This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the
broadcast channel Broadcasting is the distribution of audio or video content to a dispersed audience via any electronic mass communications medium, but typically one using the electromagnetic spectrum ( radio waves), in a one-to-many model. Broadcasting began w ...
) or intermediary "helpers" (the
relay channel In information theory, a relay channel is a probability model of the communication between a sender A sender was a special type of circuit in 20th-century electromechanical telephone exchanges which registered the telephone numbers dialed by the s ...
), or more general
networks Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
, compression followed by transmission may no longer be optimal.


Source theory

Any process that generates successive messages can be considered a of information. A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of
ergodicity In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies th ...
and stationarity impose less restrictive constraints. All such sources are
stochastic Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselv ...
. These terms are well studied in their own right outside information theory.


Rate

Information '' rate'' is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is :r = \lim_ H(X_n, X_,X_,X_, \ldots); that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the ''average rate'' is :r = \lim_ \frac H(X_1, X_2, \dots X_n); that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result. Information rate is defined as :r = \lim_ \frac I(X_1, X_2, \dots X_n;Y_1,Y_2, \dots Y_n); It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of .


Channel capacity

Communications over a channel is the primary motivation of information theory. However, channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality. Consider the communications process over a discrete channel. A simple model of the process is shown below: : \xrightarrow
text Text may refer to: Written word * Text (literary theory), any object that can be read, including: **Religious text, a writing that a religious tradition considers to be sacred **Text, a verse or passage from scripture used in expository preachin ...
\begin\hline \text \\ f_n \\ \hline\end \xrightarrow mathrm\begin\hline \text \\ p(y, x) \\ \hline\end \xrightarrow mathrm\begin\hline \text \\ g_n \\ \hline\end \xrightarrow mathrm/math> Here ''X'' represents the space of messages transmitted, and ''Y'' the space of messages received during a unit time over our channel. Let be the
conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
distribution function of ''Y'' given ''X''. We will consider to be an inherent fixed property of our communications channel (representing the nature of the ''
noise Noise is unwanted sound considered unpleasant, loud or disruptive to hearing. From a physics standpoint, there is no distinction between noise and desired sound, as both are vibrations through a medium, such as air or water. The difference arise ...
'' of our channel). Then the joint distribution of ''X'' and ''Y'' is completely determined by our channel and by our choice of , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the ''
signal In signal processing, a signal is a function that conveys information about a phenomenon. Any quantity that can vary over space or time can be used as a signal to share messages between observers. The ''IEEE Transactions on Signal Processing'' ...
'', we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the and is given by: : C = \max_ I(X;Y).\! This capacity has the following property related to communicating at information rate ''R'' (where ''R'' is usually bits per symbol). For any information rate ''R'' < ''C'' and coding error ''ε'' > 0, for large enough ''N'', there exists a code of length ''N'' and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ''ε''; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate ''R'' > ''C'', it is impossible to transmit with arbitrarily small block error. ''
Channel coding In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea is ...
'' is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.


Capacity of particular channel models

* A continuous-time analog communications channel subject to
Gaussian noise Gaussian noise, named after Carl Friedrich Gauss, is a term from signal processing theory denoting a kind of signal noise that has a probability density function (pdf) equal to that of the normal distribution (which is also known as the Gaussia ...
—see
Shannon–Hartley theorem In information theory, the Shannon–Hartley theorem tells the maximum rate at which information can be transmitted over a communications channel of a specified bandwidth in the presence of noise. It is an application of the noisy-channel coding ...
. * A binary symmetric channel (BSC) with crossover probability ''p'' is a binary input, binary output channel that flips the input bit with probability ''p''. The BSC has a capacity of bits per channel use, where is the binary entropy function to the base-2 logarithm: :: * A
binary erasure channel In coding theory and information theory, a binary erasure channel (BEC) is a communications channel model. A transmitter sends a bit (a zero or a one), and the receiver either receives the bit correctly, or with some probability P_e receives a me ...
(BEC) with erasure probability ''p'' is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is bits per channel use. ::


Channels with memory and directed information

In practice many channels have memory. Namely, at time i the channel is given by the conditional probability P(y_i, x_i,x_,x_,...,x_1,y_,y_,...,y_1). . It is often more comfortable to use the notation x^i=(x_i,x_,x_,...,x_1) and the channel become P(y_i, x^i,y^). . In such a case the capacity is given by the
mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...
rate when there is no feedback available and the
Directed information Directed information, I(X^n\to Y^n) , is an information theory measure that quantifies the information flow from the random process X^n = \ to the random process Y^n = \. The term ''directed information'' was coined by James Massey and is defined ...
rate in the case that either there is feedback or not (if there is no feedback the directed information equals the mutual information).


Applications to other fields


Intelligence uses and secrecy applications

Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the ban, was used in the
Ultra adopted by British military intelligence in June 1941 for wartime signals intelligence obtained by breaking high-level encrypted enemy radio and teleprinter communications at the Government Code and Cypher School (GC&CS) at Bletchley Park. '' ...
project, breaking the German Enigma machine code and hastening the
end of World War II in Europe The final battle of the European Theatre of World War II continued after the definitive overall surrender of Nazi Germany to the Allies, signed by Field marshal Wilhelm Keitel on 8 May 1945 in Karlshorst, Berlin. After German dictator Adolf H ...
. Shannon himself defined an important concept now called the
unicity distance In cryptography, unicity distance is the length of an original ciphertext needed to break the cipher by reducing the number of possible spurious keys to zero in a brute force attack. That is, after trying every possible key, there should be just ...
. Based on the redundancy of the
plaintext In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. Overview With the advent of comp ...
, it attempts to give a minimum amount of
ciphertext In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext ...
necessary to ensure unique decipherability. Information theory leads us to believe it is much more difficult to keep secrets than it might first appear. A
brute force attack In cryptography, a brute-force attack consists of an attacker submitting many passwords or passphrases with the hope of eventually guessing correctly. The attacker systematically checks all possible passwords and passphrases until the correct ...
can break systems based on asymmetric key algorithms or on most commonly used methods of
symmetric key algorithms Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both the encryption of plaintext and the decryption of ciphertext. The keys may be identical, or there may be a simple transformation to go between th ...
(sometimes called secret key algorithms), such as
block cipher In cryptography, a block cipher is a deterministic algorithm operating on fixed-length groups of bits, called ''blocks''. Block ciphers are specified cryptographic primitive, elementary components in the design of many cryptographic protocols and ...
s. The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.
Information theoretic security A cryptosystem is considered to have information-theoretic security (also called unconditional security) if the system is secure against adversaries with unlimited computing resources and time. In contrast, a system which depends on the computatio ...
refers to methods such as the
one-time pad In cryptography, the one-time pad (OTP) is an encryption technique that cannot be cracked, but requires the use of a single-use pre-shared key that is not smaller than the message being sent. In this technique, a plaintext is paired with a ran ...
that are not vulnerable to such brute force attacks. In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the
key Key or The Key may refer to: Common meanings * Key (cryptography), a piece of information that controls the operation of a cryptography algorithm * Key (lock), device used to control access to places or facilities restricted by a lock * Key (map ...
) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications. In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the
Venona project The Venona project was a United States counterintelligence program initiated during World War II by the United States Army's Signal Intelligence Service (later absorbed by the National Security Agency), which ran from February 1, 1943, until Octob ...
was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.


Pseudorandom number generation

Pseudorandom number generator A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generate ...
s are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed
cryptographically secure pseudorandom number generator A cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG) is a pseudorandom number generator (PRNG) with properties that make it suitable for use in cryptography. It is also loosely kno ...
s, but even they require
random seed A random seed (or seed state, or just seed) is a number (or vector) used to initialize a pseudorandom number generator. For a seed to be used in a pseudorandom number generator, it does not need to be random. Because of the nature of number gene ...
s external to the software to work as intended. These can be obtained via extractors, if done carefully. The measure of sufficient randomness in extractors is
min-entropy The min-entropy, in information theory, is the smallest of the Rényi family of entropies, corresponding to the most conservative way of measuring the unpredictability of a set of outcomes, as the negative logarithm of the probability of the ''mos ...
, a value related to Shannon entropy through Rényi entropy; Rényi entropy is also used in evaluating randomness in cryptographic systems. Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.


Seismic exploration

One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and
digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are ...
offer a major improvement of resolution and image clarity over previous analog methods.


Semiotics

Semioticians Semiotics (also called semiotic studies) is the systematic study of sign processes (semiosis) and meaning making. Semiosis is any activity, conduct, or process that involves signs, where a sign is defined as anything that communicates something, ...
Doede Nauta and
Winfried Nöth Winfried Nöth (born September 12, 1944 in Gerolzhofen) is a German linguist and semiotician. After graduating from high school in 1963 in Brunswick, from 1965 to 1969 Nöth studied English, French and Portuguese in Münster, Geneva, Lisbon and ...
both considered
Charles Sanders Peirce Charles Sanders Peirce ( ; September 10, 1839 – April 19, 1914) was an American philosopher, logician, mathematician and scientist who is sometimes known as "the father of pragmatism". Educated as a chemist and employed as a scientist for t ...
as having created a theory of information in his works on semiotics. Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing." Concepts from information theory such as redundancy and code control have been used by semioticians such as
Umberto Eco Umberto Eco (5 January 1932 – 19 February 2016) was an Italian medievalist, philosopher, semiotician, novelist, cultural critic, and political and social commentator. In English, he is best known for his popular 1980 novel ''The Name of the ...
and
Ferruccio Rossi-Landi Ferruccio is an Italian given name derived from the Latin Ferrutio (the name of a 3rd-century Christian saint). It is also used as a surname. People with the name include: Given name A–L *Ferruccio Amendola (1930–2001), Italian actor *Ferruc ...
to explain ideology as a form of message transmission whereby a dominant social class emits its message by using signs that exhibit a high degree of redundancy such that only one message is decoded among a selection of competing ones.


Integrated process organization of neural information

Quantitative information theoretic methods have been applied in cognitive science to analyze the integrated process organization of neural information in the context of the
binding problem The consciousness and binding problem is the problem of how objects, background and abstract or emotional features are combined into a single experience. The binding problem refers to the overall encoding of our brain circuits for the combination o ...
in
cognitive neuroscience Cognitive neuroscience is the scientific field that is concerned with the study of the biological processes and aspects that underlie cognition, with a specific focus on the neural connections in the brain which are involved in mental proces ...
. In this context, either an information-theoretical measure, such as (
Gerald Edelman Gerald Maurice Edelman (; July 1, 1929 – May 17, 2014) was an American biologist who shared the 1972 Nobel Prize in Physiology or Medicine for work with Rodney Robert Porter on the immune system. Edelman's Nobel Prize-winning research concern ...
and
Giulio Tononi Giulio Tononi () is a neuroscientist and psychiatrist who holds the David P. White Chair in Sleep Medicine, as well as a Distinguished Chair in Consciousness Science, at the University of Wisconsin. He is best known for his Integrated Informati ...
's functional clustering model and dynamic core hypothesis (DCH)) or (Tononi's
integrated information theory Integrated information theory (IIT) attempts to provide a framework capable of explaining why some physical systems (such as human brains) are conscious, why they feel the particular way they do in particular states (e.g. why our visual field appe ...
(IIT) of consciousness), is defined (on the basis of a reentrant process organization, i.e. the synchronization of neurophysiological activity between groups of neuronal populations), or the measure of the minimization of free energy on the basis of statistical methods (
Karl J. Friston Karl John Friston FRS FMedSci FRSB (born 12 July 1959) is a British neuroscientist and theoretician at University College London. He is an authority on brain imaging and theoretical neuroscience, especially the use of physics-inspired stati ...
's
free energy principle The free energy principle is a mathematical principle in biophysics and cognitive science that provides a formal account of the representational capacities of physical systems: that is, why things that exist look as if they track properties of the ...
(FEP), an information-theoretical measure which states that every adaptive change in a self-organized system leads to a minimization of free energy, and the Bayesian brain hypothesisKirchhoff, M., T. Parr, E. Palacios, K. Friston and J. Kiverstein. (2018). The Markov blankets of life: autonomy, active inference and the free energy principle. Journal of the Royal Society Interface 15: 20170792.).


Miscellaneous applications

Information theory also has applications in
gambling Gambling (also known as betting or gaming) is the wagering of something of value ("the stakes") on a random event with the intent of winning something else of value, where instances of strategy are discounted. Gambling thus requires three el ...
,
black holes A black hole is a region of spacetime where gravity is so strong that nothing, including light or other electromagnetic waves, has enough energy to escape it. The theory of general relativity predicts that a sufficiently compact mass can def ...
, and
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
.


See also

*
Algorithmic probability In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in induct ...
*
Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, a ...
*
Communication theory Communication theory is a proposed description of communication phenomena, the relationships among them, a storyline describing these relationships, and an argument for these three elements. Communication theory provides a way of talking about a ...
*
Constructor theory Constructor theory is a proposal for a new mode of explanation in fundamental physics in the language of ergodic theory, developed by physicists David Deutsch and Chiara Marletto, at the University of Oxford, since 2012. Constructor theory expre ...
- a generalization of information theory that includes quantum information *
Formal science Formal science is a branch of science studying disciplines concerned with abstract structures described by formal systems, such as logic, mathematics, statistics, theoretical computer science, artificial intelligence, information theory, ga ...
*
Inductive probability Inductive probability attempts to give the probability of future events based on past events. It is the basis for inductive reasoning, and gives the mathematical basis for learning and the perception of patterns. It is a source of knowledge about ...
*
Info-metrics Info-metrics is an interdisciplinary approach to scientific modeling, inference and efficient information processing. It is the science of modeling, reasoning, and drawing inferences under conditions of noisy and limited information. From the po ...
*
Minimum message length Minimum message length (MML) is a Bayesian information-theoretic method for statistical model comparison and selection. It provides a formal information theory restatement of Occam's Razor: even when models are equal in their measure of fit-accurac ...
*
Minimum description length Minimum Description Length (MDL) is a model selection principle where the shortest description of the data is the best model. MDL methods learn through a data compression perspective and are sometimes described as mathematical applications of Occam ...
* List of important publications *
Philosophy of information The philosophy of information (PI) is a branch of philosophy that studies topics relevant to information processing, representational system and consciousness, cognitive science, computer science, information science and information technology. ...


Applications

* Active networking *
Cryptanalysis Cryptanalysis (from the Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic sec ...
*
Cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
*
Cybernetics Cybernetics is a wide-ranging field concerned with circular causality, such as feedback, in regulatory and purposive systems. Cybernetics is named after an example of circular causal feedback, that of steering a ship, where the helmsperson m ...
*
Entropy in thermodynamics and information theory The mathematical expressions for thermodynamic entropy in the statistical thermodynamics formulation established by Ludwig Boltzmann and J. Willard Gibbs in the 1870s are similar to the information entropy by Claude Shannon and Ralph Hartley, develo ...
*
Gambling Gambling (also known as betting or gaming) is the wagering of something of value ("the stakes") on a random event with the intent of winning something else of value, where instances of strategy are discounted. Gambling thus requires three el ...
*
Intelligence (information gathering) Intelligence assessment, or simply intel, is the development of behavior forecasts or recommended courses of action to the leadership of an organisation, based on wide ranges of available overt and covert information (intelligence). Assessments d ...
*
Seismic exploration Reflection seismology (or seismic reflection) is a method of exploration geophysics that uses the principles of seismology to estimate the properties of the Earth's subsurface from reflected seismic waves. The method requires a controlled seismi ...


History

* Hartley, R.V.L. *
History of information theory The decisive event which established the discipline of information theory, and brought it to immediate worldwide attention, was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the ''Bell System Tech ...
* Shannon, C.E. *
Timeline of information theory A timeline of events related to  information theory,  quantum information theory and statistical physics,  data compression,  error correcting codes and related subjects. * 1872 – Ludwig Boltzmann presents his H-theorem, an ...
* Yockey, H.P.


Theory

*
Coding theory Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are stud ...
*
Detection theory Detection theory or signal detection theory is a means to measure the ability to differentiate between information-bearing patterns (called stimulus in living organisms, signal in machines) and random patterns that distract from the information (ca ...
*
Estimation theory Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their valu ...
*
Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
*
Information algebra The term "information algebra" refers to mathematical techniques of information processing. Classical information theory goes back to Claude Shannon. It is a theory of information transmission, looking at communication and storage. However, it has ...
*
Information asymmetry In contract theory and economics, information asymmetry deals with the study of decisions in transactions where one party has more or better information than the other. Information asymmetry creates an imbalance of power in transactions, which ca ...
*
Information field theory Information field theory (IFT) is a Bayesian statistical field theory relating to signal reconstruction, cosmography, and other related areas. IFT summarizes the information available on a physical field using Bayesian probabilities. It uses comput ...
*
Information geometry Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to prob ...
*
Information theory and measure theory This article discusses how information theory (a branch of mathematics studying the transmission, processing and storage of information) is related to measure theory (a branch of mathematics related to integration and probability). Measures in ...
*
Kolmogorov complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produ ...
* List of unsolved problems in information theory *
Logic of information The logic of information, or the logical theory of information, considers the information content of logical signs and expressions along the lines initially developed by Charles Sanders Peirce. In this line of work, the concept of information serve ...
*
Network coding In computer networking, linear network coding is a program in which intermediate nodes transmit data from source nodes to sink nodes by means of linear combinations. Linear network coding may be used to improve a network's throughput, efficiency, ...
*
Philosophy of information The philosophy of information (PI) is a branch of philosophy that studies topics relevant to information processing, representational system and consciousness, cognitive science, computer science, information science and information technology. ...
*
Quantum information science Quantum information science is an interdisciplinary field that seeks to understand the analysis, processing, and transmission of information using quantum mechanics principles. It combines the study of Information science with quantum effects in p ...
*
Source coding In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...


Concepts

*
Ban (unit) The hartley (symbol Hart), also called a ban, or a dit (short for decimal digit), is a logarithmic unit that measures information or entropy, based on base 10 logarithms and powers of 10. One hartley is the information content of an event if th ...
* Channel capacity *
Communication channel A communication channel refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel in telecommunications and computer networking. A channel is used for informa ...
*
Communication source A source or sender is one of the basic concepts of communication and information processing. Sources are objects which encode message data and transmit the information, via a channel, to one or more observers (or receivers). In the strictest ...
*
Conditional entropy In information theory, the conditional entropy quantifies the amount of information needed to describe the outcome of a random variable Y given that the value of another random variable X is known. Here, information is measured in shannons, na ...
*
Covert channel In computer security, a covert channel is a type of attack that creates a capability to transfer information objects between processes that are not supposed to be allowed to communicate by the computer security policy. The term, originated in 197 ...
*
Data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
* Decoder *
Differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
* Fungible information * Information fluctuation complexity *
Information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
*
Joint entropy In information theory, joint entropy is a measure of the uncertainty associated with a set of variables. Definition The joint Shannon entropy (in bits) of two discrete random variables X and Y with images \mathcal X and \mathcal Y is defined ...
*
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
*
Mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...
*
Pointwise mutual information In statistics, probability theory and information theory, pointwise mutual information (PMI), or point mutual information, is a measure of association. It compares the probability of two events occurring together to what this probability would b ...
(PMI) *
Receiver (information theory) The receiver in information theory is the receiving end of a communication channel. It receives decoded messages/information from the sender, who first encoded them. Sometimes the receiver is modeled so as to include the decoder. Real-world receiver ...
* Redundancy * Rényi entropy *
Self-information In information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative ...
*
Unicity distance In cryptography, unicity distance is the length of an original ciphertext needed to break the cipher by reducing the number of possible spurious keys to zero in a brute force attack. That is, after trying every possible key, there should be just ...
*
Variety Variety may refer to: Arts and entertainment Entertainment formats * Variety (radio) * Variety show, in theater and television Films * ''Variety'' (1925 film), a German silent film directed by Ewald Andre Dupont * ''Variety'' (1935 film), ...
*
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...


References


Further reading


The classic work

* Shannon, C.E. (1948), "
A Mathematical Theory of Communication "A Mathematical Theory of Communication" is an article by mathematician Claude E. Shannon published in '' Bell System Technical Journal'' in 1948. It was renamed ''The Mathematical Theory of Communication'' in the 1949 book of the same name, a sm ...
", ''Bell System Technical Journal'', 27, pp. 379–423 & 623–656, July & October, 1948
PDF.


* R.V.L. Hartley
"Transmission of Information"
''Bell System Technical Journal'', July 1928 *
Andrey Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...
(1968),
Three approaches to the quantitative definition of information
in International Journal of Computer Mathematics.


Other journal articles

* J. L. Kelly, Jr.
Princeton
"A New Interpretation of Information Rate" ''Bell System Technical Journal'', Vol. 35, July 1956, pp. 917–26. * R. Landauer
IEEE.org
"Information is Physical" ''Proc. Workshop on Physics and Computation PhysComp'92'' (IEEE Comp. Sci.Press, Los Alamitos, 1993) pp. 1–4. * *


Textbooks on information theory

* Arndt, C. ''Information Measures, Information and its Description in Science and Engineering'' (Springer Series: Signals and Communication Technology), 2004, * Ash, RB. ''Information Theory''. New York: Interscience, 1965. . New York: Dover 1990. * Gallager, R. ''Information Theory and Reliable Communication.'' New York: John Wiley and Sons, 1968. * Goldman, S. ''Information Theory''. New York: Prentice Hall, 1953. New York: Dover 1968 , 2005 * * Csiszar, I, Korner, J. ''Information Theory: Coding Theorems for Discrete Memoryless Systems'' Akademiai Kiado: 2nd edition, 1997. * MacKay, David J. C.
Information Theory, Inference, and Learning Algorithms
' Cambridge: Cambridge University Press, 2003. * Mansuripur, M. ''Introduction to Information Theory''. New York: Prentice Hall, 1987. * McEliece, R. ''The Theory of Information and Coding''. Cambridge, 2002. * Pierce, JR. "An introduction to information theory: symbols, signals and noise". Dover (2nd Edition). 1961 (reprinted by Dover 1980). * Reza, F. ''An Introduction to Information Theory''. New York: McGraw-Hill 1961. New York: Dover 1994. * * Stone, JV. Chapter 1 of boo
"Information Theory: A Tutorial Introduction"
University of Sheffield, England, 2014. . * Yeung, RW.
A First Course in Information Theory
' Kluwer Academic/Plenum Publishers, 2002. . * Yeung, RW.
Information Theory and Network Coding
' Springer 2008, 2002.


Other books

* Leon Brillouin, ''Science and Information Theory'', Mineola, N.Y.: Dover, 956, 19622004. *
James Gleick James Gleick (; born August 1, 1954) is an American author and historian of science whose work has chronicled the cultural impact of modern technology. Recognized for his writing about complex subjects through the techniques of narrative nonficti ...
, '' The Information: A History, a Theory, a Flood'', New York: Pantheon, 2011. * A. I. Khinchin, ''Mathematical Foundations of Information Theory'', New York: Dover, 1957. * H. S. Leff and A. F. Rex, Editors, ''Maxwell's Demon: Entropy, Information, Computing'', Princeton University Press, Princeton, New Jersey (1990). *
Robert K. Logan __NOTOC__ Robert K. Logan (born August 31, 1939), originally trained as a physicist, is a media ecologist. Career He received from MIT a BS in 1961 and a PhD in 1965 under the supervision of Francis E. Low. After two post-doctoral appointments ...
. ''What is Information? - Propagating Organization in the Biosphere, the Symbolosphere, the Technosphere and the Econosphere'', Toronto: DEMO Publishing. * Tom Siegfried, ''The Bit and the Pendulum'', Wiley, 2000. *
Charles Seife Charles Seife is an American author and journalist, and a professor at New York University. He has written extensively on scientific and mathematical topics. Career Seife holds a mathematics degree from Princeton University (1993),Greenwood, Kath ...
, ''
Decoding the Universe ''Decoding the Universe: How the New Science of Information Is Explaining Everything in the Cosmos, from Our Brains to Black Holes'' is the third non-fiction book by American author and journalist Charles Seife. The book was initially published ...
'', Viking, 2006. * Jeremy Campbell, ''
Grammatical Man ''Grammatical Man: Information, Entropy, Language, and Life'' is a 1982 book written by the Evening Standard's Washington correspondent, Jeremy Campbell. The book touches on topics of probability, Information Theory, cybernetics, genetics and l ...
'', Touchstone/Simon & Schuster, 1982, * Henri Theil, ''Economics and Information Theory'', Rand McNally & Company - Chicago, 1967. * Escolano, Suau, Bonev,
Information Theory in Computer Vision and Pattern Recognition
', Springer, 2009. * Vlatko Vedral, ''Decoding Reality: The Universe as Quantum Information'', Oxford University Press 2010.


External links

* * Lambert F. L. (1999),

, ''Journal of Chemical Education''
IEEE Information Theory Society
an
ITSOC Monographs, Surveys, and Reviews
{{DEFAULTSORT:Information Theory Claude Shannon Computer-related introductions in 1948 Cybernetics Formal sciences History of logic History of mathematics Information Age