Directed information,
, is an
information theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
measure that quantifies the
information
Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random ...
flow from the random process
to the random process
. The term ''directed information'' was coined by
James Massey
James Lee Massey (February 11, 1934 – June 16, 2013) was an American information theorist and
cryptographer, Professor Emeritus of Digital Technology at ETH Zurich. His notable work includes the application of the Berlekamp–Massey algorithm to ...
and is defined as
:
,
where
is the
conditional mutual information .
The Directed information has many applications in problems where
causality
Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the cau ...
plays an important role such as
capacity of channel with
feedback
Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can then be said to ''feed back'' into itself. The notion of cause-and-effect has to be handled ...
,
capacity of discrete
memoryless
In probability and statistics, memorylessness is a property of certain probability distributions. It usually refers to the cases when the distribution of a "waiting time" until a certain event does not depend on how much time has elapsed already ...
networks with feedback,
gambling
Gambling (also known as betting or gaming) is the wagering of something of value ("the stakes") on a random event with the intent of winning something else of value, where instances of strategy are discounted. Gambling thus requires three el ...
with causal side information,
compression
Compression may refer to:
Physical science
*Compression (physics), size reduction due to forces
*Compression member, a structural element such as a column
*Compressibility, susceptibility to compression
* Gas compression
*Compression ratio, of a ...
with causal side information, and in
real-time control
Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constrai ...
communication settings, statistical physics.
Marko's theory of bidirectional communication
Massey's directed information was motivated by Marko's work on developing a theory of bidirectional communication
Estimation and optimization
Estimating and optimizing the directed information is challenging because it has
terms where
may be large. In many cases, one is interested in optimizing the limiting average, that is, when
grows to infinity termed as a multi-letter expression.
Estimation
The estimation of directed information from given samples is a very hard problem since the directed information expression does not depends on samples but on the joint distribution
which is unknown. There exist several algorithms based on
context tree weight and on empirical parametric distributions and using
Long short-term memory
Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) ca ...
.
Optimization
The maximization of the directed information is a fundamental problem in information theory. For a fixed sequence of channel distributions
, the objective is to optimize
over the channel input distributions
.
There exist algorithms for optimizing the directed information based on
Blahut-Arimoto,
Markov decision process,
Recurrent neural network
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...
,
Reinforcement learning
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
.
and
Graphical methods (the Q-graphs)
A chart (sometimes known as a graph) is a graphical representation for data visualization, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can represent tabul ...
.
For the case of
Blahut-Arimoto,
the main idea is to start with the last element of the directed information and go backward. For the case of
Markov decision process,
the main ideas is to transform the optimization into an infinite horizon average reward
Markov decision process. For
Recurrent neural network
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...
the main idea is to model the input distribution using the
Recurrent neural network
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...
and optimize the parameters using
Gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
. For
Reinforcement learning
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
the main idea is to solve the
Markov decision process formulation of the capacity using
Reinforcement learning
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
tools which allows us to deal with a large or even continuous alphabet.
Causal conditioning
The essence of directed information is from the causal conditioning operation. The causal conditionng is denoted by
. Recall the notation
,
; The probablity of
causal conditioned on
is denoted by
and is defined as
:
.
Note, that this is very similar to the chain rule of regular conditioinig, i.e.,
, where the difference is in the conditioning on the sequence
versus
. One can also intruce delay to the causal conditioning,
i.e.,
:
.
Properties
The chain rule for causal conditioning
is
:
The chain rule implies two things: first, any joint distribustion
can be decomose into a product of
; and second, any product of
results in a joint distribustion
.
The casual conditioning probability
is indeed a probability vector, i.e.,
:
.
:
.
Directed Information can be written in terms of causal conditioning as follows
: