AIXI is a theoretical
mathematical formalism for
artificial general intelligence
Artificial general intelligence (AGI) is the ability of an intelligent agent to understand or learn any intellectual task that a human being can.
It is a primary goal of some artificial intelligence research and a common topic in science fict ...
.
It combines
Solomonoff induction
Solomonoff's theory of inductive inference is a mathematical proof that if a universe is generated by an algorithm, then observations of that universe, encoded as a dataset, are best predicted by the smallest executable archive of that dataset. T ...
with
sequential decision theory.
AIXI was first proposed by
Marcus Hutter
Marcus Hutter (born April 14, 1967 in Munich) is DeepMind Senior Scientist researching the mathematical foundations of artificial general intelligence. He is on leave from his professorship at the ANU College of Engineering and Computer Sci ...
in 2000 and several results regarding AIXI are proved in Hutter's 2005 book ''Universal Artificial Intelligence''.
AIXI is a
reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the
subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with
Occam's razor
Occam's razor, Ockham's razor, or Ocham's razor ( la, novacula Occami), also known as the principle of parsimony or the law of parsimony ( la, lex parsimoniae), is the problem-solving principle that "entities should not be multiplied beyond neces ...
. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs.
Definition
AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment
. The interaction proceeds in time steps, from
to
, where
is the lifespan of the AIXI agent. At time step ''t'', the agent chooses an action
(e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept"
, which consists of an "observation"
(e.g., a camera image) and a reward
, distributed according to the
conditional probability
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occu ...
, where
is the "history" of actions, observations and rewards. The environment
is thus mathematically represented as a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
over "percepts" (observations and rewards) which depend on the ''full'' history, so there is no
Markov assumption
In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov proper ...
(as opposed to other RL algorithms). Note again that this probability distribution is ''unknown'' to the AIXI agent. Furthermore, note again that
is computable, that is, the observations and rewards received by the agent from the environment
can be computed by some program (which runs on a
Turing machine
A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algor ...
), given the past actions of the AIXI agent.
The ''only'' goal of the AIXI agent is to maximise
, that is, the sum of rewards from time step 1 to m.
The AIXI agent is associated with a stochastic policy
, which is the function it uses to choose actions at every time step, where
is the space of all possible actions that AIXI can take and
is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution)
can also be thought of as a stochastic policy (which is a function):
, where the
is the
Kleene star
In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics,
it is more commonly known as the free monoid ...
operation.
In general, at time step
(which ranges from 1 to m), AIXI, having previously executed actions
(which is often abbreviated in the literature as
) and having observed the history of percepts
(which can be abbreviated as
), chooses and executes in the environment the action,
, defined as follows
:
or, using parentheses, to disambiguate the precedences
:
Intuitively, in the definition above, AIXI considers the sum of the total reward over all possible "futures" up to
time steps ahead (that is, from
to
), weighs each of them by the complexity of programs
(that is, by
) consistent with the agent's past (that is, the previously executed actions,
, and received percepts,
) that can generate that future, and then picks the action that maximises expected future rewards.
Let us break this definition down in order to attempt to fully understand it.
is the "percept" (which consists of the observation
and reward
) received by the AIXI agent at time step
from the environment (which is unknown and stochastic). Similarly,
is the percept received by AIXI at time step
(the last time step where AIXI is active).
is the sum of rewards from time step
to time step
, so AIXI needs to look into the future to choose its action at time step
.
denotes a
monotone universal Turing machine
In computer science, a universal Turing machine (UTM) is a Turing machine that can simulate an arbitrary Turing machine on arbitrary input. The universal machine essentially achieves this by reading both the description of the machine to be simu ...
, and
ranges over all (deterministic) programs on the universal machine
, which receives as input the program
and the sequence of actions
(that is, all actions), and produces the sequence of percepts
. The universal Turing machine
is thus used to "simulate" or compute the environment responses or percepts, given the program
(which "models" the environment) and all actions of the AIXI agent: in this sense, the environment is "computable" (as stated above). Note that, in general, the program which "models" the ''current'' and actual environment (where AIXI needs to act) is unknown because the current environment is also unknown.
is the length of the program
(which is encoded as a string of bits). Note that
. Hence, in the definition above,
should be interpreted as a
mixture (in this case, a sum) over all computable environments (which are consistent with the agent's past), each weighted by its complexity
. Note that
can also be written as
, and
is the sequence of actions already executed in the environment by the AIXI agent. Similarly,
, and
is the sequence of percepts produced by the environment so far.
Let us now put all these components together in order to understand this equation or definition.
At time step t, AIXI chooses the action
where the function
attains its maximum.
Parameters
The parameters to AIXI are the universal Turing machine ''U'' and the agent's lifetime ''m'', which need to be chosen. The latter parameter can be removed by the use of
discounting
Discounting is a financial mechanism in which a debtor obtains the right to delay payments to a creditor, for a defined period of time, in exchange for a charge or fee.See "Time Value", "Discount", "Discount Yield", "Compound Interest", "Efficie ...
.
The meaning of the word AIXI
According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by
(which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations.
Optimality
AIXI's performance is measured by the expected total number of rewards it receives.
AIXI has been proven to be optimal in the following ways.
*
Pareto optimality
Pareto efficiency or Pareto optimality is a situation where no action or allocation is available that makes one individual better off without making another worse off. The concept is named after Vilfredo Pareto (1848–1923), Italian civil engine ...
: there is no other agent that performs at least as well as AIXI in all environments while performing strictly better in at least one environment.
* Balanced Pareto optimality: like Pareto optimality, but considering a weighted sum of environments.
* Self-optimizing: a policy ''p'' is called self-optimizing for an environment
if the performance of ''p'' approaches the theoretical maximum for
when the length of the agent's lifetime (not time) goes to infinity. For environment classes where self-optimizing policies exist, AIXI is self-optimizing.
It was later shown by Hutter and Jan Leike that balanced Pareto optimality is subjective and that any policy can be considered Pareto optimal, which they describe as undermining all previous optimality claims for AIXI.
However, AIXI does have limitations. It is restricted to maximizing rewards based on percepts as opposed to external states. It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified. Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with. It also assumes the environment is computable.
Computational aspects
Like
Solomonoff induction
Solomonoff's theory of inductive inference is a mathematical proof that if a universe is generated by an algorithm, then observations of that universe, encoded as a dataset, are best predicted by the smallest executable archive of that dataset. T ...
, AIXI is
incomputable. However, there are computable approximations of it. One such approximation is AIXI''tl'', which performs at least as well as the provably best time ''t'' and space ''l'' limited agent.
Another approximation to AIXI with a restricted environment class is MC-AIXI (FAC-CTW) (which stands for
Monte Carlo
Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino i ...
AIXI FAC-
Context-Tree Weighting), which has had some success playing simple games such as
partially observable Pac-Man
originally called ''Puck Man'' in Japan, is a 1980 maze video game, maze action game, action video game developed and released by Namco for Arcade game, arcades. In North America, the game was released by Midway Manufacturing as part of its l ...
.
Playing Pacman using AIXI Approximation – YouTube
/ref>
See also
* Gödel machine A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. The ...
References
* "Universal Algorithmic Intelligence: A mathematical top->down approach", Marcus Hutter, ; also in ''Artificial General Intelligence'', eds. B. Goertzel and C. Pennachin, Springer, 2007, , pp. 227–290, {{doi, 10.1007/978-3-540-68677-4_8.
Optimal decisions
Decision theory
Machine learning