Universal artificial intelligence
   HOME

TheInfoList



OR:

AIXI is a theoretical mathematical formalism for
artificial general intelligence Artificial general intelligence (AGI) is the ability of an intelligent agent to understand or learn any intellectual task that a human being can. It is a primary goal of some artificial intelligence research and a common topic in science fictio ...
. It combines
Solomonoff induction Solomonoff's theory of inductive inference is a mathematical proof that if a universe is generated by an algorithm, then observations of that universe, encoded as a dataset, are best predicted by the smallest executable archive of that dataset. T ...
with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000 and several results regarding AIXI are proved in Hutter's 2005 book ''Universal Artificial Intelligence''. AIXI is a reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with
Occam's razor Occam's razor, Ockham's razor, or Ocham's razor ( la, novacula Occami), also known as the principle of parsimony or the law of parsimony ( la, lex parsimoniae), is the problem-solving principle that "entities should not be multiplied beyond neces ...
. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs.


Definition

AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment \mu. The interaction proceeds in time steps, from t=1 to t=m, where m \in \mathbb is the lifespan of the AIXI agent. At time step ''t'', the agent chooses an action a_t \in \mathcal (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" e_t \in \mathcal = \mathcal \times \mathbb, which consists of an "observation" o_t \in \mathcal (e.g., a camera image) and a reward r_t \in \mathbb, distributed according to the
conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occu ...
\mu(o_t r_t , a_1 o_1 r_1 ... a_ o_ r_ a_t), where a_1 o_1 r_1 ... a_ o_ r_ a_t is the "history" of actions, observations and rewards. The environment \mu is thus mathematically represented as a
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
over "percepts" (observations and rewards) which depend on the ''full'' history, so there is no
Markov assumption In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov prop ...
(as opposed to other RL algorithms). Note again that this probability distribution is ''unknown'' to the AIXI agent. Furthermore, note again that \mu is computable, that is, the observations and rewards received by the agent from the environment \mu can be computed by some program (which runs on a
Turing machine A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer alg ...
), given the past actions of the AIXI agent. The ''only'' goal of the AIXI agent is to maximise \sum_^m r_t, that is, the sum of rewards from time step 1 to m. The AIXI agent is associated with a stochastic policy \pi : (\mathcal \times \mathcal)^* \rightarrow \mathcal, which is the function it uses to choose actions at every time step, where \mathcal is the space of all possible actions that AIXI can take and \mathcal is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) \mu can also be thought of as a stochastic policy (which is a function): \mu : (\mathcal \times \mathcal)^* \times \mathcal \rightarrow \mathcal , where the * is the
Kleene star In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid ...
operation. In general, at time step t (which ranges from 1 to m), AIXI, having previously executed actions a_1\dots a_ (which is often abbreviated in the literature as a_) and having observed the history of percepts o_1 r_1 ... o_ r_ (which can be abbreviated as e_), chooses and executes in the environment the action, a_t, defined as follows : a_t := \arg \max_ \sum_ \ldots \max_ \sum_ _t + \ldots + r_m\sum_ 2^ or, using parentheses, to disambiguate the precedences : a_t := \arg \max_ \left( \sum_ \ldots \left( \max_ \sum_ _t + \ldots + r_m\left( \sum_ 2^ \right) \right) \right) Intuitively, in the definition above, AIXI considers the sum of the total reward over all possible "futures" up to m - t time steps ahead (that is, from t to m), weighs each of them by the complexity of programs q (that is, by 2^) consistent with the agent's past (that is, the previously executed actions, a_, and received percepts, e_) that can generate that future, and then picks the action that maximises expected future rewards. Let us break this definition down in order to attempt to fully understand it. o_t r_t is the "percept" (which consists of the observation o_t and reward r_t) received by the AIXI agent at time step t from the environment (which is unknown and stochastic). Similarly, o_m r_m is the percept received by AIXI at time step m (the last time step where AIXI is active). r_t + \ldots + r_m is the sum of rewards from time step t to time step m, so AIXI needs to look into the future to choose its action at time step t. U denotes a
monotone Monotone refers to a sound, for example music or speech, that has a single unvaried tone. See: monophony. Monotone or monotonicity may also refer to: In economics *Monotone preferences, a property of a consumer's preference ordering. *Monotonic ...
universal Turing machine In computer science, a universal Turing machine (UTM) is a Turing machine that can simulate an arbitrary Turing machine on arbitrary input. The universal machine essentially achieves this by reading both the description of the machine to be simu ...
, and q ranges over all (deterministic) programs on the universal machine U, which receives as input the program q and the sequence of actions a_1\dots a_m (that is, all actions), and produces the sequence of percepts o_1 r_1 \ldots o_m r_m. The universal Turing machine U is thus used to "simulate" or compute the environment responses or percepts, given the program q (which "models" the environment) and all actions of the AIXI agent: in this sense, the environment is "computable" (as stated above). Note that, in general, the program which "models" the ''current'' and actual environment (where AIXI needs to act) is unknown because the current environment is also unknown. \textrm(q) is the length of the program q (which is encoded as a string of bits). Note that 2^ = \frac. Hence, in the definition above, \sum_ 2^ should be interpreted as a
mixture In chemistry, a mixture is a material made up of two or more different chemical substances which are not chemically bonded. A mixture is the physical combination of two or more substances in which the identities are retained and are mixed in the ...
(in this case, a sum) over all computable environments (which are consistent with the agent's past), each weighted by its complexity 2^. Note that a_1 \ldots a_m can also be written as a_1 \ldots a_a_t \ldots a_m, and a_1 \ldots a_ = a_ is the sequence of actions already executed in the environment by the AIXI agent. Similarly, o_1 r_1 \ldots o_m r_m = o_1 r_1 \ldots o_ r_o_ r_ \ldots o_m r_m, and o_1 r_1 \ldots o_ r_ is the sequence of percepts produced by the environment so far. Let us now put all these components together in order to understand this equation or definition. At time step t, AIXI chooses the action a_t where the function \sum_ \ldots \max_ \sum_ _t + \ldots + r_m\sum_ 2^ attains its maximum.


Parameters

The parameters to AIXI are the universal Turing machine ''U'' and the agent's lifetime ''m'', which need to be chosen. The latter parameter can be removed by the use of
discounting Discounting is a financial mechanism in which a debtor obtains the right to delay payments to a creditor, for a defined period of time, in exchange for a charge or fee.See "Time Value", "Discount", "Discount Yield", "Compound Interest", "Efficient ...
.


The meaning of the word AIXI

According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by \xi (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations.


Optimality

AIXI's performance is measured by the expected total number of rewards it receives. AIXI has been proven to be optimal in the following ways. *
Pareto optimality Pareto efficiency or Pareto optimality is a situation where no action or allocation is available that makes one individual better off without making another worse off. The concept is named after Vilfredo Pareto (1848–1923), Italian civil engine ...
: there is no other agent that performs at least as well as AIXI in all environments while performing strictly better in at least one environment. * Balanced Pareto optimality: like Pareto optimality, but considering a weighted sum of environments. * Self-optimizing: a policy ''p'' is called self-optimizing for an environment \mu if the performance of ''p'' approaches the theoretical maximum for \mu when the length of the agent's lifetime (not time) goes to infinity. For environment classes where self-optimizing policies exist, AIXI is self-optimizing. It was later shown by Hutter and Jan Leike that balanced Pareto optimality is subjective and that any policy can be considered Pareto optimal, which they describe as undermining all previous optimality claims for AIXI. However, AIXI does have limitations. It is restricted to maximizing rewards based on percepts as opposed to external states. It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified. Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with. It also assumes the environment is computable.


Computational aspects

Like
Solomonoff induction Solomonoff's theory of inductive inference is a mathematical proof that if a universe is generated by an algorithm, then observations of that universe, encoded as a dataset, are best predicted by the smallest executable archive of that dataset. T ...
, AIXI is incomputable. However, there are computable approximations of it. One such approximation is AIXI''tl'', which performs at least as well as the provably best time ''t'' and space ''l'' limited agent. Another approximation to AIXI with a restricted environment class is MC-AIXI (FAC-CTW) (which stands for
Monte Carlo Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is ...
AIXI FAC- Context-Tree Weighting), which has had some success playing simple games such as partially observable
Pac-Man originally called ''Puck Man'' in Japan, is a 1980 maze action video game developed and released by Namco for arcades. In North America, the game was released by Midway Manufacturing as part of its licensing agreement with Namco America. Th ...
.Playing Pacman using AIXI Approximation – YouTube
/ref>


See also

*
Gödel machine A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. Th ...


References

* "Universal Algorithmic Intelligence: A mathematical top->down approach", Marcus Hutter, ; also in ''Artificial General Intelligence'', eds. B. Goertzel and C. Pennachin, Springer, 2007, , pp. 227–290, {{doi, 10.1007/978-3-540-68677-4_8. Optimal decisions Decision theory Machine learning