A learning automaton is one type of

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

algorithm studied since 1970s. Learning automata select their current action based on past experiences from the environment. It will fall into the range of reinforcement learning if the environment is

stochastic Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselv ...

and a Markov decision process (MDP) is used.

History

Research in learning automata can be traced back to the work of

Michael Lvovitch Tsetlin Michael Lvovitch Tsetlin (the surname is also written Cetlin, Tzetlin, Zeitlin, Zetlin; cyrillic: Михаил Львович Цетлин) (22 September 1924 – 30 May 1966) was a Soviet mathematician and physicist who worked on cybernetics. He i ...

in the early 1960s in the Soviet Union. Together with some colleagues, he published a collection of papers on how to use matrices to describe automata functions. Additionally, Tsetlin worked on ''reasonable'' and ''collective automata behaviour'', and on ''automata games''. Learning automata were also investigated by researches in the United States in the 1960s. However, the term ''learning automaton'' was not used until Narendra and Thathachar introduced it in a survey paper in 1974.

Definition

A learning automaton is an adaptive decision-making unit situated in a random environment that learns the optimal action through repeated interactions with its environment. The actions are chosen according to a specific probability distribution which is updated based on the environment response the automaton obtains by performing a particular action. With respect to the field of

reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...

, learning automata are characterized as policy iterators. In contrast to other reinforcement learners, policy iterators directly manipulate the policy π. Another example for policy iterators are

evolutionary algorithm In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduc ...

s. Formally, Narendra and Thathachar define a stochastic automaton to consist of: * a set ''X'' of possible inputs, * a set Φ = of possible internal states, * a set α = of possible outputs, or actions, with ''r'' ≤ ''s'', * an initial state probability vector ''p(0)'' = ≪ ''p''₁(0), ..., ''p''_''s''(0) ≫, * a

computable function Computable functions are the basic objects of study in computability theory. Computable functions are the formalized analogue of the intuitive notion of algorithms, in the sense that a function is computable if there exists an algorithm that can do ...

''A'' which after each time step ''t'' generates ''p''(''t''+1) from ''p''(''t''), the current input, and the current state, and * a function ''G'': Φ → α which generates the output at each time step. In their paper, they investigate only stochastic automata with ''r'' = ''s'' and ''G'' being

bijective In mathematics, a bijection, also known as a bijective function, one-to-one correspondence, or invertible function, is a function between the elements of two sets, where each element of one set is paired with exactly one element of the other s ...

, allowing them to confuse actions and states. The states of such an automaton correspond to the states of a "discrete-state discrete-parameter

Markov process A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...

". At each time step ''t''=0,1,2,3,..., the automaton reads an input from its environment, updates ''p''(''t'') to ''p''(''t''+1) by ''A'', randomly chooses a successor state according to the probabilities ''p''(''t''+1) and outputs the corresponding action. The automaton's environment, in turn, reads the action and sends the next input to the automaton. Frequently, the input set ''X'' = is used, with 0 and 1 corresponding to a ''nonpenalty'' and a ''penalty'' response of the environment, respectively; in this case, the automaton should learn to minimize the number of ''penalty'' responses, and the feedback loop of automaton and environment is called a "P-model". More generally, a "Q-model" allows an arbitrary finite input set ''X'', and an "S-model" uses the interval ,1of

real numbers In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...

as ''X''. A visualised demo / Art Work of a single Learning Automaton had been developed by µSystems (microSystems) Research Group at Newcastle University.

Finite action-set learning automata

Finite action-set learning automata (FALA) are a class of learning automata for which the number of possible actions is finite or, in more mathematical terms, for which the size of the action-set is finite.

Literature

* Philip Aranzulla and John Mellor

History

Definition

Finite action-set learning automata

See also

Literature