History
Research in learning automata can be traced back to the work ofDefinition
A learning automaton is an adaptive decision-making unit situated in a random environment that learns the optimal action through repeated interactions with its environment. The actions are chosen according to a specific probability distribution which is updated based on the environment response the automaton obtains by performing a particular action. With respect to the field of reinforcement learning, learning automata are characterized as policy iterators. In contrast to other reinforcement learners, policy iterators directly manipulate the policy π. Another example for policy iterators are evolutionary algorithms. Formally, Narendra and Thathachar define a stochastic automaton to consist of: * a set ''X'' of possible inputs, * a set Φ = of possible internal states, * a set α = of possible outputs, or actions, with ''r'' ≤ ''s'', * an initial state probability vector ''p(0)'' = ≪ ''p''1(0), ..., ''p''''s''(0) ≫, * a computable function ''A'' which after each time step ''t'' generates ''p''(''t''+1) from ''p''(''t''), the current input, and the current state, and * a function ''G'': Φ → α which generates the output at each time step. In their paper, they investigate only stochastic automata with ''r'' = ''s'' and ''G'' being bijective, allowing them to confuse actions and states. The states of such an automaton correspond to the states of a "discrete-state discrete-parameter Markov process". At each time step ''t''=0,1,2,3,..., the automaton reads an input from its environment, updates ''p''(''t'') to ''p''(''t''+1) by ''A'', randomly chooses a successor state according to the probabilities ''p''(''t''+1) and outputs the corresponding action. The automaton's environment, in turn, reads the action and sends the next input to the automaton. Frequently, the input set ''X'' = is used, with 0 and 1 corresponding to a ''nonpenalty'' and a ''penalty'' response of the environment, respectively; in this case, the automaton should learn to minimize the number of ''penalty'' responses, and the feedback loop of automaton and environment is called a "P-model". More generally, a "Q-model" allows an arbitrary finite input set ''X'', and an "S-model" uses the interval ,1ofFinite action-set learning automata
Finite action-set learning automata (FALA) are a class of learning automata for which the number of possible actions is finite or, in more mathematical terms, for which the size of the action-set is finite.See also
* Reinforcement learning *Literature
* Philip Aranzulla and John Mellor