State–action–reward–state–action (SARSA) is an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
for learning a
Markov decision process policy, used in the
reinforcement learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...
area of
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
. It was proposed by Rummery and Niranjan in a technical note with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name SARSA, proposed by
Rich Sutton, was only mentioned as a footnote.
This name reflects the fact that the main function for updating the Q-value depends on the current state of the agent "S
1", the action the agent chooses "A
1", the reward "R
2" the agent gets for choosing this action, the state "S
2" that the agent enters after taking that action, and finally the next action "A
2" the agent chooses in its new state. The acronym for the
quintuple
In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is on ...
(S
t, A
t, R
t+1, S
t+1, A
t+1) is SARSA. Some authors use a slightly different convention and write the quintuple (S
t, A
t, R
t, S
t+1, A
t+1), depending on which time step the reward is formally assigned. The rest of the article uses the former convention.
Algorithm
: