In the formal language theory
of computer science
, left recursion is a special case of recursion
where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on the right). For instance,
can be recognized as a sum because it can be broken into
, also a sum, and
, a suitable suffix.
In terms of context-free grammar
, a nonterminal
is left-recursive if the leftmost symbol in one of its productions is itself (in the case of direct left recursion) or can be made itself by some sequence of substitutions (in the case of indirect left recursion).
A grammar is left-recursive if and only if there exists a nonterminal symbol
that can derive to a sentential form
with itself as the leftmost symbol. Notes on Formal Language Theory and Parsing
James Power, Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland.JPR02 Symbolically,
where indicates the operation of making one or more substitutions, and is any sequence of terminal and nonterminal symbols.
Direct left recursion
Direct left recursion occurs when the definition can be satisfied with only one substitution. It requires a rule of the form
where is a sequence of nonterminals and terminals . For example, the rule
is directly left-recursive. A left-to-right recursive descent parser for this rule might look like
and such code would fall into infinite recursion when executed.
Indirect left recursion
Indirect left recursion occurs when the definition of left recursion is satisfied via several substitutions. It entails a set of rules following the pattern
where are sequences that can each yield the empty string, while may be any sequences of terminal and nonterminal symbols at all. Note that these sequences may be empty. The derivation
then gives as leftmost in its final sentential form.
Removing left recursion
Left recursion often poses problems for parsers, either because it leads them into infinite recursion (as in the case of most top-down parsers) or because they expect rules in a normal form that forbids it (as in the case of many bottom-up parsers, including the CYK algorithm). Therefore, a grammar is often preprocessed to eliminate the left recursion.
Removing direct left recursion
The general algorithm to remove direct left recursion follows. Several improvements to this method have been made.
For a left-recursive nonterminal , discard any rules of the form and consider those that remain:
* each is a nonempty sequence of nonterminals and terminals, and
* each is a sequence of nonterminals and terminals that does not start with .
Replace these with two sets of productions, one set for :
and another set for the fresh nonterminal (often called the "tail" or the "rest"):
Repeat this process until no direct left recursion remains.
As an example, consider the rule set
This could be rewritten to avoid left recursion as
Removing all left recursion
By establishing a topological ordering on nonterminals, the above process can be extended to also eliminate indirect left recursion :
:Inputs ''A grammar: a set of nonterminals and their productions''
:Output ''A modified grammar generating the same language but without left recursion''
:# ''For each nonterminal :''
:## ''Repeat until an iteration leaves the grammar unchanged:''
:### ''For each rule , being a sequence of terminals and nonterminals:''
:#### ''If begins with a nonterminal and