In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on the right). For instance, $1+2+3$ can be recognized as a sum because it can be broken into $1+2$, also a sum, and $+3$, a suitable suffix.
In terms of context-free grammar, a nonterminal is left-recursive if the leftmost symbol in one of its productions is itself (in the case of direct left recursion) or can be made itself by some sequence of substitutions (in the case of indirect left recursion).

** Definition **

A grammar is left-recursive if and only if there exists a nonterminal symbol $A$ that can derive to a sentential form with itself as the leftmost symbol. Notes on Formal Language Theory and Parsing

James Power, Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland.JPR02 Symbolically, :$A\; \backslash Rightarrow^+\; A\backslash alpha$, where $\backslash Rightarrow^+$ indicates the operation of making one or more substitutions, and $\backslash alpha$ is any sequence of terminal and nonterminal symbols.

** Direct left recursion **

Direct left recursion occurs when the definition can be satisfied with only one substitution. It requires a rule of the form
:$A\; \backslash to\; A\backslash alpha$
where $\backslash alpha$ is a sequence of nonterminals and terminals . For example, the rule
:$\backslash mathit\; \backslash to\; \backslash mathit\; +\; \backslash mathit$
is directly left-recursive. A left-to-right recursive descent parser for this rule might look like
void Expression()
and such code would fall into infinite recursion when executed.

** Indirect left recursion **

Indirect left recursion occurs when the definition of left recursion is satisfied via several substitutions. It entails a set of rules following the pattern
:$A\_0\; \backslash to\; \backslash beta\_0A\_1\backslash alpha\_0$
:$A\_1\; \backslash to\; \backslash beta\_1A\_2\backslash alpha\_1$
:$\backslash cdots$
:$A\_n\; \backslash to\; \backslash beta\_nA\_0\backslash alpha\_n$
where $\backslash beta\_0,\; \backslash beta\_1,\; \backslash ldots,\; \backslash beta\_n$ are sequences that can each yield the empty string, while $\backslash alpha\_0,\; \backslash alpha\_1,\; \backslash ldots,\; \backslash alpha\_n$ may be any sequences of terminal and nonterminal symbols at all. Note that these sequences may be empty. The derivation
:$A\_0\backslash Rightarrow\backslash beta\_0A\_1\backslash alpha\_0\backslash Rightarrow^+\; A\_1\backslash alpha\_0\backslash Rightarrow\backslash beta\_1A\_2\backslash alpha\_1\backslash alpha\_0\backslash Rightarrow^+\backslash cdots\backslash Rightarrow^+\; A\_0\backslash alpha\_n\backslash dots\backslash alpha\_1\backslash alpha\_0$
then gives $A\_0$ as leftmost in its final sentential form.

** Removing left recursion **

Left recursion often poses problems for parsers, either because it leads them into infinite recursion (as in the case of most top-down parsers) or because they expect rules in a normal form that forbids it (as in the case of many bottom-up parsers, including the CYK algorithm). Therefore, a grammar is often preprocessed to eliminate the left recursion.

** Removing direct left recursion **

The general algorithm to remove direct left recursion follows. Several improvements to this method have been made.
For a left-recursive nonterminal $A$, discard any rules of the form $A\backslash rightarrow\; A$ and consider those that remain:
:$A\; \backslash rightarrow\; A\backslash alpha\_1\; \backslash mid\; \backslash ldots\; \backslash mid\; A\backslash alpha\_n\; \backslash mid\; \backslash beta\_1\; \backslash mid\; \backslash ldots\; \backslash mid\; \backslash beta\_m$
where:
* each $\backslash alpha$ is a nonempty sequence of nonterminals and terminals, and
* each $\backslash beta$ is a sequence of nonterminals and terminals that does not start with $A$.
Replace these with two sets of productions, one set for $A$:
:$A\; \backslash rightarrow\; \backslash beta\_1A^\backslash prime\; \backslash mid\; \backslash ldots\; \backslash mid\; \backslash beta\_mA^\backslash prime$
and another set for the fresh nonterminal $A\text{'}$ (often called the "tail" or the "rest"):
:$A^\backslash prime\; \backslash rightarrow\; \backslash alpha\_1A^\backslash prime\; \backslash mid\; \backslash ldots\; \backslash mid\; \backslash alpha\_nA^\backslash prime\; \backslash mid\; \backslash epsilon$
Repeat this process until no direct left recursion remains.
As an example, consider the rule set
:$\backslash mathit\; \backslash rightarrow\; \backslash mathit+\backslash mathit\; \backslash mid\; \backslash mathit\; \backslash mid\; \backslash mathit$
This could be rewritten to avoid left recursion as
:$\backslash mathit\; \backslash rightarrow\; \backslash mathit\backslash ,\backslash mathit\text{'}\; \backslash mid\; \backslash mathit\backslash ,\backslash mathit\text{'}$
:$\backslash mathit\text{'}\; \backslash rightarrow\; +\backslash mathit\; \backslash ,\backslash mathit\text{'}\backslash mid\; \backslash epsilon$

** Removing all left recursion **

By establishing a topological ordering on nonterminals, the above process can be extended to also eliminate indirect left recursion :
:Inputs ''A grammar: a set of nonterminals $A\_1,\backslash ldots,A\_n$ and their productions''
:Output ''A modified grammar generating the same language but without left recursion''
:# ''For each nonterminal $A\_i$:''
:## ''Repeat until an iteration leaves the grammar unchanged:''
:### ''For each rule $A\_i\backslash rightarrow\backslash alpha\_i$, $\backslash alpha\_i$ being a sequence of terminals and nonterminals:''
:#### ''If $\backslash alpha\_i$ begins with a nonterminal $A\_j$ and $jmath>:\text{'}\text{'}\; :\#\#\#\#\#\; \text{'}\text{'}Let$ \backslash beta\_i$be$ \backslash alpha\_i$without\; its\; leading$ A\_j$.\text{'}\text{'}\; :\#\#\#\#\#\; \text{'}\text{'}Remove\; the\; rule$ A\_i\backslash rightarrow\backslash alpha\_i$.\text{'}\text{'}\; :\#\#\#\#\#\; \text{'}\text{'}For\; each\; rule$ A\_j\backslash rightarrow\backslash alpha\_j$:\text{'}\text{'}\; :\#\#\#\#\#\#\; \text{'}\text{'}Add\; the\; rule$ A\_i\backslash rightarrow\backslash alpha\_j\backslash beta\_i$.\text{'}\text{'}\; :\#\#\; \text{'}\text{'}Remove\; direct\; left\; recursion\; for$ A\_i$as\; described\; above.\text{'}\text{'}\; Note\; that\; this\; algorithm\; is\; highly\; sensitive\; to\; the\; nonterminal\; ordering;\; optimizations\; often\; focus\; on\; choosing\; this\; ordering\; well.$

** Pitfalls **

Although the above transformations preserve the language generated by a grammar, they may change the parse trees that witness strings' recognition. With suitable bookkeeping, tree rewriting can recover the originals, but if this step is omitted, the differences may change the semantics of a parse.
Associativity is particularly vulnerable; left-associative operators typically appear in right-associative-like arrangements under the new grammar. For example, starting with this grammar:
:$\backslash mathit\; \backslash rightarrow\; \backslash mathit\backslash ,-\backslash ,\backslash mathit\; \backslash mid\; \backslash mathit$
:$\backslash mathit\; \backslash rightarrow\; \backslash mathit\backslash ,*\backslash ,\backslash mathit\; \backslash mid\; \backslash mathit$
:$\backslash mathit\; \backslash rightarrow\; (\backslash mathit)\; \backslash mid\; \backslash mathit$
the standard transformations to remove left recursion yield the following:
:$\backslash mathit\; \backslash rightarrow\; \backslash mathit\backslash \; \backslash mathit\text{'}$
:$\backslash mathit\text{'}\; \backslash rightarrow\; -\; \backslash mathit\backslash \; \backslash mathit\text{'}\; \backslash mid\; \backslash epsilon$
:$\backslash mathit\; \backslash rightarrow\; \backslash mathit\backslash \; \backslash mathit\text{'}$
:$\backslash mathit\text{'}\; \backslash rightarrow\; *\; \backslash mathit\backslash \; \backslash mathit\text{'}\; \backslash mid\; \backslash epsilon$
:$\backslash mathit\; \backslash rightarrow\; (\backslash mathit)\; \backslash mid\; \backslash mathit$
Parsing the string "1 - 2 - 3" with the first grammar in an LALR parser (which can handle left-recursive grammars) would have resulted in the parse tree:
This parse tree groups the terms on the left, giving the correct semantics ''(1 - 2) - 3''.
Parsing with the second grammar gives
which, properly interpreted, signifies ''1 + (-2 + (-3))'', also correct, but less faithful to the input and much harder to implement for some operators. Notice how terms to the right appear deeper in the tree, much as a right-recursive grammar would arrange them for ''1 - (2 - 3)''.

** Accommodating left recursion in top-down parsing**

A formal grammar that contains left recursion cannot be parsed by a LL(k)-parser or other naive recursive descent parser unless it is converted to a weakly equivalent right-recursive form. In contrast, left recursion is preferred for LALR parsers because it results in lower stack usage than right recursion. However, more sophisticated top-down parsers can implement general context-free grammars by use of curtailment. In 2006, Frost and Hafiz described an algorithm which accommodates ambiguous grammars with direct left-recursive production rules., available from the author at http://hafiz.myweb.cs.uwindsor.ca/pub/p46-frost.pdf That algorithm was extended to a complete parsing algorithm to accommodate indirect as well as direct left recursion in polynomial time, and to generate compact polynomial-size representations of the potentially exponential number of parse trees for highly ambiguous grammars by Frost, Hafiz and Callaghan in 2007. The authors then implemented the algorithm as a set of parser combinators written in the Haskell programming language.{{cite book|last=Frost|first=R. |author2=R. Hafiz |author3=P. Callaghan|date=January 2008|title=Parser Combinators for Ambiguous Left-Recursive Grammars|journal=10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN|volume=4902|issue=2008|pages=167–181|url=http://cs.uwindsor.ca/~richard/PUBLICATIONS/PADL_08.pdf|doi=10.1007/978-3-540-77442-6_12|series=Lecture Notes in Computer Science|isbn=978-3-540-77441-9

** See also **

* Tail recursion

** References **

** External links **

Practical Considerations for LALR(1) Grammars

Category:Control flow Category:Formal languages Category:Parsing Category:Recursion

James Power, Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland.JPR02 Symbolically, :$A\; \backslash Rightarrow^+\; A\backslash alpha$, where $\backslash Rightarrow^+$ indicates the operation of making one or more substitutions, and $\backslash alpha$ is any sequence of terminal and nonterminal symbols.

Practical Considerations for LALR(1) Grammars

Category:Control flow Category:Formal languages Category:Parsing Category:Recursion