Left Recursion
   HOME

TheInfoList



OR:

In the
formal language theory In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...
of
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, left recursion is a special case of
recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on the right). For instance, 1+2+3 can be recognized as a sum because it can be broken into 1+2, also a sum, and +3, a suitable suffix. In terms of
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
, a
nonterminal In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
is left-recursive if the leftmost symbol in one of its productions is itself (in the case of direct left recursion) or can be made itself by some sequence of substitutions (in the case of indirect left recursion).


Definition

A grammar is left-recursive if and only if there exists a nonterminal symbol A that can derive to a
sentential form In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
with itself as the leftmost symbol. Notes on Formal Language Theory and Parsing
James Power, Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland.
JPR02 JPR may stand for: * J. P. R. Williams (born 1949), Welsh rugby player * Institute for Jewish Policy Research, a British think tank * Jefferson Public Radio, headquartered in Ashland, Oregon * Jeppiaar (1931–2016), Indian educationalist * Ji-Para ...
Symbolically, : A \Rightarrow^+ A\alpha, where \Rightarrow^+ indicates the operation of making one or more substitutions, and \alpha is any sequence of terminal and nonterminal symbols.


Direct left recursion

Direct left recursion occurs when the definition can be satisfied with only one substitution. It requires a rule of the form :A \to A\alpha where \alpha is a sequence of nonterminals and terminals . For example, the rule :\mathit \to \mathit + \mathit is directly left-recursive. A left-to-right
recursive descent parser In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...
for this rule might look like void Expression() and such code would fall into infinite recursion when executed.


Indirect left recursion

Indirect left recursion occurs when the definition of left recursion is satisfied via several substitutions. It entails a set of rules following the pattern :A_0 \to \beta_0A_1\alpha_0 :A_1 \to \beta_1A_2\alpha_1 :\cdots :A_n \to \beta_nA_0\alpha_n where \beta_0, \beta_1, \ldots, \beta_n are sequences that can each yield the
empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...
, while \alpha_0, \alpha_1, \ldots, \alpha_n may be any sequences of terminal and nonterminal symbols at all. Note that these sequences may be empty. The derivation :A_0\Rightarrow\beta_0A_1\alpha_0\Rightarrow^+ A_1\alpha_0\Rightarrow\beta_1A_2\alpha_1\alpha_0\Rightarrow^+\cdots\Rightarrow^+ A_0\alpha_n\dots\alpha_1\alpha_0 then gives A_0 as leftmost in its final sentential form.


Removing left recursion

Left recursion often poses problems for parsers, either because it leads them into infinite recursion (as in the case of most top-down parsers) or because they expect rules in a normal form that forbids it (as in the case of many bottom-up parsers, including the
CYK algorithm In computer science, the Cocke–Younger–Kasami algorithm (alternatively called CYK, or CKY) is a parsing algorithm for context-free grammars published by Itiroo Sakai in 1961. The algorithm is named after some of its rediscoverers: John Cocke ...
). Therefore, a grammar is often preprocessed to eliminate the left recursion.


Removing direct left recursion

The general algorithm to remove direct left recursion follows. Several improvements to this method have been made. For a left-recursive nonterminal A, discard any rules of the form A\rightarrow A and consider those that remain: :A \rightarrow A\alpha_1 \mid \ldots \mid A\alpha_n \mid \beta_1 \mid \ldots \mid \beta_m where: * each \alpha is a nonempty sequence of nonterminals and terminals, and * each \beta is a sequence of nonterminals and terminals that does not start with A. Replace these with two sets of productions, one set for A: :A \rightarrow \beta_1A^\prime \mid \ldots \mid \beta_mA^\prime and another set for the fresh nonterminal A' (often called the "tail" or the "rest"): :A^\prime \rightarrow \alpha_1A^\prime \mid \ldots \mid \alpha_nA^\prime \mid \epsilon Repeat this process until no direct left recursion remains. As an example, consider the rule set :\mathit \rightarrow \mathit+\mathit \mid \mathit \mid \mathit This could be rewritten to avoid left recursion as :\mathit \rightarrow \mathit\,\mathit' \mid \mathit\,\mathit' :\mathit' \rightarrow +\mathit \,\mathit'\mid \epsilon


Removing all left recursion

By establishing a
topological ordering In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge ''uv'' from vertex ''u'' to vertex ''v'', ''u'' comes before ''v'' in the ordering. For in ...
on nonterminals, the above process can be extended to also eliminate indirect left recursion : :Inputs ''A grammar: a set of nonterminals A_1,\ldots,A_n and their productions'' :Output ''A modified grammar generating the same language but without left recursion'' :# ''For each nonterminal A_i:'' :## ''Repeat until an iteration leaves the grammar unchanged:'' :### ''For each rule A_i\rightarrow\alpha_i, \alpha_i being a sequence of terminals and nonterminals:'' :#### ''If \alpha_i begins with a nonterminal A_j and j:'' :##### ''Let \beta_i be \alpha_i without its leading A_j.'' :##### ''Remove the rule A_i\rightarrow\alpha_i.'' :##### ''For each rule A_j\rightarrow\alpha_j:'' :###### ''Add the rule A_i\rightarrow\alpha_j\beta_i.'' :## ''Remove direct left recursion for A_i as described above.'' Note that this algorithm is highly sensitive to the nonterminal ordering; optimizations often focus on choosing this ordering well.


Pitfalls

Although the above transformations preserve the language generated by a grammar, they may change the
parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
s that
witness In law, a witness is someone who has knowledge about a matter, whether they have sensed it or are testifying on another witnesses' behalf. In law a witness is someone who, either voluntarily or under compulsion, provides testimonial evidence, e ...
strings' recognition. With suitable bookkeeping, tree rewriting can recover the originals, but if this step is omitted, the differences may change the semantics of a parse. Associativity is particularly vulnerable; left-associative operators typically appear in right-associative-like arrangements under the new grammar. For example, starting with this grammar: :\mathit \rightarrow \mathit\,-\,\mathit \mid \mathit :\mathit \rightarrow \mathit\,*\,\mathit \mid \mathit :\mathit \rightarrow (\mathit) \mid \mathit the standard transformations to remove left recursion yield the following: :\mathit \rightarrow \mathit\ \mathit' :\mathit' \rightarrow - \mathit\ \mathit' \mid \epsilon :\mathit \rightarrow \mathit\ \mathit' :\mathit' \rightarrow * \mathit\ \mathit' \mid \epsilon :\mathit \rightarrow (\mathit) \mid \mathit Parsing the string "1 - 2 - 3" with the first grammar in an LALR parser (which can handle left-recursive grammars) would have resulted in the parse tree: This parse tree groups the terms on the left, giving the correct semantics ''(1 - 2) - 3''. Parsing with the second grammar gives which, properly interpreted, signifies ''1 + (-2 + (-3))'', also correct, but less faithful to the input and much harder to implement for some operators. Notice how terms to the right appear deeper in the tree, much as a right-recursive grammar would arrange them for ''1 - (2 - 3)''.


Accommodating left recursion in top-down parsing

A
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
that contains left recursion cannot be
parse Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
d by a LL(k)-parser or other naive
recursive descent parser In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...
unless it is converted to a weakly equivalent right-recursive form. In contrast, left recursion is preferred for
LALR parser In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
s because it results in lower stack usage than
right recursion In computer science, a grammar is informally called a recursive grammar if it contains production rules that are recursive, meaning that expanding a non-terminal according to these rules can eventually lead to a string that includes the same non-t ...
. However, more sophisticated top-down parsers can implement general
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
s by use of curtailment. In 2006, Frost and Hafiz described an algorithm which accommodates
ambiguous grammar In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree, while an unambiguous grammar is a context-free grammar for which every valid string ...
s with direct left-recursive production rules., available from the author at http://hafiz.myweb.cs.uwindsor.ca/pub/p46-frost.pdf That algorithm was extended to a complete
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
algorithm to accommodate indirect as well as direct left recursion in
polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An exa ...
time, and to generate compact polynomial-size representations of the potentially exponential number of parse trees for highly ambiguous grammars by Frost, Hafiz and Callaghan in 2007. The authors then implemented the algorithm as a set of
parser combinator In computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming i ...
s written in the
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
programming language.{{cite book, last=Frost, first=R. , author2=R. Hafiz , author3=P. Callaghan, date=January 2008, title=Parser Combinators for Ambiguous Left-Recursive Grammars, journal=10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN, volume=4902, issue=2008, pages=167–181, url=https://cs.uwindsor.ca/~richard/PUBLICATIONS/PADL_08.pdf, doi=10.1007/978-3-540-77442-6_12, series=Lecture Notes in Computer Science, isbn=978-3-540-77441-9


See also

*
Tail recursion In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recur ...


References


External links


Practical Considerations for LALR(1) Grammars
Control flow Formal languages Parsing Recursion