LALR parser
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, an LALR parser (look-ahead, left-to-right, rightmost derivation parser) is part of the compiling process where human readable text is converted into a structured representation to be read by computers. An LALR parser is a software tool to process (
parse Parsing, syntax analysis, or syntactic analysis is a process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar by breaking it into parts. The term ''pa ...
) text into a very specific internal representation that other programs, such as compilers, can work with. This process happens according to a set of production rules specified by a
formal grammar A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
for a
computer language A computer language is a formal language used to communicate with a computer. Types of computer languages include: * Software construction#Construction languages, Construction language – all forms of communication by which a human can Comput ...
. An LALR parser is a simplified version of a
canonical LR parser A canonical LR parser (also called a LR(1) parser) is a type of bottom-up parsing algorithm used in computer science to analyze and process programming languages. It is based on the LR parsing technique, which stands for "left-to-right, rightmost d ...
. The LALR parser was invented by Frank DeRemer in his 1969 PhD dissertation, ''Practical Translators for LR(k) languages'', in his treatment of the practical difficulties at that time of implementing LR(1) parsers. He showed that the LALR parser has more language recognition power than the LR(0) parser, while requiring the same number of states as the LR(0) parser for a language that can be recognized by both parsers. This makes the LALR parser a memory-efficient alternative to the LR(1) parser for languages that are LALR. It was also proven that there exist LR(1) languages that are not LALR. Despite this weakness, the power of the LALR parser is sufficient for many mainstream computer languages,''LR Parsing: Theory and Practice,'' Nigel P. Chapman
p. 86–87
/ref> including
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, though the reference grammars for many languages fail to be LALR due to being
ambiguous Ambiguity is the type of meaning in which a phrase, statement, or resolution is not explicitly defined, making for several interpretations; others describe it as a concept or statement that has no real reference. A common aspect of ambiguit ...
. The original dissertation gave no algorithm for constructing such a parser given a formal grammar. The first algorithms for LALR parser generation were published in 1973. In 1982, DeRemer and Tom Pennello published an algorithm that generated highly memory-efficient LALR parsers. LALR parsers can be automatically generated from a grammar by an
LALR parser generator A lookahead LR parser (LALR) generator is a software tool that reads a context-free grammar (CFG) and creates an LALR parser which is capable of parsing files written in the context-free language defined by the CFG. LALR parsers are desirable b ...
such as
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
or GNU Bison. The automatically generated code may be augmented by hand-written code to augment the power of the resulting parser.


History

In 1965,
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...
invented the
LR parser In computer science, LR parsers are a type of bottom-up parsing, bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, canonical LR parser, canonica ...
(Left to Right, Rightmost derivation). The LR parser can recognize any
deterministic context-free language In formal language theory, deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, meanin ...
in linear-bounded time. Rightmost derivation has very large memory requirements and implementing an LR parser was impractical due to the limited
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
of computers at that time. To address this shortcoming, in 1969, Frank DeRemer proposed two simplified versions of the LR parser, namely the Look-Ahead LR (LALR) and the
Simple LR parser In computer science, a Simple LR or SLR parser is a type of LR parser with small parse tables and a relatively simple parser generator algorithm. As with other types of LR(1) parser, an SLR parser is quite efficient at finding the single correct ...
(SLR) that had much lower memory requirements at the cost of less language-recognition power, with the LALR parser being the most-powerful alternative. In 1977, memory optimizations for the LR parser were invented but still the LR parser was less memory-efficient than the simplified alternatives. In 1979, Frank DeRemer and Tom Pennello announced a series of optimizations for the LALR parser that would further improve its memory efficiency. Their work was published in 1982.


Overview

Generally, the LALR parser refers to the LALR(1) parser, just as the LR parser generally refers to the LR(1) parser. The "(1)" denotes one-token lookahead, to resolve differences between rule patterns during parsing. Similarly, there is an LALR(2) parser with two-token lookahead, and LALR(''k'') parsers with ''k''-token lookup, but these are rare in actual use. The LALR parser is based on the LR(0) parser, so it can also be denoted LALR(1) = LA(1)LR(0) (1 token of lookahead, LR(0)) or more generally LALR(''k'') = LA(''k'')LR(0) (k tokens of lookahead, LR(0)). There is in fact a two-parameter family of LA(''k'')LR(''j'') parsers for all combinations of ''j'' and ''k'', which can be derived from the LR(''j'' + ''k'') parser, but these do not see practical use. As with other types of LR parsers, an LALR parser is quite efficient at finding the single correct bottom-up parse in a single left-to-right scan over the input stream, because it does not need to use
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
. Being a lookahead parser by definition, it always uses a lookahead, with being the most-common case.


Relation to other parsers


LR parsers

The LALR(1) parser is less powerful than the LR(1) parser, and more powerful than the SLR(1) parser, though they all use the same production rules. The simplification that the LALR parser introduces consists in merging rules that have identical kernel item sets, because during the LR(0) state-construction process the lookaheads are not known. This reduces the power of the parser because not knowing the lookahead symbols can confuse the parser as to which grammar rule to pick next, resulting in reduce/reduce conflicts. All conflicts that arise in applying a LALR(1) parser to an unambiguous LR(1) grammar are reduce/reduce conflicts. The SLR(1) parser performs further merging, which introduces additional conflicts. The standard example of an LR(1) grammar that cannot be parsed with the LALR(1) parser, exhibiting such a reduce/reduce conflict, is: S → a E c → a F d → b F c → b E d E → e F → e In the LALR table construction, two states will be merged into one state and later the lookaheads will be found to be ambiguous. The one state with lookaheads is: E → e. F → e. An LR(1) parser will create two different states (with non-conflicting lookaheads), neither of which is ambiguous. In an LALR parser this one state has conflicting actions (given lookahead c or d, reduce to E or F), a "reduce/reduce conflict"; the above grammar will be declared ambiguous by a
LALR parser generator A lookahead LR parser (LALR) generator is a software tool that reads a context-free grammar (CFG) and creates an LALR parser which is capable of parsing files written in the context-free language defined by the CFG. LALR parsers are desirable b ...
and conflicts will be reported. To recover, this ambiguity is resolved by choosing E, because it occurs before F in the grammar. However, the resultant parser will not be able to recognize the valid input sequence b e c, since the ambiguous sequence e c is reduced to (E → e) c, rather than the correct (F → e) c, but b E c is not in the grammar.


LL parsers

The LALR(''j'') parsers are incomparable with LL(''k'') parsers: for any ''j'' and ''k'' both greater than 0, there are LALR(''j'') grammars that are not LL(''k'') grammars and vice versa. In fact, it is undecidable whether a given LL(1) grammar is LALR(''k'') for any k > 0. Depending on the presence of empty derivations, a LL(1) grammar can be equal to a SLR(1) or a LALR(1) grammar. If the LL(1) grammar has no empty derivations it is SLR(1) and if all symbols with empty derivations have non-empty derivations it is LALR(1). If symbols having only an empty derivation exist, the grammar may or may not be LALR(1).


See also

* Comparison of parser generators *
Context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
* Lookahead *
Parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
* Token scanner


Notes


References

* *


External links


Parsing Simulator
This simulator is used to generate parsing tables LALR and resolve the exercises of the book.
JS/CC
JavaScript based implementation of a LALR(1) parser generator, which can be run in a web-browser or from the command-line. * , a flash card-like tutorial on LALR(1) parsing. {{Parsers Parsing algorithms