HOME

TheInfoList



OR:

A GLR parser (GLR standing for "Generalized LR", where L stands for "left-to-right" and R stands for "rightmost (derivation)") is an extension of an
LR parser In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parse ...
algorithm to handle non-deterministic and
ambiguous grammar In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree, while an unambiguous grammar is a context-free grammar for which every valid string ...
s. The theoretical foundation was provided in a 1974 paper by Bernard Lang (along with other general Context-Free parsers such as GLL). It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by
Masaru Tomita is a Japanese molecular biologist and computer scientist, best known as the director of the E-Cell simulation environment software and/or the inventor of GLR parser algorithm.Tomita M. (1984).LR parsers for natural languages. COLING. 10th Intern ...
, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work, though in practice it is the second stage that is recognized as the GLR parser. Though the algorithm has evolved since its original forms, the principles have remained intact. As shown by an earlier publication, Lang was primarily interested in more easily used and more flexible parsers for
extensible programming Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this ...
languages. Tomita's goal was to parse natural language text thoroughly and efficiently. Standard
LR parsers In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parsers ...
cannot accommodate the nondeterministic and ambiguous nature of natural language, and the GLR algorithm can.


Algorithm

Briefly, the GLR algorithm works in a manner similar to the
LR parser In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parse ...
algorithm, except that, given a particular grammar, a GLR parser will process all possible interpretations of a given input in a
breadth-first search Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next de ...
. On the front-end, a GLR
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
converts an input grammar into parser tables, in a manner similar to an LR generator. However, where LR parse tables allow for only one
state transition State may refer to: Arts, entertainment, and media Literature * ''State Magazine'', a monthly magazine published by the U.S. Department of State * ''The State'' (newspaper), a daily newspaper in Columbia, South Carolina, United States * ''Our S ...
(given a state and an input token), GLR parse tables allow for multiple transitions. In effect, GLR allows for shift/reduce and reduce/reduce conflicts. When a conflicting transition is encountered, the parse stack is forked into two or more parallel parse stacks, where the state corresponding to each possible transition is at the top. Then, the next input token is read and used to determine the next transition(s) for each of the "top" states – and further forking can occur. If any given top state and input token do not result in at least one transition, then that "path" through the parse tables is invalid and can be discarded. A crucial optimization known as a
Graph-structured stack (GSS) In computer science, a graph-structured stack (GSS) is a directed acyclic graph where each directed path represents a stack. The graph-structured stack is an essential part of Tomita's algorithm, where it replaces the usual stack of a pushdown ...
allows sharing of common prefixes and suffixes of these stacks, which constrains the overall search space and memory usage required to parse input text. The complex structures that arise from this improvement make the search graph a
directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one v ...
(with additional restrictions on the "depths" of various nodes), rather than a tree.


Advantages

Recognition using the GLR algorithm has the same worst-case time complexity as the
CYK algorithm In computer science, the Cocke–Younger–Kasami algorithm (alternatively called CYK, or CKY) is a parsing algorithm for context-free grammars published by Itiroo Sakai in 1961. The algorithm is named after some of its rediscoverers: John Cocke, ...
and
Earley algorithm In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, though (depending on the variant) it may suffer problems with certain nullable grammars. The algorithm, named after its inve ...
: ''O''(''n''3). However, GLR carries two additional advantages: * The time required to run the algorithm is proportional to the degree of nondeterminism in the grammar: on deterministic grammars the GLR algorithm runs in ''O''(''n'') time (this is not true of the Earley and CYK algorithms, but the original Earley algorithms can be modified to ensure it) * The GLR algorithm is "online" – that is, it consumes the input tokens in a specific order and performs as much work as possible after consuming each token (also true for Earley). In practice, the grammars of most programming languages are deterministic or "nearly deterministic", meaning that any nondeterminism is usually resolved within a small (though possibly unbounded) number of tokens. Compared to other algorithms capable of handling the full class of context-free grammars (such as Earley or CYK), the GLR algorithm gives better performance on these "nearly deterministic" grammars, because only a single stack will be active during the majority of the parsing process. GLR can be combined with the
LALR In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
(1) algorithm, in a hybrid parser, allowing still higher performance.


See also

*
Comparison of parser generators This is a list of notable lexer generators and parser generators for various language classes. Regular languages Regular languages are a category of languages (sometimes termed Chomsky Type 3) which can be matched by a state machine (more sp ...
*
DMS Software Reengineering Toolkit The DMS Software Reengineering Toolkit is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source langu ...
* GNU Bison, a parser generator that can create LALR and GLR parsers


References


Further reading

* * * {{Parsers Parsing algorithms