HOME

TheInfoList



OR:

Top-down parsing in
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
is a
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
strategy where one first looks at the highest level of the
parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
and works down the parse tree by using the rewriting rules of a
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
.
LL parser In computer science, an LL parser (Left-to-right, leftmost derivation) is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence. An LL parser is called a ...
s are a type of parser that uses a top-down parsing strategy. Top-down parsing is a strategy of analyzing unknown data relationships by hypothesizing general
parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
structures and then considering whether the known fundamental structures are compatible with the hypothesis. It occurs in the analysis of both natural
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
s and
computer language A computer language is a formal language used to communicate with a computer. Types of computer languages include: * Construction language – all forms of communication by which a human can specify an executable problem solution to a compu ...
s. Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for parse-trees using a top-down expansion of the given
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
rules. Inclusive choice is used to accommodate
ambiguity Ambiguity is the type of meaning in which a phrase, statement or resolution is not explicitly defined, making several interpretations plausible. A common aspect of ambiguity is uncertainty. It is thus an attribute of any idea or statement ...
by expanding all alternative right-hand-sides of grammar rules. Simple implementations of top-down parsing do not terminate for left-recursive grammars, and top-down parsing with backtracking may have
exponential time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
complexity with respect to the length of the input for ambiguous CFGs. However, more sophisticated top-down parsers have been created by Frost, Hafiz, and Callaghan,Frost, R., Hafiz, R. and Callaghan, P. (2007)
Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars
." ''10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE '', Pages: 109 - 120, June 2007, Prague.
Archived
from the original on 12 November 2018.
Frost, R., Hafiz, R. and Callaghan, P. (2008)
Parser Combinators for Ambiguous Left-Recursive Grammars
" '' 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN '', Volume 4902/2008, Pages: 167-181, January 2008, San Francisco.
which do accommodate ambiguity and left recursion in
polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
and which generate polynomial-sized representations of the potentially exponential number of parse trees.


Programming language application

A
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
parses input from a programming language to an internal representation by matching the incoming symbols to production rules. Production rules are commonly defined using Backus–Naur form. An
LL parser In computer science, an LL parser (Left-to-right, leftmost derivation) is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence. An LL parser is called a ...
is a type of parser that does top-down parsing by applying each production rule to the incoming symbols, working from the left-most symbol yielded on a production rule and then proceeding to the next production rule for each non-terminal symbol encountered. In this way the parsing starts on the Left of the result side (right side) of the production rule and evaluates non-terminals from the Left first and, thus, proceeds down the parse tree for each new non-terminal before continuing to the next symbol for a production rule. For example: * A \rightarrow aBC * B \rightarrow c \mid cd * C \rightarrow df \mid eg which produces the string ''A''=''acdf'' would match A \rightarrow aBC and attempt to match B \rightarrow c \mid cd next. Then C \rightarrow df \mid eg would be tried. As one may expect, some languages are more ambiguous than others. For a non-ambiguous language, in which all productions for a non-terminal produce distinct strings, the string produced by one production will not start with the same symbol as the string produced by another production. A non-ambiguous language may be parsed by an LL(1) grammar where the (1) signifies the parser reads ahead one token at a time. For an ambiguous language to be parsed by an LL parser, the parser must lookahead more than 1 symbol, e.g. LL(3). The common solution to this problem is to use an
LR parser In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parse ...
, which is a type of
shift-reduce parser A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing a ...
, and does
bottom-up parsing In computer science, parsing reveals the grammatical structure of linear input text, as a first step in working out its meaning. Bottom-up parsing recognizes the text's lowest-level small details first, before its mid-level structures, and leavin ...
.


Accommodating left recursion in top-down parsing

A
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
that contains
left recursion In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on ...
cannot be
parsed Parsing, syntax analysis, or syntactic analysis is the process of analyzing a String (computer science), string of Symbol (formal), symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal gra ...
by a naive
recursive descent parser In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...
unless they are converted to a weakly equivalent right-recursive form. However, recent research demonstrates that it is possible to accommodate left-recursive grammars (along with all other forms of general CFGs) in a more sophisticated top-down parser by use of curtailment. A
recognition Recognition may refer to: *Award, something given in recognition of an achievement Machine learning *Pattern recognition, a branch of machine learning which encompasses the meanings below Biometric * Recognition of human individuals, or biomet ...
algorithm that accommodates
ambiguous grammar In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree, while an unambiguous grammar is a context-free grammar for which every valid string ...
s and curtails an ever-growing direct left-recursive parse by imposing depth restrictions with respect to input length and current input position, is described by Frost and Hafiz in 2006.Frost, R. and Hafiz, R. (2006)
A New Top-Down Parsing Algorithm to Accommodate Ambiguity and Left Recursion in Polynomial Time
" ''ACM SIGPLAN Notices'', Volume 41 Issue 5, Pages: 46 - 54.
That algorithm was extended to a complete
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
algorithm to accommodate indirect (by comparing previously computed context with current context) as well as direct left-recursion in
polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An exa ...
time, and to generate compact polynomial-size representations of the potentially exponential number of parse trees for highly ambiguous grammars by Frost, Hafiz and Callaghan in 2007. The algorithm has since been implemented as a set of
parser combinator In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as ...
s written in the
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
programming language. The implementation details of these new set of combinators can be found in a paper by the authors, which was presented in PADL'08. Th
X-SAIGA
site has more about the algorithms and implementation details. Additionally, one may use a
Graph-structured stack (GSS) In computer science, a graph-structured stack (GSS) is a directed acyclic graph where each directed path represents a stack. The graph-structured stack is an essential part of Tomita's algorithm, where it replaces the usual stack of a pushdown ...
in addition to the aforementioned curtailment in order to accommodate left recursion by 'merging' stacks with common prefixes and by preventing infinite recursion, thereby reducing the number and contents of each stack, thereby reducing the time and space complexity of the parser. This leads to a algorithm known as
Generalized LL parsing A generalization is a form of abstraction whereby common properties of specific instances are formulated as general concepts or claims. Generalizations posit the existence of a domain or set of elements, as well as one or more common character ...
, in which you use a GSS, left-recursion curtailment, and an LL(k) parser to parse input strings relative to a given CFG.


Time and space complexity of top-down parsing

When top-down parser tries to parse an ambiguous input with respect to an ambiguous CFG, it may need exponential number of steps (with respect to the length of the input) to try all alternatives of the CFG in order to produce all possible parse trees, which eventually would require exponential memory space. The problem of exponential time complexity in top-down parsers constructed as sets of mutually recursive functions has been solved by Norvig in 1991.Norvig, P. (1991) �
Techniques for automatic memoisation with applications to context-free parsing
” ''Journal - Computational Linguistics'' Volume 17, Issue 1, Pages: 91 - 98.
His technique is similar to the use of dynamic programming and state-sets in Earley's algorithm (1970), and tables in the
CYK algorithm In computer science, the Cocke–Younger–Kasami algorithm (alternatively called CYK, or CKY) is a parsing algorithm for context-free grammars published by Itiroo Sakai in 1961. The algorithm is named after some of its rediscoverers: John Cocke, ...
of Cocke, Younger and Kasami. The key idea is to store results of applying a parser p at position j in a memorable and to reuse results whenever the same situation arises. Frost, Hafiz and Callaghan also use memoization for refraining redundant computations to accommodate any form of CFG in
polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An exa ...
time ( Θ(n4) for left-recursive grammars and Θ(n3) for non left-recursive grammars). Their top-down parsing algorithm also requires polynomial space for potentially exponential ambiguous parse trees by 'compact representation' and 'local ambiguities grouping'. Their compact representation is comparable with Tomita's compact representation of
bottom-up parsing In computer science, parsing reveals the grammatical structure of linear input text, as a first step in working out its meaning. Bottom-up parsing recognizes the text's lowest-level small details first, before its mid-level structures, and leavin ...
.Tomita, M. (1985) �
Efficient Parsing for Natural Language
” ''Kluwer, Boston, MA''.
Using PEG's, another representation of grammars, packrat parsers provide an elegant and powerful parsing algorithm. See
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...
.


Examples

Some of the parsers that use top-down parsing include: *
Definite clause grammar A definite clause grammar (DCG) is a way of expressing grammar, either for natural or formal languages, in a logic programming language such as Prolog. It is closely related to the concept of attribute grammars / affix grammars from which Prolog ...
parsersPereira, Fernando CN, and David HD Warren.
Definite clause grammars for language analysis—a survey of the formalism and a comparison with augmented transition networks
" Artificial intelligence 13.3 (1980): 231-278.
*
Recursive descent parser In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...
* Predictive parser *
Earley parser In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, though (depending on the variant) it may suffer problems with certain nullable grammars. The algorithm, named after its inve ...


See also

*
Bottom-up parsing In computer science, parsing reveals the grammatical structure of linear input text, as a first step in working out its meaning. Bottom-up parsing recognizes the text's lowest-level small details first, before its mid-level structures, and leavin ...
*
Parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
*
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...


References


External links


X-SAIGA
- eXecutable SpecificAtIons of GrAmmars {{DEFAULTSORT:Top-Down Parsing Parsing algorithms