HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the
grammar In linguistics, the grammar of a natural language is its set of structural constraints on speakers' or writers' composition of clauses, phrases, and words. The term can also refer to the study of such constraints, a field that includes doma ...
. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes. A ''predictive parser'' is a recursive descent parser that does not require
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it d ...
. Predictive parsing is possible only for the class of LL(''k'') grammars, which are the
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
s for which there exists some positive integer ''k'' that allows a recursive descent parser to decide which production to use by examining only the next ''k'' tokens of input. The LL(''k'') grammars therefore exclude all
ambiguous grammar In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree, while an unambiguous grammar is a context-free grammar for which every valid string ...
s, as well as all grammars that contain
left recursion In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on ...
. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(''k'') grammar. A predictive parser runs in
linear time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
. Recursive descent with backtracking is a technique that determines which production to use by trying each production in turn. Recursive descent with backtracking is not limited to LL(''k'') grammars, but is not guaranteed to terminate unless the grammar is LL(''k''). Even when they terminate, parsers that use recursive descent with backtracking may require exponential time. Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by a parser generator, either for an LL(''k'') language or using an alternative parser, such as
LALR In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
or LR. This is particularly the case if a grammar is not in LL(''k'') form, as transforming the grammar to LL to make it suitable for predictive parsing is involved. Predictive parsers can also be automatically generated, using tools like ANTLR. Predictive parsers can be depicted using transition diagrams for each non-terminal symbol where the edges between the initial and the final states are labelled by the symbols (terminals and non-terminals) of the right side of the production rule.


Example parser

The following EBNF-like
grammar In linguistics, the grammar of a natural language is its set of structural constraints on speakers' or writers' composition of clauses, phrases, and words. The term can also refer to the study of such constraints, a field that includes doma ...
(for
Niklaus Wirth Niklaus Emil Wirth (born 15 February 1934) is a Swiss computer scientist. He has designed several programming languages, including Pascal, and pioneered several classic topics in software engineering. In 1984, he won the Turing Award, generally ...
's
PL/0 PL/0 is a programming language, intended as an educational programming language, that is similar to but much simpler than Pascal, a general-purpose programming language. It serves as an example of how to construct a compiler. It was originally intr ...
programming language, from ''
Algorithms + Data Structures = Programs ''Algorithms + Data Structures = Programs'' is a 1976 book written by Niklaus Wirth covering some of the fundamental topics of computer programming, particularly that algorithms and data structures are inherently related. For example, if one has a ...
'') is in LL(1) form: program = block "." . block = const" ident "=" number ";" var" ident ";" statement . statement = ident ":=" expression , "call" ident , "begin" statement "end" , "if" condition "then" statement , "while" condition "do" statement . condition = "odd" expression , expression ("=", "#", "<", "<=", ">", ">=") expression . expression = "-"term . term = factor . factor = ident , number , "(" expression ")" . Terminals are expressed in quotes. Each
nonterminal In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
is defined by a rule in the grammar, except for ''ident'' and ''number'', which are assumed to be implicitly defined.


C implementation

What follows is an implementation of a recursive descent parser for the above language in C. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly. Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner until the final nonterminal has been processed. The program fragment depends on a global variable, ''sym'', which contains the current symbol from the input, and the function ''nextsym'', which updates ''sym'' when called. The implementations of the functions ''nextsym'' and ''error'' are omitted for simplicity. typedef enum Symbol; Symbol sym; void nextsym(void); void error(const char msg[]); int accept(Symbol s) int expect(Symbol s) void factor(void) void term(void) void expression(void) void condition(void) void statement(void) void block(void) void program(void)


Examples

Some recursive descent parser generators: * TMG – an early compiler-compiler used in the 1960s and early 1970s * JavaCC * Coco/R * ANTLR * Spirit Parser Framework – a C++ recursive descent parser generator framework requiring no pre-compile step * parboiled (Java) – a recursive descent PEG parsing library for
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...


See also

*
Parser combinator In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as ...
– a higher-order function used in combinatory parsing, a method of factoring recursive descent parser designs *
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...
– another form representing recursive descent grammar * Recursive ascent parser * Tail recursive parser – a variant of the recursive descent parser


References


General references

* '' Compilers: Principles, Techniques, and Tools'', first edition, Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman, in particular Section 4.4. * ''Modern Compiler Implementation in Java, Second Edition'', Andrew Appel, 2002, . * ''Recursive Programming Techniques'', W.H. Burge, 1975, * ''Crafting a Compiler with C'', Charles N Fischer and Richard J LeBlanc, Jr, 1991, . * ''Compiling with C# and Java'', Pat Terry, 2005, , 624 * ''Algorithms + Data Structures = Programs'', Niklaus Wirth, 1975, * ''Compiler Construction'', Niklaus Wirth, 1996,


External links


Introduction to Parsing
- an easy to read introduction to parsing, with a comprehensive section on recursive descent parsing

- a brief tutorial on implementing recursive descent parser
Simple mathematical expressions parser
in
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...

Simple Top Down Parsing in PythonJack W. Crenshaw: ''Let's Build A Compiler'' (1988-1995)
in
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Frenc ...
, with
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
output, using a "keep it simple" approach
Functional Pearls: Monadic Parsing in Haskell
{{Parsers Parsing algorithms Articles with example C code