recursive descent parser
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a recursive descent parser is a kind of
top-down parser Top-down parsing in computer science is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. LL parsers are a type of parser that uses a top-d ...
built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the
grammar In linguistics, the grammar of a natural language is its set of structure, structural constraints on speakers' or writers' composition of clause (linguistics), clauses, phrases, and words. The term can also refer to the study of such constraint ...
. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes. A ''predictive parser'' is a recursive descent parser that does not require
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
. Predictive parsing is possible only for the class of LL(''k'') grammars, which are the
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
s for which there exists some positive integer ''k'' that allows a recursive descent parser to decide which production to use by examining only the next ''k'' tokens of input. The LL(''k'') grammars therefore exclude all
ambiguous grammar In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree, while an unambiguous grammar is a context-free grammar for which every valid string ...
s, as well as all grammars that contain
left recursion In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on ...
. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(''k'') grammar. A predictive parser runs in
linear time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
. Recursive descent with backtracking is a technique that determines which
production Production may refer to: Economics and business * Production (economics) * Production, the act of manufacturing goods * Production, in the outline of industrial organization, the act of making products (goods and services) * Production as a stati ...
to use by trying each production in turn. Recursive descent with backtracking is not limited to LL(''k'') grammars, but is not guaranteed to terminate unless the grammar is LL(''k''). Even when they terminate, parsers that use recursive descent with backtracking may require
exponential time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
. Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by a
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
, either for an LL(''k'') language or using an alternative parser, such as
LALR In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
or LR. This is particularly the case if a grammar is not in LL(''k'') form, as transforming the grammar to LL to make it suitable for predictive parsing is involved. Predictive parsers can also be automatically generated, using tools like
ANTLR In computer-based language recognition, ANTLR (pronounced ''antler''), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), firs ...
. Predictive parsers can be depicted using transition diagrams for each non-terminal symbol where the edges between the initial and the final states are labelled by the symbols (terminals and non-terminals) of the right side of the production rule.


Example parser

The following EBNF-like
grammar In linguistics, the grammar of a natural language is its set of structure, structural constraints on speakers' or writers' composition of clause (linguistics), clauses, phrases, and words. The term can also refer to the study of such constraint ...
(for
Niklaus Wirth Niklaus Emil Wirth (born 15 February 1934) is a Swiss computer scientist. He has designed several programming languages, including Pascal (programming language), Pascal, and pioneered several classic topics in software engineering. In 1984, he w ...
's
PL/0 PL/0 is a programming language, intended as an educational programming language, that is similar to but much simpler than Pascal, a general-purpose programming language. It serves as an example of how to construct a compiler. It was originally intr ...
programming language, from ''
Algorithms + Data Structures = Programs In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing c ...
'') is in
LL(1) In computer science, an LL parser (Left-to-right, leftmost derivation) is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence. An LL parser is called an ...
form: program = block "." . block = const" ident "=" number ";" var" ident ";" statement . statement = ident ":=" expression , "call" ident , "begin" statement "end" , "if" condition "then" statement , "while" condition "do" statement . condition = "odd" expression , expression ("=", "#", "<", "<=", ">", ">=") expression . expression = "-"term . term = factor . factor = ident , number , "(" expression ")" . Terminals are expressed in quotes. Each
nonterminal In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
is defined by a rule in the grammar, except for ''ident'' and ''number'', which are assumed to be implicitly defined.


C implementation

What follows is an implementation of a recursive descent parser for the above language in C. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly. Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner until the final nonterminal has been processed. The program fragment depends on a global variable, ''sym'', which contains the current symbol from the input, and the function ''nextsym'', which updates ''sym'' when called. The implementations of the functions ''nextsym'' and ''error'' are omitted for simplicity. typedef enum Symbol; Symbol sym; void nextsym(void); void error(const char msg[]); int accept(Symbol s) int expect(Symbol s) void factor(void) void term(void) void expression(void) void condition(void) void statement(void) void block(void) void program(void)


Examples

Some recursive descent parser generators: * TMG – an early compiler-compiler used in the 1960s and early 1970s *
JavaCC JavaCC (Java Compiler Compiler) is an open-source software, open-source parser generator and Lexical analysis, lexical analyzer generator written in the Java (programming language), Java programming language. JavaCC is similar to yacc in that it ...
*
Coco/R Coco/R is a compiler generator that takes wirth syntax notationIn the manual, however, it is referred as L-attributed Extended Backus–Naur Form syntax (EBNF). grammars of a source language and generates a scanner and a parser for that lang ...
*
ANTLR In computer-based language recognition, ANTLR (pronounced ''antler''), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), firs ...
*
Spirit Parser Framework The Spirit Parser Framework is an object oriented recursive descent parser generator framework implemented using template metaprogramming techniques. Expression templates allow users to approximate the syntax of extended Backus–Naur form (EBNF) ...
– a C++ recursive descent parser generator framework requiring no pre-compile step *
parboiled (Java) parboiled is an open-source Java library released under an Apache License. It provides support for defining PEG parsers directly in Java source code. parboiled is commonly used as an alternative for regular expressions or parser generators (like ...
– a recursive descent PEG parsing library for
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...


See also

*
Parser combinator In computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming i ...
– a higher-order function used in combinatory parsing, a method of factoring recursive descent parser designs *
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...
– another form representing recursive descent grammar *
Recursive ascent parser In computer science, recursive ascent parsing is a technique for implementing an LALR parser which uses mutually-recursive functions rather than tables. Thus, the parser is ''directly encoded'' in the host language similar to recursive descent. Di ...
*
Tail recursive parser In computer science, tail recursive parsers are a derivation from the more common recursive descent parsers. Tail recursive parsers are commonly used to parse left recursive grammars. They use a smaller amount of stack space than regular recursive ...
– a variant of the recursive descent parser


References


General references

* '' Compilers: Principles, Techniques, and Tools'', first edition, Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman, in particular Section 4.4. * ''Modern Compiler Implementation in Java, Second Edition'', Andrew Appel, 2002, . * ''Recursive Programming Techniques'', W.H. Burge, 1975, * ''Crafting a Compiler with C'', Charles N Fischer and Richard J LeBlanc, Jr, 1991, . * ''Compiling with C# and Java'', Pat Terry, 2005, , 624 * ''Algorithms + Data Structures = Programs'', Niklaus Wirth, 1975, * ''Compiler Construction'', Niklaus Wirth, 1996,


External links


Introduction to Parsing
- an easy to read introduction to parsing, with a comprehensive section on recursive descent parsing

- a brief tutorial on implementing recursive descent parser
Simple mathematical expressions parser
in
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...

Simple Top Down Parsing in PythonJack W. Crenshaw: ''Let's Build A Compiler'' (1988-1995)
in
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Fren ...
, with
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
output, using a "keep it simple" approach
Functional Pearls: Monadic Parsing in Haskell
{{Parsers Parsing algorithms Articles with example C code