HOME

TheInfoList



OR:

Top-Down Parsing Language (TDPL) is a type of analytic
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
developed by Alexander Birman in the early 1970s in order to study formally the behavior of a common class of practical top-down parsers that support a limited form of
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
. Birman originally named his formalism ''the TMG Schema'' (TS), after TMG, an early
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
, but it was later given the name TDPL by Aho and Ullman in their classic anthology ''The Theory of Parsing, Translation and Compiling''.


Definition of a TDPL grammar

Formally, a TDPL grammar ''G'' is a tuple consisting of the following components: * A finite set ''N'' of ''nonterminal symbols''. * A finite set Σ of ''terminal symbols'' that is disjoint from ''N''. * A finite set ''P'' of '' production rules'', where a rule has one of the following forms: ** ''A'' ← ε, where ''A'' is a nonterminal and ε is the empty string. ** ''A'' ← ''f'', where ''f'' is a distinguished symbol representing ''unconditional failure''. ** ''A'' ← ''a'', where ''a'' is any terminal symbol. ** ''A'' ← ''BC/D'', where ''B'', ''C'', and ''D'' are nonterminals.


Interpretation of a grammar

A TDPL grammar can be viewed as an extremely minimalistic formal representation of a
recursive descent parser In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...
, in which each of the nonterminals schematically represents a parsing
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...
. Each of these nonterminal-functions takes as its input argument a string to be recognized, and yields one of two possible outcomes: * ''success'', in which case the function may optionally move forward or ''consume'' one or more characters of the input string supplied to it, or * ''failure'', in which case no input is consumed. Note that a nonterminal-function may succeed without actually consuming any input, and this is considered an outcome distinct from failure. A nonterminal ''A'' defined by a rule of the form ''A'' ← ε always succeeds without consuming any input, regardless of the input string provided. Conversely, a rule of the form ''A'' ← ''f'' always fails regardless of input. A rule of the form ''A'' ← ''a'' succeeds if the next character in the input string is the terminal ''a'', in which case the nonterminal succeeds and consumes that one terminal; if the next input character does not match (or there is no next character), then the nonterminal fails. A nonterminal ''A'' defined by a rule of the form ''A'' ← ''BC/D'' first
recursively Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
invokes nonterminal ''B'', and if ''B'' succeeds, invokes ''C'' on the remainder of the input string left unconsumed by ''B''. If both ''B'' and ''C'' succeed, then ''A'' in turn succeeds and consumes the same total number of input characters that ''B'' and ''C'' together did. If either ''B'' or ''C'' fails, however, then ''A'' backtracks to the original point in the input string where it was first invoked, and then invokes ''D'' on that original input string, returning whatever result ''D'' produces.


Examples

The following TDPL grammar describes the
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
consisting of an arbitrary-length sequence of a's and b's: : ''S'' ← ''AS/T'' : ''T'' ← ''BS/E'' : ''A'' ← a : ''B'' ← b : ''E'' ← ε The following grammar describes the
context-free language In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by ...
''parentheses language'' consisting of arbitrary-length strings of matched braces, such as '', '', etc.: : ''S'' ← ''OT/E'' : ''T'' ← ''SU/F'' : ''U'' ← ''CS/F'' : ''O'' ← : ''E'' ← ε : ''F'' ← ''f'' The above examples can be represented equivalently but much more succinctly in
parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...
notation as and , respectively.


Generalized TDPL

A slight variation of TDPL, known as Generalized TDPL or GTDPL, greatly increases the apparent expressiveness of TDPL while retaining the same minimalist approach (though they are actually equivalent). In GTDPL, in place of TDPL's recursive rule form ''A'' ← ''BC/D'', we instead use the alternate rule form ''A'' ← ''B ,D', which is interpreted as follows. When nonterminal ''A'' is invoked on some input string, it first recursively invokes ''B''. If ''B'' succeeds, then ''A'' subsequently invokes ''C'' on the remainder of the input left unconsumed by ''B'', and returns the result of ''C'' to the original caller. If ''B'' fails, on the other hand, then ''A'' invokes ''D'' on the original input string, and passes the result back to the caller. The important difference between this rule form and the ''A'' ← ''BC/D'' rule form used in TDPL is that ''C'' and ''D'' are never ''both'' invoked in the same call to ''A'': that is, the GTDPL rule acts more like a "pure" if/then/else construct using ''B'' as the condition. In GTDPL it is straightforward to express interesting non-
context-free language In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by ...
s such as the classic example . A GTDPL grammar can be reduced to an equivalent TDPL grammar that recognizes the same language, although the process is not straightforward and may greatly increase the number of rules required.Ford, Bryan.
Parsing Expression Grammars: A Recognition-Based Syntactic Foundation
'
Also, both TDPL and GTDPL can be viewed as very restricted forms of
parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...
s, all of which represent the same class of grammars.


See also

*
Formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
*
Recursive descent parser In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...


References

{{Reflist, 2


External links


The Packrat Parsing and Parsing Expression Grammars Page
Formal languages