In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, an LL parser (Left-to-right,
leftmost derivation
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
) is a
top-down parser Top-down parsing in computer science is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. LL parsers are a type of parser that uses a top-d ...
for a restricted
context-free language
In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).
Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by ...
. It parses the input from Left to right, performing
Leftmost derivation of the sentence.
An LL parser is called an LL(''k'') parser if it uses ''k''
tokens of
lookahead when parsing a sentence. A grammar is called an
LL(''k'') grammar if an LL(''k'') parser can be constructed from it. A formal language is called an LL(''k'') language if it has an LL(''k'') grammar. The set of LL(''k'') languages is properly contained in that of LL(''k''+1) languages, for each ''k'' ≥ 0. A corollary of this is that not all context-free languages can be recognized by an LL(''k'') parser.
An LL parser is called LL-regular (LLR) if it parses an
LL-regular language. The class of
LLR grammars contains every LL(k) grammar for every k. For every LLR grammar there exists an LLR parser that parses the grammar in linear time.
Two nomenclative outlier parser types are LL(*) and LL(finite). A parser is called LL(*)/LL(finite) if it uses the LL(*)/LL(finite) parsing strategy. LL(*) and LL(finite) parsers are functionally more closely resemblant to
PEG parsers. An LL(finite) parser can parse an arbitrary LL(k) grammar optimally in the amount of lookahead and lookahead comparisons. The class of grammars parsable by the LL(*) strategy encompasses some context-sensitive languages due to the use of syntactic and semantic predicates and has not been identified. It has been suggested that LL(*) parsers are better thought of as
TDPL parsers.
Against the popular misconception, LL(*) parsers are not LLR in general, and are guaranteed by construction to perform worse on average (super-linear against linear time) and far worse in the worst-case (exponential against linear time).
LL grammars, particularly LL(1) grammars, are of great practical interest, as parsers for these grammars are easy to construct, and many
computer language
A computer language is a formal language used to communicate with a computer. Types of computer languages include:
* Construction language – all forms of communication by which a human can specify an executable problem solution to a compu ...
s are designed to be LL(1) for this reason. LL parsers may be table-based, i.e. similar to
LR parser
In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parsers ...
s, but LL grammars can also be parsed by
recursive descent parser
In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...
s. According to Waite and Goos (1984), LL(''k'') grammars were introduced by Stearns and Lewis (1969).
Overview
For a given
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
, the parser attempts to find the
leftmost derivation
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
.
Given an example grammar
:
#
#
#
the leftmost derivation for
is:
:
Generally, there are multiple possibilities when selecting a rule to expand the leftmost non-terminal. In step 2 of the previous example, the parser must choose whether to apply rule 2 or rule 3:
:
To be efficient, the parser must be able to make this choice deterministically when possible, without backtracking. For some grammars, it can do this by peeking on the unread input (without reading). In our example, if the parser knows that the next unread symbol is
, the only correct rule that can be used is 2.
Generally, an
parser can look ahead at
symbols. However, given a grammar, the problem of determining if there exists a
parser for some
that recognizes it is undecidable. For each
, there is a language that cannot be recognized by an
parser, but can be by an
.
We can use the above analysis to give the following formal definition:
Let
be a context-free grammar and
. We say that
is
, if and only if for any two leftmost derivations:
#
#
the following condition holds: the prefix of the string
of length
equals the prefix of the string
of length
implies
.
In this definition,
is the start symbol and
any non-terminal. The already derived input
, and yet unread
and
are strings of terminals. The Greek letters
,
and
represent any string of both terminals and non-terminals (possibly empty). The prefix length corresponds to the lookahead buffer size, and the definition says that this buffer is enough to distinguish between any two derivations of different words.
Parser
The
parser is a
deterministic pushdown automaton In automata theory, a deterministic pushdown automaton (DPDA or DPA) is a variation of the pushdown automaton. The class of deterministic pushdown automata accepts the deterministic context-free languages, a proper subset of context-free languages. ...
with the ability to peek on the next
input symbols without reading. This peek capability can be emulated by storing the lookahead buffer contents in the finite state space, since both buffer and input alphabet are finite in size. As a result, this does not make the automaton more powerful, but is a convenient abstraction.
The stack alphabet is
, where:
*
is the set of non-terminals;
*
the set of terminal (input) symbols with a special end-of-input (EOI) symbol
.
The parser stack initially contains the starting symbol above the EOI: