In
computer science, an LL parser (Left-to-right,
leftmost derivation
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
) is a
top-down parser for a restricted
context-free language. It parses the input from Left to right, performing
Leftmost derivation of the sentence.
An LL parser is called an LL(''k'') parser if it uses ''k''
tokens of
lookahead when parsing a sentence. A grammar is called an
LL(''k'') grammar if an LL(''k'') parser can be constructed from it. A formal language is called an LL(''k'') language if it has an LL(''k'') grammar. The set of LL(''k'') languages is properly contained in that of LL(''k''+1) languages, for each ''k'' ≥ 0. A corollary of this is that not all context-free languages can be recognized by an LL(''k'') parser.
An LL parser is called LL-regular (LLR) if it parses an
LL-regular language. The class of
LLR grammars contains every LL(k) grammar for every k. For every LLR grammar there exists an LLR parser that parses the grammar in linear time.
Two nomenclative outlier parser types are LL(*) and LL(finite). A parser is called LL(*)/LL(finite) if it uses the LL(*)/LL(finite) parsing strategy. LL(*) and LL(finite) parsers are functionally more closely resemblant to
PEG parsers. An LL(finite) parser can parse an arbitrary LL(k) grammar optimally in the amount of lookahead and lookahead comparisons. The class of grammars parsable by the LL(*) strategy encompasses some context-sensitive languages due to the use of syntactic and semantic predicates and has not been identified. It has been suggested that LL(*) parsers are better thought of as
TDPL parsers.
Against the popular misconception, LL(*) parsers are not LLR in general, and are guaranteed by construction to perform worse on average (super-linear against linear time) and far worse in the worst-case (exponential against linear time).
LL grammars, particularly LL(1) grammars, are of great practical interest, as parsers for these grammars are easy to construct, and many
computer languages are designed to be LL(1) for this reason. LL parsers may be table-based, i.e. similar to
LR parsers, but LL grammars can also be parsed by
recursive descent parsers. According to Waite and Goos (1984), LL(''k'') grammars were introduced by Stearns and Lewis (1969).
Overview
For a given
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
, the parser attempts to find the
leftmost derivation
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
.
Given an example grammar
:
#
#
#
the leftmost derivation for
is:
:
Generally, there are multiple possibilities when selecting a rule to expand the leftmost non-terminal. In step 2 of the previous example, the parser must choose whether to apply rule 2 or rule 3:
:
To be efficient, the parser must be able to make this choice deterministically when possible, without backtracking. For some grammars, it can do this by peeking on the unread input (without reading). In our example, if the parser knows that the next unread symbol is
, the only correct rule that can be used is 2.
Generally, an
parser can look ahead at
symbols. However, given a grammar, the problem of determining if there exists a
parser for some
that recognizes it is undecidable. For each
, there is a language that cannot be recognized by an
parser, but can be by an
.
We can use the above analysis to give the following formal definition:
Let
be a context-free grammar and
. We say that
is
, if and only if for any two leftmost derivations:
#
#
the following condition holds: the prefix of the string
of length
equals the prefix of the string
of length
implies
.
In this definition,
is the start symbol and
any non-terminal. The already derived input
, and yet unread
and
are strings of terminals. The Greek letters
,
and
represent any string of both terminals and non-terminals (possibly empty). The prefix length corresponds to the lookahead buffer size, and the definition says that this buffer is enough to distinguish between any two derivations of different words.
Parser
The
parser is a
deterministic pushdown automaton with the ability to peek on the next
input symbols without reading. This peek capability can be emulated by storing the lookahead buffer contents in the finite state space, since both buffer and input alphabet are finite in size. As a result, this does not make the automaton more powerful, but is a convenient abstraction.
The stack alphabet is
, where:
*
is the set of non-terminals;
*
the set of terminal (input) symbols with a special end-of-input (EOI) symbol
.
The parser stack initially contains the starting symbol above the EOI: