The dangling else is a problem in
programming of
parser generator
In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.
The most common type of compiler- ...
s in which an optional else clause in an
if–then(–else) statement results in nested conditionals being ambiguous. Formally, the reference
context-free grammar of the language is
ambiguous, meaning there is more than one correct
parse tree
A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
.
In many
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
s one may write conditionally executed code in two forms: the if-then form, and the if-then-else form – the else clause is optional:
if a then s
if b then s1 else s2
This gives rise to an ambiguity in interpretation when there are nested statements, specifically whenever an if-then form appears as
s1
in an if-then-else form:
if a then if b then s else s2
In this example,
s
is unambiguously executed when
a
is true and
b
is true, but one may interpret
s2
as being executed when
a
is false (thus attaching the else to the first if) or when
a
is true and
b
is false (thus attaching the else to the second if). In other words, one may see the previous statement as either of the following expressions:
if a then (if b then s) else s2
if a then (if b then s else s2)
The dangling else problem dates to
ALGOL 60
ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a k ...
, and has been resolved in various ways in subsequent languages. In
LR parser
In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parse ...
s, the dangling else is the archetypal example of a
shift-reduce conflict.
Avoiding ambiguity while keeping the syntax
This is a problem that often comes up in
compiler construction
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
, especially
scannerless parsing
In computer science, scannerless parsing (also called lexerless parsing) performs tokenization (breaking a stream of characters into words) and parsing (arranging the words into phrases) in a single step, rather than breaking it up into a pipeli ...
. The convention when dealing with the dangling else is to attach the else to the nearby if statement,
allowing for unambiguous context-free grammars, in particular. Programming languages like Pascal, C and Java follow this convention, so there is no ambiguity in the semantics of the ''language'', though the use of a parser generator may lead to ambiguous ''grammars''. In these cases alternative grouping is accomplished by explicit blocks, such as
begin...end
in Pascal and
in C.
Depending on the compiler construction approach, one may take different corrective actions to avoid ambiguity:
*If the parser is produced by an SLR, LR(1) or LALR
LR parser
In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parse ...
generator, the programmer will often rely on the generated parser feature of preferring shift over reduce whenever there is a conflict.
Alternatively, the grammar can be rewritten to remove the conflict, at the expense of an increase in grammar size (see
below).
*If the parser is hand written, the programmer may use a ''non-ambiguous'' context-free grammar. Alternatively, one may rely on a non-context-free grammar or a
parsing expression grammar
In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...
.
Avoiding ambiguity by changing the syntax
The problem can also be solved by making explicit the link between an else and its if, within the syntax. This usually helps avoid human errors.
Possible solutions are:
*Having an "end if" symbol delimiting the end of the if construct. Examples of such languages are
ALGOL 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously d ...
,
Ada
Ada may refer to:
Places
Africa
* Ada Foah, a town in Ghana
* Ada (Ghana parliament constituency)
* Ada, Osun, a town in Nigeria
Asia
* Ada, Urmia, a village in West Azerbaijan Province, Iran
* Ada, Karaman, a village in Karaman Province, ...
,
Eiffel
Eiffel may refer to:
Places
* Eiffel Peak, a summit in Alberta, Canada
* Champ de Mars – Tour Eiffel station, Paris, France; a transit station
Structures
* Eiffel Tower, in Paris, France, designed by Gustave Eiffel
* Eiffel Bridge, Ungheni, M ...
,
PL/SQL
PL/SQL (Procedural Language for SQL) is Oracle Corporation's procedural extension for SQL and the Oracle relational database. PL/SQL is available in Oracle Database (since version 6 - stored PL/SQL procedures/functions/packages/triggers since ...
,
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to:
* Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET
* Visual Basic (cl ...
,
Modula-2
Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It w ...
, and
AppleScript
AppleScript is a scripting language created by Apple Inc. that facilitates automated control over scriptable Mac applications. First introduced in System 7, it is currently included in all versions of macOS as part of a package of system aut ...
.
*Disallowing the statement following a "then" to be an "if" itself (it may however be a pair of statement brackets containing only an if-then-clause). This approach is followed by
ALGOL 60
ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a k ...
.
*Requiring braces (parenthesize) when an "else" follows an "if".
*Requiring every "if" to be paired with an "else". To avoid a similar problem concerning semantics rather than syntax,
Racket deviates from
Scheme A scheme is a systematic plan for the implementation of a certain idea.
Scheme or schemer may refer to:
Arts and entertainment
* ''The Scheme'' (TV series), a BBC Scotland documentary series
* The Scheme (band), an English pop band
* ''The Schem ...
by considering an
if
without a fallback clause to be an error, effectively distinguishing conditional ''expressions'' (i.e
if
) from conditional ''statements'' (i.e
when
and
unless
, which do not have fallback clauses).
*Using different keywords for the one-alternative and two-alternative "if" statements.
S-algol
S-algol (St Andrews Algol) is a computer programming language derivative of ALGOL 60 developed at the University of St Andrews in 1979 by Ron Morrison and Tony Davie. The language is a modification of ALGOL to contain orthogonal data types that Mo ...
uses
if e do s
for the one-alternative case and
if e1 then e2 else e3
for the general case.
* Requiring braces unconditionally, like
Swift
Swift or SWIFT most commonly refers to:
* SWIFT, an international organization facilitating transactions between banks
** SWIFT code
* Swift (programming language)
* Swift (bird), a family of birds
It may also refer to:
Organizations
* SWIFT, ...
. This is effectively true in
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
as its indentation rules delimit every block, not just those in "if" statements.
Examples
Concrete examples follow.
C
In
C, the grammar reads, in part:
statement = ...
, selection-statement
selection-statement = ...
, IF ( expression ) statement
, IF ( expression ) statement ELSE statement
Thus, without further rules, the statement
if (a) if (b) s; else s2;
could ambiguously be parsed as if it were either:
if (a)
or:
if (a)
else
s2;
In practice in C the first tree is chosen, by associating the
else
with the nearest
if
.
Avoiding the conflict in LR parsers
The above example could be rewritten in the following way to remove the ambiguity :
statement: open_statement
, closed_statement
;
open_statement: IF '(' expression ')' statement
, IF '(' expression ')' closed_statement ELSE open_statement
;
closed_statement: non_if_statement
, IF '(' expression ')' closed_statement ELSE closed_statement
;
non_if_statement: ...
;
Any other statement-related grammar rules may also have to be duplicated in this way if they may directly or indirectly end with a
statement
or
selection-statement
non-terminal.
However, we give grammar that includes both of if and while statements.
statement: open_statement
, closed_statement
;
open_statement: IF '(' expression ')' statement
, IF '(' expression ')' closed_statement ELSE open_statement
, WHILE '(' expression ')' open_statement
;
closed_statement: simple_statement
, IF '(' expression ')' closed_statement ELSE closed_statement
, WHILE '(' expression ')' closed_statement
;
simple_statement: ...
;
Finally, we give the grammar that forbids ambiguous IF statements.
statement: open_statement
, closed_statement
;
open_statement: IF '(' expression ')' simple_statement
, IF '(' expression ')' open_statement
, IF '(' expression ')' closed_statement ELSE open_statement
, WHILE '(' expression ')' open_statement
;
closed_statement: simple_statement
, IF '(' expression ')' closed_statement ELSE closed_statement
, WHILE '(' expression ')' closed_statement
;
simple_statement: ...
;
With this grammar parsing of "if (a) if (b) c else d" fails:
statement
open_statement
IF '(' expression ')' closed_statement ELSE open_statement
'if' '(' 'a' ')' closed_statement 'else' 'd'
and then the parsing fails trying to match
closed_statement
to "if (b) c". An attempt with
closed_statement
fails in the same way.
See also
*
The lexer hack In computer programming, the lexer hack is a common solution to the problems in parsing ANSI C, due to the reference grammar being context-sensitive. In C, classifying a sequence of characters as a variable name or a type name requires contextual i ...
*
Most vexing parse
The most vexing parse is a counterintuitive form of syntactic ambiguity resolution in the C++ programming language. In certain situations, the C++ grammar cannot distinguish between the creation of an object parameter and specification of a functi ...
References
{{DEFAULTSORT:Dangling Else
Parsing
Computer programming
Ambiguity
Conditional constructs