Extended Backus–Naur Form
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, extended Backus–Naur form (EBNF) is a family of
metasyntax In logic and computer science, a metasyntax describes the allowable structure and composition of phrases and sentences of a metalanguage, which is used to describe either a natural language or a computer programming language.Sellink, Alex, and Chr ...
notations, any of which can be used to express a
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
. EBNF is used to make a formal description of a
formal language In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...
such as a computer
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
. They are extensions of the basic Backus–Naur form (BNF) metasyntax notation. The earliest EBNF was developed by
Niklaus Wirth Niklaus Emil Wirth (born 15 February 1934) is a Swiss computer scientist. He has designed several programming languages, including Pascal (programming language), Pascal, and pioneered several classic topics in software engineering. In 1984, he w ...
, incorporating some of the concepts (with a different syntax and notation) from
Wirth syntax notation Wirth syntax notation (WSN) is a metasyntax, that is, a formal way to describe formal languages. Originally proposed by Niklaus Wirth in 1977 as an alternative to Backus–Naur form (BNF). It has several advantages over BNF in that it contains an e ...
. Today, many variants of EBNF are in use. The
International Organization for Standardization The International Organization for Standardization (ISO ) is an international standard development organization composed of representatives from the national standards organizations of member countries. Membership requirements are given in Ar ...
adopted an EBNF
Standard Standard may refer to: Symbols * Colours, standards and guidons, kinds of military signs * Standard (emblem), a type of a large symbol or emblem used for identification Norms, conventions or requirements * Standard (metrology), an object th ...
, ISO/IEC 14977, in 1996. According to Zaytsev, however, this standard "only ended up adding yet another three dialects to the chaos" and, after noting its lack of success, also notes that the ISO EBNF is not even used in all ISO standards. Wheeler argues against using the ISO standard when using an EBNF and recommends considering alternative EBNF notations such as the one from the W3C Extensible Markup Language (XML) 1.0 (Fifth Edition). This article uses EBNF as specified by the ISO for examples applying to all EBNFs. Other EBNF variants use somewhat different syntactic conventions.


Basics

EBNF is a
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
that expresses the
syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituency) ...
of a formal language. An EBNF consists of
terminal symbol In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
s and non-terminal production rules which are the restrictions governing how terminal symbols can be combined into a legal sequence. Examples of terminal symbols include alphanumeric characters, punctuation marks, and
whitespace character In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
s. The EBNF defines production rules where sequences of symbols are respectively assigned to a
nonterminal In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
: digit excluding zero = "1" , "2" , "3" , "4" , "5" , "6" , "7" , "8" , "9" ; digit = "0" , digit excluding zero ; This production rule defines the nonterminal ''digit'' which is on the left side of the assignment. The vertical bar represents an alternative and the terminal symbols are enclosed with quotation marks followed by a semicolon as terminating character. Hence a ''digit'' is a ''0'' or a ''digit excluding zero'' that can be ''1'' or ''2'' or ''3'' and so forth until ''9''. A production rule can also include a sequence of terminals or nonterminals, each separated by a comma: twelve = "1", "2" ; two hundred one = "2", "0", "1" ; three hundred twelve = "3", twelve ; twelve thousand two hundred one = twelve, two hundred one ; Expressions that may be omitted or repeated can be represented through curly braces : natural number = digit excluding zero, ; In this case, the strings ''1'', ''2'', ..., ''10'', ..., ''10000'', ... are correct expressions. To represent this, everything that is set within the curly braces may be repeated arbitrarily often, including not at all. An option can be represented through squared brackets ... That is, everything that is set within the square brackets may be present just once, or not at all: integer = "0" , "-" natural number ; Therefore, an
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
is a zero (''0'') or a
natural number In mathematics, the natural numbers are those numbers used for counting (as in "there are ''six'' coins on the table") and ordering (as in "this is the ''third'' largest city in the country"). Numbers used for counting are called ''Cardinal n ...
that may be preceded by an optional
minus sign The plus and minus signs, and , are mathematical symbols used to represent the notions of positive and negative, respectively. In addition, represents the operation of addition, which results in a sum, while represents subtraction, resulti ...
. EBNF also provides, among other things, the syntax to describe repetitions (of a specified number of times), to exclude some part of a production, and to insert comments in an EBNF grammar.


Table of symbols

The following represents a proposed ISO/IEC 14977 standard, by R. S. Scowen, page 7, table 1.


Examples


Syntax diagram


EBNF

Even EBNF can be described using EBNF. Consider the sketched grammar below: letter = "A" , "B" , "C" , "D" , "E" , "F" , "G" , "H" , "I" , "J" , "K" , "L" , "M" , "N" , "O" , "P" , "Q" , "R" , "S" , "T" , "U" , "V" , "W" , "X" , "Y" , "Z" , "a" , "b" , "c" , "d" , "e" , "f" , "g" , "h" , "i" , "j" , "k" , "l" , "m" , "n" , "o" , "p" , "q" , "r" , "s" , "t" , "u" , "v" , "w" , "x" , "y" , "z" ; digit = "0" , "1" , "2" , "3" , "4" , "5" , "6" , "7" , "8" , "9" ; symbol = " " , "" , "(" , ")" , "<" , ">" , "'" , '"' , "=" , ", " , "." , "," , ";" ; character = letter , digit , symbol , "_" ; identifier = letter , ; terminal = "'" , character - "'" , , "'" , '"' , character - '"' , , '"' ; lhs = identifier ; rhs = identifier , terminal , " , rhs , " , "" , "(" , rhs , ")" , rhs , ", " , rhs , rhs , "," , rhs ; rule = lhs , "=" , rhs , ";" ; grammar = ;


Pascal

A
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Fren ...
-like programming language that allows only assignments can be defined in EBNF as follows: (* a simple program syntax in EBNF - Wikipedia *) program = 'PROGRAM', white space, identifier, white space, 'BEGIN', white space, , 'END.' ; identifier = alphabetic character, ; number = "-" digit, ; string = '"' , , '"' ; assignment = identifier , ":=" , ( number , identifier , string ) ; alphabetic character = "A" , "B" , "C" , "D" , "E" , "F" , "G" , "H" , "I" , "J" , "K" , "L" , "M" , "N" , "O" , "P" , "Q" , "R" , "S" , "T" , "U" , "V" , "W" , "X" , "Y" , "Z" ; digit = "0" , "1" , "2" , "3" , "4" , "5" , "6" , "7" , "8" , "9" ; white space = ? white space characters ? ; all characters = ? all visible characters ? ; For example, a syntactically correct program then could be: PROGRAM DEMO1 BEGIN A:=3; B:=45; H:=-100023; C:=A; D123:=B34A; BABOON:=GIRAFFE; TEXT:="Hello world!"; END. The language can easily be extended with
control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''imper ...
s, arithmetical expressions, and Input/Output instructions. Then a small, usable programming language would be developed.


Advantages over BNF

Any
grammar In linguistics, the grammar of a natural language is its set of structure, structural constraints on speakers' or writers' composition of clause (linguistics), clauses, phrases, and words. The term can also refer to the study of such constraint ...
defined in EBNF can also be represented in BNF, though representations in the latter are generally lengthier. E.g., options and repetitions cannot be directly expressed in BNF and require the use of an intermediate rule or alternative production defined to be either nothing or the optional production for option, or either the repeated production of itself, recursively, for repetition. The same constructs can still be used in EBNF. The BNF uses the symbols (<, >, , , ::=) for itself, but does not include quotes around terminal strings. This prevents these characters from being used in the languages, and requires a special symbol for the empty string. In EBNF, terminals are strictly enclosed within quotation marks ("..." or '...'). The angle brackets ("<>") for nonterminals can be omitted. BNF syntax can only represent a rule in one line, whereas in EBNF a terminating character, the semicolon character “;” marks the end of a rule. Furthermore, EBNF includes mechanisms for enhancements, defining the number of repetitions, excluding alternatives, comments, etc.


Conventions

As examples, the following syntax rules illustrate the facilities for expressing repetition: aa = "A"; bb = 3 * aa, "B"; cc = 3 * a "C"; dd = , "D"; ee = aa, , "E"; ff = 3 * aa, 3 * a "F"; gg = , "G"; Terminal strings defined by these rules are as follows: aa: A bb: AAAB cc: C AC AAC AAAC dd: D AD AAD AAAD AAAAD etc. ee: AE AAE AAAE AAAAE AAAAAE etc. ff: AAAF AAAAF AAAAAF AAAAAAF gg: G AAAG AAAAAAG etc.


Extensibility

According to the ISO 14977 standard EBNF is meant to be extensible, and two facilities are mentioned. The first is part of EBNF grammar, the special sequence, which is arbitrary text enclosed with question marks. The interpretation of the text inside a special sequence is beyond the scope of the EBNF standard. For example, the space character could be defined by the following rule: space = ? ASCII character 32 ?; The second facility for extension is using the fact that parentheses in EBNF cannot be placed next to identifiers (they must be concatenated with them). The following is valid EBNF: something = foo, ( bar ); The following is ''not'' valid EBNF: something = foo ( bar ); Therefore, an extension of EBNF could use that notation. For example, in a
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lisping ...
grammar, function application could be defined by the following rule: function application = list( symbol, );


Related work

* The
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
publishes an EBN
notation
* The
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
use
a different EBNF
to specify the
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
syntax. * The
British Standards Institution The British Standards Institution (BSI) is the national standards body of the United Kingdom. BSI produces technical standards on a wide range of products and services and also supplies certification and standards-related services to busines ...
published a standard for an EBNF: BS 6154 in 1981. * The
IETF The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and a ...
uses augmented BNF (ABNF), specified in .


See also

*
Meta-II META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as: Each ''syntax equation'' ...
– An early compiler writing tool and notation *
Phrase structure rules Phrase structure rules are a type of rewrite rule used to describe a given language's syntax and are closely associated with the early stages of transformational grammar, proposed by Noam Chomsky in 1957. They are used to break down a natural lang ...
– The direct equivalent of EBNF in natural languages *
Regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
*
Spirit Parser Framework The Spirit Parser Framework is an object oriented recursive descent parser generator framework implemented using template metaprogramming techniques. Expression templates allow users to approximate the syntax of extended Backus–Naur form (EBNF) ...


References


External links


ISO/IEC 14977 : 1996(E)


– A table by Pete Jinks comparing several syntaxes {{DEFAULTSORT:Extended Backus-Naur form Compiler construction Formal languages Metalanguages