Backus–Naur form
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, Backus–Naur form () or Backus normal form (BNF) is a metasyntax notation for
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
s, often used to describe the
syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituenc ...
of
languages Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
used in computing, such as computer
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s, document formats,
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ...
s and communication protocols. It is applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory. Many extensions and variants of the original Backus–Naur notation are used; some are exactly defined, including extended Backus–Naur form (EBNF) and augmented Backus–Naur form (ABNF).


Overview

A BNF specification is a set of derivation rules, written as ::= __expression__ where: * <
symbol A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
> is a '' nonterminal'' (variable) and the __expression__ consists of one or more sequences of either terminal or nonterminal symbols; * means that the symbol on the left must be replaced with the expression on the right. * more sequences f symbolsare separated by the vertical bar ", ", indicating a choice, the whole being a possible substitution for the symbol on the left. Symbols that never appear on a left side are ''
terminal Terminal may refer to: Computing Hardware * Terminal (electronics), a device for joining electrical circuits together * Terminal (telecommunication), a device communicating over a line * Computer terminal, a set of primary input and output devi ...
s''. On the other hand, symbols that appear on a left side are '' non-terminals'' and are always enclosed between the pair <>.


Example

As an example, consider this possible BNF for a U.S. postal address: ::= ::= , ::= "." , ::= ::= "," ::= "Sr." , "Jr." , , "" ::= , "" This translates into English as: * A postal address consists of a name-part, followed by a street-address part, followed by a zip-code part. * A name-part consists of either: a personal-part followed by a last name followed by an optional
suffix In linguistics, a suffix is an affix which is placed after the stem of a word. Common examples are case endings, which indicate the grammatical case of nouns, adjectives, and verb endings, which form the conjugation of verbs. Suffixes can carr ...
(Jr., Sr., or dynastic number) and
end-of-line Newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a ...
, or a personal part followed by a name part (this rule illustrates the use of
recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematic ...
in BNFs, covering the case of people who use multiple first and middle names and initials). * A personal-part consists of either a first name or an
initial In a written or published work, an initial capital, also referred to as a drop capital or simply an initial cap, initial, initcapital, initcap or init or a drop cap or drop, is a letter at the beginning of a word, a chapter, or a paragraph tha ...
followed by a dot. * A street address consists of a house number, followed by a street name, followed by an optional
apartment An apartment (American English), or flat (British English, Indian English, South African English), is a self-contained housing unit (a type of residential real estate) that occupies part of a building, generally on a single story. There are ma ...
specifier, followed by an end-of-line. * A zip-part consists of a
town A town is a human settlement. Towns are generally larger than villages and smaller than cities, though the criteria to distinguish between them vary considerably in different parts of the world. Origin and use The word "town" shares an o ...
-name, followed by a comma, followed by a
state code Several sets of codes and abbreviations are used to represent the political divisions of the United States for postal addresses, data processing, general abbreviations, and other purposes. Table This table includes abbreviations for three inde ...
, followed by a ZIP-code followed by an end-of-line. * An opt-suffix-part consists of a suffix, such as "Sr.", "Jr." or a roman-numeral, or an empty string (i.e. nothing). * An opt-apt-num consists of an apartment number or an empty string (i.e. nothing). Note that many things (such as the format of a first-name, apartment number, ZIP-code, and Roman numeral) are left unspecified here. If necessary, they may be described using additional BNF rules.


History

The idea of describing the structure of language using rewriting rules can be traced back to at least the work of
Pāṇini , era = ;;6th–5th century BCE , region = Indian philosophy , main_interests = Grammar, linguistics , notable_works = ' ( Classical Sanskrit) , influenced= , notable_ideas= Descriptive linguistics (Devana ...
, an ancient Indian Sanskrit grammarian and a revered scholar in Hinduism who lived sometime between the 6th and 4th century BC. His notation to describe
Sanskrit Sanskrit (; attributively , ; nominally , , ) is a classical language belonging to the Indo-Aryan languages, Indo-Aryan branch of the Indo-European languages. It arose in South Asia after its predecessor languages had Trans-cultural diffusion ...
word structure is equivalent in power to that of Backus and has many similar properties. In Western society, grammar was long regarded as a subject for teaching, rather than scientific study; descriptions were informal and targeted at practical usage. In the first half of the 20th century,
linguists Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Lingui ...
such as
Leonard Bloomfield Leonard Bloomfield (April 1, 1887 – April 18, 1949) was an American linguist who led the development of structural linguistics in the United States during the 1930s and the 1940s. He is considered to be the father of American distributionalis ...
and
Zellig Harris Zellig Sabbettai Harris (; October 23, 1909 – May 22, 1992) was an influential American linguistics, linguist, mathematical syntactician, and methodologist of science. Originally a Semitic languages, Semiticist, he is best known for his work i ...
started attempts to formalize the description of language, including
phrase structure Phrase structure rules are a type of rewrite rule used to describe a given language's syntax and are closely associated with the early stages of transformational grammar, proposed by Noam Chomsky in 1957. They are used to break down a natural langu ...
. Meanwhile, string rewriting rules as
formal logical systems Mathematical logic is the study of formal logic within mathematics. Major subareas include model theory, proof theory, set theory, and recursion theory. Research in mathematical logic commonly addresses the mathematical properties of formal sy ...
were introduced and studied by mathematicians such as Axel Thue (in 1914), Emil Post (1920s–40s) and
Alan Turing Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical ...
(1936).
Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky i ...
, teaching linguistics to students of
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
at MIT, combined linguistics and mathematics by taking what is essentially Thue's formalism as the basis for the description of the syntax of natural language. He also introduced a clear distinction between generative rules (those of
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
s) and transformation rules (1956). John Backus, a programming language designer at IBM, proposed a metalanguage of "metalinguistic formulas" to describe the syntax of the new programming language IAL, known today as ALGOL 58 (1959). His notation was first used in the ALGOL 60 report. BNF is a notation for Chomsky's context-free grammars. Backus was familiar with Chomsky's work. As proposed by Backus, the formula defined "classes" whose names are enclosed in angle brackets. For example, <ab>. Each of these names denotes a class of basic symbols. Further development of ALGOL led to ALGOL 60. In the committee's 1963 report, Peter Naur called Backus's notation ''Backus normal form''. Donald Knuth argued that BNF should rather be read as ''Backus–Naur form'', as it is "not a normal form in the conventional sense", unlike, for instance,
Chomsky normal form In formal language theory, a context-free grammar, ''G'', is said to be in Chomsky normal form (first described by Noam Chomsky) if all of its production rules are of the form: : ''A'' → ''BC'',   or : ''A'' → ''a'',   or : ''S'' → ...
. The name ''Pāṇini Backus form'' was also once suggested in view of the fact that the expansion ''Backus normal form'' may not be accurate, and that
Pāṇini , era = ;;6th–5th century BCE , region = Indian philosophy , main_interests = Grammar, linguistics , notable_works = ' ( Classical Sanskrit) , influenced= , notable_ideas= Descriptive linguistics (Devana ...
had independently developed a similar notation earlier. BNF is described by Peter Naur in the ALGOL 60 report as ''metalinguistic formula'':Revised ALGOL 60 report section. 1.1. Another example from the ALGOL 60 report illustrates a major difference between the BNF metalanguage and a Chomsky context-free grammar. Metalinguistic variables do not require a rule defining their formation. Their formation may simply be described in natural language within the <> brackets. The following ALGOL 60 report section 2.3 comments specification, exemplifies how this works:
For the purpose of including text among the symbols of a program the following "comment" conventions hold: Equivalence here means that any of the three structures shown in the left column may be replaced, in any occurrence outside of strings, by the symbol shown in the same line in the right column without any effect on the action of the program.
Naur changed two of Backus's symbols to commonly available characters. The ::= symbol was originally a :≡. The , symbol was originally the word "" (with a bar over it). BNF is very similar to canonical-form
boolean algebra In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variables are the truth values ''true'' and ''false'', usually denoted 1 and 0, whereas i ...
equations that are, and were at the time, used in logic-circuit design. Backus was a mathematician and the designer of the FORTRAN programming language. Studies of boolean algebra is commonly part of a mathematics curriculum. Neither Backus nor Naur described the names enclosed in < > as non-terminals. Chomsky's terminology was not originally used in describing BNF. Naur later described them as classes in ALGOL course materials. In the ALGOL 60 report they were called metalinguistic variables. Anything other than the metasymbols ::=, , , and class names enclosed in < > are symbols of the language being defined. The metasymbol ::= is to be interpreted as "is defined as". The , is used to separate alternative definitions and is interpreted as "or". The metasymbols < > are delimiters enclosing a class name. BNF is described as a metalanguage for talking about ALGOL by Peter Naur and Saul Rosen. In 1947 Saul Rosen became involved in the activities of the fledgling
Association for Computing Machinery The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 and is the world's largest scientific and educational computing society. The ACM is a non-profit professional member ...
, first on the languages committee that became the IAL group and eventually led to ALGOL. He was the first managing editor of the Communications of the ACM. BNF was first used as a metalanguage to talk about the ALGOL language in the ALGOL 60 report. That is how it is explained in ALGOL programming course material developed by Peter Naur in 1962. Early ALGOL manuals by IBM, Honeywell, Burroughs and Digital Equipment Corporation followed the ALGOL 60 report using it as a metalanguage. Saul Rosen in his book describes BNF as a metalanguage for talking about ALGOL. An example of its use as a metalanguage would be in defining an arithmetic expression: The first symbol of an alternative may be the class being defined, the repetition, as explained by Naur, having the function of specifying that the alternative sequence can recursively begin with a previous alternative and can be repeated any number of times. For example, above <expr> is defined as a <term> followed by any number of <addop> <term>. In some later metalanguages, such as Schorre's
META II META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as: Each ''syntax equation' ...
, the BNF recursive repeat construct is replaced by a sequence operator and target language symbols defined using quoted strings. The < and > brackets were removed. Parentheses () for mathematical grouping were added. The <expr> rule would appear in META II as These changes enabled META II and its derivative programming languages to define and extend their own metalanguage, at the cost of the ability to use a natural language description, metalinguistic variable, language construct description. Many spin-off metalanguages were inspired by BNF. See
META II META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as: Each ''syntax equation' ...
, TREE-META, and Metacompiler. A BNF class describes a language construct formation, with formation defined as a pattern or the action of forming the pattern. The class name expr is described in a natural language as a <term> followed by a sequence <addop> <term>. A class is an abstraction; we can talk about it independent of its formation. We can talk about term, independent of its definition, as being added or subtracted in expr. We can talk about a term being a specific data type and how an expr is to be evaluated having specific combinations of data types, or even reordering an expression to group data types and evaluation results of mixed types. The natural-language supplement provided specific details of the language class semantics to be used by a compiler implementation and a programmer writing an ALGOL program. Natural-language description further supplemented the syntax as well. The integer rule is a good example of natural and metalanguage used to describe syntax: There are no specifics on white space in the above. As far as the rule states, we could have space between the digits. In the natural language we complement the BNF metalanguage by explaining that the digit sequence can have no white space between the digits. English is only one of the possible natural languages. Translations of the ALGOL reports were available in many natural languages. The origin of BNF is not as important as its impact on programming language development. During the period immediately following the publication of the ALGOL 60 report BNF was the basis of many
compiler-compiler In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler ...
systems. Some, like "A Syntax Directed Compiler for ALGOL 60" developed by Edgar T. Irons and "A Compiler Building System" Developed by Brooker and Morris, directly used BNF. Others, like the Schorre Metacompilers, made it into a programming language with only a few changes. <class name> became symbol identifiers, dropping the enclosing <,> and using quoted strings for symbols of the target language. Arithmetic-like grouping provided a simplification that removed using classes where grouping was its only value. The META II arithmetic expression rule shows grouping use. Output expressions placed in a META II rule are used to output code and labels in an assembly language. Rules in META II are equivalent to a class definitions in BNF. The Unix utility
yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
is based on BNF with code production similar to META II. yacc is most commonly used as a parser generator, and its roots are obviously BNF. BNF today is one of the oldest computer-related languages still in use.


Further examples

BNF's syntax itself may be represented with a BNF like the following: ::= , ::= "<" ">" "::=" ::= " " , "" ::= , ", " ::= , ::= , ::= , "<" ">" ::= '"' '"' , "'" "'" ::= "" , ::= '' , ::= , , ::= "A" , "B" , "C" , "D" , "E" , "F" , "G" , "H" , "I" , "J" , "K" , "L" , "M" , "N" , "O" , "P" , "Q" , "R" , "S" , "T" , "U" , "V" , "W" , "X" , "Y" , "Z" , "a" , "b" , "c" , "d" , "e" , "f" , "g" , "h" , "i" , "j" , "k" , "l" , "m" , "n" , "o" , "p" , "q" , "r" , "s" , "t" , "u" , "v" , "w" , "x" , "y" , "z" ::= "0" , "1" , "2" , "3" , "4" , "5" , "6" , "7" , "8" , "9" ::= ", " , " " , "!" , "#" , "$" , "%" , "&" , "(" , ")" , "*" , "+" , "," , "-" , "." , "/" , ":" , ";" , ">" , "=" , "<" , "?" , "@" , " "\" , " , "^" , "_" , "`" , "" , "~" ::= , "'" ::= , '"' ::= , ::= , , "-" Note that "" is the
empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special c ...
. The original BNF did not use quotes as shown in <literal> rule. This assumes that no whitespace is necessary for proper interpretation of the rule. <EOL> represents the appropriate line-end specifier (in
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
, carriage-return, line-feed or both depending on the
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
). <rule-name> and <text> are to be substituted with a declared rule's name/label or literal text, respectively. In the U.S. postal address example above, the entire block-quote is a syntax. Each line or unbroken grouping of lines is a rule; for example one rule begins with <name-part> ::=. The other part of that rule (aside from a line-end) is an expression, which consists of two lists separated by a pipe , . These two lists consists of some terms (three terms and two terms, respectively). Each term in this particular rule is a rule-name.


Variants


EBNF

There are many variants and extensions of BNF, generally either for the sake of simplicity and succinctness, or to adapt it to a specific application. One common feature of many variants is the use of regular expression repetition operators such as * and +. The extended Backus–Naur form (EBNF) is a common one. Another common extension is the use of square brackets around optional items. Although not present in the original ALGOL 60 report (instead introduced a few years later in IBM's PL/I definition), the notation is now universally recognised.


ABNF

Augmented Backus–Naur form (ABNF) and Routing Backus–Naur form (RBNF) are extensions commonly used to describe
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements an ...
(IETF)
protocol Protocol may refer to: Sociology and politics * Protocol (politics), a formal agreement between nation states * Protocol (diplomacy), the etiquette of diplomacy and affairs of state * Etiquette, a code of personal behavior Science and technology ...
s. Parsing expression grammars build on the BNF and regular expression notations to form an alternative class of formal grammar, which is essentially
analytic Generally speaking, analytic (from el, ἀναλυτικός, ''analytikos'') refers to the "having the ability to analyze" or "division into elements or principles". Analytic or analytical can also have the following meanings: Chemistry * ...
rather than
generative Generative may refer to: * Generative actor, a person who instigates social change * Generative art, art that has been created using an autonomous system that is frequently, but not necessarily, implemented using a computer * Generative music, mus ...
in character.


Others

Many BNF specifications found online today are intended to be human-readable and are non-formal. These often include many of the following syntax rules and extensions: * Optional items enclosed in square brackets: lt;item-x>/code>. * Items existing 0 or more times are enclosed in curly brackets or suffixed with an asterisk (*) such as <word> ::= <letter> or <word> ::= <letter> <letter>* respectively. * Items existing 1 or more times are suffixed with an addition (plus) symbol, +, such as <word> ::= <letter>+. * Terminals may appear in bold rather than italics, and non-terminals in plain text rather than angle brackets. * Where items are grouped, they are enclosed in simple parentheses.


Software using BNF

* ANTLR, another parser generator written in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
* Qlik Sense, a BI tool, uses a variant of BNF for scripting * BNF Converter (BNFC), operating on a variant called "labeled Backus–Naur form" (LBNF). In this variant, each production for a given non-terminal is given a label, which can be used as a constructor of an algebraic data type representing that nonterminal. The converter is capable of producing types and parsers for abstract syntax in several languages, including Haskell and Java. * Coco/R, compiler generator accepting an attributed grammar in EBNF * DMS Software Reengineering Toolkit, program analysis and transformation system for arbitrary languages *
GOLD Gold is a chemical element with the symbol Au (from la, aurum) and atomic number 79. This makes it one of the higher atomic number elements that occur naturally. It is a bright, slightly orange-yellow, dense, soft, malleable, and ductile ...
BNF parser *
GNU bison GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in the BNF notation (a context-free language), warns about any parsing ambiguities, and generates a parser that reads sequen ...
, GNU version of yacc * RPA BNF parser. Online (PHP) demo parsing: JavaScript, XML * XACT X4MR System, a rule-based expert system for programming language translation * XPL Analyzer, a tool which accepts simplified BNF for a language and produces a parser for that language in XPL; it may be integrated into the supplied SKELETON program, with which the language may be debugged (a SHARE contributed program, which was preceded by ''A Compiler Generator'') *
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
, parser generator (most commonly used with the Lex preprocessor) * bnfparser2, a universal syntax verification utility * bnf2xml, Markup input with XML tags using advanced BNF matching. * JavaCC, Java Compiler Compiler tm (JavaCC tm) - The Java Parser Generator.
Racket's parser tools
lex and yacc-style Parsing (Beautiful Racket edition)


See also

* Compiler Description Language (CDL) * Syntax diagram – railroad diagram * Translational Backus–Naur form (TBNF) *
Wirth syntax notation Wirth syntax notation (WSN) is a metasyntax, that is, a formal way to describe formal languages. Originally proposed by Niklaus Wirth in 1977 as an alternative to Backus–Naur form (BNF). It has several advantages over BNF in that it contains an ex ...
– an alternative to BNF from 1977 * Definite clause grammar – a more expressive alternative to BNF used in Prolog * Van Wijngaarden grammar – used in preference to BNF to define Algol68 * Meta-II – an early compiler writing tool and notation


References


External links

* . * — Augmented BNF for Syntax Specifications: ABNF. * — Routing BNF: A Syntax Used in Various Protocol Specifications. * ISO/IEC 14977:1996(E) ''Information technology – Syntactic metalanguage – Extended BNF'', available from or from (the latter is missing the cover page, but is otherwise much cleaner)


Language grammars

* , the original BNF. * , freely available BNF grammars for SQL. * , freely available BNF grammars for SQL,
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, T ...
,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
. * , freely available BNF/ EBNF grammars for C/C++,
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Frenc ...
,
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily u ...
,
Ada 95 Ada is a structured, statically typed, imperative, and object-oriented high-level programming language, extended from Pascal and other languages. It has built-in language support for '' design by contract'' (DbC), extremely strong typing, e ...
, PL/I. * . Includes parts 11, 14, and 21 of the ISO 10303 (STEP) standard. {{DEFAULTSORT:Backus-Naur Form Formal languages Compiler construction Metalanguages