Backus–Naur Form
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, Backus–Naur form () or Backus normal form (BNF) is a
metasyntax In logic and computer science, a metasyntax describes the allowable structure and composition of phrases and sentences of a metalanguage, which is used to describe either a natural language or a computer programming language.Sellink, Alex, and Chr ...
notation for
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
s, often used to describe the
syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituency) ...
of
languages Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
used in computing, such as computer
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s, document formats,
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s and
communication protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics (computer scien ...
s. It is applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory. Many extensions and variants of the original Backus–Naur notation are used; some are exactly defined, including
extended Backus–Naur form In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programm ...
(EBNF) and
augmented Backus–Naur form In computer science, augmented Backus–Naur form (ABNF) is a metalanguage based on Backus–Naur form (BNF), but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be use ...
(ABNF).


Overview

A BNF specification is a set of derivation rules, written as ::= __expression__ where: * <
symbol A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
> is a ''
nonterminal In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
'' (variable) and the __expression__ consists of one or more sequences of either terminal or nonterminal symbols; * means that the symbol on the left must be replaced with the expression on the right. * more sequences f symbolsare separated by the
vertical bar The vertical bar, , is a glyph with various uses in mathematics, computing, and typography. It has many names, often related to particular meanings: Sheffer stroke (in logic), pipe, bar, or (literally the word "or"), vbar, and others. Usage ...
", ", indicating a
choice A choice is the range of different things from which a being can choose. The arrival at a choice may incorporate motivators and models. For example, a traveler might choose a route for a journey based on the preference of arriving at a giv ...
, the whole being a possible substitution for the symbol on the left. Symbols that never appear on a left side are ''
terminal Terminal may refer to: Computing Hardware * Terminal (electronics), a device for joining electrical circuits together * Terminal (telecommunication), a device communicating over a line * Computer terminal, a set of primary input and output dev ...
s''. On the other hand, symbols that appear on a left side are '' non-terminals'' and are always enclosed between the pair <>.


Example

As an example, consider this possible BNF for a U.S.
postal address An address is a collection of information, presented in a mostly fixed format, used to give the location of a building, apartment, or other structure or a plot of land, generally using political boundaries and street names as references, along w ...
: ::= ::= , ::= "." , ::= ::= "," ::= "Sr." , "Jr." , , "" ::= , "" This translates into English as: * A postal address consists of a name-part, followed by a street-address part, followed by a zip-code part. * A name-part consists of either: a personal-part followed by a
last name In some cultures, a surname, family name, or last name is the portion of one's personal name that indicates one's family, tribe or community. Practices vary by culture. The family name may be placed at either the start of a person's full name, ...
followed by an optional
suffix In linguistics, a suffix is an affix which is placed after the stem of a word. Common examples are case endings, which indicate the grammatical case of nouns, adjectives, and verb endings, which form the conjugation of verbs. Suffixes can carry ...
(Jr., Sr., or dynastic number) and
end-of-line Newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a ...
, or a personal part followed by a name part (this rule illustrates the use of
recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
in BNFs, covering the case of people who use multiple first and middle names and initials). * A personal-part consists of either a
first name First or 1st is the ordinal form of the number one (#1). First or 1st may also refer to: *World record, specifically the first instance of a particular achievement Arts and media Music * 1$T, American rapper, singer-songwriter, DJ, and rec ...
or an
initial In a written or published work, an initial capital, also referred to as a drop capital or simply an initial cap, initial, initcapital, initcap or init or a drop cap or drop, is a letter at the beginning of a word, a chapter, or a paragraph that ...
followed by a dot. * A street address consists of a house number, followed by a street name, followed by an optional
apartment An apartment (American English), or flat (British English, Indian English, South African English), is a self-contained housing unit (a type of residential real estate) that occupies part of a building, generally on a single story. There are ma ...
specifier, followed by an end-of-line. * A zip-part consists of a
town A town is a human settlement. Towns are generally larger than villages and smaller than cities, though the criteria to distinguish between them vary considerably in different parts of the world. Origin and use The word "town" shares an ori ...
-name, followed by a comma, followed by a
state code Several sets of codes and abbreviations are used to represent the political divisions of the United States for postal addresses, data processing, general abbreviations, and other purposes. Table This table includes abbreviations for three inde ...
, followed by a ZIP-code followed by an end-of-line. * An opt-suffix-part consists of a suffix, such as "Sr.", "Jr." or a roman-numeral, or an empty string (i.e. nothing). * An opt-apt-num consists of an apartment number or an empty string (i.e. nothing). Note that many things (such as the format of a first-name, apartment number, ZIP-code, and Roman numeral) are left unspecified here. If necessary, they may be described using additional BNF rules.


History

The idea of describing the structure of language using rewriting rules can be traced back to at least the work of
Pāṇini , era = ;;6th–5th century BCE , region = Indian philosophy , main_interests = Grammar, linguistics , notable_works = ' (Sanskrit#Classical Sanskrit, Classical Sanskrit) , influenced= , notable_ideas=Descript ...
, an ancient Indian Sanskrit grammarian and a revered scholar in Hinduism who lived sometime between the 6th and 4th century BC. His notation to describe
Sanskrit Sanskrit (; attributively , ; nominally , , ) is a classical language belonging to the Indo-Aryan branch of the Indo-European languages. It arose in South Asia after its predecessor languages had diffused there from the northwest in the late ...
word structure is equivalent in power to that of Backus and has many similar properties. In Western society, grammar was long regarded as a subject for teaching, rather than scientific study; descriptions were informal and targeted at practical usage. In the first half of the 20th century,
linguists Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
such as
Leonard Bloomfield Leonard Bloomfield (April 1, 1887 – April 18, 1949) was an American linguist who led the development of structural linguistics in the United States during the 1930s and the 1940s. He is considered to be the father of American distributionalism ...
and
Zellig Harris Zellig Sabbettai Harris (; October 23, 1909 – May 22, 1992) was an influential American linguist, mathematical syntactician, and methodologist of science. Originally a Semiticist, he is best known for his work in structural linguistics and dis ...
started attempts to formalize the description of language, including
phrase structure Phrase structure rules are a type of rewrite rule used to describe a given language's syntax and are closely associated with the early stages of transformational grammar, proposed by Noam Chomsky in 1957. They are used to break down a natural langu ...
. Meanwhile, string rewriting rules as formal logical systems were introduced and studied by mathematicians such as
Axel Thue Axel Thue (; 19 February 1863 – 7 March 1922) was a Norwegian mathematician, known for his original work in diophantine approximation and combinatorics. Work Thue published his first important paper in 1909. He stated in 1914 the so-calle ...
(in 1914),
Emil Post Emil Leon Post (; February 11, 1897 – April 21, 1954) was an American mathematician and logician. He is best known for his work in the field that eventually became known as computability theory. Life Post was born in Augustów, Suwałki Gove ...
(1920s–40s) and
Alan Turing Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical com ...
(1936).
Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky is ...
, teaching linguistics to students of
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
at
MIT The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the m ...
, combined linguistics and mathematics by taking what is essentially Thue's formalism as the basis for the description of the syntax of
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
. He also introduced a clear distinction between generative rules (those of
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
s) and transformation rules (1956).
John Backus John Warner Backus (December 3, 1924 – March 17, 2007) was an American computer scientist. He directed the team that invented and implemented FORTRAN, the first widely used high-level programming language, and was the inventor of the Back ...
, a programming language designer at IBM, proposed a
metalanguage In logic and linguistics, a metalanguage is a language used to describe another language, often called the ''object language''. Expressions in a metalanguage are often distinguished from those in the object language by the use of italics, quot ...
of "metalinguistic formulas" to describe the syntax of the new programming language IAL, known today as
ALGOL 58 ALGOL 58, originally named IAL, is one of the family of ALGOL computer programming languages. It was an early compromise design soon superseded by ALGOL 60. According to John Backus The Zurich ACM-GAMM Conference had two principal motives in pro ...
(1959). His notation was first used in the ALGOL 60 report. BNF is a notation for Chomsky's context-free grammars. Backus was familiar with Chomsky's work. As proposed by Backus, the formula defined "classes" whose names are enclosed in angle brackets. For example, <ab>. Each of these names denotes a class of basic symbols. Further development of
ALGOL ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...
led to
ALGOL 60 ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a k ...
. In the committee's 1963 report,
Peter Naur Peter Naur (25 October 1928 – 3 January 2016) was a Danish computer science pioneer and Turing award winner. He is best remembered as a contributor, with John Backus, to the Backus–Naur form (BNF) notation used in describing the syntax for m ...
called Backus's notation ''Backus normal form''.
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
argued that BNF should rather be read as ''Backus–Naur form'', as it is "not a normal form in the conventional sense", unlike, for instance,
Chomsky normal form In formal language theory, a context-free grammar, ''G'', is said to be in Chomsky normal form (first described by Noam Chomsky) if all of its production rules are of the form: : ''A'' → ''BC'',   or : ''A'' → ''a'',   or : ''S'' → ...
. The name ''Pāṇini Backus form'' was also once suggested in view of the fact that the expansion ''Backus normal form'' may not be accurate, and that
Pāṇini , era = ;;6th–5th century BCE , region = Indian philosophy , main_interests = Grammar, linguistics , notable_works = ' (Sanskrit#Classical Sanskrit, Classical Sanskrit) , influenced= , notable_ideas=Descript ...
had independently developed a similar notation earlier. BNF is described by Peter Naur in the ALGOL 60 report as ''metalinguistic formula'':Revised ALGOL 60 report section. 1.1. Another example from the ALGOL 60 report illustrates a major difference between the BNF metalanguage and a Chomsky context-free grammar. Metalinguistic variables do not require a rule defining their formation. Their formation may simply be described in natural language within the <> brackets. The following ALGOL 60 report section 2.3 comments specification, exemplifies how this works:
For the purpose of including text among the symbols of a program the following "comment" conventions hold: Equivalence here means that any of the three structures shown in the left column may be replaced, in any occurrence outside of strings, by the symbol shown in the same line in the right column without any effect on the action of the program.
Naur changed two of Backus's symbols to commonly available characters. The ::= symbol was originally a :≡. The , symbol was originally the word "" (with a bar over it). BNF is very similar to canonical-form
boolean algebra In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variables are the truth values ''true'' and ''false'', usually denoted 1 and 0, whereas in e ...
equations that are, and were at the time, used in logic-circuit design. Backus was a mathematician and the designer of the FORTRAN programming language. Studies of boolean algebra is commonly part of a mathematics curriculum. Neither Backus nor Naur described the names enclosed in < > as non-terminals. Chomsky's terminology was not originally used in describing BNF. Naur later described them as classes in ALGOL course materials. In the ALGOL 60 report they were called metalinguistic variables. Anything other than the metasymbols ::=, , , and class names enclosed in < > are symbols of the language being defined. The metasymbol ::= is to be interpreted as "is defined as". The , is used to separate alternative definitions and is interpreted as "or". The metasymbols < > are delimiters enclosing a class name. BNF is described as a
metalanguage In logic and linguistics, a metalanguage is a language used to describe another language, often called the ''object language''. Expressions in a metalanguage are often distinguished from those in the object language by the use of italics, quot ...
for talking about ALGOL by Peter Naur and
Saul Rosen Saul Rosen (February 8, 1922 – June 9, 1991) was an American computer science pioneer. He is known for designing the software of the first transistor-based computer ''Philco Transac S-2000'', and for his work on programming language desig ...
. In 1947
Saul Rosen Saul Rosen (February 8, 1922 – June 9, 1991) was an American computer science pioneer. He is known for designing the software of the first transistor-based computer ''Philco Transac S-2000'', and for his work on programming language desig ...
became involved in the activities of the fledgling
Association for Computing Machinery The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 and is the world's largest scientific and educational computing society. The ACM is a non-profit professional member ...
, first on the languages committee that became the IAL group and eventually led to ALGOL. He was the first managing editor of the Communications of the ACM. BNF was first used as a metalanguage to talk about the ALGOL language in the ALGOL 60 report. That is how it is explained in ALGOL programming course material developed by Peter Naur in 1962. Early ALGOL manuals by IBM, Honeywell, Burroughs and Digital Equipment Corporation followed the ALGOL 60 report using it as a metalanguage. Saul Rosen in his book describes BNF as a metalanguage for talking about ALGOL. An example of its use as a metalanguage would be in defining an arithmetic expression: The first symbol of an alternative may be the class being defined, the repetition, as explained by Naur, having the function of specifying that the alternative sequence can recursively begin with a previous alternative and can be repeated any number of times. For example, above <expr> is defined as a <term> followed by any number of <addop> <term>. In some later metalanguages, such as Schorre's
META II META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as: Each ''syntax equation'' ...
, the BNF recursive repeat construct is replaced by a sequence operator and target language symbols defined using quoted strings. The < and > brackets were removed. Parentheses () for mathematical grouping were added. The <expr> rule would appear in META II as These changes enabled META II and its derivative programming languages to define and extend their own metalanguage, at the cost of the ability to use a natural language description, metalinguistic variable, language construct description. Many spin-off metalanguages were inspired by BNF. See
META II META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as: Each ''syntax equation'' ...
,
TREE-META The TREE-META (or Tree Meta, TREEMETA) Translator Writing System is a compiler-compiler system for context-free languages originally developed in the 1960s. Parsing statements of the metalanguage resemble augmented Backus–Naur form with embedde ...
, and
Metacompiler In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
. A BNF class describes a language construct formation, with formation defined as a pattern or the action of forming the pattern. The class name expr is described in a natural language as a <term> followed by a sequence <addop> <term>. A class is an abstraction; we can talk about it independent of its formation. We can talk about term, independent of its definition, as being added or subtracted in expr. We can talk about a term being a specific data type and how an expr is to be evaluated having specific combinations of data types, or even reordering an expression to group data types and evaluation results of mixed types. The natural-language supplement provided specific details of the language class semantics to be used by a compiler implementation and a programmer writing an ALGOL program. Natural-language description further supplemented the syntax as well. The integer rule is a good example of natural and metalanguage used to describe syntax: There are no specifics on white space in the above. As far as the rule states, we could have space between the digits. In the natural language we complement the BNF metalanguage by explaining that the digit sequence can have no white space between the digits. English is only one of the possible natural languages. Translations of the ALGOL reports were available in many natural languages. The origin of BNF is not as important as its impact on programming language development. During the period immediately following the publication of the ALGOL 60 report BNF was the basis of many compiler-compiler systems. Some, like "A Syntax Directed Compiler for ALGOL 60" developed by Edgar T. Irons and "A Compiler Building System" Developed by Brooker and Morris, directly used BNF. Others, like the Schorre Metacompilers, made it into a programming language with only a few changes. <class name> became symbol identifiers, dropping the enclosing <,> and using quoted strings for symbols of the target language. Arithmetic-like grouping provided a simplification that removed using classes where grouping was its only value. The META II arithmetic expression rule shows grouping use. Output expressions placed in a META II rule are used to output code and labels in an assembly language. Rules in META II are equivalent to a class definitions in BNF. The Unix utility
yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a com ...
is based on BNF with code production similar to META II. yacc is most commonly used as a
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
, and its roots are obviously BNF. BNF today is one of the oldest computer-related languages still in use.


Further examples

BNF's syntax itself may be represented with a BNF like the following: ::= , ::= "<" ">" "::=" ::= " " , "" ::= , ", " ::= , ::= , ::= , "<" ">" ::= '"' '"' , "'" "'" ::= "" , ::= '' , ::= , , ::= "A" , "B" , "C" , "D" , "E" , "F" , "G" , "H" , "I" , "J" , "K" , "L" , "M" , "N" , "O" , "P" , "Q" , "R" , "S" , "T" , "U" , "V" , "W" , "X" , "Y" , "Z" , "a" , "b" , "c" , "d" , "e" , "f" , "g" , "h" , "i" , "j" , "k" , "l" , "m" , "n" , "o" , "p" , "q" , "r" , "s" , "t" , "u" , "v" , "w" , "x" , "y" , "z" ::= "0" , "1" , "2" , "3" , "4" , "5" , "6" , "7" , "8" , "9" ::= ", " , " " , "!" , "#" , "$" , "%" , "&" , "(" , ")" , "*" , "+" , "," , "-" , "." , "/" , ":" , ";" , ">" , "=" , "<" , "?" , "@" , " "\" , " , "^" , "_" , "`" , "" , "~" ::= , "'" ::= , '"' ::= , ::= , , "-" Note that "" is the
empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...
. The original BNF did not use quotes as shown in <literal> rule. This assumes that no whitespace is necessary for proper interpretation of the rule. <EOL> represents the appropriate line-end specifier (in
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
, carriage-return, line-feed or both depending on the
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
). <rule-name> and <text> are to be substituted with a declared rule's name/label or literal text, respectively. In the U.S. postal address example above, the entire block-quote is a syntax. Each line or unbroken grouping of lines is a rule; for example one rule begins with <name-part> ::=. The other part of that rule (aside from a line-end) is an expression, which consists of two lists separated by a pipe , . These two lists consists of some terms (three terms and two terms, respectively). Each term in this particular rule is a rule-name.


Variants


EBNF

There are many variants and extensions of BNF, generally either for the sake of simplicity and succinctness, or to adapt it to a specific application. One common feature of many variants is the use of
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
repetition operators such as * and +. The
extended Backus–Naur form In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programm ...
(EBNF) is a common one. Another common extension is the use of square brackets around optional items. Although not present in the original ALGOL 60 report (instead introduced a few years later in IBM's
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
definition), the notation is now universally recognised.


ABNF

Augmented Backus–Naur form In computer science, augmented Backus–Naur form (ABNF) is a metalanguage based on Backus–Naur form (BNF), but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be use ...
(ABNF) and Routing Backus–Naur form (RBNF) are extensions commonly used to describe
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and a ...
(IETF)
protocol Protocol may refer to: Sociology and politics * Protocol (politics), a formal agreement between nation states * Protocol (diplomacy), the etiquette of diplomacy and affairs of state * Etiquette, a code of personal behavior Science and technolog ...
s.
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...
s build on the BNF and
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
notations to form an alternative class of
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
, which is essentially analytic rather than
generative Generative may refer to: * Generative actor, a person who instigates social change * Generative art, art that has been created using an autonomous system that is frequently, but not necessarily, implemented using a computer * Generative music, mus ...
in character.


Others

Many BNF specifications found online today are intended to be human-readable and are non-formal. These often include many of the following syntax rules and extensions: * Optional items enclosed in square brackets: lt;item-x>/code>. * Items existing 0 or more times are enclosed in curly brackets or suffixed with an asterisk (*) such as <word> ::= <letter> or <word> ::= <letter> <letter>* respectively. * Items existing 1 or more times are suffixed with an addition (plus) symbol, +, such as <word> ::= <letter>+. * Terminals may appear in bold rather than italics, and non-terminals in plain text rather than angle brackets. * Where items are grouped, they are enclosed in simple parentheses.


Software using BNF

*
ANTLR In computer-based language recognition, ANTLR (pronounced ''antler''), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), firs ...
, another parser generator written in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
*
Qlik Qlik ronounced "klik"(formerly known as Qliktech) provides a business analytics platform. The software company was founded in 1993 in Lund, Sweden and is now based in King of Prussia, Pennsylvania, United States. The company's main products ...
Sense, a BI tool, uses a variant of BNF for scripting * BNF Converter (BNFC), operating on a variant called "labeled Backus–Naur form" (LBNF). In this variant, each production for a given non-terminal is given a label, which can be used as a constructor of an
algebraic data type In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types. Two common classes of algebraic types are product types (i.e., t ...
representing that nonterminal. The converter is capable of producing types and parsers for
abstract syntax In computer science, the abstract syntax of data is its structure described as a data type (possibly, but not necessarily, an abstract data type), independent of any particular representation or encoding. This is particularly used in the representa ...
in several languages, including
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
and Java. *
Coco/R Coco/R is a compiler generator that takes wirth syntax notationIn the manual, however, it is referred as L-attributed Extended Backus–Naur Form syntax (EBNF). grammars of a source language and generates a scanner and a parser for that lang ...
, compiler generator accepting an attributed grammar in EBNF *
DMS Software Reengineering Toolkit The DMS Software Reengineering Toolkit is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source langu ...
, program analysis and transformation system for arbitrary languages *
GOLD Gold is a chemical element with the symbol Au (from la, aurum) and atomic number 79. This makes it one of the higher atomic number elements that occur naturally. It is a bright, slightly orange-yellow, dense, soft, malleable, and ductile met ...
BNF parser *
GNU bison GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in the BNF notation (a context-free language), warns about any parsing ambiguities, and generates a parser that reads sequence ...
, GNU version of yacc * RPA BNF parser. Online (PHP) demo parsing: JavaScript, XML * XACT X4MR System, a rule-based expert system for programming language translation *
XPL XPL is a programming language based on PL/I, a portable one-pass compiler written in its own language, and a parser generator tool for easily implementing similar compilers for other languages. XPL was designed in 1967 as a way to teach compiler d ...
Analyzer, a tool which accepts simplified BNF for a language and produces a parser for that language in XPL; it may be integrated into the supplied SKELETON program, with which the language may be debugged (a SHARE contributed program, which was preceded by ''A Compiler Generator'') *
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a com ...
, parser generator (most commonly used with the
Lex Lex or LEX may refer to: Arts and entertainment * ''Lex'', a daily featured column in the ''Financial Times'' Games * Lex, the mascot of the word-forming puzzle video game ''Bookworm'' * Lex, the protagonist of the word-forming puzzle video ga ...
preprocessor) * bnfparser2, a universal syntax verification utility * bnf2xml, Markup input with XML tags using advanced BNF matching. * JavaCC, Java Compiler Compiler tm (JavaCC tm) - The Java Parser Generator.
Racket's parser tools
lex and yacc-style Parsing (Beautiful Racket edition)


See also

*
Compiler Description Language Compiler Description Language (CDL) is a programming language based on affix grammars. It is very similar to Backus–Naur form (BNF) notation. It was designed for the development of compilers. It is very limited in its capabilities and control f ...
(CDL) *
Syntax diagram Syntax diagrams (or railroad diagrams) are a way to represent a context-free grammar. They represent a graphical alternative to Backus–Naur form, EBNF, Augmented Backus–Naur form, and other text-based grammars as metalanguages. Early books usi ...
– railroad diagram * Translational Backus–Naur form (TBNF) *
Wirth syntax notation Wirth syntax notation (WSN) is a metasyntax, that is, a formal way to describe formal languages. Originally proposed by Niklaus Wirth in 1977 as an alternative to Backus–Naur form (BNF). It has several advantages over BNF in that it contains an e ...
– an alternative to BNF from 1977 *
Definite clause grammar A definite clause grammar (DCG) is a way of expressing grammar, either for natural or formal languages, in a logic programming language such as Prolog. It is closely related to the concept of attribute grammars / affix grammars from which Prolog ...
– a more expressive alternative to BNF used in Prolog *
Van Wijngaarden grammar In computer science, a Van Wijngaarden grammar (also vW-grammar or W-grammar) is a two-level grammar which provides a technique to define potentially infinite context-free grammars in a finite number of rules. The formalism was invented by Adriaa ...
– used in preference to BNF to define
Algol68 ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously d ...
*
Meta-II META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as: Each ''syntax equation'' ...
– an early compiler writing tool and notation


References


External links

* . * — Augmented BNF for Syntax Specifications: ABNF. * — Routing BNF: A Syntax Used in Various Protocol Specifications. * ISO/IEC 14977:1996(E) ''Information technology – Syntactic metalanguage – Extended BNF'', available from or from (the latter is missing the cover page, but is otherwise much cleaner)


Language grammars

* , the original BNF. * , freely available BNF grammars for SQL. * , freely available BNF grammars for SQL,
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, ...
,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. * , freely available BNF/ EBNF grammars for C/C++,
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Fren ...
,
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily us ...
,
Ada 95 Ada is a structured, statically typed, imperative, and object-oriented high-level programming language, extended from Pascal and other languages. It has built-in language support for '' design by contract'' (DbC), extremely strong typing, exp ...
,
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
. * . Includes parts 11, 14, and 21 of the
ISO 10303 ISO 10303 is an ISO standard for the computer-interpretable representation and exchange of product manufacturing information. It is an ASCII-based format. Its official title is: ''Automation systems and integration — Product data representa ...
(STEP) standard. {{DEFAULTSORT:Backus-Naur Form Formal languages Compiler construction Metalanguages