HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, augmented Backus–Naur form (ABNF) is a
metalanguage In logic and linguistics, a metalanguage is a language used to describe another language, often called the ''object language''. Expressions in a metalanguage are often distinguished from those in the object language by the use of italics, quota ...
based on
Backus–Naur form In computer science, Backus–Naur form (BNF, pronounced ), also known as Backus normal form, is a notation system for defining the Syntax (programming languages), syntax of Programming language, programming languages and other Formal language, for ...
(BNF) but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a
formal system A formal system is an abstract structure and formalization of an axiomatic system used for deducing, using rules of inference, theorems from axioms. In 1921, David Hilbert proposed to use formal systems as the foundation of knowledge in ma ...
of a language to be used as a bidirectional
communications protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any variation of a physical quantity. The protocol defines the rules, syntax, semantics (computer science), sem ...
. It is defined b
''Internet Standard 68''
("STD 68", type case sic), which was , and it often serves as the definition language for
IETF The Internet Engineering Task Force (IETF) is a standards organization for the Internet standard, Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster ...
communication protocols. supersedes . updates it, adding a syntax for specifying case-sensitive string literals.


Overview

An ABNF specification is a set of derivation rules, written as where rule is a
case-insensitive In computers, case sensitivity defines whether uppercase and lowercase letters are treated as distinct (case-sensitive) or equivalent (case-insensitive). For instance, when users interested in learning about dogs search an e-book, "dog" and "Dog ...
nonterminal In formal languages, terminal and nonterminal symbols are parts of the ''vocabulary'' under a formal grammar. ''Vocabulary'' is a finite, nonempty set of symbols. ''Terminal symbols'' are symbols that cannot be replaced by other symbols of the v ...
, the definition consists of sequences of symbols that define the rule, a comment for documentation, and ending with a carriage return and line feed. Rule names are case-insensitive: , , , and all refer to the same rule. Rule names consist of a letter followed by letters, numbers, and hyphens. Angle brackets (<, >) are not required around rule names (as they are in BNF). However, they may be used to delimit a rule name when used in prose to discern a rule name.


Terminal values

Terminals are specified by one or more numeric characters. Numeric characters may be specified as the percent sign %, followed by the base (b = binary, d = decimal, and x =
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
), followed by the value, or concatenation of values (indicated by .). For example, a carriage return is specified by %d13 in decimal or %x0D in hexadecimal. A carriage return followed by a line feed may be specified with concatenation as %d13.10. Literal text is specified through the use of a string enclosed in quotation marks ("). These strings are case-insensitive, and the character set used is (US-)
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
. Therefore, the string "abc" will match “abc”, “Abc”, “aBc”, “abC”, “ABc”, “AbC”, “aBC”, and “ABC”
RFC 7405
added a syntax for case-sensitive strings: %s"aBc" will only match "aBc". Prior to that, a case-sensitive string could only be specified by listing the individual characters: to match “aBc”, the definition would be %d97.66.99. A string can also be explicitly specified as case-insensitive with a %i prefix.


Operators


White space

White space is used to separate elements of a definition; for space to be recognized as a delimiter, it must be explicitly included. The explicit reference for a single
whitespace character A whitespace character is a character data element that represents white space when text is rendered for display by a computer. For example, a ''space'' character (, ASCII 32) represents blank space such as a word divider in a Western scrip ...
is WSP (linear white space), and LWSP is for zero or more whitespace characters with newlines permitted. The LWSP definition in RFC5234 is controversialRFC Errata 3096
because at least one whitespace character is needed to form a delimiter between two fields. Definitions are left-aligned. When multiple lines are required (for readability), continuation lines are indented by whitespace.


Comment

; comment A semicolon (;) starts a comment that continues to the end of the line.


Concatenation

Rule1 Rule2 A rule may be defined by listing a sequence of rule names. To match the string “aba”, the following rules could be used: * * *


Alternative

Rule1 / Rule2 A rule may be defined by a list of alternative rules separated by a
solidus Solidus (Latin for "solid") may refer to: * Solidus (coin) The ''solidus'' (Latin 'solid'; : ''solidi'') or ''nomisma'' () was a highly pure gold coin issued in the Later Roman Empire and Byzantine Empire. It was introduced in the early ...
(/). To accept the rule ''fu'' or the rule ''bar'', the following rule could be constructed: *


Incremental alternatives

Rule1 =/ Rule2 Additional alternatives may be added to a rule through the use of =/ between the rule name and the definition. The rule * * * is therefore equivalent to *


Value range

%c##-## A range of numeric values may be specified through the use of a hyphen (-). The rule * is equivalent to *


Sequence group

(Rule1 Rule2) Elements may be placed in parentheses to group rules in a definition. To match "a b d" or "a c d", the following rule could be constructed: * To match “a b” or “c d”, the following rules could be constructed: * *


Variable repetition

n*nRule To indicate repetition of an element, the form <a>*<b>element is used. The optional <a> gives the minimal number of elements to be included (with the default of 0). The optional <b> gives the maximal number of elements to be included (with the default of infinity). Use *element for zero or more elements, *1element for zero or one element, 1*element for one or more elements, and 2*3element for two or three elements, cf.
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s e*, e?, e+ and e.


Specific repetition

nRule To indicate an explicit number of elements, the form <a>element is used and is equivalent to <a>*<a>element. Use 2DIGIT to get two numeric digits, and 3DIGIT to get three numeric digits. (DIGIT is defined below under " Core rules". Also see ''zip-code'' in the example below.)


Optional sequence

ule/code> To indicate an optional element, the following constructions are equivalent: * * *


Operator precedence

The following operators have the given precedence from tightest binding to loosest binding: #Strings, names formation #Comment #Value range #Repetition #Grouping, optional #Concatenation #Alternative Use of the alternative operator with concatenation may be confusing, and it is recommended that grouping be used to make explicit concatenation groups.


Core rules

The core rules are defined in the ABNF standard. Note that in the core rules diagram the CHAR2 charset is inlined in char-val and CHAR3 is inlined in prose-val in the RFC spec. They are named here for clarity in the main syntax diagram.


Example

The (U.S.) postal address example given in the augmented Backus–Naur form (ABNF) page may be specified as follows: postal-address = name-part street zip-part name-part = *(personal-part SP) last-name P suffixCRLF name-part =/ personal-part CRLF personal-part = first-name / (initial ".") first-name = *ALPHA initial = ALPHA last-name = *ALPHA suffix = ("Jr." / "Sr." / 1*("I" / "V" / "X")) street = pt SPhouse-num SP street-name CRLF apt = 1*4DIGIT house-num = 1*8(DIGIT / ALPHA) street-name = 1*VCHAR zip-part = town-name "," SP state 1*2SP zip-code CRLF town-name = 1*(ALPHA / SP) state = 2ALPHA zip-code = 5DIGIT -" 4DIGIT


Pitfalls


RFC 5234
adds a warning in conjunction to the definition of LWSP as follows:


References

{{DEFAULTSORT:Augmented Backus-Naur Form Formal languages Metalanguages