Overview
An ABNF specification is a set of derivation rules, written as rule = definition ; comment CR LF where rule is a
,
,
, and
all refer to the same rule. Rule names consist of a letter followed by letters, numbers, and hyphens.
Angle brackets (<
, >
) are not required around rule names (as they are in BNF). However, they may be used to delimit a rule name when used in prose to discern a rule name.
Terminal values
Terminals are specified by one or more numeric characters. Numeric characters may be specified as the percent sign%
, followed by the base (b
= binary, d
= decimal, and x
= hexadecimal), followed by the value, or concatenation of values (indicated by .
). For example, a carriage return is specified by %d13
in decimal or %x0D
in hexadecimal. A carriage return followed by a line feed may be specified with concatenation as %d13.10
.
Literal text is specified through the use of a string enclosed in quotation marks ("
). These strings are case-insensitive, and the character set used is (US-)ASCII. Therefore, the string "abc"
will match “abc”, “Abc”, “aBc”, “abC”, “ABc”, “AbC”, “aBC”, and “ABC”. RFC 7405 added a syntax for case-sensitive strings: %s"aBc"
will only match "aBc". Prior to that, a case-sensitive string could only be specified by listing the individual characters: to match “aBc”, the definition would be %d97.66.99
. A string can also be explicitly specified as case-insensitive with a %i
prefix.
Operators
White space
White space is used to separate elements of a definition; for space to be recognized as a delimiter, it must be explicitly included. The explicit reference for a single whitespace character isWSP
(linear white space), and LWSP
is for zero or more whitespace characters with newlines permitted. The LWSP
definition in RFC5234 is controversialRFC Errata 3096Comment
; comment
A semicolon (;
) starts a comment that continues to the end of the line.
Concatenation
Rule1 Rule2
A rule may be defined by listing a sequence of rule names.
To match the string “aba”, the following rules could be used:
*
*
*
Alternative
Rule1 / Rule2
A rule may be defined by a list of alternative rules separated by a /
).
To accept the rule ''fu'' or the rule ''bar'', the following rule could be constructed:
*
Incremental alternatives
Rule1 =/ Rule2
Additional alternatives may be added to a rule through the use of =/
between the rule name and the definition.
The rule
*
*
*
is therefore equivalent to
*
Value range
%c##-##
A range of numeric values may be specified through the use of a hyphen (-
).
The rule
*
is equivalent to
*
Sequence group
(Rule1 Rule2)
Elements may be placed in parentheses to group rules in a definition.
To match "a b d" or "a c d", the following rule could be constructed:
*
To match “a b” or “c d”, the following rules could be constructed:
*
*
Variable repetition
n*nRule
To indicate repetition of an element, the form <a>*<b>element
is used. The optional <a>
gives the minimal number of elements to be included (with the default of 0). The optional <b>
gives the maximal number of elements to be included (with the default of infinity).
Use *element
for zero or more elements, *1element
for zero or one element, 1*element
for one or more elements, and 2*3element
for two or three elements, cf. e*
, e?
, e+
and e
.
Specific repetition
nRule
To indicate an explicit number of elements, the form <a>element
is used and is equivalent to <a>*<a>element
.
Use 2DIGIT
to get two numeric digits, and 3DIGIT
to get three numeric digits. (DIGIT
is defined below under " Core rules". Also see ''zip-code'' in the example below.)
Optional sequence
ule
Ule is a German surname
Personal names in German-speaking Europe consist of one or several given names (''Vorname'', plural ''Vornamen'') and a surname (''Nachname, Familienname'').
The ''Vorname'' is usually gender-specific. A name is usually ci ...
/code>
To indicate an optional element, the following constructions are equivalent:
*
*
*
Operator precedence
The following operators have the given precedence from tightest binding to loosest binding:
#Strings, names formation
#Comment
#Value range
#Repetition
#Grouping, optional
#Concatenation
#Alternative
Use of the alternative operator with concatenation may be confusing, and it is recommended that grouping be used to make explicit concatenation groups.
Core rules
The core rules are defined in the ABNF standard.
Note that in the core rules diagram the CHAR2 charset is inlined in char-val and CHAR3 is inlined in prose-val in the RFC spec. They are named here for clarity in the main syntax diagram.
Example
The (U.S.) postal address example given in the augmented Backus–Naur form (ABNF) page may be specified as follows:
postal-address = name-part street zip-part
name-part = *(personal-part SP) last-name P suffixCRLF
name-part =/ personal-part CRLF
personal-part = first-name / (initial ".")
first-name = *ALPHA
initial = ALPHA
last-name = *ALPHA
suffix = ("Jr." / "Sr." / 1*("I" / "V" / "X"))
street = pt SPhouse-num SP street-name CRLF
apt = 1*4DIGIT
house-num = 1*8(DIGIT / ALPHA)
street-name = 1*VCHAR
zip-part = town-name "," SP state 1*2SP zip-code CRLF
town-name = 1*(ALPHA / SP)
state = 2ALPHA
zip-code = 5DIGIT -" 4DIGIT
Pitfalls
RFC 5234
adds a warning in conjunction to the definition of LWSP as follows:
References
{{DEFAULTSORT:Augmented Backus-Naur Form
Formal languages
Metalanguages