computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a Van Wijngaarden grammar (also vW-grammar or W-grammar) is a formalism for defining
formal language
In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet".
The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
s. The name derives from the formalism invented by Adriaan van Wijngaarden
for the purpose of defining the
ALGOL 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language member of the ALGOL family that was conceived as a successor to the ALGOL 60 language, designed with the goal of a much wider scope of application and ...
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
.
The resulting specification remains its most notable application.
Van Wijngaarden grammars address the problem that
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
s cannot express agreement or reference, where two different parts of the sentence must agree with each other in some way. For example, the sentence "The birds was eating" is not
Standard English
In an English-speaking country, Standard English (SE) is the variety of English that has undergone codification to the point of being socially perceived as the standard language, associated with formal schooling, language assessment, and off ...
because it fails to agree on number. A context-free grammar would parse "The birds was eating" and "The birds were eating" and "The bird was eating" in the same way. However, context-free grammars have the benefit of simplicity whereas van Wijngaarden grammars are considered highly complex.
Two levels
W-grammars are two-level grammars: they are defined by a pair of grammars, that operate on different levels:
* the '' hypergrammar'' is an
attribute grammar
An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are t ...
, i.e. a set of
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
rules in which the nonterminals may have attributes; and
* the '' metagrammar'' is a context-free grammar defining possible values for these attributes.
The set of strings generated by a W-grammar is defined by a two-stage process:
# within each hyperrule, for each attribute that occurs in it, pick a value for it generated by the metagrammar; the result is a normal context-free grammar rule; do this in every possible way;
# use the resulting (possibly infinite) context-free grammar to generate strings in the normal way.
The ''consistent substitution'' used in the first step is the same as substitution in predicate logic, and actually supports
logic programming
Logic programming is a programming, database and knowledge representation paradigm based on formal logic. A logic program is a set of sentences in logical form, representing knowledge about some problem domain. Computation is performed by applyin ...
Prolog
Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics.
Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...
Turing complete
Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist. He was highly influential in the development of theoretical comput ...
;
hence, all decision problems regarding the languages they generate, such as
* whether a W-grammar generates a given string
* whether a W-grammar generates no strings at all
are undecidable.
Curtailed variants, known as affix grammars, were developed, and applied in
compiler construction
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
and to the description of natural languages.
Definite logic programs, that is, logic programs that make no use of negation, can be viewed as a subclass of W-grammars.
Motivation and history
In the 1950s, attempts started to apply computers to the recognition, interpretation and translation of natural languages, such as English and Russian. This requires a machine-readable description of the phrase structure of sentences, that can be used to parse and interpret them, and to generate them. Context-free grammars, a concept from
structural linguistics
Structural linguistics, or structuralism, in linguistics, denotes schools or theories in which language is conceived as a self-contained, self-regulating semiotic system whose elements are defined by their relationship to other elements within th ...
, were adopted for this purpose; their rules can express how sentences are recursively built out of
parts of speech
In grammar, a part of speech or part-of-speech (abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are as ...
, such as
noun phrase
A noun phrase – or NP or nominal (phrase) – is a phrase that usually has a noun or pronoun as its head, and has the same grammatical functions as a noun. Noun phrases are very common cross-linguistically, and they may be the most frequently ...
s and
verb phrase
In linguistics, a verb phrase (VP) is a syntax, syntactic unit composed of a verb and its argument (linguistics), arguments except the subject (grammar), subject of an independent clause or coordinate clause. Thus, in the sentence ''A fat man quic ...
s, and ultimately, words, such as
noun
In grammar, a noun is a word that represents a concrete or abstract thing, like living creatures, places, actions, qualities, states of existence, and ideas. A noun may serve as an Object (grammar), object or Subject (grammar), subject within a p ...
s,
verb
A verb is a word that generally conveys an action (''bring'', ''read'', ''walk'', ''run'', ''learn''), an occurrence (''happen'', ''become''), or a state of being (''be'', ''exist'', ''stand''). In the usual description of English, the basic f ...
s, and
pronoun
In linguistics and grammar, a pronoun (Interlinear gloss, glossed ) is a word or a group of words that one may substitute for a noun or noun phrase.
Pronouns have traditionally been regarded as one of the part of speech, parts of speech, but so ...
s.
This work influenced the design and implementation of
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s, most notably, of
ALGOL 60
ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a ...
, which introduced a syntax description in
Backus–Naur form
In computer science, Backus–Naur form (BNF, pronounced ), also known as Backus normal form, is a notation system for defining the Syntax (programming languages), syntax of Programming language, programming languages and other Formal language, for ...
.
However, context-free rules cannot express
agreement
Agreement may refer to:
Agreements between people and organizations
* Gentlemen's agreement, not enforceable by law
* Trade agreement, between countries
* Consensus (disambiguation), a decision-making process
* Contract, enforceable in a court of ...
or reference ( anaphora), where two different parts of the sentence must agree with each other in some way.
These can be readily expressed in W-grammars. (See example below.)
Programming languages have the analogous notions of
typing
Typing is the process of writing or inputting text by pressing keys on a typewriter, computer keyboard, mobile phone, or calculator. It can be distinguished from other means of text input, such as handwriting recognition, handwriting and speech ...
and scoping.
A compiler or interpreter for the language must recognize which uses of a variable belong together (refer to the same variable). This is typically subject to constraints such as:
* A variable must be initialized before its value is used.
* In strongly typed languages, each variable is assigned a type, and all uses of the variable must respect its type.
* Often, its type must be declared explicitly, before use.
W-grammars are based on the idea of providing the nonterminal symbols of context-free grammars with ''attributes'' (or ''affixes'') that pass information between the nodes of the
parse tree
A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...
, used to constrain the syntax and to specify the semantics.
This idea was well known at the time; e.g.
Donald Knuth
Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...
visited the ALGOL 68 design committee while developing his own version of it,
attribute grammar
An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are t ...
s.
By augmenting the syntax description with attributes, constraints like the above can be checked, ruling many invalid programs out at compile time.
As Van Wijngaarden wrote in his preface:
Quite peculiar to W-grammars was their strict treatment of attributes as strings, defined by a context-free grammar, on which concatenation is the only possible operation; complex data structures and operations can be defined by
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
. (See example below.)
After their introduction in the 1968
ALGOL 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language member of the ALGOL family that was conceived as a successor to the ALGOL 60 language, designed with the goal of a much wider scope of application and ...
"Final Report", W-grammars were widely considered as too powerful and unconstrained to be practical.
This was partly a consequence of the way in which they had been applied; the 1973 ALGOL 68 "Revised Report" contains a much more readable grammar, without modifying the W-grammar formalism itself.
Meanwhile, it became clear that W-grammars, when used in their full generality, are indeed too powerful for such practical purposes as serving as the input for a
parser generator
In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.
The most common type of compiler- ...
.
They describe precisely all
recursively enumerable language
In mathematics, logic and computer science, a formal language is called recursively enumerable (also recognizable, partially decidable, semidecidable, Turing-acceptable or Turing-recognizable) if it is a recursively enumerable subset in the set o ...
s, which makes parsing impossible in general: it is an
undecidable problem
In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is proved to be impossible to construct an algorithm that always leads to a correct yes-or-no answer. The halting problem is an ...
to decide whether a given string can be generated by a given W-grammar.
Hence, their use must be seriously constrained when used for automatic parsing or translation. Restricted and modified variants of W-grammars were developed to address this, e.g.
* Extended Affix Grammars (EAGs), applied to describe the grammars of
natural language
A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
such as English and Spanish);
* Q-systems, also applied to natural language processing;
* the CDL series of languages, applied as
compiler construction
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
languages for
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s.
After the 1970s, interest in the approach waned; occasionally, new studies are published.
Examples
Agreement in English grammar
In English, nouns, pronouns and verbs have attributes such as
grammatical number
In linguistics, grammatical number is a Feature (linguistics), feature of nouns, pronouns, adjectives and verb agreement (linguistics), agreement that expresses count distinctions (such as "one", "two" or "three or more"). English and many other ...
,
gender
Gender is the range of social, psychological, cultural, and behavioral aspects of being a man (or boy), woman (or girl), or third gender. Although gender often corresponds to sex, a transgender person may identify with a gender other tha ...
, and
person
A person (: people or persons, depending on context) is a being who has certain capacities or attributes such as reason, morality, consciousness or self-consciousness, and being a part of a culturally established form of social relations suc ...
, which must agree between subject, main verb, and pronouns referring to the subject:
* I wash myself.
* She washes herself.
* We wash ourselves.
are valid sentences; invalid are, for instance:
* *I washes ourselves.
* *She wash himself.
* *We wash herself.
Here, agreement serves to stress that both pronouns (e.g. ''I'' and ''myself'') refer to the same person.
A context-free grammar to generate all such sentences:
::=
From , we can generate all combinations:
I wash myself
I wash yourself
I wash himself
.. They wash yourselves
They wash themselves
A W-grammar to generate only the valid sentences:
>
::= >
>
>
1st> ::= I
2nd> ::= You
::= He
::= She
1st> ::= We
3rd> ::= They
::= wash
::= wash
::= washes
> ::= wash
1st> ::= myself
2nd> ::= yourself
::= himself
::= herself
1st> ::= ourselves
2nd> ::= yourselves
3rd> ::= themselves
:: singular , plural
:: male , female
:: 1st , 2nd , 3rd
A standard non-context-free language
A well-known non-context-free language is
:
A two-level grammar for this language is the metagrammar
:N ::= 1 , N1
:X ::= a , b
together with grammar schema
:Start ::=
: ::= X
: ::= X
Requiring valid use of variables in ALGOL
The Revised Report on the Algorithmic Language Algol 60
defines a full context-free syntax for the language.
Assignments are defined as follows (section 4.2.1):
::= :=
, :=
::=
,
::=
,
A can be (amongst other things) an , which in turn is defined as:
::= , ,
Examples (section 4.2.2):
s:=p =n:=n+1+s
n:=n+1
A:=B/C-v-q×S
S ,k+2=3-arctan(sTIMESzeta)
V:=Q>Y^Z
Expressions and assignments must be type checked: for instance,
* in n:=n+1, n must be a number (integer or real);
* in A:=B/C-v-q×S, all variables must be numbers;
* in V:=Q>Y^Z, all variables must be of type Boolean.
The rules above distinguish between and , but they cannot verify that the same variable always has the same type.
This (non-context-free) requirement can be expressed in a W-grammar by annotating the rules with attributes that record, for each variable used or assigned to, its name and type.
This record can then be carried along to all places in the grammar where types need to be matched, and implement type checking.
Similarly, it can be used to checking initialization of variables before use, etcetera.
One may wonder how to create and manipulate such a data structure without explicit support in the formalism for data structures and operations on them. It can be done by using the metagrammar to define a string representation for the data structure and using
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
to define operations:
>
::= > :=
, > :=
>
::= >
is added to sorted >
, >
>
is added to sorted >
>
::= > >
, > >
is added to sorted >
::=
is added to sorted >
::= is added to sorted >
is lexicographically before >
is added to sorted >
::= is added to sorted >
is lexicographically before >
is added to sorted >
is lexicographically before >
::= is followed by >
is lexicographically before >
::= is followed by >
is followed by >
is lexicographically before >
is lexicographically before >
::= is followed by >
is followed by >
precedes+ precedes+
::= precedes precedes+
::= precedes+ precedes+ :
:
.. :: real , integer , Boolean
:: , , :: , ::= ::= ::= :: a , b , c , .. :: 0 , 1 , 2 , .. :: :: :: :: :: , ::
:: () :: :: ::
When compared to the original grammar, three new elements have been added:
* attributes to the nonterminals in what are now the hyperrules;
* metarules to specify the allowable values for the attributes;
* new hyperrules to specify operations on the attribute values.
The new hyperrules are -rules: they only generate the empty string.
ALGOL 68 examples
The ALGOL 68 reports use a slightly different notation without .
ALGOL 68 as in the 1968 Final Report §2.1
a) program : open symbol, standard prelude,
library prelude option, particular program, exit,
library postlude option, standard postlude, close symbol.
b) standard prelude : declaration prelude sequence.
c) library prelude : declaration prelude sequence.
d) particular program :
label sequence option, strong CLOSED void clause.
e) exit : go on symbol, letter e letter x letter i letter t, label symbol.
f) library postlude : statement interlude.
g) standard postlude : strong void clause train
ALGOL 68 as in the 1973 Revised Report §2.2.1, §10.1.1
program : strong void new closed clause
A) EXTERNAL :: standard ; library ; system ; particular.
B) STOP :: label letter s letter t letter o letter p.
a) program text : STYLE begin token, new LAYER1 preludes,
parallel token, new LAYER1 tasks PACK,
STYLE end token.
b) NEST1 preludes : NEST1 standard prelude with DECS1,
NEST1 library prelude with DECSETY2,
NEST1 system prelude with DECSETY3, where (NEST1) is
(new EMPTY new DECS1 DECSETY2 DECSETY3).
c) NEST1 EXTERNAL prelude with DECSETY1 :
strong void NEST1 series with DECSETY1, go on token ;
where (DECSETY1) is (EMPTY), EMPTY.
d) NEST1 tasks : NEST1 system task list, and also token,
NEST1 user task PACK list.
e) NEST1 system task : strong void NEST1 unit.
f) NEST1 user task : NEST2 particular prelude with DECS,
NEST2 particular program PACK, go on token,
NEST2 particular postlude,
where (NEST2) is (NEST1 new DECS STOP).
g) NEST2 particular program :
NEST2 new LABSETY3 joined label definition
of LABSETY3, strong void NEST2 new LABSETY3
ENCLOSED clause.
h) NEST joined label definition of LABSETY :
where (LABSETY) is (EMPTY), EMPTY ;
where (LABSETY) is (LAB1 LABSETY1),
NEST label definition of LAB1,
NEST joined label definition of$ LABSETY1.
i) NEST2 particular postlude :
strong void NEST2 series with STOP.
A simple example of the power of W-grammars is clause
a) program text : STYLE begin token, new LAYER1 preludes,
parallel token, new LAYER1 tasks PACK,
STYLE end token.
This allows BEGIN ... END and as block delimiters, while ruling out BEGIN ... } and { ... END.
One may wish to compare the grammar in the report with the
Yacc
Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
parser for a subset of ALGOL 68 by Marc van Leeuwen.
Implementations
Anthony Fisher wrote ''yo-yo'',.
a parser for a large class of W-grammars, with example grammars for ''expressions'', ''eva'', ''sal'' and Pascal (the actual ISO 7185 standard for Pascal uses
extended Backus–Naur form
Extension, extend or extended may refer to:
Mathematics
Logic or set theory
* Axiom of extensionality
* Extensible cardinal
* Extension (model theory)
* Extension (proof theory)
* Extension (predicate logic), the set of tuples of values ...
).
Dick Grune created a C program that would generate all possible productions of a W-grammar.
Applications outside of ALGOL 68
The applications of Extended Affix Grammars (EAG)s mentioned above can effectively be regarded as applications of W-grammars, since EAGs are so close to W-grammars.
W-grammars have also been proposed for the description of complex human actions in
ergonomics
Ergonomics, also known as human factors or human factors engineering (HFE), is the application of Psychology, psychological and Physiology, physiological principles to the engineering and design of products, processes, and systems. Primary goa ...
.
A W-Grammar Description has also been supplied for Ada.
Attribute grammar
An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are t ...
References
Further reading
*
*
* .
* {{cite journal, last=Petersson, first=Kent, date=1990, title=Syntax and Semantics of Programming Languages, journal=Draft Lecture Notes, url=http://www.cs.chalmers.se/~kentp/proglang.pdf, archiveurl= https://web.archive.org/web/20010605175158/http://www.cs.chalmers.se/~kentp/proglang.pdf, archivedate=5 June 2001
Formal languagesParsingCompiler constructionDutch inventions