An affix grammar is a kind of
formal grammar
In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
; it is used to describe the
syntax
In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituency) ...
of languages, mainly
computer language
A computer language is a formal language used to communicate with a computer. Types of computer languages include:
* Construction language – all forms of communication by which a human can specify an executable problem solution to a compu ...
s, using an approach based on how natural language is typically described.
[Koster, Cornelis HA.]
Affix grammars for natural languages
" Attribute Grammars, Applications and Systems. Springer, Berlin, Heidelberg, 1991.
The grammatical rules of an affix grammar are those of a
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
, except that certain parts in the nonterminals (the
affix
In linguistics, an affix is a morpheme that is attached to a word stem to form a new word or word form. Affixes may be derivational, like English ''-ness'' and ''pre-'', or inflectional, like English plural ''-s'' and past tense ''-ed''. They ar ...
es) are used as arguments. If the same affix occurs multiple times in a rule, its value must
agree, i.e. it must be the same everywhere. In some types of affix grammar, more complex relationships between affix values are possible.
Example
We can describe an extremely simple fragment of English in the following manner:
: ''Sentence'' → ''Subject'' ''Predicate''
: ''Subject'' → ''Noun''
: ''Predicate'' → ''Verb'' ''Object''
: ''Object'' → ''Noun''
:
: ''Noun'' → John
: ''Noun'' → Mary
: ''Noun'' → children
: ''Noun'' → parents
:
: ''Verb'' → like
: ''Verb'' → likes
: ''Verb'' → help
: ''Verb'' → helps
This
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
describes simple sentences such as
: John likes children
: Mary helps John
: children help parents
: parents like John
With more nouns and verbs, and more rules to introduce other parts of speech, a large range of English sentences can be described; so this is a promising approach for describing the syntax of English.
However, the given grammar also describes sentences such as
: John like children
: children helps parents
These sentences are wrong: in English, subject and verb have a
grammatical number
In linguistics, grammatical number is a grammatical category of nouns, pronouns, adjectives and verb agreement that expresses count distinctions (such as "one", "two" or "three or more"). English and other languages present number categories of ...
, which must agree.
An affix grammar can express this directly:
: ''Sentence'' → ''Subject'' + ''number Predicate'' + ''number''
: ''Subject'' + ''number'' → ''Noun'' + ''number''
: ''Predicate'' + ''number'' → ''Verb'' + ''number Object''
: ''Object'' → ''Noun'' + ''number''
: ''Noun'' + ''singular'' → John
: ''Noun'' + ''singular'' → Mary
: ''Noun'' + ''plural'' → children
: ''Noun'' + ''plural'' → parents
:
: ''Verb'' + ''singular'' → likes
: ''Verb'' + ''plural'' → like
: ''Verb'' + ''singular'' → helps
: ''Verb'' + ''plural'' → help
This grammar only describes correct English sentences, although it could be argued that
: John likes John
is still incorrect and should instead read
: John likes himself
This, too, can be incorporated using affixes, if the means of describing the relationships between different affix values are powerful enough. As remarked above, these means depend on the type of affix grammar chosen.
Types
In the simplest type of affix grammar, affixes can only take values from a finite domain, and affix values can only be related through agreement, as in the example.
Applied in this way, affixes increase compactness of grammars, but do not add expressive power.
Another approach is to allow affixes to take arbitrary strings as values and allow concatenations of affixes to be used in rules. The ranges of allowable values for affixes can be described with context-free grammar rules. This produces the formalism of
two-level grammars, also known as ''
Van Wijngaarden grammar
In computer science, a Van Wijngaarden grammar (also vW-grammar or W-grammar) is a two-level grammar which provides a technique to define potentially infinite context-free grammars in a finite number of rules. The formalism was invented by Adriaa ...
s'' or ''2VW'' grammars. These have been successfully used to describe complicated languages, in particular, the syntax of the
Algol 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously de ...
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
. However, it turns out that, even though affix values can only be manipulated with string concatenation, this formalism is
Turing complete
Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical com ...
; hence, even the most basic questions about the language described by an arbitrary 2VW grammar are
undecidable in general.
Extended Affix Grammars, developed in the 1980s, are a more restricted version of the same idea. They were mainly applied to describe the grammar of natural language, e.g. English.
Another possibility is to allow the values of affixes to be computed by code written in some programming language. Two basic approaches have been used:
* In
attribute grammar An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are resu ...
s, the affixes (called attributes) can take values from arbitrary domains (e.g. integer or real numbers, complex data structures) and arbitrary functions can be specified, written in a language of choice, to describe how affix values in rules are derived from each other.
* In CDL (the
Compiler Description Language
Compiler Description Language (CDL) is a programming language based on affix grammars. It is very similar to Backus–Naur form (BNF) notation. It was designed for the development of compilers. It is very limited in its capabilities and control f ...
) and its successor
CDL2
Compiler Description Language (CDL) is a programming language based on affix grammars. It is very similar to Backus–Naur form (BNF) notation. It was designed for the development of compilers. It is very limited in its capabilities and control f ...
, developed in the 1970s, fragments of source code (usually in
assembly language
In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
) can be used in rules instead of normal right-hand sides, allowing primitives for input scanning and affix value computations to be expressed directly. Designed as a basis for practical
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
construction, this approach was used to write compilers, and other software, e.g. a
text editor
A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be us ...
.
References
{{reflist
Formal languages
Compiler construction
Syntax
Grammar frameworks