HOME

TheInfoList



OR:

Grammar induction (or grammatical inference) is the process in
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
of learning a
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
(usually as a collection of ''re-write rules'' or '' productions'' or alternatively as a
finite state machine A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
or automaton of some kind) from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.


Grammar classes

Grammatical inference has often been very focused on the problem of learning finite state machines of various types (see the article
Induction of regular languages In computational learning theory, induction of regular languages refers to the task of learning a formal description (e.g. grammar) of a regular language from a given set of example strings. Although E. Mark Gold has shown that not every regular la ...
for details on these approaches), since there have been efficient algorithms for this problem since the 1980s. Since the beginning of the century, these approaches have been extended to the problem of inference of
context-free grammars In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
and richer formalisms, such as multiple context-free grammars and parallel multiple context-free grammars. Other classes of grammars for which grammatical inference has been studied are
combinatory categorial grammar Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structur ...
s,
stochastic context-free grammar Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA stru ...
s, contextual grammars and pattern languages.


Learning models

The simplest form of learning is where the learning algorithm merely receives a set of examples drawn from the language in question: the aim is to learn the language from examples of it (and, rarely, from counter-examples, that is, example that do not belong to the language). However, other learning models have been studied. One frequently studied alternative is the case where the learner can ask membership queries as in the exact query learning model or minimally adequate teacher model introduced by Angluin.


Methodologies

There is a wide variety of methods for grammatical inference. Two of the classic sources are and . also devote a brief section to the problem, and cite a number of references. The basic trial-and-error method they present is discussed below. For approaches to infer subclasses of regular languages in particular, see ''
Induction of regular languages In computational learning theory, induction of regular languages refers to the task of learning a formal description (e.g. grammar) of a regular language from a given set of example strings. Although E. Mark Gold has shown that not every regular la ...
''. A more recent textbook is de la Higuera (2010), which covers the theory of grammatical inference of regular languages and finite state automata. D'Ulizia, Ferri and Grifoni provide a survey that explores grammatical inference methods for natural languages.


Grammatical inference by trial-and-error

The method proposed in Section 8.7 of suggests successively guessing grammar rules (productions) and testing them against positive and negative observations. The rule set is expanded so as to be able to generate each positive example, but if a given rule set also generates a negative example, it must be discarded. This particular approach can be characterized as "hypothesis testing" and bears some similarity to Mitchel's
version space Version space learning is a logical approach to machine learning, specifically binary classification. Version space learning algorithms search a predefined space of hypotheses, viewed as a set of logical sentences. Formally, the hypothesis spac ...
algorithm. The text provide a simple example which nicely illustrates the process, but the feasibility of such an unguided trial-and-error approach for more substantial problems is dubious.


Grammatical inference by genetic algorithms

Grammatical induction using
evolutionary algorithm In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduct ...
s is the process of evolving a representation of the grammar of a target language through some evolutionary process.
Formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
s can easily be represented as tree structures of production rules that can be subjected to evolutionary operators.
Algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s of this sort stem from the
genetic programming In artificial intelligence, genetic programming (GP) is a technique of evolving programs, starting from a population of unfit (usually random) programs, fit for a particular task by applying operations analogous to natural genetic processes to t ...
paradigm pioneered by John Koza. Other early work on simple formal languages used the binary string representation of genetic algorithms, but the inherently hierarchical structure of grammars couched in the EBNF language made trees a more flexible approach. Koza represented
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispin ...
programs as trees. He was able to find analogues to the genetic operators within the standard set of tree operators. For example, swapping sub-trees is equivalent to the corresponding process of genetic crossover, where sub-strings of a genetic code are transplanted into an individual of the next generation. Fitness is measured by scoring the output from the functions of the Lisp code. Similar analogues between the tree structured lisp representation and the representation of grammars as trees, made the application of genetic programming techniques possible for grammar induction. In the case of grammar induction, the transplantation of sub-trees corresponds to the swapping of production rules that enable the parsing of phrases from some language. The fitness operator for the grammar is based upon some measure of how well it performed in parsing some group of sentences from the target language. In a tree representation of a grammar, a
terminal symbol In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
of a production rule corresponds to a leaf node of the tree. Its parent nodes corresponds to a non-terminal symbol (e.g. a
noun phrase In linguistics, a noun phrase, or nominal (phrase), is a phrase that has a noun or pronoun as its head or performs the same grammatical function as a noun. Noun phrases are very common cross-linguistically, and they may be the most frequently oc ...
or a
verb phrase In linguistics, a verb phrase (VP) is a syntactic unit composed of a verb and its arguments except the subject of an independent clause or coordinate clause. Thus, in the sentence ''A fat man quickly put the money into the box'', the words ''q ...
) in the rule set. Ultimately, the root node might correspond to a sentence non-terminal.


Grammatical inference by greedy algorithms

Like all
greedy algorithm A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
s, greedy grammar inference algorithms make, in iterative manner, decisions that seem to be the best at that stage. The decisions made usually deal with things like the creation of new rules, the removal of existing rules, the choice of a rule to be applied or the merging of some existing rules. Because there are several ways to define 'the stage' and 'the best', there are also several greedy grammar inference algorithms. These
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
generating algorithms make the decision after every read symbol: * Lempel-Ziv-Welch algorithm creates a context-free grammar in a deterministic way such that it is necessary to store only the start rule of the generated grammar. * Sequitur and its modifications. These context-free grammar generating algorithms first read the whole given symbol-sequence and then start to make decisions: * Byte pair encoding and its optimizations.


Distributional learning

A more recent approach is based on distributional learning. Algorithms using these approaches have been applied to learning
context-free grammars In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
and mildly context-sensitive languages and have been proven to be correct and efficient for large subclasses of these grammars.


Learning of pattern languages

Angluin defines a ''pattern'' to be "a string of constant symbols from Σ and variable symbols from a disjoint set". The language of such a pattern is the set of all its nonempty ground instances i.e. all strings resulting from consistent replacement of its variable symbols by nonempty strings of constant symbols.The language of a pattern with at least two occurrences of the same variable is not regular due to the
pumping lemma In the theory of formal language In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of ...
.
A pattern is called descriptive for a finite input set of strings if its language is minimal (with respect to set inclusion) among all pattern languages subsuming the input set. Angluin gives a polynomial algorithm to compute, for a given input string set, all descriptive patterns in one variable ''x''.''x'' may occur several times, but no other variable ''y'' may occur To this end, she builds an automaton representing all possibly relevant patterns; using sophisticated arguments about word lengths, which rely on ''x'' being the only variable, the state count can be drastically reduced. Erlebach et al. give a more efficient version of Angluin's pattern learning algorithm, as well as a parallelized version. Arimura et al. show that a language class obtained from limited unions of patterns can be learned in polynomial time.


Pattern theory

Pattern theory, formulated by
Ulf Grenander Ulf Grenander (23 July 1923 – 12 May 2016) was a Swedish statistician and professor of applied mathematics at Brown University. His early research was in probability theory, stochastic processes, time series analysis, and statistical theory (p ...
, is a mathematical
formalism Formalism may refer to: * Form (disambiguation) * Formal (disambiguation) * Legal formalism, legal positivist view that the substantive justice of a law is a question for the legislature rather than the judiciary * Formalism (linguistics) * Scien ...
to describe knowledge of the world as patterns. It differs from other approaches to
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
in that it does not begin by prescribing algorithms and machinery to recognize and classify patterns; rather, it prescribes a vocabulary to articulate and recast the pattern concepts in precise language. In addition to the new algebraic vocabulary, its statistical approach was novel in its aim to: * Identify the hidden variables of a data set using real world data rather than artificial stimuli, which was commonplace at the time. * Formulate prior distributions for hidden variables and models for the observed variables that form the vertices of a Gibbs-like graph. * Study the randomness and variability of these graphs. * Create the basic classes of stochastic models applied by listing the deformations of the patterns. * Synthesize (sample) from the models, not just analyze signals with it. Broad in its mathematical coverage, pattern theory spans algebra and statistics, as well as local topological and global entropic properties.


Applications

The principle of grammar induction has been applied to other aspects of
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
, and has been applied (among many other problems) to
semantic parsing Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Application ...
,Kwiatkowski, Tom, et al.
Lexical generalization in CCG grammar induction for semantic parsing
" Proceedings of the conference on empirical methods in natural language processing.
Association for Computational Linguistics The Association for Computational Linguistics (ACL) is a scientific and professional organization for people working on natural language processing. Its namesake conference is one of the primary high impact conferences for natural language proces ...
, 2011.
natural language understanding Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subtopic of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an A ...
,
example-based translation Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be vie ...
,
morpheme A morpheme is the smallest meaningful Constituent (linguistics), constituent of a linguistic expression. The field of linguistics, linguistic study dedicated to morphemes is called morphology (linguistics), morphology. In English, morphemes are ...
analysis, and place name derivations. Grammar induction has also been used for grammar-based compression and
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...
via
minimum message length Minimum message length (MML) is a Bayesian information-theoretic method for statistical model comparison and selection. It provides a formal information theory restatement of Occam's Razor: even when models are equal in their measure of fit-accurac ...
(MML) and
minimum description length Minimum Description Length (MDL) is a model selection principle where the shortest description of the data is the best model. MDL methods learn through a data compression perspective and are sometimes described as mathematical applications of Occa ...
(MDL) principles. Grammar induction has also been used in some probabilistic models of language acquisition.Chater, Nick, and Christopher D. Manning.
Probabilistic models of language processing and acquisition
" Trends in cognitive sciences 10.7 (2006): 335-344.


See also

* Artificial grammar learning#Artificial intelligence *
Example-based machine translation Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be vi ...
*
Inductive programming Inductive programming (IP) is a special area of automatic programming, covering research from artificial intelligence and programming, which addresses learning of typically declarative ( logic or functional) and often recursive programs from in ...
*
Kolmogorov complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produ ...
* Language identification in the limit * Straight-line grammar * Syntactic pattern recognition


Notes


References


Sources

* * * * * * {{Citation , last=Gold , first=E. Mark , title=Language Identification in the Limit , volume=10 , pages=447–474 , url=http://web.mit.edu/~6.863/www/spring2009/readings/gold67limit.pdf , publisher=
Information and Control ''Information and Computation'' is a closed-access computer science journal published by Elsevier (formerly Academic Press). The journal was founded in 1957 under its former name ''Information and Control'' and given its current title in 1987. , ...
, year=1967 Genetic programming Natural language processing Computational linguistics Grammar Inference