A stochastic grammar (statistical grammar) is a
grammar framework
In linguistics, grammar is the set of rules for how a natural language is structured, as demonstrated by its speakers or writers. Grammar rules may concern the use of clauses, phrases, and words. The term may also refer to the study of such rul ...
with a
probabilistic
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
notion of
grammaticality
In linguistics, grammaticality is determined by the conformity to language usage as derived by the grammar of a particular speech variety. The notion of grammaticality rose alongside the theory of generative grammar, the goal of which is to formu ...
:
*
Stochastic context-free grammar In theoretical linguistics and computational linguistics, probabilistic context free grammars (PCFGs) extend context-free grammars, similar to how hidden Markov models extend regular grammars. Each Formal grammar#The syntax of grammars, production i ...
*Statistical parsing
*
Data-oriented parsing
*
Hidden Markov model
A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent (or ''hidden'') Markov process (referred to as X). An HMM requires that there be an observable process Y whose outcomes depend on the outcomes of X ...
(or stochastic
regular grammar)
*
Estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of Statistical parameter, parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such ...
The grammar is realized as a
language model
A language model is a model of the human brain's ability to produce natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation,Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013)"S ...
. Allowed sentences are stored in a database together with the frequency how common a sentence is.
Statistical
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
uses
stochastic Stochastic (; ) is the property of being well-described by a random probability distribution. ''Stochasticity'' and ''randomness'' are technically distinct concepts: the former refers to a modeling approach, while the latter describes phenomena; i ...
,
probabilistic
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
and
statistical
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
methods, especially to resolve difficulties that arise because longer sentences are highly
ambiguous
Ambiguity is the type of meaning in which a phrase, statement, or resolution is not explicitly defined, making for several interpretations; others describe it as a concept or statement that has no real reference. A common aspect of ambiguit ...
when processed with realistic grammars, yielding thousands or millions of possible analyses. Methods for disambiguation often involve the use of
corpora
Corpus (plural ''corpora'') is Latin for "body". It may refer to:
Linguistics
* Text corpus, in linguistics, a large and structured set of texts
* Speech corpus, in linguistics, a large set of speech audio files
* Corpus linguistics, a branch of ...
and
Markov model
In probability theory, a Markov model is a stochastic model used to Mathematical model, model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, ...
s. "A probabilistic model consists of a non-probabilistic model plus some numerical quantities; it is not true that probabilistic models are inherently simpler or less structural than non-probabilistic models."
Examples
A probabilistic method for rhyme detection is implemented by Hirjee & Brown
in their study in 2013 to find internal and imperfect rhyme pairs in rap lyrics. The concept is adapted from a
sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between ...
technique using
BLOSUM (BLOcks SUbstitution Matrix). They were able to detect rhymes undetectable by non-probabilistic models.
See also
*
Colorless green ideas sleep furiously
''Colorless green ideas sleep furiously'' was composed by Noam Chomsky in his 1957 book '' Syntactic Structures'' as an example of a sentence that is grammatically well-formed, but semantically nonsensical. The sentence was originally used ...
*
Computational linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
*
L-system#Stochastic grammars
*
Stochastic context-free grammar In theoretical linguistics and computational linguistics, probabilistic context free grammars (PCFGs) extend context-free grammars, similar to how hidden Markov models extend regular grammars. Each Formal grammar#The syntax of grammars, production i ...
*
Statistical language acquisition
Statistical language acquisition, a branch of developmental psycholinguistics, studies the process by which humans develop the ability to perceive, produce, comprehend, and communicate with natural language in all of its aspects (phonological, sy ...
References
Further reading
*Christopher D. Manning, Hinrich Schütze: ''Foundations of Statistical Natural Language Processing'', MIT Press (1999), .
*Stefan Wermter, Ellen Riloff, Gabriele Scheler (eds.): ''Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing'', Springer (1996), .
*Pirani, Giancarlo, ed. Advanced algorithms and architectures for speech understanding. Vol. 1. Springer Science & Business Media, 2013.
Grammar frameworks
Probabilistic models
{{grammar-stub