Quantitative linguistics (QL) is a sub-discipline of
general linguistics and, more specifically, of
mathematical linguistics. Quantitative linguistics deals with language learning, language change, and application as well as structure of natural languages.
QL investigates languages using statistical methods; its most demanding objective is the formulation of language laws and, ultimately, of a general
theory of language in the sense of a set of interrelated languages laws.
Synergetic linguistics was from its very beginning specifically designed for this purpose.
QL is empirically based on the results of language statistics, a field which can be interpreted as statistics of languages or as statistics of any linguistic object. This field is not necessarily connected to substantial theoretical ambitions.
Corpus linguistics
Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural ''corpora''). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a giv ...
and
computational linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
are other fields which contribute important
empirical evidence
Empirical evidence is evidence obtained through sense experience or experimental procedure. It is of central importance to the sciences and plays a role in various other fields, like epistemology and law.
There is no general agreement on how the ...
.
History
The earliest QL approaches date back to the ancient Indian world. One of the historical sources consists of applications of combinatorics to linguistic matters, another one is based on elementary statistical studies, which can be found under the header
colometry
Colometry is a scholarly technique used in linguistics, particularly in the analysis of ancient texts. The name comes from the notion of ( colon, ''cola'') used in the structuring of the Classical rhetorical tradition and poetry, designating a ...
and
stichometry.
Quantitative laws

In QL, the concept of law is understood as the class of law hypotheses which have been deduced from theoretical assumptions, are mathematically formulated, are interrelated with other laws in the field, and have sufficiently and successfully been tested on empirical data, i.e. which could not be refuted in spite of much effort to do so. Reinhard Köhler writes about QL laws:
Linguistic laws
In quantitative linguistics, linguistic laws are statistical regularities emerging across different linguistic scales (i.e. phonemes, syllables, words or sentences) that can be formulated mathematically and that have been deduced from certain theoretical assumptions. They are also required to have been successfully tested through the use of data, that is, not to have been refuted by empirical evidence. Among the main linguistic laws proposed by various authors, the following can be highlighted:
*
Zipf's law: The frequency of words is inversely proportional to their rank in frequency lists. Similar distribution between rank and frequency of sounds, phonemes, and letters can be observed.
*
Heaps' law
In linguistics, Heaps' law (also called Herdan's law) is an empirical law which describes the number of distinct words in a document (or set of documents) as a function of the document length (so called type-token relation). It can be formulated ...
: It describes the number of distinct words in a document (or set of documents) as a function of the document length.
*
Brevity law or Zipf's law of abbreviation: It qualitatively states that the more frequently a word is used, the 'shorter' that word tends to be.
*
Menzerath's law (also, Menzerath-Altmann law): This law states that the sizes of the constituents of a construction decrease with increasing size of the construction under study. The longer, e.g. a sentence (measured in terms of the number of clauses) the shorter the clauses (measured in terms of the number of words), or: the longer a word (in syllables or morphs) the shorter the syllables or words in sounds).
* Law of diversification: If linguistic categories such as parts-of-speech or inflectional endings appear in various forms it can be shown that the frequencies of their occurrences in texts are controlled by laws.
* : This law concerns lexical chains which are obtained by looking up the definition of a word in a dictionary, then looking up the definition of the definition just obtained etc. Finally, all these definitions form a hierarchy of more and more general meanings, whereby the number of definitions decreases with increasing generality. Among the levels of this kind of hierarchy, there exists a number of lawful relations.
* Piotrowski's law of
language change
Language change is the process of alteration in the features of a single language, or of languages in general, over time. It is studied in several subfields of linguistics: historical linguistics, sociolinguistics, and evolutionary linguistic ...
: Growth processes in language such as vocabulary growth, the dispersion of foreign or loan words, changes in the inflectional system etc. correspond to growth models in other scientific disciplines. Piotrowski's law is an application of the
logistic function
A logistic function or logistic curve is a common S-shaped curve ( sigmoid curve) with the equation
f(x) = \frac
where
The logistic function has domain the real numbers, the limit as x \to -\infty is 0, and the limit as x \to +\infty is L.
...
. It was shown that it also covers
language acquisition
Language acquisition is the process by which humans acquire the capacity to perceive and comprehend language. In other words, it is how human beings gain the ability to be aware of language, to understand it, and to produce and use words and s ...
processes (cf. language acquisition law).
* Text block law: Linguistic units (e.g. words, letters, syntactic functions and constructions) show a specific frequency distribution in equally large text blocks.
Stylistics
The study of poetic and non-poetic styles can be based on statistical methods. Moreover, it is possible to conduct corresponding investigations on the basis of the specific forms (parameters) that language laws take in texts of different styles. In such cases, QL supports research into
stylistics
Stylistics, a branch of applied linguistics, is the study and interpretation of texts of all types, but particularly literary texts, and spoken language with regard to their linguistic and tonal style, where style is the particular variety of l ...
: One of the overall aims is to make evidence for stylistic phenomena as objective as possible by referring to language laws. One of the central assumptions of QL is that some laws (e.g. the distribution of word lengths) require different models, and hence different parameter values of the laws (distributions or functions) depending on the corpus that a text belongs to. If poetic texts are under study, QL methods form a sub-discipline of Quantitative Study of Literature (
stylometrics).
Important authors
*
Gabriel Altmann (1931-2020)
*
Otto Behaghel (1854–1936); cf.
Behaghel's laws
* (1943)
* (1897–1966)
*
William Palin Elderton (1877–1962)
*
*
Ernst Wilhelm Förstemann (1822–1906)
* (1902–1990)
* (1957-2019)
* (1897–1968)
* (1934-2015)
* (1843–1928)
* (1951)
*
Snježana Kordić (1964)
*
Werner Lehfeldt (1943)
* (1938–2012)
*
Haitao Liu
* (1897–1973)
*
Paul Menzerath (1883–1954), cf.
Menzerath's law
* (1926-2014)
*
Augustus De Morgan
Augustus De Morgan (27 June 1806 – 18 March 1871) was a British mathematician and logician. He is best known for De Morgan's laws, relating logical conjunction, disjunction, and negation, and for coining the term "mathematical induction", the ...
(1806–1871)
* (1909-2015)
*
*
L.A. Sherman
* (1922–2003)
* Andrew Wilson, Lancaster
* (1865–1915)
*
George Kingsley Zipf (1902–1950); cf.
Zipf's law
* (1899–1984).
Phonometry[ :de:Eberhard Zwirner]
See also
*
Quantitative comparative linguistics
Quantitative comparative linguistics is the use of quantitative research, quantitative analysis as applied to comparative linguistics. Examples include the statistical fields of lexicostatistics and glottochronology, and the borrowing of phylogene ...
Notes
References
* Karl-Heinz Best: ''Quantitative Linguistik. Eine Annäherung''. 3., stark überarbeitete und ergänzte Auflage. Peust & Gutschmidt, Göttingen 2006, .
* Karl-Heinz Best, Otto Rottmann: ''Quantitative Linguistics, an Invitation.'' RAM-Verlag, Lüdenscheid 2017. .
* Reinhard Köhler with the assistance of Christiane Hoffmann: ''Bibliography of Quantitative Linguistics.'' Benjamins, Amsterdam/ Philadelphia 1995, .
* Reinhard Köhler, Gabriel Altmann, Gabriel, Rajmund G. Piotrowski (eds.): ''Quantitative Linguistik - Quantitative Linguistics. Ein internationales Handbuch – An International Handbook''. de Gruyter, Berlin/ New York 2005, .
* Haitao Liu & Wei Huang
Quantitative Linguistics:State of the Art, Theories and Methods ''Journal of Zhejiang University (Humanities and Social Science)''. 2012,43(2):178-192. in Chinese.
External links
*
IQLA - International Quantitative Linguistics Association
{{Authority control
Quantitative research
Applied linguistics
Mathematical linguistics