Menzerath's Law
   HOME

TheInfoList



OR:

Menzerath's law, also known as the Menzerath–Altmann law (named after
Paul Menzerath Paul Menzerath (1 January 1883 – 8 April 1954) was a German linguist and experimental phonetician. He discovered that in German, longer words used shorter syllables (based on phonemes) and he suggested that other languages may also follow this pr ...
and
Gabriel Altmann Gabriel Altmann (24 May 1931 – 2 March 2020) was a Slovak-German linguist and mathematician. He made significant contributions to the field of quantitative linguistics. He is best known for co-developing Menzerath's law, also known as the Men ...
), is a
linguistic Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
law according to which the increase of the size of a linguistic construct results in a decrease of the size of its constituents, and vice versa. For example, the longer a sentence (measured in terms of the number of clauses), the shorter the clauses (measured in terms of the number of words), or: the longer a word (in syllables or morphs), the shorter the syllables or morphs in sounds.


History

In the 19th century,
Eduard Sievers Eduard Sievers (; 25 November 1850 – 30 March 1932) was a German philologist of the classical and Germanic languages. Sievers was one of the '' Junggrammatiker'' of the so-called "Leipzig School". He was one of the most influential historical ...
observed that vowels in short words are pronounced longer than the same vowels in long words. Menzerath & de Oleza (1928) expanded this observation to state that, as the number of syllables in words increases, the syllables themselves become shorter on average. From this, the following hypothesis developed:
''The larger the whole, the smaller its parts.''
In particular, for linguistics:
''The larger a linguistic construct, the smaller its constituents.''
In the early 1980s, Altmann, Heups, and Köhler demonstrated using quantitative methods that this postulate can also be applied to larger constructs of natural language: the larger the sentence, the smaller the individual clauses, etc. A prerequisite for such relationships is that a relationship between units (here: sentence) and their direct constituents (here: clause) is examined.


Mathematics

According to Altmann (1980), it can be mathematically stated as: y=a \cdot x^ \cdot e^ where: * y is the constituent size (e.g. syllable length); * x is the size of the linguistic construct that is being inspected (e.g. number of syllables per word); * a, b, c are positive parameters. The law can be explained by assuming that linguistic segments contain information about their structure (besides the information that needs to be communicated). The assumption that the length of the structure information is independent of the length of the other content of the segment yields the alternative formula that was also successfully empirically tested.


Examples


Linguistics

Gerlach (1982) checked a German dictionary with about 15,000 entries: Where x is the number of morphs per word, n is the number of words in the dictionary with length x; y is the observed average length of morphs (number of phonemes per morph); y^* is the prediction according to y = ax^ where a, b are fited to data. The
F-test An F-test is a statistical test that compares variances. It is used to determine if the variances of two samples, or if the ratios of variances among multiple samples, are significantly different. The test calculates a Test statistic, statistic, ...
has p < 0.001. As another example, the simplest form of Menzerath's law, y=ax^, holds for the duration of vowels in Hungarian words: More examples are on the German Wikipedia pages on phoneme duration, syllable duration,
word length In computing, a word is any processor design's natural unit of data. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word size'', ''word wid ...
, clause length, and sentence length. This law also seems to hold true for at least a subclass of Japanese
Kanji are logographic Chinese characters, adapted from Chinese family of scripts, Chinese script, used in the writing of Japanese language, Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are ...
characters.


Non-linguistics

Beyond
quantitative linguistics Quantitative linguistics (QL) is a sub-discipline of general linguistics and, more specifically, of mathematical linguistics. Quantitative linguistics deals with language learning, language change, and application as well as structure of natural ...
, Menzerath's law can be discussed in any multi-level complex systems. Given three levels, x is the number of middle-level units contained in a high-level unit, y is the averaged number of low-level units contained in middle-level units, Menzerath's law claims a negative
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between y and x. Menzerath's law is shown to be true for both the base-
exon An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
-
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
levels in the
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
, and base-
chromosome A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
-
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
levels in genomes from a collection of species. In addition, Menzerath's law was shown to accurately predict the distribution of protein lengths in terms of amino acid number in the
proteome A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
of ten organisms. Furthermore, studies have shown that the social behavior of baboon groups also corresponds to Menzerath's Law: the larger the entire group, the smaller the subordinate social groups. In 2016, a research group at the
University of Michigan The University of Michigan (U-M, U of M, or Michigan) is a public university, public research university in Ann Arbor, Michigan, United States. Founded in 1817, it is the oldest institution of higher education in the state. The University of Mi ...
found that the calls of geladas obey Menzerath's law, observing that calls are abbreviated when used in longer sequences.


See also

* Zipf's law * Brevity law *
Heaps' law In linguistics, Heaps' law (also called Herdan's law) is an empirical law which describes the number of distinct words in a document (or set of documents) as a function of the document length (so called type-token relation). It can be formulated ...
*
Bradford's law Bradford's law is a pattern first described by Samuel C. Bradford in 1934 that estimates the exponentially diminishing returns of searching for references in science journals. One formulation is that if journals in a field are sorted by number ...
* Benford's law *
Pareto distribution The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial scien ...
*
Principle of least effort The principle of least effort is a broad theory that covers diverse fields from evolutionary biology to webpage design. It postulates that animals, people, and even well-designed machines will naturally choose the path of least resistance or "e ...
*
Rank–size distribution Rank–size distribution is the distribution of size by rank, in decreasing order of size. For example, if a data set consists of items of sizes 5, 100, 5, and 8, the rank-size distribution is 100, 8, 5, 5 (ranks 1 through 4). This is also known ...


References

{{Reflist Quantitative linguistics Linguistics