HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
and
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, the Zipf–Mandelbrot law is a discrete probability distribution. Also known as the Pareto–Zipf law, it is a power-law distribution on ranked data, named after the
linguist Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Lingu ...
George Kingsley Zipf who suggested a simpler distribution called Zipf's law, and the mathematician
Benoit Mandelbrot Benoit B. Mandelbrot (20 November 1924 – 14 October 2010) was a Polish-born French-American mathematician and polymath with broad interests in the practical sciences, especially regarding what he labeled as "the art of roughness" of p ...
, who subsequently generalized it. The probability mass function is given by: :f(k;N,q,s)=\frac where H_ is given by: :H_=\sum_^N \frac which may be thought of as a generalization of a
harmonic number In mathematics, the -th harmonic number is the sum of the reciprocals of the first natural numbers: H_n= 1+\frac+\frac+\cdots+\frac =\sum_^n \frac. Starting from , the sequence of harmonic numbers begins: 1, \frac, \frac, \frac, \frac, \do ...
. In the formula, k is the rank of the data, and q and s are parameters of the distribution. In the limit as N approaches infinity, this becomes the Hurwitz zeta function \zeta(s,q). For finite N and q=0 the Zipf–Mandelbrot law becomes Zipf's law. For infinite N and q=0 it becomes a Zeta distribution.


Applications

The distribution of words ranked by their
frequency Frequency is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as ''temporal frequency'' for clarity, and is distinct from ''angular frequency''. Frequency is measured in hertz (Hz) which is eq ...
in a random text corpus is approximated by a power-law distribution, known as Zipf's law. If one plots the frequency rank of words contained in a moderately sized corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Powers, 1998 and Gelbukh & Sidorov, 2001). Zipf's law implicitly assumes a fixed vocabulary size, but the Harmonic series with ''s''=1 does not converge, while the Zipf–Mandelbrot generalization with ''s''>1 does. Furthermore, there is evidence that the closed class of functional words that define a language obeys a Zipf–Mandelbrot distribution with different parameters from the open classes of contentive words that vary by topic, field and register. In ecological field studies, the relative abundance distribution (i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a Zipf–Mandelbrot law. Within music, many metrics of measuring "pleasing" music conform to Zipf–Mandelbrot distributions.


Notes


References

* Reprinted as ** * * * Van Droogenbroeck F.J., 'An essential rephrasing of the Zipf–Mandelbrot law to solve authorship attribution applications by Gaussian statistics' (2019


External links


Z. K. Silagadze: Citations and the Zipf–Mandelbrot's law






* ttps://github.com/gkohri/discreteRNG C++ Library for generating random Zipf–Mandelbrot deviates. {{DEFAULTSORT:Zipf-Mandelbrot Law Discrete distributions Power laws Computational linguistics Quantitative linguistics Corpus linguistics