Zipf–Mandelbrot law
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the Zipf–Mandelbrot law is a
discrete probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
. Also known as the Pareto–Zipf law, it is a power-law distribution on ranked data, named after the
linguist Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
George Kingsley Zipf George Kingsley Zipf (; January 7, 1902 – September 25, 1950), was an American linguist and philologist who studied statistical occurrences in different languages.. Zipf earned his bachelors, masters, and doctoral degrees from Harvard Universi ...
who suggested a simpler distribution called Zipf's law, and the mathematician Benoit Mandelbrot, who subsequently generalized it. The probability mass function is given by: :f(k;N,q,s)=\frac where H_ is given by: :H_=\sum_^N \frac which may be thought of as a generalization of a
harmonic number In mathematics, the -th harmonic number is the sum of the reciprocals of the first natural numbers: H_n= 1+\frac+\frac+\cdots+\frac =\sum_^n \frac. Starting from , the sequence of harmonic numbers begins: 1, \frac, \frac, \frac, \frac, \dot ...
. In the formula, k is the rank of the data, and q and s are parameters of the distribution. In the limit as N approaches infinity, this becomes the
Hurwitz zeta function In mathematics, the Hurwitz zeta function is one of the many zeta functions. It is formally defined for complex variables with and by :\zeta(s,a) = \sum_^\infty \frac. This series is absolutely convergent for the given values of and and c ...
\zeta(s,q). For finite N and q=0 the Zipf–Mandelbrot law becomes Zipf's law. For infinite N and q=0 it becomes a
Zeta distribution In probability theory and statistics, the zeta distribution is a discrete probability distribution. If ''X'' is a zeta-distributed random variable with parameter ''s'', then the probability that ''X'' takes the integer value ''k'' is given by t ...
.


Applications

The distribution of words ranked by their
frequency Frequency is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as ''temporal frequency'' for clarity, and is distinct from ''angular frequency''. Frequency is measured in hertz (Hz) which is eq ...
in a random
text corpus In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical a ...
is approximated by a power-law distribution, known as Zipf's law. If one plots the frequency rank of words contained in a moderately sized corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Powers, 1998 and Gelbukh & Sidorov, 2001). Zipf's law implicitly assumes a fixed vocabulary size, but the Harmonic series with ''s''=1 does not converge, while the Zipf–Mandelbrot generalization with ''s''>1 does. Furthermore, there is evidence that the closed class of functional words that define a language obeys a Zipf–Mandelbrot distribution with different parameters from the open classes of contentive words that vary by topic, field and register. In ecological field studies, the
relative abundance distribution In the field of ecology, the relative abundance distribution (RAD) or species abundance distribution describes the relationship between the number of species observed in a field study as a function of their observed abundance. The graphs obtained ...
(i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a Zipf–Mandelbrot law. Within music, many metrics of measuring "pleasing" music conform to Zipf–Mandelbrot distributions.


Notes


References

* Reprinted as ** * * * Van Droogenbroeck F.J., 'An essential rephrasing of the Zipf–Mandelbrot law to solve authorship attribution applications by Gaussian statistics' (2019


External links


Z. K. Silagadze: Citations and the Zipf–Mandelbrot's law






* ttps://github.com/gkohri/discreteRNG C++ Library for generating random Zipf–Mandelbrot deviates. {{DEFAULTSORT:Zipf-Mandelbrot Law Discrete distributions Power laws Computational linguistics Quantitative linguistics Corpus linguistics