In
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
and
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the Tweedie distributions are a family of
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s which include the purely continuous
normal Normal(s) or The Normal(s) may refer to:
Film and television
* ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson
* ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie
* ''Norma ...
,
gamma
Gamma (uppercase , lowercase ; ''gámma'') is the third letter of the Greek alphabet. In the system of Greek numerals it has a value of 3. In Ancient Greek, the letter gamma represented a voiced velar stop . In Modern Greek, this letter re ...
and
inverse Gaussian distributions, the purely discrete scaled
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
, and the class of
compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous.
Tweedie distributions are a special case of
exponential dispersion model
In probability and statistics, the class of exponential dispersion models (EDM) is a set of probability distributions that represents a generalisation of the natural exponential family.Jørgensen, B. (1987). Exponential dispersion models (with dis ...
s and are often used as distributions for
generalized linear model
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
s.
The Tweedie distributions were named by
Bent Jørgensen after
Maurice Tweedie, a statistician and medical physicist at the
University of Liverpool
, mottoeng = These days of peace foster learning
, established = 1881 – University College Liverpool1884 – affiliated to the federal Victoria Universityhttp://www.legislation.gov.uk/ukla/2004/4 University of Manchester Act 200 ...
, UK, who presented the first thorough study of these distributions in 1984.
Definitions
The (reproductive) Tweedie distributions are defined as subfamily of (reproductive)
exponential dispersion model
In probability and statistics, the class of exponential dispersion models (EDM) is a set of probability distributions that represents a generalisation of the natural exponential family.Jørgensen, B. (1987). Exponential dispersion models (with dis ...
s (ED), with a special
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set.
For a data set, the ''arithme ...
-
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
relationship.
A
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''Y'' is Tweedie distributed ''Tw
p(μ, σ
2)'', if
with mean
, positive dispersion parameter
and
:
where
is called Tweedie power parameter.
The probability distribution ''P
θ,σ2'' on the
measurable sets ''A'', is given by
:
for some σ-finite measure ''ν
λ''.
This representation uses the canonical parameter ''θ'' of an exponential dispersion model and
cumulant function
:
where we used
, or equivalently
.
Properties
Additive exponential dispersion models
The models just described are in the reproductive form. An exponential dispersion model has always a dual: the additive form. If ''Y'' is reproductive, then
with
is in the additive form ED
*(''θ'',''λ''), for Tweedie ''Tw
*p(μ, λ)''. Additive models have the property that the distribution of the sum of independent random variables,
:
for which ''Z''
''i'' ~ ED
*(''θ'',''λ''
''i'') with fixed ''θ'' and various ''λ'' are members of the family of distributions with the same ''θ'',
:
Reproductive exponential dispersion models
A second class of exponential dispersion models exists designated by the random variable
:
where ''σ''
2 = 1/''λ'', known as reproductive exponential dispersion models. They have the property that for ''n'' independent random variables ''Y''
''i'' ~ ED(''μ'',''σ''
2/''w''
''i''), with weighting factors ''w
i'' and
:
a weighted average of the variables gives,
:
For reproductive models the weighted average of independent random variables with fixed ''μ'' and ''σ''
2 and various values for ''w
i'' is a member of the family of distributions with same ''μ'' and ''σ''
2.
The Tweedie exponential dispersion models are both additive and reproductive; we thus have the ''duality transformation''
:
Scale invariance
A third property of the Tweedie models is that they are
scale invariant
In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality.
The technical term ...
: For a reproductive exponential dispersion model ''Tw
p(μ, σ
2)'' and any positive constant ''c'' we have the property of closure under scale transformation,
:
The Tweedie power variance function
To define the
variance function
In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statisti ...
for exponential dispersion models we make use of the mean value mapping, the relationship between the canonical parameter ''θ'' and the mean ''μ''. It is defined by the function
:
with cumulative function
.
The
variance function
In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statisti ...
''V''(''μ'') is constructed from the mean value mapping,
:
Here the minus exponent in ''τ''
−1(''μ'') denotes an inverse function rather than a reciprocal. The mean and variance of an additive random variable is then E(''Z'') = ''λμ'' and var(''Z'') = ''λV''(''μ'').
Scale invariance implies that the variance function obeys the relationship
''V''(''μ'') = ''μ''
''p''.
The Tweedie deviance
The unit
deviance of a reproductive Tweedie distribution is given by
:
The Tweedie cumulant generating functions
The properties of exponential dispersion models give us two
differential equation
In mathematics, a differential equation is an equation that relates one or more unknown functions and their derivatives. In applications, the functions generally represent physical quantities, the derivatives represent their rates of change, an ...
s.
The first relates the mean value mapping and the variance function to each other,
:
The second shows how the mean value mapping is related to the
cumulant function,
:
These equations can be solved to obtain the cumulant function for different cases of the Tweedie models. A cumulant generating function (CGF) may then be obtained from the cumulant function. The additive CGF is generally specified by the equation
:
and the reproductive CGF by
:
where ''s'' is the generating function variable.
For the additive Tweedie models the CGFs take the form,
:
and for the reproductive models,
:
The additive and reproductive Tweedie models are conventionally denoted by the symbols ''Tw''
*''p''(''θ'',''λ'') and ''Tw''
''p''(''θ'',''σ''
2), respectively.
The first and second derivatives of the CGFs, with ''s'' = 0, yields the mean and variance, respectively. One can thus confirm that for the additive models the variance relates to the mean by the power law,
:
The Tweedie convergence theorem
The Tweedie exponential dispersion models are fundamental in statistical theory consequent to their roles as foci of
convergence
Convergence may refer to:
Arts and media Literature
*''Convergence'' (book series), edited by Ruth Nanda Anshen
* "Convergence" (comics), two separate story lines published by DC Comics:
**A four-part crossover storyline that united the four Wei ...
for a wide range of statistical processes. Jørgensen ''et al'' proved a theorem that specifies the asymptotic behaviour of variance functions known as the Tweedie convergence theorem. This theorem, in technical terms, is stated thus:
The unit variance function is regular of order ''p'' at zero (or infinity) provided that ''V''(''μ'') ~ ''c''
0''μ''
''p'' for ''μ'' as it approaches zero (or infinity) for all real values of ''p'' and ''c''
0 > 0. Then for a unit variance function regular of order ''p'' at either zero or infinity and for
:
for any
, and
we have
:
as
or
, respectively, where the convergence is through values of ''c'' such that ''cμ'' is in the domain of ''θ'' and ''c''
''p''−2/''σ''
2 is in the domain of ''λ''. The model must be infinitely divisible as ''c''
2−''p'' approaches infinity.
In nontechnical terms this theorem implies that any exponential dispersion model that asymptotically manifests a variance-to-mean power law is required to have a variance function that comes within the
domain of attraction of a Tweedie model. Almost all distribution functions with finite cumulant generating functions qualify as exponential dispersion models and most exponential dispersion models manifest variance functions of this form. Hence many probability distributions have variance functions that express this asymptotic behaviour, and the Tweedie distributions become foci of convergence for a wide range of data types.
Related distributions
The Tweedie distributions include a number of familiar distributions as well as some unusual ones, each being specified by the
domain
Domain may refer to:
Mathematics
*Domain of a function, the set of input values for which the (total) function is defined
**Domain of definition of a partial function
**Natural domain of a partial function
**Domain of holomorphy of a function
* Do ...
of the index parameter. We have the
*extreme stable distribution, ''p'' < 0,
*
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
, ''p'' = 0,
*
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
, ''p'' = 1,
*
compound Poisson–gamma distribution, 1 < ''p'' < 2,
*
gamma distribution
In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distri ...
, ''p'' = 2,
*positive
stable distribution
In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stab ...
s, 2 < ''p'' < 3,
*
Inverse Gaussian distribution
In probability theory, the inverse Gaussian distribution (also known as the Wald distribution) is a two-parameter family of continuous probability distributions with support on (0,∞).
Its probability density function is given by
: f(x;\mu,\ ...
, ''p'' = 3,
*positive stable distributions, ''p'' > 3, and
*extreme stable distributions, ''p'' = .
For 0 < ''p'' < 1 no Tweedie model exists. Note that all ''stable'' distributions mean actually ''generated by stable distributions''.
Occurrence and applications
The Tweedie models and Taylor’s power law
Taylor's law
Taylor's power law is an empirical law in ecology that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a power law relationship. It is named after the ecologist who first propos ...
is an empirical law in
ecology
Ecology () is the study of the relationships between living organisms, including humans, and their physical environment. Ecology considers organisms at the individual, population, community, ecosystem, and biosphere level. Ecology overlaps wi ...
that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a
power-law
In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one qua ...
relationship.
For the population count ''Y'' with mean ''µ'' and variance var(''Y''), Taylor's law is written,
:
where ''a'' and ''p'' are both positive constants. Since L. R. Taylor described this law in 1961 there have been many different explanations offered to explain it, ranging from animal behavior,
a
random walk
In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space.
An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
model,
a
stochastic birth, death, immigration and emigration model,
to a consequence of equilibrium and non-equilibrium
statistical mechanics
In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic be ...
.
No consensus exists as to an explanation for this model.
Since Taylor's law is mathematically identical to the variance-to-mean power law that characterizes the Tweedie models, it seemed reasonable to use these models and the Tweedie convergence theorem to explain the observed clustering of animals and plants associated with Taylor's law.
The majority of the observed values for the power-law exponent ''p'' have fallen in the interval (1,2) and so the Tweedie compound Poisson–gamma distribution would seem applicable. Comparison of the
empirical distribution function
In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...
to the theoretical compound Poisson–gamma distribution has provided a means to verify consistency of this hypothesis.
Whereas conventional models for Taylor's law have tended to involve ''
ad hoc
Ad hoc is a Latin phrase meaning literally 'to this'. In English, it typically signifies a solution for a specific purpose, problem, or task rather than a generalized solution adaptable to collateral instances. (Compare with ''a priori''.)
Com ...
'' animal behavioral or
population dynamic assumptions, the Tweedie convergence theorem would imply that Taylor's law results from a general mathematical convergence effect much as how the
central limit theorem
In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
governs the convergence behavior of certain types of random data. Indeed, any mathematical model, approximation or simulation that is designed to yield Taylor's law (on the basis of this theorem) is required to converge to the form of the Tweedie models.
Tweedie convergence and 1/''f'' noise
Pink noise
Pink noise or noise is a signal or process with a frequency spectrum such that the power spectral density (power per frequency interval) is inversely proportional to the frequency of the signal. In pink noise, each octave interval (halving ...
, or 1/''f'' noise, refers to a pattern of noise characterized by a power-law relationship between its intensities ''S''(''f'') at different frequencies ''f'',
:
where the dimensionless exponent ''γ'' ∈
,1 It is found within a diverse number of natural processes.
Many different explanations for 1/''f'' noise exist, a widely held hypothesis is based on
Self-organized criticality
Self-organized criticality (SOC) is a property of dynamical systems that have a critical point as an attractor. Their macroscopic behavior thus displays the spatial or temporal scale-invariance characteristic of the critical point of a phase ...
where dynamical systems close to a
critical point are thought to manifest
scale-invariant spatial and/or temporal behavior.
In this subsection a mathematical connection between 1/''f'' noise and the Tweedie variance-to-mean power law will be described. To begin, we first need to introduce
self-similar process Self-similar processes are types of stochastic processes that exhibit the phenomenon of self-similarity. A self-similar phenomenon behaves the same when viewed at different degrees of magnification, or different scales on a dimension (space or time ...
es: For the sequence of numbers
:
with mean
:
deviations
:
variance
:
and autocorrelation function
:
with lag ''k'', if the
autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
of this sequence has the long range behavior
:
as ''k'' and where ''L''(''k'') is a slowly varying function at large values of ''k'', this sequence is called a self-similar process.
The method of expanding bins can be used to analyze self-similar processes. Consider a set of equal-sized non-overlapping bins that divides the original sequence of ''N'' elements into groups of ''m'' equal-sized segments (''N/m'' is integer) so that new reproductive sequences, based on the mean values, can be defined:
:
The variance determined from this sequence will scale as the bin size changes such that
:
if and only if the autocorrelation has the limiting form
:
One can also construct a set of corresponding additive sequences
:
based on the expanding bins,
:
Provided the autocorrelation function exhibits the same behavior, the additive sequences will obey the relationship
:
Since
and
are constants this relationship constitutes a variance-to-mean power law, with ''p'' = 2 - ''d''.
The
biconditional
In logic and mathematics, the logical biconditional, sometimes known as the material biconditional, is the logical connective (\leftrightarrow) used to conjoin two statements and to form the statement " if and only if ", where is known as t ...
relationship above between the variance-to-mean power law and power law autocorrelation function, and the
Wiener–Khinchin theorem
In applied mathematics, the Wiener–Khinchin theorem or Wiener–Khintchine theorem, also known as the Wiener–Khinchin–Einstein theorem or the Khinchin–Kolmogorov theorem, states that the autocorrelation function of a wide-sense-stationary r ...
[McQuarrie DA (1976) ''Statistical mechanics'' arper & Row/ref> imply that any sequence that exhibits a variance-to-mean power law by the method of expanding bins will also manifest 1/''f'' noise, and vice versa. Moreover, the Tweedie convergence theorem, by virtue of its central limit-like effect of generating distributions that manifest variance-to-mean power functions, will also generate processes that manifest 1/''f'' noise.] The Tweedie convergence theorem thus provides an alternative explanation for the origin of 1/''f'' noise, based its central limit-like effect.
Much as the central limit theorem
In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
requires certain kinds of random processes to have as a focus of their convergence the Gaussian distribution and thus express white noise
In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, ...
, the Tweedie convergence theorem requires certain non-Gaussian processes to have as a focus of convergence the Tweedie distributions that express 1/''f'' noise.
The Tweedie models and multifractality
From the properties of self-similar processes, the power-law exponent ''p'' = 2 - ''d'' is related to the Hurst exponent
The Hurst exponent is used as a measure of long-term memory of time series. It relates to the autocorrelations of the time series, and the rate at which these decrease as the lag between pairs of values increases.
Studies involving the Hurst expone ...
''H'' and the fractal dimension
In mathematics, more specifically in fractal geometry, a fractal dimension is a ratio providing a statistical index of complexity comparing how detail in a pattern (strictly speaking, a fractal pattern) changes with the scale at which it is meas ...
''D'' by
:
A one-dimensional data sequence of self-similar data may demonstrate a variance-to-mean power law with local variations in the value of ''p'' and hence in the value of ''D''. When fractal structures manifest local variations in fractal dimension, they are said to be multifractals
A multifractal system is a generalization of a fractal system in which a single exponent (the fractal dimension) is not enough to describe its dynamics; instead, a continuous spectrum of exponents (the so-called singularity spectrum) is needed ...
. Examples of data sequences that exhibit local variations in ''p'' like this include the eigenvalue deviations of the Gaussian Orthogonal and Unitary Ensembles. The Tweedie compound Poisson–gamma distribution has served to model multifractality based on local variations in the Tweedie exponent ''α''. Consequently, in conjunction with the variation of ''α'', the Tweedie convergence theorem can be viewed as having a role in the genesis of such multifractals.
The variation of ''α'' has been found to obey the asymmetric Laplace distribution
In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponen ...
in certain cases. This distribution has been shown to be a member of the family of geometric Tweedie models, that manifest as limiting distributions in a convergence theorem for geometric dispersion models.
Regional organ blood flow
Regional organ blood flow has been traditionally assessed by the injection of radiolabelled
Isotopic labeling (or isotopic labelling) is a technique used to track the passage of an isotope (an atom with a detectable variation in neutron count) through a reaction, metabolic pathway, or cell. The reactant is 'labeled' by replacing specific ...
polyethylene microspheres
Microbeads are manufactured solid plastic particles of less than one millimeter in their largest dimension. They are most frequently made of polyethylene but can be of other petrochemical plastics such as polypropylene and polystyrene.
They are u ...
into the arterial circulation of animals, of a size that they become entrapped within the microcirculation
The microcirculation is the circulation of the blood in the smallest blood vessels, the microvessels of the microvasculature present within organ tissues. The microvessels include terminal arterioles, metarterioles, capillaries, and venules. ...
of organs. The organ to be assessed is then divided into equal-sized cubes and the amount of radiolabel within each cube is evaluated by liquid scintillation counting Liquid scintillation counting is the measurement of radioactive activity of a sample material which uses the technique of mixing the active material with a liquid scintillator (e.g. zinc sulfide), and counting the resultant photon emissions. The pu ...
and recorded. The amount of radioactivity within each cube is taken to reflect the blood flow through that sample at the time of injection. It is possible to evaluate adjacent cubes from an organ in order to additively determine the blood flow through larger regions. Through the work of J B Bassingthwaighte and others an empirical power law has been derived between the relative dispersion of blood flow of tissue samples (''RD'' = standard deviation/mean) of mass ''m'' relative to reference-sized samples:
:
This power law exponent ''Ds'' has been called a fractal dimension. Bassingthwaighte's power law can be shown to directly relate to the variance-to-mean power law. Regional organ blood flow can thus be modelled by the Tweedie compound Poisson–gamma distribution., In this model tissue sample could be considered to contain a random (Poisson) distributed number of entrapment sites, each with gamma distributed blood flow. Blood flow at this microcirculatory level has been observed to obey a gamma distribution, thus providing support for this hypothesis.
Cancer metastasis
The "experimental cancer metastasis
Metastasis is a pathogenic agent's spread from an initial or primary site to a different or secondary site within the host's body; the term is typically used when referring to metastasis by a cancerous tumor. The newly pathological sites, then, ...
assay" has some resemblance to the above method to measure regional blood flow. Groups of syngeneic and age matched mice are given intravenous injections of equal-sized aliquots of suspensions of cloned cancer cells and then after a set period of time their lungs are removed and the number of cancer metastases enumerated within each pair of lungs. If other groups of mice are injected with different cancer cell clones
Clone or Clones or Cloning or Cloned or The Clone may refer to:
Places
* Clones, County Fermanagh
* Clones, County Monaghan, a town in Ireland
Biology
* Clone (B-cell), a lymphocyte clone, the massive presence of which may indicate a pathologi ...
then the number of metastases per group will differ in accordance with the metastatic potentials of the clones. It has been long recognized that there can be considerable intraclonal variation in the numbers of metastases per mouse despite the best attempts to keep the experimental conditions within each clonal group uniform. This variation is larger than would be expected on the basis of a Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
of numbers of metastases per mouse in each clone and when the variance of the number of metastases per mouse was plotted against the corresponding mean a power law was found.
The variance-to-mean power law for metastases was found to also hold for spontaneous murine metastases and for cases series of human metastases.
Since hematogenous metastasis occurs in direct relationship to regional blood flow and videomicroscopic studies indicate that the passage and entrapment of cancer cells within the circulation appears analogous to the microsphere experiments it seemed plausible to propose that the variation in numbers of hematogenous metastases could reflect heterogeneity in regional organ blood flow.
The blood flow model was based on the Tweedie compound Poisson–gamma distribution, a distribution governing a continuous random variable. For that reason in the metastasis model it was assumed that blood flow was governed by that distribution and that the number of regional metastases occurred as a Poisson process
In probability, statistics and related fields, a Poisson point process is a type of random mathematical object that consists of points randomly located on a mathematical space with the essential feature that the points occur independently of one ...
for which the intensity was directly proportional to blood flow. This led to the description of the Poisson negative binomial (PNB) distribution as a discrete equivalent to the Tweedie compound Poisson–gamma distribution. The probability generating function In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often ...
for the PNB distribution is
: