Tweedie Distribution
   HOME

TheInfoList



OR:

In
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
and
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the Tweedie distributions are a family of
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s which include the purely continuous
normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...
,
gamma Gamma (uppercase , lowercase ; ''gámma'') is the third letter of the Greek alphabet. In the system of Greek numerals it has a value of 3. In Ancient Greek, the letter gamma represented a voiced velar stop . In Modern Greek, this letter re ...
and inverse Gaussian distributions, the purely discrete scaled
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of
exponential dispersion model In probability and statistics, the class of exponential dispersion models (EDM) is a set of probability distributions that represents a generalisation of the natural exponential family.Jørgensen, B. (1987). Exponential dispersion models (with dis ...
s and are often used as distributions for
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
s. The Tweedie distributions were named by Bent Jørgensen after Maurice Tweedie, a statistician and medical physicist at the
University of Liverpool , mottoeng = These days of peace foster learning , established = 1881 – University College Liverpool1884 – affiliated to the federal Victoria Universityhttp://www.legislation.gov.uk/ukla/2004/4 University of Manchester Act 200 ...
, UK, who presented the first thorough study of these distributions in 1984.


Definitions

The (reproductive) Tweedie distributions are defined as subfamily of (reproductive)
exponential dispersion model In probability and statistics, the class of exponential dispersion models (EDM) is a set of probability distributions that represents a generalisation of the natural exponential family.Jørgensen, B. (1987). Exponential dispersion models (with dis ...
s (ED), with a special
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...
-
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
relationship. A
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''Y'' is Tweedie distributed ''Twp(μ, σ2)'', if Y \sim \mathrm(\mu, \sigma^2) with mean \mu = \operatorname(Y), positive dispersion parameter \sigma^2 and :\operatorname(Y) = \sigma^2\,\mu^p, where p \in \mathbf is called Tweedie power parameter. The probability distribution ''Pθ,σ2'' on the measurable sets ''A'', is given by : P_(Y\in A)=\int_A \exp\left(\frac\right)\cdot \nu_\lambda\, (dz), for some σ-finite measure ''νλ''. This representation uses the canonical parameter ''θ'' of an exponential dispersion model and cumulant function : \kappa_p(\theta)= \begin \frac \left(\frac\right)^\alpha, & \textp\neq 1,2\\ -\log(-\theta), & \textp=2\\ e^\theta, & \textp=1 \end where we used \alpha = \frac, or equivalently p = \frac.


Properties


Additive exponential dispersion models

The models just described are in the reproductive form. An exponential dispersion model has always a dual: the additive form. If ''Y'' is reproductive, then Z=\lambda Y with \lambda = \frac is in the additive form ED*(''θ'',''λ''), for Tweedie ''Tw*p(μ, λ)''. Additive models have the property that the distribution of the sum of independent random variables, : Z_+ = Z_1 +\cdots+ Z_n, for which ''Z''''i'' ~ ED*(''θ'',''λ''''i'') with fixed ''θ'' and various ''λ'' are members of the family of distributions with the same ''θ'', : Z_+ \sim \operatorname^*(\theta,\lambda_1+\cdots+\lambda_n).


Reproductive exponential dispersion models

A second class of exponential dispersion models exists designated by the random variable : Y=Z/\lambda \sim \operatorname(\mu,\sigma^2), where ''σ''2 = 1/''λ'', known as reproductive exponential dispersion models. They have the property that for ''n'' independent random variables ''Y''''i'' ~ ED(''μ'',''σ''2/''w''''i''), with weighting factors ''wi'' and : w= \sum_^n w_i, a weighted average of the variables gives, : w^\sum_^n w_iY_i \sim \operatorname(\mu,\sigma^2/w). For reproductive models the weighted average of independent random variables with fixed ''μ'' and ''σ''2 and various values for ''wi'' is a member of the family of distributions with same ''μ'' and ''σ''2. The Tweedie exponential dispersion models are both additive and reproductive; we thus have the ''duality transformation'' : Y \mapsto Z=Y/\sigma^2.


Scale invariance

A third property of the Tweedie models is that they are
scale invariant In physics, mathematics and statistics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor, and thus represent a universality. The technical term ...
: For a reproductive exponential dispersion model ''Twp(μ, σ2)'' and any positive constant ''c'' we have the property of closure under scale transformation, : c \operatorname_p(\mu,\sigma^2) = \operatorname_p(c\mu,c^\sigma^2).


The Tweedie power variance function

To define the
variance function In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statisti ...
for exponential dispersion models we make use of the mean value mapping, the relationship between the canonical parameter ''θ'' and the mean ''μ''. It is defined by the function : \tau(\theta)=\kappa^\prime(\theta)=\mu. with cumulative function \kappa(\theta). The
variance function In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statisti ...
''V''(''μ'') is constructed from the mean value mapping, : V(\mu)=\tau^\prime tau^(\mu) Here the minus exponent in ''τ''−1(''μ'') denotes an inverse function rather than a reciprocal. The mean and variance of an additive random variable is then E(''Z'') = ''λμ'' and var(''Z'') = ''λV''(''μ''). Scale invariance implies that the variance function obeys the relationship ''V''(''μ'') = ''μ'' ''p''.


The Tweedie deviance

The unit deviance of a reproductive Tweedie distribution is given by :d(y,\mu) = \begin (y-\mu)^2, & \textp=0\\ 2(y \log(y/\mu) + \mu - y), & \textp=1\\ 2(\log(\mu/y) + y/\mu - 1), & \textp=2\\ 2\left(\frac-\frac+\frac\right), & \text \end


The Tweedie cumulant generating functions

The properties of exponential dispersion models give us two
differential equation In mathematics, a differential equation is an equation that relates one or more unknown functions and their derivatives. In applications, the functions generally represent physical quantities, the derivatives represent their rates of change, an ...
s. The first relates the mean value mapping and the variance function to each other, : \frac= \frac. The second shows how the mean value mapping is related to the cumulant function, : \frac=\tau(\theta). These equations can be solved to obtain the cumulant function for different cases of the Tweedie models. A cumulant generating function (CGF) may then be obtained from the cumulant function. The additive CGF is generally specified by the equation : K^*(s)=\log operatorname(e^)\lambda kappa(\theta+s)-\kappa(\theta) and the reproductive CGF by : K(s)=\log operatorname(e^)\lambda kappa(\theta+s/\lambda)-\kappa(\theta) where ''s'' is the generating function variable. For the additive Tweedie models the CGFs take the form, : K^*_p(s;\theta,\lambda) = \begin \lambda\kappa_p(\theta) 1+s/\theta)^\alpha-1 & \quad p \ne 1,2, \\ -\lambda \log(1+s/\theta) & \quad p = 2, \\ \lambda e^\theta (e^s -1) & \quad p = 1, \end and for the reproductive models, : K_p(s;\theta,\lambda) = \begin \lambda\kappa_p(\theta)\left \ & \quad p \ne 1,2, \\ -\lambda \log +s/(\theta \lambda) & \quad p = 2, \\ \lambda e^\theta (e^ -1) & \quad p = 1. \end The additive and reproductive Tweedie models are conventionally denoted by the symbols ''Tw''*''p''(''θ'',''λ'') and ''Tw''''p''(''θ'',''σ''2), respectively. The first and second derivatives of the CGFs, with ''s'' = 0, yields the mean and variance, respectively. One can thus confirm that for the additive models the variance relates to the mean by the power law, : \mathrm (Z)\propto \mathrm(Z)^p.


The Tweedie convergence theorem

The Tweedie exponential dispersion models are fundamental in statistical theory consequent to their roles as foci of
convergence Convergence may refer to: Arts and media Literature *''Convergence'' (book series), edited by Ruth Nanda Anshen * "Convergence" (comics), two separate story lines published by DC Comics: **A four-part crossover storyline that united the four Wei ...
for a wide range of statistical processes. Jørgensen ''et al'' proved a theorem that specifies the asymptotic behaviour of variance functions known as the Tweedie convergence theorem. This theorem, in technical terms, is stated thus: The unit variance function is regular of order ''p'' at zero (or infinity) provided that ''V''(''μ'') ~ ''c''0''μ''''p'' for ''μ'' as it approaches zero (or infinity) for all real values of ''p'' and ''c''0 > 0. Then for a unit variance function regular of order ''p'' at either zero or infinity and for : p \notin (0,1), for any \mu>0, and \sigma^2>0 we have : c^ \operatorname(c\mu,\sigma^2c^) \rightarrow Tw_p(\mu,c_0 \sigma^2) as c \downarrow 0 or c \rightarrow \infty, respectively, where the convergence is through values of ''c'' such that ''cμ'' is in the domain of ''θ'' and ''c''''p''−2/''σ''2 is in the domain of ''λ''. The model must be infinitely divisible as ''c''2−''p'' approaches infinity. In nontechnical terms this theorem implies that any exponential dispersion model that asymptotically manifests a variance-to-mean power law is required to have a variance function that comes within the domain of attraction of a Tweedie model. Almost all distribution functions with finite cumulant generating functions qualify as exponential dispersion models and most exponential dispersion models manifest variance functions of this form. Hence many probability distributions have variance functions that express this asymptotic behaviour, and the Tweedie distributions become foci of convergence for a wide range of data types.


Related distributions

The Tweedie distributions include a number of familiar distributions as well as some unusual ones, each being specified by the
domain Domain may refer to: Mathematics *Domain of a function, the set of input values for which the (total) function is defined **Domain of definition of a partial function **Natural domain of a partial function **Domain of holomorphy of a function * Do ...
of the index parameter. We have the *extreme stable distribution, ''p'' < 0, *
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
, ''p'' = 0, *
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
, ''p'' = 1, * compound Poisson–gamma distribution, 1 < ''p'' < 2, *
gamma distribution In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distri ...
, ''p'' = 2, *positive
stable distribution In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stab ...
s, 2 < ''p'' < 3, *
Inverse Gaussian distribution In probability theory, the inverse Gaussian distribution (also known as the Wald distribution) is a two-parameter family of continuous probability distributions with support on (0,∞). Its probability density function is given by : f(x;\mu,\ ...
, ''p'' = 3, *positive stable distributions, ''p'' > 3, and *extreme stable distributions, ''p'' = . For 0 < ''p'' < 1 no Tweedie model exists. Note that all ''stable'' distributions mean actually ''generated by stable distributions''.


Occurrence and applications


The Tweedie models and Taylor’s power law

Taylor's law Taylor's power law is an empirical law in ecology that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a power law relationship. It is named after the ecologist who first propos ...
is an empirical law in
ecology Ecology () is the study of the relationships between living organisms, including humans, and their physical environment. Ecology considers organisms at the individual, population, community, ecosystem, and biosphere level. Ecology overlaps wi ...
that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a
power-law In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one qua ...
relationship. For the population count ''Y'' with mean ''µ'' and variance var(''Y''), Taylor's law is written, : \operatorname(Y) = a\mu^p, where ''a'' and ''p'' are both positive constants. Since L. R. Taylor described this law in 1961 there have been many different explanations offered to explain it, ranging from animal behavior, a
random walk In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space. An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
model, a stochastic birth, death, immigration and emigration model, to a consequence of equilibrium and non-equilibrium
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic be ...
. No consensus exists as to an explanation for this model. Since Taylor's law is mathematically identical to the variance-to-mean power law that characterizes the Tweedie models, it seemed reasonable to use these models and the Tweedie convergence theorem to explain the observed clustering of animals and plants associated with Taylor's law. The majority of the observed values for the power-law exponent ''p'' have fallen in the interval (1,2) and so the Tweedie compound Poisson–gamma distribution would seem applicable. Comparison of the
empirical distribution function In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...
to the theoretical compound Poisson–gamma distribution has provided a means to verify consistency of this hypothesis. Whereas conventional models for Taylor's law have tended to involve ''
ad hoc Ad hoc is a Latin phrase meaning literally 'to this'. In English, it typically signifies a solution for a specific purpose, problem, or task rather than a generalized solution adaptable to collateral instances. (Compare with ''a priori''.) Com ...
'' animal behavioral or population dynamic assumptions, the Tweedie convergence theorem would imply that Taylor's law results from a general mathematical convergence effect much as how the
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
governs the convergence behavior of certain types of random data. Indeed, any mathematical model, approximation or simulation that is designed to yield Taylor's law (on the basis of this theorem) is required to converge to the form of the Tweedie models.


Tweedie convergence and 1/''f'' noise

Pink noise Pink noise or noise is a signal or process with a frequency spectrum such that the power spectral density (power per frequency interval) is inversely proportional to the frequency of the signal. In pink noise, each octave interval (halving ...
, or 1/''f'' noise, refers to a pattern of noise characterized by a power-law relationship between its intensities ''S''(''f'') at different frequencies ''f'', : S(f)\propto \frac 1 , where the dimensionless exponent ''γ'' ∈ ,1 It is found within a diverse number of natural processes. Many different explanations for 1/''f'' noise exist, a widely held hypothesis is based on
Self-organized criticality Self-organized criticality (SOC) is a property of dynamical systems that have a critical point as an attractor. Their macroscopic behavior thus displays the spatial or temporal scale-invariance characteristic of the critical point of a phase ...
where dynamical systems close to a critical point are thought to manifest scale-invariant spatial and/or temporal behavior. In this subsection a mathematical connection between 1/''f'' noise and the Tweedie variance-to-mean power law will be described. To begin, we first need to introduce
self-similar process Self-similar processes are types of stochastic processes that exhibit the phenomenon of self-similarity. A self-similar phenomenon behaves the same when viewed at different degrees of magnification, or different scales on a dimension (space or time ...
es: For the sequence of numbers : Y=(Y_i :i=0,1,2,\ldots,N) with mean : \widehat=\operatorname(Y_i), deviations : y_i = Y_i - \widehat, variance : \widehat^2=\operatorname(y_i^2), and autocorrelation function : r(k) = \frac with lag ''k'', if the
autocorrelation Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
of this sequence has the long range behavior : r(k)\sim k^ L(k) as ''k'' and where ''L''(''k'') is a slowly varying function at large values of ''k'', this sequence is called a self-similar process. The method of expanding bins can be used to analyze self-similar processes. Consider a set of equal-sized non-overlapping bins that divides the original sequence of ''N'' elements into groups of ''m'' equal-sized segments (''N/m'' is integer) so that new reproductive sequences, based on the mean values, can be defined: : Y_i^=(Y_+\cdots+Y_)/m. The variance determined from this sequence will scale as the bin size changes such that : \operatorname ^\widehat^2 m^ if and only if the autocorrelation has the limiting form : \lim_r(k)/k^ = (2-d)(1-d)/2. One can also construct a set of corresponding additive sequences : Z_i^ = mY_i^, based on the expanding bins, : Z_i^=(Y_+\cdots+Y_). Provided the autocorrelation function exhibits the same behavior, the additive sequences will obey the relationship : \operatorname _i^= m^2 \operatorname ^= \left(\frac \right) \operatorname _i^ Since \widehat and \widehat^2 are constants this relationship constitutes a variance-to-mean power law, with ''p'' = 2 - ''d''. The
biconditional In logic and mathematics, the logical biconditional, sometimes known as the material biconditional, is the logical connective (\leftrightarrow) used to conjoin two statements and to form the statement " if and only if ", where is known as t ...
relationship above between the variance-to-mean power law and power law autocorrelation function, and the
Wiener–Khinchin theorem In applied mathematics, the Wiener–Khinchin theorem or Wiener–Khintchine theorem, also known as the Wiener–Khinchin–Einstein theorem or the Khinchin–Kolmogorov theorem, states that the autocorrelation function of a wide-sense-stationary r ...
McQuarrie DA (1976) ''Statistical mechanics'' arper & Row/ref> imply that any sequence that exhibits a variance-to-mean power law by the method of expanding bins will also manifest 1/''f'' noise, and vice versa. Moreover, the Tweedie convergence theorem, by virtue of its central limit-like effect of generating distributions that manifest variance-to-mean power functions, will also generate processes that manifest 1/''f'' noise. The Tweedie convergence theorem thus provides an alternative explanation for the origin of 1/''f'' noise, based its central limit-like effect. Much as the
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
requires certain kinds of random processes to have as a focus of their convergence the Gaussian distribution and thus express
white noise In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, ...
, the Tweedie convergence theorem requires certain non-Gaussian processes to have as a focus of convergence the Tweedie distributions that express 1/''f'' noise.


The Tweedie models and multifractality

From the properties of self-similar processes, the power-law exponent ''p'' = 2 - ''d'' is related to the
Hurst exponent The Hurst exponent is used as a measure of long-term memory of time series. It relates to the autocorrelations of the time series, and the rate at which these decrease as the lag between pairs of values increases. Studies involving the Hurst expone ...
''H'' and the
fractal dimension In mathematics, more specifically in fractal geometry, a fractal dimension is a ratio providing a statistical index of complexity comparing how detail in a pattern (strictly speaking, a fractal pattern) changes with the scale at which it is meas ...
''D'' by : D = 2-H = 2 - p/2. A one-dimensional data sequence of self-similar data may demonstrate a variance-to-mean power law with local variations in the value of ''p'' and hence in the value of ''D''. When fractal structures manifest local variations in fractal dimension, they are said to be
multifractals A multifractal system is a generalization of a fractal system in which a single exponent (the fractal dimension) is not enough to describe its dynamics; instead, a continuous spectrum of exponents (the so-called singularity spectrum) is needed ...
. Examples of data sequences that exhibit local variations in ''p'' like this include the eigenvalue deviations of the Gaussian Orthogonal and Unitary Ensembles. The Tweedie compound Poisson–gamma distribution has served to model multifractality based on local variations in the Tweedie exponent ''α''. Consequently, in conjunction with the variation of ''α'', the Tweedie convergence theorem can be viewed as having a role in the genesis of such multifractals. The variation of ''α'' has been found to obey the asymmetric
Laplace distribution In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponen ...
in certain cases. This distribution has been shown to be a member of the family of geometric Tweedie models, that manifest as limiting distributions in a convergence theorem for geometric dispersion models.


Regional organ blood flow

Regional organ blood flow has been traditionally assessed by the injection of
radiolabelled Isotopic labeling (or isotopic labelling) is a technique used to track the passage of an isotope (an atom with a detectable variation in neutron count) through a reaction, metabolic pathway, or cell. The reactant is 'labeled' by replacing specific ...
polyethylene microspheres Microbeads are manufactured solid plastic particles of less than one millimeter in their largest dimension. They are most frequently made of polyethylene but can be of other petrochemical plastics such as polypropylene and polystyrene. They are u ...
into the arterial circulation of animals, of a size that they become entrapped within the
microcirculation The microcirculation is the circulation of the blood in the smallest blood vessels, the microvessels of the microvasculature present within organ tissues. The microvessels include terminal arterioles, metarterioles, capillaries, and venules. ...
of organs. The organ to be assessed is then divided into equal-sized cubes and the amount of radiolabel within each cube is evaluated by
liquid scintillation counting Liquid scintillation counting is the measurement of radioactive activity of a sample material which uses the technique of mixing the active material with a liquid scintillator (e.g. zinc sulfide), and counting the resultant photon emissions. The pu ...
and recorded. The amount of radioactivity within each cube is taken to reflect the blood flow through that sample at the time of injection. It is possible to evaluate adjacent cubes from an organ in order to additively determine the blood flow through larger regions. Through the work of J B Bassingthwaighte and others an empirical power law has been derived between the relative dispersion of blood flow of tissue samples (''RD'' = standard deviation/mean) of mass ''m'' relative to reference-sized samples: : RD(m)=RD(m_\text)\left (\frac\right )^ This power law exponent ''Ds'' has been called a fractal dimension. Bassingthwaighte's power law can be shown to directly relate to the variance-to-mean power law. Regional organ blood flow can thus be modelled by the Tweedie compound Poisson–gamma distribution., In this model tissue sample could be considered to contain a random (Poisson) distributed number of entrapment sites, each with gamma distributed blood flow. Blood flow at this microcirculatory level has been observed to obey a gamma distribution, thus providing support for this hypothesis.


Cancer metastasis

The "experimental cancer
metastasis Metastasis is a pathogenic agent's spread from an initial or primary site to a different or secondary site within the host's body; the term is typically used when referring to metastasis by a cancerous tumor. The newly pathological sites, then, ...
assay" has some resemblance to the above method to measure regional blood flow. Groups of syngeneic and age matched mice are given intravenous injections of equal-sized aliquots of suspensions of cloned cancer cells and then after a set period of time their lungs are removed and the number of cancer metastases enumerated within each pair of lungs. If other groups of mice are injected with different cancer cell
clones Clone or Clones or Cloning or Cloned or The Clone may refer to: Places * Clones, County Fermanagh * Clones, County Monaghan, a town in Ireland Biology * Clone (B-cell), a lymphocyte clone, the massive presence of which may indicate a pathologi ...
then the number of metastases per group will differ in accordance with the metastatic potentials of the clones. It has been long recognized that there can be considerable intraclonal variation in the numbers of metastases per mouse despite the best attempts to keep the experimental conditions within each clonal group uniform. This variation is larger than would be expected on the basis of a
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
of numbers of metastases per mouse in each clone and when the variance of the number of metastases per mouse was plotted against the corresponding mean a power law was found. The variance-to-mean power law for metastases was found to also hold for spontaneous murine metastases and for cases series of human metastases. Since hematogenous metastasis occurs in direct relationship to regional blood flow and videomicroscopic studies indicate that the passage and entrapment of cancer cells within the circulation appears analogous to the microsphere experiments it seemed plausible to propose that the variation in numbers of hematogenous metastases could reflect heterogeneity in regional organ blood flow. The blood flow model was based on the Tweedie compound Poisson–gamma distribution, a distribution governing a continuous random variable. For that reason in the metastasis model it was assumed that blood flow was governed by that distribution and that the number of regional metastases occurred as a
Poisson process In probability, statistics and related fields, a Poisson point process is a type of random mathematical object that consists of points randomly located on a mathematical space with the essential feature that the points occur independently of one ...
for which the intensity was directly proportional to blood flow. This led to the description of the Poisson negative binomial (PNB) distribution as a discrete equivalent to the Tweedie compound Poisson–gamma distribution. The
probability generating function In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often ...
for the PNB distribution is : G(s)= \exp \left lambda \frac \left( \frac \right)^\alpha \left\\right/math> The relationship between the mean and variance of the PNB distribution is then : \operatorname(Y) = a\operatorname(Y)^b + \operatorname(Y), which, in the range of many experimental metastasis assays, would be indistinguishable from the variance-to-mean power law. For sparse data, however, this discrete variance-to-mean relationship would behave more like that of a Poisson distribution where the variance equaled the mean.


Genomic structure and evolution

The local density of
Single Nucleotide Polymorphisms In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently larg ...
(SNPs) within the
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the n ...
, as well as that of
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
s, appears to cluster in accord with the variance-to-mean power law and the Tweedie compound Poisson–gamma distribution. In the case of SNPs their observed density reflects the assessment techniques, the availability of genomic sequences for analysis, and the nucleotide heterozygosity. The first two factors reflect ascertainment errors inherent to the collection methods, the latter factor reflects an intrinsic property of the genome. In the coalescent model of population genetics each genetic locus has its own unique history. Within the evolution of a population from some species some genetic loci could presumably be traced back to a relatively
recent common ancestor In biology and genetic genealogy, the most recent common ancestor (MRCA), also known as the last common ancestor (LCA) or concestor, of a set of organisms is the most recent individual from which all the organisms of the set are descended. The ...
whereas other loci might have more ancient
genealogies Genealogy () is the study of families, family history, and the tracing of their lineages. Genealogists use oral interviews, historical records, genetic analysis, and other records to obtain information about a family and to demonstrate kinsh ...
. More ancient genomic segments would have had more time to accumulate SNPs and to experience recombination. R R Hudson has proposed a model where recombination could cause variation in the time to most common recent ancestor for different genomic segments. A high recombination rate could cause a chromosome to contain a large number of small segments with less correlated genealogies. Assuming a constant background rate of mutation the number of SNPs per genomic segment would accumulate proportionately to the time to the most recent common ancestor. Current population genetic theory would indicate that these times would be gamma distributed, on average. The Tweedie compound Poisson–gamma distribution would suggest a model whereby the SNP map would consist of multiple small genomic segments with the mean number of SNPs per segment would be gamma distributed as per Hudson's model. The distribution of genes within the human genome also demonstrated a variance-to-mean power law, when the method of expanding bins was used to determine the corresponding variances and means. Similarly the number of genes per enumerative bin was found to obey a Tweedie compound Poisson–gamma distribution. This probability distribution was deemed compatible with two different biological models: the microarrangement model where the number of genes per unit genomic length was determined by the sum of a random number of smaller genomic segments derived by random breakage and reconstruction of protochormosomes. These smaller segments would be assumed to carry on average a gamma distributed number of genes. In the alternative gene cluster model, genes would be distributed randomly within the protochromosomes. Over large evolutionary timescales there would occur
tandem duplication Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. ...
, mutations, insertions, deletions and rearrangements that could affect the genes through a stochastic birth, death and immigration process to yield the Tweedie compound Poisson–gamma distribution. Both these mechanisms would implicate neutral evolutionary processes that would result in regional clustering of genes.


Random matrix theory

The
Gaussian unitary ensemble In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathemat ...
(GUE) consists of complex
Hermitian matrices In mathematics, a Hermitian matrix (or self-adjoint matrix) is a complex square matrix that is equal to its own conjugate transpose—that is, the element in the -th row and -th column is equal to the complex conjugate of the element in the -th ...
that are invariant under
unitary transformation In mathematics, a unitary transformation is a transformation that preserves the inner product: the inner product of two vectors before the transformation is equal to their inner product after the transformation. Formal definition More precisely, ...
s whereas the Gaussian orthogonal ensemble (GOE) consists of real symmetric matrices invariant under
orthogonal transformation In linear algebra, an orthogonal transformation is a linear transformation ''T'' : ''V'' → ''V'' on a real inner product space ''V'', that preserves the inner product. That is, for each pair of elements of ''V'', we have ...
s. The ranked
eigenvalues In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
''En'' from these random matrices obey Wigner's semicircular distribution: For a ''N''×''N'' matrix the average density for eigenvalues of size ''E'' will be : \bar(E)= \begin \sqrt/\pi & \quad \left\vert E \right\vert < \sqrt \\ 0 & \quad \left\vert E \right\vert > \sqrt \end as ''E''. Integration of the semicircular rule provides the number of eigenvalues on average less than ''E'', : \bar(E) = \frac\left \sqrt+2N \arcsin \left( \frac \right )+ \pi N \right The ranked eigenvalues can be unfolded, or renormalized, with the equation : e_n = \bar(E)=\int \limits_^ \, dE^\prime \bar(E^\prime). This removes the trend of the sequence from the fluctuating portion. If we look at the absolute value of the difference between the actual and expected cumulative number of eigenvalues : \left , \bar_n \right , =\left , n- \bar(E_n) \right , we obtain a sequence of eigenvalue fluctuations which, using the method of expanding bins, reveals a variance-to-mean power law. The eigenvalue fluctuations of both the GUE and the GOE manifest this power law with the power law exponents ranging between 1 and 2, and they similarly manifest 1/''f'' noise spectra. These eigenvalue fluctuations also correspond to the Tweedie compound Poisson–gamma distribution and they exhibit multifractality.


The distribution of

prime number A prime number (or a prime) is a natural number greater than 1 that is not a product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime because the only ways ...
s

The second
Chebyshev function In mathematics, the Chebyshev function is either a scalarising function (Tchebycheff function) or one of two related functions. The first Chebyshev function or is given by :\vartheta(x)=\sum_ \ln p where \ln denotes the natural logarithm, w ...
''ψ''(''x'') is given by, : \psi(x) = \sum_\log \widehat=\sum_ \Lambda(n) where the summation extends over all prime powers \widehat^k not exceeding ''x'', ''x'' runs over the positive real numbers, and \Lambda(n) is the
von Mangoldt function In mathematics, the von Mangoldt function is an arithmetic function named after German mathematician Hans von Mangoldt. It is an example of an important arithmetic function that is neither multiplicative nor additive. Definition The von Mangold ...
. The function ''ψ''(''x'') is related to the
prime-counting function In mathematics, the prime-counting function is the function counting the number of prime numbers less than or equal to some real number ''x''. It is denoted by (''x'') (unrelated to the number ). History Of great interest in number theory is t ...
''π''(''x''), and as such provides information with regards to the distribution of prime numbers amongst the real numbers. It is asymptotic to ''x'', a statement equivalent to the
prime number theorem In mathematics, the prime number theorem (PNT) describes the asymptotic distribution of the prime numbers among the positive integers. It formalizes the intuitive idea that primes become less common as they become larger by precisely quantifying ...
and it can also be shown to be related to the zeros of the
Riemann zeta function The Riemann zeta function or Euler–Riemann zeta function, denoted by the Greek letter (zeta), is a mathematical function of a complex variable defined as \zeta(s) = \sum_^\infty \frac = \frac + \frac + \frac + \cdots for \operatorname(s) > ...
located on the critical strip ''ρ'', where the real part of the zeta zero ''ρ'' is between 0 and 1. Then ''ψ'' expressed for ''x'' greater than one can be written: :\psi_0(x) = x - \sum_\rho \frac - \ln 2\pi - \frac12 \ln(1-x^) where : \psi_0(x) = \lim_\frac2. The
Riemann hypothesis In mathematics, the Riemann hypothesis is the conjecture that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part . Many consider it to be the most important unsolved problem in ...
states that the nontrivial zeros of the
Riemann zeta function The Riemann zeta function or Euler–Riemann zeta function, denoted by the Greek letter (zeta), is a mathematical function of a complex variable defined as \zeta(s) = \sum_^\infty \frac = \frac + \frac + \frac + \cdots for \operatorname(s) > ...
all have
real part In mathematics, a complex number is an element of a number system that extends the real numbers with a specific element denoted , called the imaginary unit and satisfying the equation i^= -1; every complex number can be expressed in the form a ...
½. These zeta function zeros are related to the
distribution of prime numbers In mathematics, the prime number theorem (PNT) describes the asymptotic distribution of the prime numbers among the positive integers. It formalizes the intuitive idea that primes become less common as they become larger by precisely quantifying ...
. Schoenfeld has shown that if the Riemann hypothesis is true then : \Delta(x)=\left\vert \psi(x)-x \right\vert < \sqrt \log^(x)/(8 \pi) for all x>73.2. If we analyze the Chebyshev deviations Δ(''n'') on the integers ''n'' using the method of expanding bins and plot the variance versus the mean a variance to mean power law can be demonstrated. Moreover, these deviations correspond to the Tweedie compound Poisson-gamma distribution and they exhibit 1/''f'' noise.


Other applications

Applications of Tweedie distributions include: * actuarial studies * assay analysis * survival analysis * ecology * analysis of alcohol consumption in British teenagers * medical applications Smyth, G. K. 1996. Regression analysis of quantity data with exact zeros. Proceedings of the Second Australia—Japan Workshop on Stochastic Models in Engineering, Technology and Management. Technology Management Centre, University of Queensland, 572–580. * health economics * meteorology and climatology * fisheries *
Mertens function In number theory, the Mertens function is defined for all positive integers ''n'' as : M(n) = \sum_^n \mu(k), where \mu(k) is the Möbius function. The function is named in honour of Franz Mertens. This definition can be extended to positive r ...
*
self-organized criticality Self-organized criticality (SOC) is a property of dynamical systems that have a critical point as an attractor. Their macroscopic behavior thus displays the spatial or temporal scale-invariance characteristic of the critical point of a phase ...


References


Further reading

* Chapter 12 is about Tweedie distributions and models. * Kaas, R. (2005)
"Compound Poisson distribution and GLM’s – Tweedie’s distribution"
In ''Proceedings of the Contact Forum "3rd Actuarial and Financial Mathematics Day"'', pages 3–12. Brussels: Royal Flemish Academy of Belgium for Science and the Arts. * {{ProbDistributions, families Continuous distributions Systems of probability distributions