In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of ''spacings'' in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points. The concept underlying the method is based on the

probability integral transform In probability theory, the probability integral transform (also known as universality of the uniform) relates to the result that data values that are modeled as being random variables from any given continuous distribution can be converted to random ...

, in that a set of independent random samples derived from any random variable should on average be uniformly distributed with respect to the cumulative distribution function of the random variable. The MPS method chooses the parameter values that make the observed data as uniform as possible, according to a specific quantitative measure of uniformity. One of the most common methods for estimating the parameters of a distribution from data, the method of

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...

(MLE), can break down in various cases, such as involving certain mixtures of continuous distributions. In these cases the method of maximum spacing estimation may be successful. Apart from its use in pure mathematics and statistics, the trial applications of the method have been reported using data from fields such as

hydrology Hydrology () is the scientific study of the movement, distribution, and management of water on Earth and other planets, including the water cycle, water resources, and environmental watershed sustainability. A practitioner of hydrology is call ...

econometrics Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...

, magnetic resonance imaging, and others.

History and usage

The MSE method was derived independently by Russel Cheng and Nik Amin at the

University of Wales Institute of Science and Technology A university () is an institution of higher (or tertiary) education and research which awards academic degrees in several academic disciplines. Universities typically offer both undergraduate and postgraduate programs. In the United States, the ...

, and Bo Ranneby at the

Swedish University of Agricultural Sciences The Swedish University of Agricultural Sciences, or Swedish Agricultural University (Swedish: ''Sveriges lantbruksuniversitet'') (SLU) is a university in Sweden. Although its head office is located in Ultuna, Uppsala, the university has several c ...

. The authors explained that due to the

at the true parameter, the “spacing” between each observation should be uniformly distributed. This would imply that the difference between the values of the cumulative distribution function at consecutive observations should be equal. This is the case that maximizes the geometric mean of such spacings, so solving for the parameters that maximize the geometric mean would achieve the “best” fit as defined this way. justified the method by demonstrating that it is an estimator of the

Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fr ...

, similar to

maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

, but with more robust properties for some classes of problems. There are certain distributions, especially those with three or more parameters, whose likelihoods may become infinite along certain paths in the

parameter space The parameter space is the space of possible parameter values that define a particular mathematical model, often a subset of finite-dimensional Euclidean space. Often the parameters are inputs of a function, in which case the technical term for th ...

. Using maximum likelihood to estimate these parameters often breaks down, with one parameter tending to the specific value that causes the likelihood to be infinite, rendering the other parameters inconsistent. The method of maximum spacings, however, being dependent on the difference between points on the cumulative distribution function and not individual likelihood points, does not have this issue, and will return valid results over a much wider array of distributions. The distributions that tend to have likelihood issues are often those used to model physical phenomena. seek to analyze flood alleviation methods, which requires accurate models of river flood effects. The distributions that better model these effects are all three-parameter models, which suffer from the infinite likelihood issue described above, leading to Hall's investigation of the maximum spacing procedure. , when comparing the method to maximum likelihood, use various data sets ranging from a set on the oldest ages at death in Sweden between 1905 and 1958 to a set containing annual maximum wind speeds.

Definition

Given an

iid In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...

random sample In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians atte ...

of size ''n'' from a

univariate distribution In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariate distribution, the probability distribution of a random vector (consisting of multiple random variables). Exam ...

with continuous cumulative distribution function ''F''(''x'';''θ''₀), where ''θ''₀ ∈ Θ is an unknown parameter to be

estimated Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...

, let be the corresponding ordered sample, that is the result of sorting of all observations from smallest to largest. For convenience also denote ''x''₍₀₎ = −∞ and ''x''_(''n''+1) = +∞. Define the ''spacings'' as the “gaps” between the values of the distribution function at adjacent ordered points:

D_i(\theta) = F(x_;\,\theta) - F(x_;\,\theta), \quad i=1,\ldots,n+1.

Then the maximum spacing estimator of ''θ''₀ is defined as a value that maximizes the

logarithm In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number to the base is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 of ...

of the geometric mean of sample spacings:

= \frac\sum_^\ln(\theta).

By the

inequality of arithmetic and geometric means In mathematics, the inequality of arithmetic and geometric means, or more briefly the AM–GM inequality, states that the arithmetic mean of a list of non-negative real numbers is greater than or equal to the geometric mean of the same list; and ...

, function ''S''_''n''(''θ'') is bounded from above by −ln(''n''+1), and thus the maximum has to exist at least in the supremum sense. Note that some authors define the function ''S''_''n''(''θ'') somewhat differently. In particular, multiplies each ''D''_''i'' by a factor of (''n''+1), whereas omit the factor in front of the sum and add the “−” sign in order to turn the maximization into minimization. As these are constants with respect to ''θ'', the modifications do not alter the location of the maximum of the function ''S''_''n''.

Examples

This section presents two examples of calculating the maximum spacing estimator.

Example 1

Suppose two values ''x''₍₁₎ = 2, ''x''₍₂₎ = 4 were sampled from the exponential distribution ''F''(''x'';''λ'') = 1 − e^−''xλ'', ''x'' ≥ 0 with unknown parameter ''λ'' > 0. In order to construct the MSE we have to first find the spacings: The process continues by finding the ''λ'' that maximizes the geometric mean of the “difference” column. Using the convention that ignores taking the (''n''+1)st root, this turns into the maximization of the following product: (1 − e^−2''λ'') · (e^−2''λ'' − e^−4''λ'') · (e^−4''λ''). Letting ''μ'' = e^−2''λ'', the problem becomes finding the maximum of ''μ''⁵−2''μ''⁴+''μ''³. Differentiating, the ''μ'' has to satisfy 5''μ''⁴−8''μ''³+3''μ''² = 0. This equation has roots 0, 0.6, and 1. As ''μ'' is actually e^−2''λ'', it has to be greater than zero but less than one. Therefore, the only acceptable solution is

\mu=0.6 \quad \Rightarrow \quad \lambda_ = \frac \approx 0.255,

which corresponds to an exponential distribution with a mean of ≈ 3.915. For comparison, the maximum likelihood estimate of λ is the inverse of the sample mean, 3, so ''λ''_MLE = ⅓ ≈ 0.333.

Example 2

Suppose is the ordered sample from a uniform distribution ''U''(''a'',''b'') with unknown endpoints ''a'' and ''b''. The cumulative distribution function is ''F''(''x'';''a'',''b'') = (''x''−''a'')/(''b''−''a'') when ''x''∈ 'a'',''b'' Therefore, individual spacings are given by

D_1 = \frac, \ \ 
    D_i = \frac\ \text i = 2, \ldots, n, \ \ 
    D_ = \frac \ \

Calculating the geometric mean and then taking the logarithm, statistic ''S''_''n'' will be equal to

S_n(a,b) = \tfrac\ln(x_-a)  + \sum_^n \ln(x_-x_) + \tfrac\ln(b-x_) - \ln(b-a)

Here only three terms depend on the parameters ''a'' and ''b''. Differentiating with respect to those parameters and solving the resulting linear system, the maximum spacing estimates will be :

\hat = \frac,\ \ \hat = \frac.

These are known to be the uniformly minimum variance unbiased (UMVU) estimators for the continuous uniform distribution. In comparison, the maximum likelihood estimates for this problem

\scriptstyle\hat=x_

and

\scriptstyle\hat=x_

are biased and have higher

mean-squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwe ...

Properties

Consistency and efficiency

The maximum spacing estimator is a

consistent estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...

in that it converges in probability to the true value of the parameter, ''θ''₀, as the sample size increases to infinity. The consistency of maximum spacing estimation holds under much more general conditions than for

estimators. In particular, in cases where the underlying distribution is J-shaped, maximum likelihood will fail where MSE succeeds. An example of a J-shaped density is the

Weibull distribution In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...

, specifically a shifted Weibull, with a

shape parameter In probability theory and statistics, a shape parameter (also known as form parameter) is a kind of numerical parameter of a parametric family of probability distributionsEveritt B.S. (2002) Cambridge Dictionary of Statistics. 2nd Edition. CUP. t ...

less than 1. The density will tend to infinity as ''x'' approaches the

location parameter In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...

rendering estimates of the other parameters inconsistent. Maximum spacing estimators are also at least as asymptotically efficient as maximum likelihood estimators, where the latter exist. However, MSEs may exist in cases where MLEs do not.

Sensitivity

Maximum spacing estimators are sensitive to closely spaced observations, and especially ties. Given

X_ = X_=\cdots=X_i, \,

we get

D_(\theta) = D_(\theta) = \cdots = D_(\theta) = 0. \,

When the ties are due to multiple observations, the repeated spacings (those that would otherwise be zero) should be replaced by the corresponding likelihood. That is, one should substitute

f_(\theta)

for

D_i(\theta)

, as

\lim_\frac = f(x_,\theta) = f(x_,\theta),

since

x_ = x_

. When ties are due to rounding error, suggest another method to remove the effects. Given ''r'' tied observations from ''x''_''i'' to ''x''_{''i''+''r''−1}, let ''δ'' represent the round-off error. All of the true values should then fall in the range

x \pm \delta

. The corresponding points on the distribution should now fall between

y_L = F(x-\delta, \hat\theta)

and

y_U = F(x+\delta, \hat\theta)

. Cheng and Stephens suggest assuming that the rounded values are uniformly spaced in this interval, by defining

D_j = \frac \quad (j=i+1,\ldots,i+r-1).

The MSE method is also sensitive to secondary clustering. One example of this phenomenon is when a set of observations is thought to come from a single

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

, but in fact comes from a mixture normals with different means. A second example is when the data is thought to come from an exponential distribution, but actually comes from a

gamma distribution In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma d ...

. In the latter case, smaller spacings may occur in the lower tail. A high value of ''M''(''θ'') would indicate this secondary clustering effect, and suggesting a closer look at the data is required.

Moran test

The statistic ''S_n''(''θ'') is also a form of Moran or Moran-Darling statistic, ''M''(''θ''), which can be used to test

goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...

. It has been shown that the statistic, when defined as

S_n(\theta) = M_n(\theta)= -\sum_^\ln,

is asymptotically normal, and that a chi-squared approximation exists for small samples. In the case where we know the true parameter

\theta^0

, show that the statistic

\scriptstyle M_n(\theta)

has a

with

\begin
   \mu_M      & \approx (n+1)(\ln(n+1)+\gamma)-\frac-\frac,\\
   \sigma^2_M & \approx (n+1)\left ( \frac -1 \right ) -\frac-\frac,
  \end

where ''γ'' is the

Euler–Mascheroni constant Euler's constant (sometimes also called the Euler–Mascheroni constant) is a mathematical constant usually denoted by the lowercase Greek letter gamma (). It is defined as the limiting difference between the harmonic series and the natural l ...

which is approximately 0.57722. The distribution can also be approximated by that of

A

, where

A = C_1 + C_2\chi^2_n \,

, in which

\begin
    C_1 &= \mu_M - \sqrt,\\
    C_2 &= ,\\
  \end

and where

\chi^2_n

follows a

chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squar ...

with

n

degrees of freedom. Therefore, to test the hypothesis

H_0

that a random sample of

n

values comes from the distribution

F(x,\theta)

, the statistic

T(\theta)= \frac

can be calculated. Then

H_0

should be rejected with significance

\alpha

if the value is greater than the

critical value Critical value may refer to: *In differential topology, a critical value of a differentiable function between differentiable manifolds is the image (value of) ƒ(''x'') in ''N'' of a critical point ''x'' in ''M''. *In statistical hypothesis ...

of the appropriate chi-squared distribution. Where ''θ''₀ is being estimated by

\hat\theta

, showed that

S_n(\hat\theta) = M_n(\hat\theta)

has the same asymptotic mean and variance as in the known case. However, the test statistic to be used requires the addition of a bias correction term and is:

T(\hat\theta) =  \frac,

where

k

is the number of parameters in the estimate.

Generalized maximum spacing

Alternate measures and spacings

generalized the MSE method to approximate other

measures Measure may refer to: * Measurement, the assignment of a number to a characteristic of an object or event Law * Ballot measure, proposed legislation in the United States * Church of England Measure, legislation of the Church of England * Measu ...

besides the Kullback–Leibler measure. further expanded the method to investigate properties of estimators using higher order spacings, where an ''m''-order spacing would be defined as

F(X_) - F(X_)

Multivariate distributions

discuss extended maximum spacing methods to the

multivariate Multivariate may refer to: In mathematics * Multivariable calculus * Multivariate function * Multivariate polynomial In computing * Multivariate cryptography * Multivariate division algorithm * Multivariate interpolation * Multivariate optical c ...

case. As there is no natural order for

\mathbb^k (k>1)

, they discuss two alternative approaches: a geometric approach based on Dirichlet cells and a probabilistic approach based on a “nearest neighbor ball” metric.

Notes

References

Citations

Works cited

* * ''Note: linked paper is an updated 2001 version.'' * * * * * * * * * * Estimation methods Probability distribution fitting {{good article