Although the term well-behaved statistic often seems to be used in the scientific literature in somewhat the same way as is

well-behaved In mathematics, when a mathematical phenomenon runs counter to some intuition, then the phenomenon is sometimes called pathological. On the other hand, if a phenomenon does not run counter to intuition, it is sometimes called well-behaved. T ...

in mathematics (that is, to mean "non-

pathological Pathology is the study of the causes and effects of disease or injury. The word ''pathology'' also refers to the study of disease in general, incorporating a wide range of biology research fields and medical practices. However, when used in t ...

") it can also be assigned precise mathematical meaning, and in more than one way. In the former case, the meaning of this term will vary from context to context. In the latter case, the mathematical conditions can be used to derive classes of combinations of distributions with statistics which are ''well-behaved'' in each sense. First Definition: The

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...

of a well-behaved

statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hy ...

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

is finite and one condition on its

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...

is that it is

differentiable In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point i ...

in the parameter being estimated. Second Definition: The statistic is monotonic, well-defined, and locally sufficient.

Conditions for a Well-Behaved Statistic: First Definition

More formally the conditions can be expressed in this way.

T

is a statistic for

\theta

that is a function of the sample,

_,...,_

. For

T

to be ''well-behaved'' we require:

<\infty \quad \forall \quad \theta \in \Theta

: Condition 1

_\left( T \right)

differentiable in

\theta \quad \forall \quad \theta \in \Theta

, and the derivative satisfies:

\frac  \int  \prod _^ d_...d_ = \int  d_...d_

: Condition 2

Conditions for a Well-Behaved Statistic: Second Definition

In order to derive the distribution law of the parameter ''T'', compatible with

\boldsymbol x

, the statistic must obey some technical properties. Namely, a statistic ''s'' is said to be well-behaved if it satisfies the following three statements: # monotonicity. A uniformly monotone relation exists between ''s'' and ? for any fixed seed

\

– so as to have a unique solution of (1); # well-defined. On each observed ''s'' the statistic is well defined for every value of ?, i.e. any sample specification

\\in\mathfrak X^m

such that

\rho(x_1,\ldots,x_m)=s

has a probability density different from 0 – so as to avoid considering a non-surjective mapping from

\mathfrak X^m

\mathfrak S

, i.e. associating via

s

to a sample

\

a ? that could not generate the sample itself; # local sufficiency.

\

constitutes a true T sample for the observed ''s'', so that the same probability distribution can be attributed to each sampled value. Now,

\breve\theta_j= h^(s,\breve z_1^j, \ldots,\breve z_m^j)

is a solution of (1) with the seed

\

. Since the seeds are equally distributed, the sole caveat comes from their independence or, conversely from their dependence on ? itself. This check can be restricted to seeds involved by ''s'', i.e. this drawback can be avoided by requiring that the distribution of

\

is independent of ?. An easy way to check this property is by mapping seed specifications into

x_i

s specifications. The mapping of course depends on ?, but the distribution of

\

will not depend on ?, if the above seed independence holds – a condition that looks like a ''local sufficiency'' of the statistic ''S''. The remainder of the present article is mainly concerned with the context of data mining procedures applied to statistical inference and, in particular, to the group of computationally intensive procedure that have been called

algorithmic inference Algorithmic inference gathers new developments in the statistical inference methods made feasible by the powerful computing devices widely available to any data analyst. Cornerstones in this field are computational learning theory, granular computi ...

Algorithmic inference

, the property of a statistic that is of most relevance is the pivoting step which allows to transference of probability-considerations from the sample distribution to the distribution of the parameters representing the population distribution in such a way that the conclusion of this statistical inference step is compatible with the sample actually observed. By default, capital letters (such as ''U'', ''X'') will denote random variables and small letters (''u'', ''x'') their corresponding realizations and with gothic letters (such as

\mathfrak U, \mathfrak X

) the domain where the variable takes specifications. Facing a sample

\boldsymbol x=\

, given a sampling mechanism

(g_\theta,Z)

, with

\theta

scalar, for the random variable ''X'', we have :

\boldsymbol x=\.

The sampling mechanism

(g_\theta,\boldsymbol z)

, of the statistic ''s'', as a function ? of

\

with specifications in

\mathfrak S

, has an explaining function defined by the master equation: :

s=\rho(x_1,\ldots,x_m)=\rho(g_\theta(z_1),\ldots,g_\theta(z_m))=h(\theta,z_1,\ldots,z_m),\qquad\qquad\qquad (1)

for suitable seeds

\boldsymbol z=\

and parameter ?

Example

For instance, for both the

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probab ...

with parameter ''p'' and the

exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant averag ...

with parameter ? the statistic

\sum_^m x_i

is well-behaved. The satisfaction of the above three properties is straightforward when looking at both explaining functions:

g_p(u)=1

u\leq p

, 0 otherwise in the case of the Bernoulli random variable, and

g_\lambda(u)=-\log u/\lambda

for the Exponential random variable, giving rise to statistics :

s_p=\sum_^m I_(u_i)

and :

s_\lambda=-\frac\sum_^m \log u_i.

''Vice versa'', in the case of ''X'' following a

continuous uniform distribution In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions. The distribution describes an experiment where there is an arbitrary outcome that lies betw ...

,A /math> the same statistics do not meet the second requirement. For instance, the observed sample \ gives s'_A=11/6c . But the explaining function of this ''X'' is g_a(u)=u a .
Hence a master equation s_A=\sum_^m u_i a would produce with
a ''U'' sample \ and a solution \breve a=0.76 c . This conflicts with the observed sample since the first observed value should result greater than the right extreme of the ''X'' range. The statistic s_A=\max\ is well-behaved in this case.

Analogously, for a random variable ''X'' following the

Pareto distribution The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto ( ), is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actu ...

with parameters ''K'' and ''A'' (see Pareto example for more detail of this case), :

s_1=\sum_^m \log x_i

and :

s_2=\min_ \

can be used as joint statistics for these parameters. As a general statement that holds under weak conditions,

sufficient statistics In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...

are well-behaved with respect to the related parameters. The table below gives sufficient / Well-behaved statistics for the parameters of some of the most commonly used probability distributions.

References

* * {{DEFAULTSORT:Well-Behaved Statistic Statistical inference