A statistic (singular) or sample statistic is any quantity computed from values in a
sample
Sample or samples may refer to:
Base meaning
* Sample (statistics), a subset of a population – complete data set
* Sample (signal), a digital discrete sample of a continuous analog signal
* Sample (material), a specimen or small quantity of s ...
which is considered for a statistical purpose. Statistical purposes include
estimating
Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...
a
population
Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...
parameter, describing a sample, or evaluating a hypothesis. The
average (or mean) of sample values is a statistic. The term statistic is used both for the function and for the value of the function on a given sample. When a statistic is being used for a specific purpose, it may be referred to by a name indicating its purpose.
When a statistic is used for estimating a population parameter, the statistic is called an ''
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
''. A population parameter is any characteristic of a population under study, but when it is not feasible to directly measure the value of a population parameter, statistical methods are used to infer the likely value of the parameter on the basis of a statistic computed from a sample taken from the population. For example, the
sample mean
The sample mean (or "empirical mean") and the sample covariance are statistics computed from a Sample (statistics), sample of data on one or more random variables.
The sample mean is the average value (or mean, mean value) of a sample (statistic ...
is an
unbiased estimator
In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In stat ...
of the
population mean
In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothe ...
. This means that the
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of the sample mean equals the true population mean.
A ''
descriptive statistic
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
'' is used to summarize the sample data. A ''
test statistic
A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifi ...
'' is used in
statistical hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
. Note that a single statistic can be used for multiple purposes – for example the sample mean can be used to estimate the population mean, to describe a sample data set, or to test a hypothesis.
Examples
Some examples of statistics are:
* "In a recent survey of Americans, 52% of
Republicans say global warming is happening."
In this case, "52%" is a statistic, namely the percentage of Republicans in the survey sample who believe in global warming. The population is the
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
of all Republicans in the United States, and the population parameter being estimated is the percentage of ''all'' Republicans in the United States, not just those surveyed, who believe in global warming.
* "The manager of a large hotel located near Disney World indicated that 20 selected guests had a mean length of stay equal to 5.6 days."
In this example, "5.6 days" is a statistic, namely the mean length of stay for our sample of 20 hotel guests. The population is the set of all guests of this hotel, and the population parameter being estimated is the mean length of stay for ''all'' guests. Note that whether the estimator is unbiased in this case depends upon the sample selection process; see
the inspection paradox.
There are a variety of functions that are used to calculate statistics. Some include:
*
Sample mean
The sample mean (or "empirical mean") and the sample covariance are statistics computed from a Sample (statistics), sample of data on one or more random variables.
The sample mean is the average value (or mean, mean value) of a sample (statistic ...
,
sample median
Sample or samples may refer to:
Base meaning
* Sample (statistics), a subset of a population – complete data set
* Sample (signal), a digital discrete sample of a continuous analog signal
* Sample (material), a specimen or small quantity of so ...
, and
sample mode
*
Sample variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
and sample
standard deviation
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
* Sample
quantile
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile th ...
s besides the
median
In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
, e.g.,
quartile
In statistics, a quartile is a type of quantile which divides the number of data points into four parts, or ''quarters'', of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a ...
s and
percentile
In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...
s
*
Test statistic
A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifi ...
s, such as
t-statistic
In statistics, the ''t''-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. It is used in hypothesis testing via Student's ''t''-test. The ''t''-statistic is used in a ...
,
chi-squared statistic
A chi-squared test (also chi-square or test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables ...
,
f statistic
An ''F''-test is any statistical test in which the test statistic has an ''F''-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model th ...
*
Order statistic
In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference.
Import ...
s, including sample maximum and minimum
* Sample
moments and functions thereof, including
kurtosis
In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...
and
skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal d ...
* Various
functionals of the
empirical distribution function
In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...
Properties
Observability
Statisticians often contemplate a
parameterized family of
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s, any member of which could be the distribution of some measurable aspect of each member of a population, from which a sample is drawn randomly. For example, the parameter may be the average height of 25-year-old men in North America. The height of the members of a sample of 100 such men are measured; the average of those 100 numbers is a statistic. The average of the heights of all members of the population is not a statistic unless that has somehow also been ascertained (such as by measuring every member of the population). The average height that would be calculated using ''all'' of the individual heights of ''all'' 25-year-old North American men is a parameter, and not a statistic.
Statistical properties
Important potential properties of statistics include
completeness,
consistency
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
,
sufficiency,
unbiased
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
ness,
minimum mean square error
In statistics and signal processing, a minimum mean square error (MMSE) estimator is an estimation method which minimizes the mean square error (MSE), which is a common measure of estimator quality, of the fitted values of a dependent variable. In ...
, low
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
,
robustness
Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...
, and computational convenience.
Information of a statistic
Information of a statistic on model parameters can be defined in several ways. The most common is the
Fisher information
In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
, which is defined on the statistic model induced by the statistic.
Kullback information measure can also be used.
See also
*
Statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
*
Statistical theory
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics.
The theory covers approaches to statistical-decision problems and to statistica ...
*
Descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
*
Statistical hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
*
Summary statistic
In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in
* a measure of ...
*
Well-behaved statistic
Although the term well-behaved statistic often seems to be used in the scientific literature in somewhat the same way as is well-behaved in mathematics (that is, to mean "non-pathological") it can also be assigned precise mathematical meaning, and ...
References
*
*Parker, Sybil P (editor in chief). "Statistic". McGraw-Hill Dictionary of Scientific and Technical Terms. Fifth Edition. McGraw-Hill, Inc. 1994. . Page 1912.
*DeGroot and Schervish. "Definition of a Statistic". Probability and Statistics. International Edition. Third Edition. Addison Wesley. 2002. . Pages 370 to 371.
{{Statistics, inference
*