
In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the standard score or ''z''-score is the number of
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
s by which the value of a
raw score
Raw data, also known as primary data, are ''data'' (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score (after test scores).
If a scientist ...
(i.e., an observed value or data point) is above or below the
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.
It is calculated by subtracting the
population mean
In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hyp ...
from an individual raw score and then dividing the difference by the
population
Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...
standard deviation. This process of converting a raw score into a standard score is called standardizing or normalizing (however, "normalizing" can refer to many types of ratios; see ''
Normalization
Normalization or normalisation refers to a process that makes something more normal or regular. Science
* Normalization process theory, a sociological theory of the implementation of new technologies or innovations
* Normalization model, used in ...
'' for more).
Standard scores are most commonly called ''z''-scores; the two terms may be used interchangeably, as they are in this article. Other equivalent terms in use include z-value, z-statistic, normal score, standardized variable and pull in
high energy physics
Particle physics or high-energy physics is the study of fundamental particles and forces that constitute matter and radiation. The field also studies combinations of elementary particles up to the scale of protons and neutrons, while the stu ...
.
Computing a z-score requires knowledge of the mean and standard deviation of the complete population to which a data point belongs; if one only has a
sample of observations from the population, then the analogous computation using the sample mean and sample standard deviation yields the
''t''-statistic.
Calculation
If the population mean and population standard deviation are known, a raw score
is converted into a standard score by
:
where:
: ''μ'' is the
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
of the population,
: ''σ'' is the
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
of the population.
The absolute value of represents the distance between that raw score and the population mean in units of the standard deviation. is negative when the raw score is below the mean, positive when above.
Calculating using this formula requires use of the population mean and the population standard deviation, not the sample mean or sample deviation. However, knowing the true mean and standard deviation of a population is often an unrealistic expectation, except in cases such as
standardized testing
A standardized test is a test that is administered and scored in a consistent or standard manner. Standardized tests are designed in such a way that the questions and interpretations are consistent and are administered and scored in a predetermine ...
, where the entire population is measured.
When the population mean and the population standard deviation are unknown, the standard score may be estimated by using the sample mean and sample standard deviation as estimates of the population values.
[ ][ ][ ][ ]
In these cases, the z-score is given by
:
where:
:
is the
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
of the sample,
: ''S'' is the
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
of the sample.
Though it should always be stated, the distinction between use of the population and sample statistics often is not made. In either case, the numerator and denominator of the equations have the same units of measure so that the units cancel out through division and z is left as a
dimensionless quantity
Dimensionless quantities, or quantities of dimension one, are quantities implicitly defined in a manner that prevents their aggregation into unit of measurement, units of measurement. ISBN 978-92-822-2272-0. Typically expressed as ratios that a ...
.
Applications
Z-test
The z-score is often used in the z-test in standardized testing – the analog of the
Student's t-test
Student's ''t''-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's ''t''- ...
for a population whose parameters are known, rather than estimated. As it is very unusual to know the entire population, the t-test is much more widely used.
Prediction intervals
The standard score can be used in the calculation of
prediction interval
In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval (statistics), interval in which a future observation will fall, with a certain probability, given what has already been observed. Pr ...
s. A prediction interval
'L'',''U'' consisting of a lower endpoint designated ''L'' and an upper endpoint designated ''U'', is an interval such that a future observation ''X'' will lie in the interval with high probability
, i.e.
: