statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the standard score or ''z''-score is the number of

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

s by which the value of a

raw score Raw data, also known as primary data, are ''data'' (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score (after test scores). If a scientist ...

(i.e., an observed value or data point) is above or below the

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores. It is calculated by subtracting the

population mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hyp ...

from an individual raw score and then dividing the difference by the

population Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...

standard deviation. This process of converting a raw score into a standard score is called standardizing or normalizing (however, "normalizing" can refer to many types of ratios; see ''

Normalization Normalization or normalisation refers to a process that makes something more normal or regular. Science * Normalization process theory, a sociological theory of the implementation of new technologies or innovations * Normalization model, used in ...

'' for more). Standard scores are most commonly called ''z''-scores; the two terms may be used interchangeably, as they are in this article. Other equivalent terms in use include z-value, z-statistic, normal score, standardized variable and pull in

high energy physics Particle physics or high-energy physics is the study of fundamental particles and forces that constitute matter and radiation. The field also studies combinations of elementary particles up to the scale of protons and neutrons, while the stu ...

. Computing a z-score requires knowledge of the mean and standard deviation of the complete population to which a data point belongs; if one only has a sample of observations from the population, then the analogous computation using the sample mean and sample standard deviation yields the ''t''-statistic.

Calculation

If the population mean and population standard deviation are known, a raw score is converted into a standard score by :

z =

where: : ''μ'' is the

of the population, : ''σ'' is the

of the population. The absolute value of represents the distance between that raw score and the population mean in units of the standard deviation. is negative when the raw score is below the mean, positive when above. Calculating using this formula requires use of the population mean and the population standard deviation, not the sample mean or sample deviation. However, knowing the true mean and standard deviation of a population is often an unrealistic expectation, except in cases such as

standardized testing A standardized test is a test that is administered and scored in a consistent or standard manner. Standardized tests are designed in such a way that the questions and interpretations are consistent and are administered and scored in a predetermine ...

, where the entire population is measured. When the population mean and the population standard deviation are unknown, the standard score may be estimated by using the sample mean and sample standard deviation as estimates of the population values. In these cases, the z-score is given by :

z =

where: :

\bar

is the

of the sample, : ''S'' is the

of the sample. Though it should always be stated, the distinction between use of the population and sample statistics often is not made. In either case, the numerator and denominator of the equations have the same units of measure so that the units cancel out through division and z is left as a

dimensionless quantity Dimensionless quantities, or quantities of dimension one, are quantities implicitly defined in a manner that prevents their aggregation into unit of measurement, units of measurement. ISBN 978-92-822-2272-0. Typically expressed as ratios that a ...

Applications

Z-test

The z-score is often used in the z-test in standardized testing – the analog of the

Student's t-test Student's ''t''-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's ''t''- ...

for a population whose parameters are known, rather than estimated. As it is very unusual to know the entire population, the t-test is much more widely used.

Prediction intervals

The standard score can be used in the calculation of

prediction interval In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval (statistics), interval in which a future observation will fall, with a certain probability, given what has already been observed. Pr ...

s. A prediction interval 'L'',''U'' consisting of a lower endpoint designated ''L'' and an upper endpoint designated ''U'', is an interval such that a future observation ''X'' will lie in the interval with high probability

\gamma

, i.e. :

P(L For the standard score ''Z'' of ''X'' it gives:
: P\left( \frac < Z < \frac \right) = \gamma. By determining the quantile z such that
: P\left( -z < Z < z \right) = \gamma it follows:
: L=\mu-z\sigma,\ U=\mu+z\sigma

Process control

In process control applications, the Z value provides an assessment of the degree to which a process is operating off-target.

Comparison of scores measured on different scales: ACT and SAT

When scores are measured on different scales, they may be converted to z-scores to aid comparison. Dietz et al. give the following example, comparing student scores on the (old)

SAT The SAT ( ) is a standardized test widely used for college admissions in the United States. Since its debut in 1926, its name and Test score, scoring have changed several times. For much of its history, it was called the Scholastic Aptitude Test ...

and ACT high school tests. The table shows the mean and standard deviation for total scores on the SAT and ACT. Suppose that student A scored 1800 on the SAT, and student B scored 24 on the ACT. Which student performed better relative to other test-takers? Z score for Student B

The z-score for student A is

z =  =  = 1

The z-score for student B is

z =  =  = 0.6

Because student A has a higher z-score than student B, student A performed better compared to other test-takers than did student B.

Percentage of observations below a z-score

Continuing the example of ACT and SAT scores, if it can be further assumed that both ACT and SAT scores are

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

(which is approximately correct), then the z-scores may be used to calculate the percentage of test-takers who received lower scores than students A and B.

Cluster analysis and multidimensional scaling

"For some multivariate techniques such as multidimensional scaling and cluster analysis, the concept of distance between the units in the data is often of considerable interest and importance… When the variables in a multivariate data set are on different scales, it makes more sense to calculate the distances after some form of standardization."

Principal components analysis

In principal components analysis, "Variables measured on different scales or on a common scale with widely differing ranges are often standardized."

Relative importance of variables in multiple regression: standardized regression coefficients

Standardization of variables prior to multiple regression analysis is sometimes used as an aid to interpretation. (page 95) state the following. "The standardized regression slope is the slope in the regression equation if X and Y are standardized … Standardization of X and Y is done by subtracting the respective means from each set of observations and dividing by the respective standard deviations … In multiple regression, where several X variables are used, the standardized regression coefficients quantify the relative contribution of each X variable." However, Kutner et al. (p 278) give the following caveat: "… one must be cautious about interpreting any regression coefficients, whether standardized or not. The reason is that when the predictor variables are correlated among themselves, … the regression coefficients are affected by the other predictor variables in the model … The magnitudes of the standardized regression coefficients are affected not only by the presence of correlations among the predictor variables but also by the spacings of the observations on each of these variables. Sometimes these spacings may be quite arbitrary. Hence, it is ordinarily not wise to interpret the magnitudes of standardized regression coefficients as reflecting the comparative importance of the predictor variables."

Standardizing in mathematical statistics

mathematical statistics Mathematical statistics is the application of probability theory and other mathematical concepts to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques that are commonly used in statistics inc ...

, a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

''X'' is standardized by subtracting its

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

\operatorname /math> and dividing the difference by its

\sigma(X) = \sqrt:

Z =

If the random variable under consideration is the

sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...

of a random sample

\ X_1,\dots, X_n

of ''X'': :

\bar= \sum_^n X_i

then the standardized version is :

Z = \frac

: :Where the standardised sample mean's variance was calculated as follows: : :

\begin
\operatorname\left(\sum x_\right) =\sum \operatorname(x_) =n\operatorname(x_) =n\sigma ^\\
\operatorname(\overline) =\operatorname\left(\frac\right) =\frac \operatorname\left(\sum x_\right) =\frac =\frac
\end

T-score

In educational assessment, T-score is a standard score Z shifted and scaled to have a mean of 50 and a standard deviation of 10. It is also known as '' hensachi'' in Japanese, where the concept is much more widely known and used in the context of high school and university admissions. In bone density measurements, the T-score is the standard score of the measurement compared to the population of healthy 30-year-old adults, and has the usual mean of 0 and standard deviation of 1.

References

External links

z-score calculator
{{Statistics Statistical ratios