Deviation from Mean of a Random Distribution

mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...

and

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, deviation serves as a measure to quantify the disparity between an observed value of a variable and another designated value, frequently the mean of that variable. Deviations with respect to the

sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...

and the

population mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hyp ...

(or "

true value The True Value Company is an American wholesaler and Hardware store brand. The corporate headquarters are located in Chicago. Historically True Value was a cooperative owned by retailers, but in 2018 it was purchased by ACON Investments. In Oc ...

") are called ''errors'' and ''residuals'', respectively. The

sign A sign is an object, quality, event, or entity whose presence or occurrence indicates the probable presence or occurrence of something else. A natural sign bears a causal relation to its object—for instance, thunder is a sign of storm, or me ...

of the deviation reports the direction of that difference: the deviation is positive when the observed value exceeds the reference value. The

absolute value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if x is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), ...

of the deviation indicates the size or magnitude of the difference. In a given sample, there are as many deviations as sample points.

Summary statistics In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in * a measure of ...

can be derived from a set of deviations, such as the ''

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

'' and the '' mean absolute deviation'', measures of dispersion, and the '' mean signed deviation'', a measure of bias. The deviation of each data point is calculated by subtracting the mean of the data set from the individual data point. Mathematically, the deviation ''d'' of a data point ''x'' in a data set with respect to the mean ''m'' is given by the difference: :

d = x - m

This calculation represents the "distance" of a data point from the mean and provides information about how much individual values vary from the average. Positive deviations indicate values above the mean, while negative deviations indicate values below the mean. The sum of squared deviations is a key component in the calculation of

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

, another measure of the spread or dispersion of a data set. Variance is calculated by averaging the squared deviations. Deviation is a fundamental concept in understanding the distribution and variability of data points in statistical analysis.

Types

A deviation that is a difference between an observed value and the ''true value'' of a quantity of interest (where ''true value'' denotes the Expected Value, such as the population mean) is an error.

Signed deviations

A deviation that is the difference between the observed value and an estimate of the true value (e.g. the sample mean) is a ''residual''. These concepts are applicable for data at the interval and

ratio In mathematics, a ratio () shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...

levels of measurement.

Unsigned or absolute deviation

*Absolute deviation in statistics is a metric that measures the overall difference between individual data points and a central value, typically the mean or median of a dataset. It is determined by taking the absolute value of the difference between each data point and the central value and then averaging these absolute differences. The formula is expressed as follows:

D_i = , x_i - m(X), ,

where * ''D_i'' is the absolute deviation, * ''x_i'' is the data element, * ''m''(''X'') is the chosen measure of

central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...

of the data set—sometimes the

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

(

\overline

), but most often the

median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...

. The average absolute deviation (AAD) in statistics is a measure of the dispersion or spread of a set of data points around a central value, usually the mean or median. It is calculated by taking the average of the absolute differences between each data point and the chosen central value. AAD provides a measure of the typical magnitude of deviations from the central value in a dataset, giving insights into the overall variability of the data. Least absolute deviation (LAD) is a statistical method used in regression analysis to estimate the coefficients of a linear model. Unlike the more common least squares method, which minimizes the sum of squared vertical distances (residuals) between the observed and predicted values, the LAD method minimizes the sum of the absolute vertical distances. In the context of linear regression, if (''x''₁,''y''₁), (''x''₂,''y''₂), ... are the data points, and ''a'' and ''b'' are the coefficients to be estimated for the linear model

y= b + (a * x)

the least absolute deviation estimates (''a'' and ''b'') are obtained by minimizing the sum. The LAD method is less sensitive to outliers compared to the least squares method, making it a robust regression technique in the presence of skewed or heavy-tailed residual distributions.

Summary statistics

Mean signed deviation

For an

unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In stat ...

, the average of the signed deviations across the entire set of all observations from the unobserved population parameter value averages zero over an arbitrarily large number of samples. However, by construction the average of signed deviations of values from the sample mean value is always zero, though the average signed deviation from another measure of central tendency, such as the sample median, need not be zero. Mean Signed Deviation is a statistical measure used to assess the average deviation of a set of values from a central point, usually the mean. It is calculated by taking the arithmetic mean of the signed differences between each data point and the mean of the dataset. The term "signed" indicates that the deviations are considered with their respective signs, meaning whether they are above or below the mean. Positive deviations (above the mean) and negative deviations (below the mean) are included in the calculation. The mean signed deviation provides a measure of the average distance and direction of data points from the mean, offering insights into the overall trend and distribution of the data.

Dispersion

Statistics of the distribution of deviations are used as measures of

statistical dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartil ...

Standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

is a widely used measure of the spread or dispersion of a dataset. It quantifies the average amount of variation or deviation of individual data points from the mean of the dataset. It uses squared deviations, and has desirable properties. Standard deviation is sensitive to extreme values, making it not robust. *

Average absolute deviation The average absolute deviation (AAD) of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, ...

is a measure of the dispersion in a dataset that is less influenced by extreme values. It is calculated by finding the

absolute difference The absolute difference of two real numbers x and y is given by , x-y, , the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y, and is a special case of the Lp distance fo ...

between each data point and the mean, summing these absolute differences, and then dividing by the number of observations. This metric provides a more robust estimation of variability compared to standard deviation. * Median absolute deviation is a robust statistic that employs the median, rather than the mean, to measure the spread of a dataset. It is calculated by finding the absolute difference between each data point and the median, then computing the median of these absolute differences. This makes median absolute deviation less sensitive to outliers, offering a robust alternative to standard deviation. *

Maximum absolute deviation The average absolute deviation (AAD) of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, ...

is a straightforward measure of the maximum difference between any individual data point and the mean of the dataset. However, it is highly non-robust, as it can be disproportionately influenced by a single extreme value. This metric may not provide a reliable measure of dispersion when dealing with datasets containing outliers.

Normalization

Deviations, which measure the difference between observed values and some reference point, inherently carry units corresponding to the measurement scale used. For example, if lengths are being measured, deviations would be expressed in units like meters or feet. To make deviations unitless and facilitate comparisons across different datasets, one can nondimensionalize. One common method involves dividing deviations by a measure of scale(

), with the population standard deviation used for standardizing or the sample standard deviation for studentizing (e.g., Studentized residual). Another approach to nondimensionalization focuses on scaling by location rather than dispersion. The percent deviation offers an illustration of this method, calculated as the difference between the observed value and the accepted value, divided by the accepted value, and then multiplied by 100%. By scaling the deviation based on the accepted value, this technique allows for expressing deviations in percentage terms, providing a clear perspective on the relative difference between the observed and accepted values. Both methods of nondimensionalization serve the purpose of making deviations comparable and interpretable beyond the specific measurement units.

Examples

In one example, a series of measurements of the speed are taken of sound in a particular medium. The accepted or expected value for the speed of sound in this medium, based on theoretical calculations, is 343 meters per second. Now, during an experiment, multiple measurements are taken by different researchers. Researcher A measures the speed of sound as 340 meters per second, resulting in a deviation of −3 meters per second from the expected value. Researcher B, on the other hand, measures the speed as 345 meters per second, resulting in a deviation of +2 meters per second. In this scientific context, deviation helps quantify how individual measurements differ from the theoretically predicted or accepted value. It provides insights into the

accuracy and precision Accuracy and precision are two measures of ''observational error''. ''Accuracy'' is how close a given set of measurements (observations or readings) are to their ''true value''. ''Precision'' is how close the measurements are to each other. The ...

of experimental results, allowing researchers to assess the reliability of their data and potentially identify factors contributing to discrepancies. In another example, suppose a chemical reaction is expected to yield 100 grams of a specific compound based on stoichiometry. However, in an actual laboratory experiment, several trials are conducted with different conditions. In Trial 1, the actual yield is measured to be 95 grams, resulting in a deviation of −5 grams from the expected yield. In Trial 2, the actual yield is measured to be 102 grams, resulting in a deviation of +2 grams. These deviations from the expected value provide valuable information about the efficiency and reproducibility of the chemical reaction under different conditions. Scientists can analyze these deviations to optimize reaction conditions, identify potential sources of error, and improve the overall yield and reliability of the process. The concept of deviation is crucial in assessing the accuracy of experimental results and making informed decisions to enhance the outcomes of scientific experiments.

References

{{DEFAULTSORT:Deviation (Statistics) Statistical deviation and dispersion Statistical distance