
In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the standard deviation line (or SD line) marks points on a
scatter plot
A scatter plot, also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram, is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of dat ...
that are an equal number of
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
s away from the average in each dimension. For example, in a 2-dimensional scatter diagram with variables
and
, points that are 1 standard deviation away from the mean of
and also 1 standard deviation away from the mean of
are on the SD line.
The SD line is a useful visual tool since points in a scatter diagram tend to cluster around it,
more or less tightly depending on their
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
.
Properties
Relation to regression line
The SD line goes through the point of averages and has a slope of
when the correlation between
and
is positive, and
when the correlation is negative.
Unlike the
regression line
In statistics, linear regression is a model that estimates the relationship between a scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A model with exactly one explanatory variable ...
, the SD line does not take into account the relationship between
and
.
The slope of the SD line is related to that of the regression line by
where
is the slope of the regression line,
is the
correlation coefficient
A correlation coefficient is a numerical measure of some type of linear correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two c ...
, and
is the magnitude of the slope of the SD line.
Typical distance of points to SD line
The
root mean square
In mathematics, the root mean square (abbrev. RMS, or rms) of a set of values is the square root of the set's mean square.
Given a set x_i, its RMS is denoted as either x_\mathrm or \mathrm_x. The RMS is also known as the quadratic mean (denote ...
vertical distance of points from the SD line is
.
This gives an idea of the spread of points around the SD line.
Descriptive statistics