In
statistics, standardized (regression) coefficients, also called beta coefficients or beta weights, are the estimates resulting from a
regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
where the underlying data have been
standardized
Standardization or standardisation is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organizations and governments. Standardization ...
so that the
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
s of
dependent and independent variables are equal to 1.
Therefore, standardized coefficients are
unitless
A dimensionless quantity (also known as a bare quantity, pure quantity, or scalar quantity as well as quantity of dimension one) is a quantity to which no physical dimension is assigned, with a corresponding SI unit of measurement of one (or 1), ...
and refer to how many standard deviations a dependent variable will change, per standard deviation increase in the predictor variable.
Usage
Standardization of the coefficient is usually done to answer the question of which of the independent variables have a greater effect on the
dependent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or dema ...
in a
multiple regression
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one o ...
analysis where the variables are measured in different
units of measurement
A unit of measurement is a definite magnitude of a quantity, defined and adopted by convention or by law, that is used as a standard for measurement of the same kind of quantity. Any other quantity of that kind can be expressed as a mul ...
(for example, income measured in dollars and family size measured in number of individuals).
It may also be considered a general measure of
effect size
In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the ...
, quantifying the "magnitude" of the effect of one variable on another.
For simple linear regression with orthogonal predictors, the standardized regression coefficient equals the
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...
between the independent and dependent variables.
Implementation
A
regression carried out on original (unstandardized) variables produces unstandardized coefficients. A regression carried out on standardized variables produces standardized coefficients. Values for standardized and unstandardized coefficients can also be re-scaled to one another subsequent to either type of analysis.
Suppose that
is the regression coefficient resulting from a
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...
(predicting
by
). The standardized coefficient simply results as
, where
and
are the
(estimated) standard deviations of
and
, respectively.
Sometimes, standardization is done only without respect to the standard deviation of the
regressor
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or dema ...
(the independent variable
).
Advantages and disadvantages
Standardized coefficients' advocates note that the coefficients are independent of the involved variables'
units of measurement
A unit of measurement is a definite magnitude of a quantity, defined and adopted by convention or by law, that is used as a standard for measurement of the same kind of quantity. Any other quantity of that kind can be expressed as a mul ...
(i.e., standardized coefficients are ''
unitless
A dimensionless quantity (also known as a bare quantity, pure quantity, or scalar quantity as well as quantity of dimension one) is a quantity to which no physical dimension is assigned, with a corresponding SI unit of measurement of one (or 1), ...
''), which makes comparisons easy.
Critics voice concerns that such a standardization can be very misleading.
Due to the re-scaling based on sample standard deviations, any effect apparent in the standardized coefficient may be due to
confounding
In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
with the particularities (especially:
variability) of the involved data sample(s).
Also, the interpretation or meaning of a "''one standard deviation change''" in the regressor
may vary markedly between non-
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
s (e.g., when
skewed
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimoda ...
,
asymmetric
Asymmetric may refer to:
*Asymmetry in geometry, chemistry, and physics
Computing
* Asymmetric cryptography, in public-key cryptography
*Asymmetric digital subscriber line, Internet connectivity
* Asymmetric multiprocessing, in computer architect ...
or
multimodal).
Terminology
Some
statistical software
Statistical software are specialized computer programs for analysis in statistics and econometrics.
Open-source
* ADaMSoft – a generalized statistical software with data mining algorithms and methods for data management
* ADMB – a softwa ...
packages like
PSPP
PSPP is a free software application for analysis of sampled data, intended as a free alternative for IBM SPSS Statistics. It has a graphical user interface and conventional command-line interface. It is written in C and uses GNU Scientific ...
,
SPSS
SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Cur ...
and
SYSTAT label the standardized regression coefficients as "Beta" while the unstandardized coefficients are labeled "B". Others, like
DAP
DAP or Dap may refer to:
Science
* DAP (gene), human gene that encodes death-associated proteins, which mediate programmed cell death
* Diamidophosphate, phosphorylating compound
* Diaminopimelic acid, amino acid derivative of lysine
* Diamino ...
/
SAS label them "Standardized Coefficient". Sometimes the unstandardized variables are also labeled as "b".
See also
*
Linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...
*
Correlation coefficient
A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two componen ...
*
Effect size
In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the ...
*
Unit-weighted regression In statistics, unit-weighted regression is a simplified and robust version ( Wainer & Thissen, 1976) of multiple regression analysis where only the intercept term is estimated. That is, it fits a model
:\hat = \hat(\mathbf) = \hat + \sum_i x_i
...
References
Further reading
*
*
*
External links
Which Predictors Are More Important?- why standardized coefficients are used
{{DEFAULTSORT:Standardized Coefficient
Regression analysis