Factor analysis is a
statistical
Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
method used to describe
variability among observed, correlated
variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved
latent variables. The observed variables are modelled as
linear combinations of the potential factors plus "
error" terms, hence factor analysis can be thought of as a special case of
errors-in-variables models.
Simply put, the factor loading of a variable quantifies the extent to which the variable is related to a given factor.
A common rationale behind factor analytic methods is that the information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Factor analysis is commonly used in
psychometrics,
personality psychology, biology,
marketing,
product management,
operations research,
finance
Finance is the study and discipline of money, currency and capital assets. It is related to, but not synonymous with economics, the study of production, distribution, and consumption of money, assets, goods and services (the discipline of fina ...
, and
machine learning. It may help to deal with data sets where there are large numbers of observed variables that are thought to reflect a smaller number of underlying/latent variables. It is one of the most commonly used inter-dependency techniques and is used when the relevant set of variables shows a systematic inter-dependence and the objective is to find out the latent factors that create a commonality.
Statistical model
Definition
The model attempts to explain a set of
observations in each of
individuals with a set of
''common factors'' (
) where there are fewer factors per unit than observations per unit (