Factor analysis is a

statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

method used to describe variability among observed, correlated

variables Variable may refer to: Computer science * Variable (computer science), a symbolic name associated with a value and whose associated value may be changed Mathematics * Variable (mathematics), a symbol that represents a quantity in a mathemat ...

in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved

latent variable In statistics, latent variables (from Latin: present participle of ) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such '' latent va ...

s. The observed variables are modelled as

linear combination In mathematics, a linear combination or superposition is an Expression (mathematics), expression constructed from a Set (mathematics), set of terms by multiplying each term by a constant and adding the results (e.g. a linear combination of ''x'' a ...

s of the potential factors plus "

error An error (from the Latin , meaning 'to wander'Oxford English Dictionary, s.v. “error (n.), Etymology,” September 2023, .) is an inaccurate or incorrect action, thought, or judgement. In statistics, "error" refers to the difference between t ...

" terms, hence factor analysis can be thought of as a special case of

errors-in-variables models In statistics, an errors-in-variables model or a measurement error model is a regression model that accounts for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been me ...

. Simply put, the factor loading of a variable quantifies the extent to which the variable is related to a given factor. A common rationale behind factor analytic methods is that the information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Factor analysis is commonly used in

psychometrics Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and rela ...

personality Personality is any person's collection of interrelated behavioral, cognitive, and emotional patterns that comprise a person’s unique adjustment to life. These interrelated patterns are relatively stable, but can change over long time per ...

psychology, biology,

marketing Marketing is the act of acquiring, satisfying and retaining customers. It is one of the primary components of Business administration, business management and commerce. Marketing is usually conducted by the seller, typically a retailer or ma ...

product management Product management is the business process of planning, developing, launching, and managing a product or service. It includes the entire lifecycle of a product, from ideation to development to go to market. Product managers are responsible for ...

operations research Operations research () (U.S. Air Force Specialty Code: Operations Analysis), often shortened to the initialism OR, is a branch of applied mathematics that deals with the development and application of analytical methods to improve management and ...

finance Finance refers to monetary resources and to the study and Academic discipline, discipline of money, currency, assets and Liability (financial accounting), liabilities. As a subject of study, is a field of Business administration, Business Admin ...

, and

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

. It may help to deal with data sets where there are large numbers of observed variables that are thought to reflect a smaller number of underlying/latent variables. It is one of the most commonly used inter-dependency techniques and is used when the relevant set of variables shows a systematic inter-dependence and the objective is to find out the latent factors that create a commonality.

Statistical model

Definition

The model attempts to explain a set of

p

observations in each of

n

individuals with a set of

k

''common factors'' (

f_

) where there are fewer factors per unit than observations per unit (

k). Each individual has k of their own common factors, and these are related to the observations via the factor ''loading matrix'' (L  \in \mathbb^), for a single observation, according to

: x_ - \mu_ = l_ f_ + \dots + l_ f_ + \varepsilon_where
* x_is the value of the i th observation of the m th individual,
* \mu_i is the observation mean for the i th observation,
* l_is the loading for the i th observation of the j th factor,
* f_is the value of the j th factor of the m th individual, and
* \varepsilon_is the (i,m) th ''unobserved stochastic error term'' with mean zero and finite variance.

In matrix notation

: X - \Mu = L F + \varepsilon where observation matrix X \in \mathbb^, loading matrix L \in \mathbb^, factor matrix F \in \mathbb^, error term matrix \varepsilon \in \mathbb^and mean matrix \Mu \in \mathbb^whereby the (i,m) th element is simply \Mu_=\mu_i .

Also we will impose the following assumptions on F :

# F and \varepsilon are independent.
# \mathrm(F) = 0; where \mathrm E is Expectation # \mathrm(F)=I where \mathrm is the

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

, to make sure that the factors are uncorrelated, and

I

is the

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...

. Suppose

\mathrm(X - \Mu)=\Sigma

. Then :

\Sigma=\mathrm(X - \Mu)=\mathrm(LF + \varepsilon),\,

and therefore, from conditions 1 and 2 imposed on

F

above,

E F LE 0

and

Cov(LF+\epsilon)=Cov(LF)+Cov(\epsilon)

, giving :

\Sigma = L \mathrm(F) L^T + \mathrm(\varepsilon),\,

or, setting

\Psi:=\mathrm(\varepsilon)

, :

\Sigma = LL^T + \Psi.\,

For any

orthogonal matrix In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors. One way to express this is Q^\mathrm Q = Q Q^\mathrm = I, where is the transpose of and is the identi ...

Q

, if we set

L^\prime=\ LQ

and

F^\prime=Q^T F

, the criteria for being factors and factor loadings still hold. Hence a set of factors and factor loadings is unique only up to an

orthogonal transformation In linear algebra, an orthogonal transformation is a linear transformation ''T'' : ''V'' → ''V'' on a real inner product space ''V'', that preserves the inner product. That is, for each pair of elements of ''V'', we hav ...

Example

Suppose a psychologist has the hypothesis that there are two kinds of

intelligence Intelligence has been defined in many ways: the capacity for abstraction, logic, understanding, self-awareness, learning, emotional knowledge, reasoning, planning, creativity, critical thinking, and problem-solving. It can be described as t ...

, "verbal intelligence" and "mathematical intelligence", neither of which is directly observed.

Evidence Evidence for a proposition is what supports the proposition. It is usually understood as an indication that the proposition is truth, true. The exact definition and role of evidence vary across different fields. In epistemology, evidence is what J ...

for the hypothesis is sought in the examination scores from each of 10 different academic fields of 1000 students. If each student is chosen randomly from a large

population Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...

, then each student's 10 scores are random variables. The psychologist's hypothesis may say that for each of the 10 academic fields, the score averaged over the group of all students who share some common pair of values for verbal and mathematical "intelligences" is some constant times their level of verbal intelligence plus another constant times their level of mathematical intelligence, i.e., it is a linear combination of those two "factors". The numbers for a particular subject, by which the two kinds of intelligence are multiplied to obtain the expected score, are posited by the hypothesis to be the same for all intelligence level pairs, and are called "factor loading" for this subject. For example, the hypothesis may hold that the predicted average student's aptitude in the field of

astronomy Astronomy is a natural science that studies celestial objects and the phenomena that occur in the cosmos. It uses mathematics, physics, and chemistry in order to explain their origin and their overall evolution. Objects of interest includ ...

is : + . The numbers 10 and 6 are the factor loadings associated with astronomy. Other academic subjects may have different factor loadings. Two students assumed to have identical degrees of verbal and mathematical intelligence may have different measured aptitudes in astronomy because individual aptitudes differ from average aptitudes (predicted above) and because of measurement error itself. Such differences make up what is collectively called the "error" — a statistical term that means the amount by which an individual, as measured, differs from what is average for or predicted by his or her levels of intelligence (see

errors and residuals in statistics In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value" (not necessarily observable). The erro ...

). The observable data that go into factor analysis would be 10 scores of each of the 1000 students, a total of 10,000 numbers. The factor loadings and levels of the two kinds of intelligence of each student must be inferred from the data.

Mathematical model of the same example

In the following, matrices will be indicated by indexed variables. "Academic Subject" indices will be indicated using letters

a

b

and

c

, with values running from

1

p

which is equal to

10

in the above example. "Factor" indices will be indicated using letters

p

q

and

r

, with values running from

1

k

which is equal to

2

in the above example. "Instance" or "sample" indices will be indicated using letters

i

j

and

k

, with values running from

1

N

. In the example above, if a sample of

N=1000

students participated in the

p=10

exams, the

i

th student's score for the

a

th exam is given by

x_

. The purpose of factor analysis is to characterize the correlations between the variables

x_a

of which the

x_

are a particular instance, or set of observations. In order for the variables to be on equal footing, they are

normalized Normalization or normalisation refers to a process that makes something more normal or regular. Science * Normalization process theory, a sociological theory of the implementation of new technologies or innovations * Normalization model, used in ...

into standard scores

z

: :

z_=\frac

where the sample mean is: :

\hat\mu_a=\tfrac\sum_i x_

and the sample variance is given by: :

\hat\sigma_a^2=\tfrac\sum_i (x_-\hat\mu_a)^2

The factor analysis model for this particular sample is then: :

\beginz_ & =  & \ell_F_ & + & \ell_F_ & + & \varepsilon_ \\
\vdots & & \vdots & & \vdots & & \vdots \\
z_ & =  & \ell_F_ & + & \ell_F_ & + & \varepsilon_
\end

or, more succinctly: :

z_=\sum_p \ell_F_+\varepsilon_

where *

F_

is the

i

th student's "verbal intelligence", *

F_

is the

i

th student's "mathematical intelligence", *

\ell_

are the factor loadings for the

a

th subject, for

p=1,2

. In

matrix Matrix (: matrices or matrixes) or MATRIX may refer to: Science and mathematics * Matrix (mathematics), a rectangular array of numbers, symbols or expressions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the m ...

notation, we have :

Z=LF+\varepsilon

Observe that by doubling the scale on which "verbal intelligence"—the first component in each column of

F

—is measured, and simultaneously halving the factor loadings for verbal intelligence makes no difference to the model. Thus, no generality is lost by assuming that the standard deviation of the factors for verbal intelligence is

1

. Likewise for mathematical intelligence. Moreover, for similar reasons, no generality is lost by assuming the two factors are

uncorrelated In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...

with each other. In other words: :

\sum_i F_F_=\delta_

where

\delta_

is the

Kronecker delta In mathematics, the Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually just non-negative integers. The function is 1 if the variables are equal, and 0 otherwise: \delta_ = \begin 0 &\text i \neq j, \\ 1 &\ ...

(

0

when

p \ne q

and

1

when

p=q

). The errors are assumed to be independent of the factors: :

\sum_i F_\varepsilon_=0

Since any rotation of a solution is also a solution, this makes interpreting the factors difficult. See disadvantages below. In this particular example, if we do not know beforehand that the two types of intelligence are uncorrelated, then we cannot interpret the two factors as the two different types of intelligence. Even if they are uncorrelated, we cannot tell which factor corresponds to verbal intelligence and which corresponds to mathematical intelligence without an outside argument. The values of the loadings

L

, the averages

\mu

, and the

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

s of the "errors"

\varepsilon

must be estimated given the observed data

X

and

F

(the assumption about the levels of the factors is fixed for a given

F

). The "fundamental theorem" may be derived from the above conditions: :

\sum_i z_z_=\sum_j \ell_\ell_+\sum_i \varepsilon_\varepsilon_

The term on the left is the

(a,b)

-term of the correlation matrix (a

p \times p

matrix derived as the product of the

p \times N

matrix of standardized observations with its transpose) of the observed data, and its

p

diagonal elements will be

1

s. The second term on the right will be a diagonal matrix with terms less than unity. The first term on the right is the "reduced correlation matrix" and will be equal to the correlation matrix except for its diagonal values which will be less than unity. These diagonal elements of the reduced correlation matrix are called "communalities" (which represent the fraction of the variance in the observed variable that is accounted for by the factors): :

h_a^2=1-\psi_a=\sum_j \ell_\ell_

The sample data

z_

will not exactly obey the fundamental equation given above due to sampling errors, inadequacy of the model, etc. The goal of any analysis of the above model is to find the factors

F_

and loadings

\ell_

which give a "best fit" to the data. In factor analysis, the best fit is defined as the minimum of the mean square error in the off-diagonal residuals of the correlation matrix: :

\varepsilon^2 = \sum_ \left sum_i z_z_-\sum_j \ell_\ell_\right 2

This is equivalent to minimizing the off-diagonal components of the error covariance which, in the model equations have expected values of zero. This is to be contrasted with principal component analysis which seeks to minimize the mean square error of all residuals. Before the advent of high-speed computers, considerable effort was devoted to finding approximate solutions to the problem, particularly in estimating the communalities by other means, which then simplifies the problem considerably by yielding a known reduced correlation matrix. This was then used to estimate the factors and the loadings. With the advent of high-speed computers, the minimization problem can be solved iteratively with adequate speed, and the communalities are calculated in the process, rather than being needed beforehand. The MinRes algorithm is particularly suited to this problem, but is hardly the only iterative means of finding a solution. If the solution factors are allowed to be correlated (as in 'oblimin' rotation, for example), then the corresponding mathematical model uses

skew coordinates A system of skew coordinates, sometimes called oblique coordinates, is a curvilinear coordinate system where the coordinate surfaces are not orthogonal, as in ''orthogonal coordinates''. Skew coordinates tend to be more complicated to work with co ...

rather than orthogonal coordinates.

Geometric interpretation

The parameters and variables of factor analysis can be given a geometrical interpretation. The data (

z_

), the factors (

F_

) and the errors (

\varepsilon_

) can be viewed as vectors in an

N

-dimensional Euclidean space (sample space), represented as

\mathbf_a

\mathbf_p

and

\boldsymbol_a

respectively. Since the data are standardized, the data vectors are of unit length (

, , \mathbf_a, , =1

). The factor vectors define a

k

-dimensional linear subspace (i.e. a hyperplane) in this space, upon which the data vectors are projected orthogonally. This follows from the model equation :

\mathbf_a=\sum_p \ell_ \mathbf_p+\boldsymbol_a

and the independence of the factors and the errors:

\mathbf_p\cdot\boldsymbol_a=0

. In the above example, the hyperplane is just a 2-dimensional plane defined by the two factor vectors. The projection of the data vectors onto the hyperplane is given by :

\hat_a=\sum_p \ell_\mathbf_p

and the errors are vectors from that projected point to the data point and are perpendicular to the hyperplane. The goal of factor analysis is to find a hyperplane which is a "best fit" to the data in some sense, so it doesn't matter how the factor vectors which define this hyperplane are chosen, as long as they are independent and lie in the hyperplane. We are free to specify them as both orthogonal and normal (

\mathbf_p\cdot \mathbf_q=\delta_

) with no loss of generality. After a suitable set of factors are found, they may also be arbitrarily rotated within the hyperplane, so that any rotation of the factor vectors will define the same hyperplane, and also be a solution. As a result, in the above example, in which the fitting hyperplane is two dimensional, if we do not know beforehand that the two types of intelligence are uncorrelated, then we cannot interpret the two factors as the two different types of intelligence. Even if they are uncorrelated, we cannot tell which factor corresponds to verbal intelligence and which corresponds to mathematical intelligence, or whether the factors are linear combinations of both, without an outside argument. The data vectors

\mathbf_a

have unit length. The entries of the correlation matrix for the data are given by

r_=\mathbf_a\cdot\mathbf_b

. The correlation matrix can be geometrically interpreted as the cosine of the angle between the two data vectors

\mathbf_a

and

\mathbf_b

. The diagonal elements will clearly be

1

s and the off diagonal elements will have absolute values less than or equal to unity. The "reduced correlation matrix" is defined as :

\hat_=\hat_a\cdot\hat_b

. The goal of factor analysis is to choose the fitting hyperplane such that the reduced correlation matrix reproduces the correlation matrix as nearly as possible, except for the diagonal elements of the correlation matrix which are known to have unit value. In other words, the goal is to reproduce as accurately as possible the cross-correlations in the data. Specifically, for the fitting hyperplane, the mean square error in the off-diagonal components :

\varepsilon^2=\sum_ \left(r_-\hat_\right)^2

is to be minimized, and this is accomplished by minimizing it with respect to a set of orthonormal factor vectors. It can be seen that :

r_-\hat_= \boldsymbol_a\cdot\boldsymbol_b

The term on the right is just the covariance of the errors. In the model, the error covariance is stated to be a diagonal matrix and so the above minimization problem will in fact yield a "best fit" to the model: It will yield a sample estimate of the error covariance which has its off-diagonal components minimized in the mean square sense. It can be seen that since the

\hat_a

are orthogonal projections of the data vectors, their length will be less than or equal to the length of the projected data vector, which is unity. The square of these lengths are just the diagonal elements of the reduced correlation matrix. These diagonal elements of the reduced correlation matrix are known as "communalities": :

^2=, , \hat_a, , ^2= \sum_p ^2

Large values of the communalities will indicate that the fitting hyperplane is rather accurately reproducing the correlation matrix. The mean values of the factors must also be constrained to be zero, from which it follows that the mean values of the errors will also be zero.

Practical implementation

Types of factor analysis

Exploratory factor analysis

Exploratory factor analysis (EFA) is used to identify complex interrelationships among items and group items that are part of unified concepts. The researcher makes no ''a priori'' assumptions about relationships among factors.

Confirmatory factor analysis

Confirmatory factor analysis (CFA) is a more complex approach that tests the hypothesis that the items are associated with specific factors. CFA uses

structural equation modeling Structural equation modeling (SEM) is a diverse set of methods used by scientists for both observational and experimental research. SEM is used mostly in the social and behavioral science fields, but it is also used in epidemiology, business, ...

to test a measurement model whereby loading on the factors allows for evaluation of relationships between observed variables and unobserved variables. Structural equation modeling approaches can accommodate measurement error and are less restrictive than

least-squares estimation The method of least squares is a mathematical optimization technique that aims to determine the best fit function by minimizing the sum of the squares of the differences between the observed values and the predicted values of the model. The me ...

. Hypothesized models are tested against actual data, and the analysis would demonstrate loadings of observed variables on the latent variables (factors), as well as the correlation between the latent variables.

Types of factor extraction

Principal component analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...

(PCA) is a widely used method for factor extraction, which is the first phase of EFA. Factor weights are computed to extract the maximum possible variance, with successive factoring continuing until there is no further meaningful variance left. The factor model must then be rotated for analysis. Canonical factor analysis, also called Rao's canonical factoring, is a different method of computing the same model as PCA, which uses the principal axis method. Canonical factor analysis seeks factors that have the highest canonical correlation with the observed variables. Canonical factor analysis is unaffected by arbitrary rescaling of the data. Common factor analysis, also called principal factor analysis (PFA) or principal axis factoring (PAF), seeks the fewest factors which can account for the common variance (correlation) of a set of variables. Image factoring is based on the

correlation matrix In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...

of predicted variables rather than actual variables, where each variable is predicted from the others using multiple regression. Alpha factoring is based on maximizing the reliability of factors, assuming variables are randomly sampled from a universe of variables. All other methods assume cases to be sampled and variables fixed. Factor regression model is a combinatorial model of factor model and regression model; or alternatively, it can be viewed as the hybrid factor model, whose factors are partially known.

Terminology

Criteria for determining the number of factors

Researchers wish to avoid such subjective or arbitrary criteria for factor retention as "it made sense to me". A number of objective methods have been developed to solve this problem, allowing users to determine an appropriate range of solutions to investigate. However these different methods often disagree with one another as to the number of factors that ought to be retained. For instance, the parallel analysis may suggest 5 factors while Velicer's MAP suggests 6, so the researcher may request both 5 and 6-factor solutions and discuss each in terms of their relation to external data and theory.

Modern criteria

Horn's parallel analysis (PA): A Monte-Carlo based simulation method that compares the observed eigenvalues with those obtained from uncorrelated normal variables. A factor or component is retained if the associated eigenvalue is bigger than the 95th percentile of the distribution of eigenvalues derived from the random data. PA is among the more commonly recommended rules for determining the number of components to retain, but many programs fail to include this option (a notable exception being R). However, Formann provided both theoretical and empirical evidence that its application might not be appropriate in many cases since its performance is considerably influenced by

sample size Sample size determination or estimation is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences abo ...

, item discrimination, and type of

correlation coefficient A correlation coefficient is a numerical measure of some type of linear correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two c ...

. Velicer's (1976) MAP test as described by Courtney (2013)Courtney, M. G. R. (2013). Determining the number of factors to retain in EFA: Using the SPSS R-Menu v2.0 to make more judicious estimations. Practical Assessment, Research and Evaluation, 18(8). Available online: http://pareonline.net/getvn.asp?v=18&n=8 “involves a complete principal components analysis followed by the examination of a series of matrices of partial correlations” (p. 397 (though this quote does not occur in Velicer (1976) and the cited page number is outside the pages of the citation). The squared correlation for Step “0” (see Figure 4) is the average squared off-diagonal correlation for the unpartialed correlation matrix. On Step 1, the first principal component and its associated items are partialed out. Thereafter, the average squared off-diagonal correlation for the subsequent correlation matrix is then computed for Step 1. On Step 2, the first two principal components are partialed out and the resultant average squared off-diagonal correlation is again computed. The computations are carried out for k minus one step (k representing the total number of variables in the matrix). Thereafter, all of the average squared correlations for each step are lined up and the step number in the analyses that resulted in the lowest average squared partial correlation determines the number of components or factors to retain. By this method, components are maintained as long as the variance in the correlation matrix represents systematic variance, as opposed to residual or error variance. Although methodologically akin to principal components analysis, the MAP technique has been shown to perform quite well in determining the number of factors to retain in multiple simulation studies.Garrido, L. E., & Abad, F. J., & Ponsoda, V. (2012). A new look at Horn's parallel analysis with ordinal variables. Psychological Methods. Advance online publication. This procedure is made available through SPSS's user interface, as well as the ''psych'' package for the

R programming language R is a programming language for statistical computing and data visualization. It has been widely adopted in the fields of data mining, bioinformatics, data analysis, and data science. The core R language is extended by a large number of so ...

Older methods

Kaiser criterion: The Kaiser rule is to drop all components with eigenvalues under 1.0 – this being the eigenvalue equal to the information accounted for by an average single item. The Kaiser criterion is the default in

SPSS SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Versi ...

and most

statistical software The following is a list of statistical software. Open-source * ADaMSoft – a generalized statistical software with data mining algorithms and methods for data management * ADMB – a software suite for non-linear statistical modeling based on C+ ...

but is not recommended when used as the sole cut-off criterion for estimating the number of factors as it tends to over-extract factors. A variation of this method has been created where a researcher calculates confidence intervals for each eigenvalue and retains only factors which have the entire confidence interval greater than 1.0.

Scree plot In multivariate statistics, a scree plot is a line plot of the eigenvalues of factors or principal components in an analysis. The scree plot is used to determine the number of factors to retain in an exploratory factor analysis (FA) or principal c ...

: The Cattell scree test plots the components as the X-axis and the corresponding

eigenvalue In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...

s as the

Y-axis In geometry, a Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of real numbers called ''coordinates'', which are the signed distances to the point from two fixed perpendicular o ...

. As one moves to the right, toward later components, the eigenvalues drop. When the drop ceases and the curve makes an elbow toward less steep decline, Cattell's scree test says to drop all further components after the one starting at the elbow. This rule is sometimes criticised for being amenable to researcher-controlled " fudging". That is, as picking the "elbow" can be subjective because the curve has multiple elbows or is a smooth curve, the researcher may be tempted to set the cut-off at the number of factors desired by their research agenda. Variance explained criteria: Some researchers simply use the rule of keeping enough factors to account for 90% (sometimes 80%) of the variation. Where the researcher's goal emphasizes parsimony (explaining variance with as few factors as possible), the criterion could be as low as 50%.

Bayesian methods

By placing a

prior distribution A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...

over the number of latent factors and then applying Bayes' theorem, Bayesian models can return a

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

over the number of latent factors. This has been modeled using the Indian buffet process, but can be modeled more simply by placing any discrete prior (e.g. a

negative binomial distribution In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...

) on the number of components.

Rotation methods

The output of PCA maximizes the variance accounted for by the first factor first, then the second factor, etc. A disadvantage of this procedure is that most items load on the early factors, while very few items load on later variables. This makes interpreting the factors by reading through a list of questions and loadings difficult, as every question is strongly correlated with the first few components, while very few questions are strongly correlated with the last few components. Rotation serves to make the output easier to interpret. By choosing a different basis for the same principal componentsthat is, choosing different factors to express the same correlation structureit is possible to create variables that are more easily interpretable. Rotations can be orthogonal or oblique; oblique rotations allow the factors to correlate. This increased flexibility means that more rotations are possible, some of which may be better at achieving a specified goal. However, this can also make the factors more difficult to interpret, as some information is "double-counted" and included multiple times in different components; some factors may even appear to be near-duplicates of each other.

Orthogonal methods

Two broad classes of orthogonal rotations exist: those that look for sparse rows (where each row is a case, i.e. subject), and those that look for sparse columns (where each column is a variable). * Simple factors: these rotations try to explain all factors by using only a few important variables. This effect can be achieved by using ''Varimax'' (the most common rotation). * Simple variables: these rotations try to explain all variables using only a few important factors. This effect can be achieved using either ''Quartimax'' or the unrotated components of PCA. * Both: these rotations try to compromise between both of the above goals, but in the process, may achieve a fit that is poor at both tasks; as such, they are unpopular compared to the above methods. ''Equamax'' is one such rotation.

Problems with factor rotation

It can be difficult to interpret a factor structure when each variable is loading on multiple factors. Small changes in the data can sometimes tip a balance in the factor rotation criterion so that a completely different factor rotation is produced. This can make it difficult to compare the results of different experiments. This problem is illustrated by a comparison of different studies of world-wide cultural differences. Each study has used different measures of cultural variables and produced a differently rotated factor analysis result. The authors of each study believed that they had discovered something new, and invented new names for the factors they found. A later comparison of the studies found that the results were rather similar when the unrotated results were compared. The common practice of factor rotation has obscured the similarity between the results of the different studies.

Higher order factor analysis

Higher-order factor analysis is a statistical method consisting of repeating steps factor analysis – oblique rotation – factor analysis of rotated factors. Its merit is to enable the researcher to see the hierarchical structure of studied phenomena. To interpret the results, one proceeds either by post-multiplying the primary factor pattern matrix by the higher-order factor pattern matrices (Gorsuch, 1983) and perhaps applying a

Varimax rotation In statistics, a varimax rotation is used to simplify the expression of a particular sub-space in terms of just a few major items each. The actual coordinate system is unchanged, it is the orthogonal basis that is being rotated to align with those ...

to the result (Thompson, 1990) or by using a Schmid-Leiman solution (SLS, Schmid & Leiman, 1957, also known as Schmid-Leiman transformation) which attributes the variation from the primary factors to the second-order factors.

Exploratory factor analysis (EFA) versus principal components analysis (PCA)

Factor analysis is related to

principal component analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...

(PCA), but the two are not identical. There has been significant controversy in the field over differences between the two techniques. PCA can be considered as a more basic version of

exploratory factor analysis In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of Variable (research), variables. EFA is a technique within factor analysis whose overarching ...

(EFA) that was developed in the early days prior to the advent of high-speed computers. Both PCA and factor analysis aim to reduce the dimensionality of a set of data, but the approaches taken to do so are different for the two techniques. Factor analysis is clearly designed with the objective to identify certain unobservable factors from the observed variables, whereas PCA does not directly address this objective; at best, PCA provides an approximation to the required factors.Jolliffe I.T. ''Principal Component Analysis'', Series: Springer Series in Statistics, 2nd ed., Springer, NY, 2002, XXIX, 487 p. 28 illus. From the point of view of exploratory analysis, the

eigenvalues In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...

of PCA are inflated component loadings, i.e., contaminated with error variance. Whilst EFA and PCA are treated as synonymous techniques in some fields of statistics, this has been criticised. Factor analysis "deals with ''the assumption of an underlying causal structure'': tassumes that the covariation in the observed variables is due to the presence of one or more latent variables (factors) that exert causal influence on these observed variables". In contrast, PCA neither assumes nor depends on such an underlying causal relationship. Researchers have argued that the distinctions between the two techniques may mean that there are objective benefits for preferring one over the other based on the analytic goal. If the factor model is incorrectly formulated or the assumptions are not met, then factor analysis will give erroneous results. Factor analysis has been used successfully where adequate understanding of the system permits good initial model formulations. PCA employs a mathematical transformation to the original data with no assumptions about the form of the covariance matrix. The objective of PCA is to determine linear combinations of the original variables and select a few that can be used to summarize the data set without losing much information.

Arguments contrasting PCA and EFA

Fabrigar et al. (1999) address a number of reasons used to suggest that PCA is not equivalent to factor analysis: # It is sometimes suggested that PCA is computationally quicker and requires fewer resources than factor analysis. Fabrigar et al. suggest that readily available computer resources have rendered this practical concern irrelevant. # PCA and factor analysis can produce similar results. This point is also addressed by Fabrigar et al.; in certain cases, whereby the communalities are low (e.g. 0.4), the two techniques produce divergent results. In fact, Fabrigar et al. argue that in cases where the data correspond to assumptions of the common factor model, the results of PCA are inaccurate results. # There are certain cases where factor analysis leads to 'Heywood cases'. These encompass situations whereby 100% or more of the

in a measured variable is estimated to be accounted for by the model. Fabrigar et al. suggest that these cases are actually informative to the researcher, indicating an incorrectly specified model or a violation of the common factor model. The lack of Heywood cases in the PCA approach may mean that such issues pass unnoticed. # Researchers gain extra information from a PCA approach, such as an individual's score on a certain component; such information is not yielded from factor analysis. However, as Fabrigar et al. contend, the typical aim of factor analysis – i.e. to determine the factors accounting for the structure of the

correlations In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...

between measured variables – does not require knowledge of factor scores and thus this advantage is negated. It is also possible to compute factor scores from a factor analysis.

Variance versus covariance

Factor analysis takes into account the

random error Observational error (or measurement error) is the difference between a measured value of a quantity and its unknown true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. Such errors are inherent in the measurement ...

that is inherent in measurement, whereas PCA fails to do so. This point is exemplified by Brown (2009), who indicated that, in respect to the correlation matrices involved in the calculations: For this reason, Brown (2009) recommends using factor analysis when theoretical ideas about relationships between variables exist, whereas PCA should be used if the goal of the researcher is to explore patterns in their data.

Differences in procedure and results

The differences between PCA and factor analysis (FA) are further illustrated by Suhr (2009): * PCA results in principal components that account for a maximal amount of variance for observed variables; FA accounts for ''common'' variance in the data. * PCA inserts ones on the diagonals of the correlation matrix; FA adjusts the diagonals of the correlation matrix with the unique factors. * PCA minimizes the sum of squared perpendicular distance to the component axis; FA estimates factors that influence responses on observed variables. * The component scores in PCA represent a linear combination of the observed variables weighted by

eigenvectors In linear algebra, an eigenvector ( ) or characteristic vector is a Vector (mathematics and physics), vector that has its direction (geometry), direction unchanged (or reversed) by a given linear map, linear transformation. More precisely, an e ...

; the observed variables in FA are linear combinations of the underlying and unique factors. * In PCA, the components yielded are uninterpretable, i.e. they do not represent underlying ‘constructs’; in FA, the underlying constructs can be labelled and readily interpreted, given an accurate model specification.

In psychometrics

History

Charles Spearman Charles Edward Spearman, FRS (10 September 1863 – 17 September 1945) was an English psychologist known for work in statistics, as a pioneer of factor analysis, and for Spearman's rank correlation coefficient. He also did seminal work on mod ...

was the first psychologist to discuss common factor analysis and did so in his 1904 paper. It provided few details about his methods and was concerned with single-factor models. He discovered that school children's scores on a wide variety of seemingly unrelated subjects were positively correlated, which led him to postulate that a single general mental ability, or '' g'', underlies and shapes human cognitive performance. The initial development of common factor analysis with multiple factors was given by Louis Thurstone in two papers in the early 1930s, summarized in his 1935 book, '' The Vector of Mind.'' Thurstone introduced several important factor analysis concepts, including communality, uniqueness, and rotation. He advocated for "simple structure", and developed methods of rotation that could be used as a way to achieve such structure. In

Q methodology Q methodology is a research method used in psychology and in social sciences to study people's "subjectivity"—that is, their viewpoint. Q was developed by psychologist William Stephenson. It has been used both in clinical settings for assessing ...

William Stephenson Sir William Samuel Stephenson (born William Samuel Clouston Stanger, 23 January 1897 – 31 January 1989) was a Canadian soldier, fighter pilot, businessman and spymaster who served as the senior representative of the British Security Coord ...

, a student of Spearman, distinguish between ''R'' factor analysis, oriented toward the study of inter-individual differences, and ''Q'' factor analysis oriented toward subjective intra-individual differences.

Raymond Cattell Raymond Bernard Cattell (20 March 1905 – 2 February 1998) was a British-American psychologist, known for his psychometric research into intrapersonal psychological structure.Gillis, J. (2014). ''Psychology's Secret Genius: The Lives and Works ...

was a strong advocate of factor analysis and

and used Thurstone's multi-factor theory to explain intelligence. Cattell also developed the scree test and similarity coefficients.

Applications in psychology

Factor analysis is used to identify "factors" that explain a variety of results on different tests. For example, intelligence research found that people who get a high score on a test of verbal ability are also good on other tests that require verbal abilities. Researchers explained this by using factor analysis to isolate one factor, often called verbal intelligence, which represents the degree to which someone is able to solve problems involving verbal skills. Factor analysis in psychology is most often associated with intelligence research. However, it also has been used to find factors in a broad range of domains such as personality, attitudes, beliefs, etc. It is linked to

, as it can assess the validity of an instrument by finding if the instrument indeed measures the postulated factors.

Advantages

* Reduction of number of variables, by combining two or more variables into a single factor. For example, performance at running, ball throwing, batting, jumping and weight lifting could be combined into a single factor such as general athletic ability. Usually, in an item by people matrix, factors are selected by grouping related items. In the Q factor analysis technique, the matrix is transposed and factors are created by grouping related people. For example, liberals, libertarians, conservatives, and socialists might form into separate groups. * Identification of groups of inter-related variables, to see how they are related to each other. For example, Carroll used factor analysis to build his

Three Stratum Theory The three-stratum theory is a theory of cognitive ability proposed by the American psychologist John Carroll in 1993.J. B. Carroll (1997), "The three-stratum theory of cognitive abilities" in D. P. Flanagan, J. L. Genshaft et al., ''Contemporary i ...

. He found that a factor called "broad visual perception" relates to how good an individual is at visual tasks. He also found a "broad auditory perception" factor, relating to auditory task capability. Furthermore, he found a global factor, called "g" or general intelligence, that relates to both "broad visual perception" and "broad auditory perception". This means someone with a high "g" is likely to have both a high "visual perception" capability and a high "auditory perception" capability, and that "g" therefore explains a good part of why someone is good or bad in both of those domains.

Disadvantages

* "...each orientation is equally acceptable mathematically. But different factorial theories proved to differ as much in terms of the orientations of factorial axes for a given solution as in terms of anything else, so that model fitting did not prove to be useful in distinguishing among theories." (Sternberg, 1977). This means all rotations represent different underlying processes, but all rotations are equally valid outcomes of standard factor analysis optimization. Therefore, it is impossible to pick the proper rotation using factor analysis alone. * Factor analysis can be only as good as the data allows. In psychology, where researchers often have to rely on less valid and reliable measures such as self-reports, this can be problematic. * Interpreting factor analysis is based on using a "heuristic", which is a solution that is "convenient even if not absolutely true". More than one interpretation can be made of the same data factored the same way, and factor analysis cannot identify causality.

In cross-cultural research

Factor analysis is a frequently used technique in cross-cultural research. It serves the purpose of extracting cultural dimensions. The best known cultural dimensions models are those elaborated by

Geert Hofstede Gerard Hendrik (Geert) Hofstede (2 October 1928 – 12 February 2020) was a Dutch social psychologist, IBM employee, and Professor Emeritus of Organizational Anthropology and International Management at Maastricht University in the Netherlands, ...

Ronald Inglehart Ronald F. Inglehart (September 5, 1934 – May 8, 2021) was an American political scientist specializing in comparative politics. He was director of the World Values Survey, a global network of social scientists who have carried out representat ...

Christian Welzel Christian Welzel (born 1964) is a German political scientist at the Leuphana University Lueneburg and director of research at the World Values Survey Association. He is known for the model of cultural dimensions which measures emancipative va ...

, Shalom Schwartz and Michael Minkov. A popular visualization is Inglehart and Welzel's cultural map of the world.

In political science

In an early 1965 study, political systems around the world are examined via factor analysis to construct related theoretical models and research, compare political systems, and create typological categories. For these purposes, in this study seven basic political dimensions are identified, which are related to a wide variety of political behaviour: these dimensions are Access, Differentiation, Consensus, Sectionalism, Legitimation, Interest, and Leadership Theory and Research. Other political scientists explore the measurement of internal political efficacy using four new questions added to the 1988 National Election Study. Factor analysis is here used to find that these items measure a single concept distinct from external efficacy and political trust, and that these four questions provided the best measure of internal political efficacy up to that point in time.

In marketing

The basic steps are: * Identify the salient attributes consumers use to evaluate

products Product may refer to: Business * Product (business), an item that can be offered to a market to satisfy the desire or need of a customer. * Product (project management), a deliverable or set of deliverables that contribute to a business solution ...

in this category. * Use

quantitative marketing research Quantitative marketing research is the application of quantitative research techniques to the field of marketing research. It has roots in both the positivist view of the world, and the modern marketing viewpoint that marketing is an interactive ...

techniques (such as surveys) to collect data from a sample of potential

customer In sales, commerce, and economics, a customer (sometimes known as a Client (business), client, buyer, or purchaser) is the recipient of a Good (economics), good, service (economics), service, product (business), product, or an Intellectual prop ...

s concerning their ratings of all the product attributes. * Input the data into a statistical program and run the factor analysis procedure. The computer will yield a set of underlying attributes (or factors). * Use these factors to construct perceptual maps and other product positioning devices.

Information collection

The data collection stage is usually done by marketing research professionals. Survey questions ask the respondent to rate a product sample or descriptions of product concepts on a range of attributes. Anywhere from five to twenty attributes are chosen. They could include things like: ease of use, weight, accuracy, durability, colourfulness, price, or size. The attributes chosen will vary depending on the product being studied. The same question is asked about all the products in the study. The data for multiple products is coded and input into a statistical program such as R,

, SAS,

Stata Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose Statistics, statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers ...

, STATISTICA, JMP, and SYSTAT.

Analysis

The analysis will isolate the underlying factors that explain the data using a matrix of associations. Factor analysis is an interdependence technique. The complete set of interdependent relationships is examined. There is no specification of dependent variables, independent variables, or causality. Factor analysis assumes that all the rating data on different attributes can be reduced down to a few important dimensions. This reduction is possible because some attributes may be related to each other. The rating given to any one attribute is partially the result of the influence of other attributes. The statistical algorithm deconstructs the rating (called a raw score) into its various components and reconstructs the partial scores into underlying factor scores. The degree of correlation between the initial raw score and the final factor score is called a ''factor loading''.

Advantages

* Both objective and subjective attributes can be used provided the subjective attributes can be converted into scores. * Factor analysis can identify latent dimensions or constructs that direct analysis may not. * It is easy and inexpensive.

Disadvantages

* Usefulness depends on the researchers' ability to collect a sufficient set of product attributes. If important attributes are excluded or neglected, the value of the procedure is reduced. * If sets of observed variables are highly similar to each other and distinct from other items, factor analysis will assign a single factor to them. This may obscure factors that represent more interesting relationships. * Naming factors may require knowledge of theory because seemingly dissimilar attributes can correlate strongly for unknown reasons.

In physical and biological sciences

Factor analysis has also been widely used in physical sciences such as

geochemistry Geochemistry is the science that uses the tools and principles of chemistry to explain the mechanisms behind major geological systems such as the Earth's crust and its oceans. The realm of geochemistry extends beyond the Earth, encompassing the e ...

, hydrochemistry,

astrophysics Astrophysics is a science that employs the methods and principles of physics and chemistry in the study of astronomical objects and phenomena. As one of the founders of the discipline, James Keeler, said, astrophysics "seeks to ascertain the ...

and

cosmology Cosmology () is a branch of physics and metaphysics dealing with the nature of the universe, the cosmos. The term ''cosmology'' was first used in English in 1656 in Thomas Blount's ''Glossographia'', with the meaning of "a speaking of the wo ...

, as well as biological sciences, such as

ecology Ecology () is the natural science of the relationships among living organisms and their Natural environment, environment. Ecology considers organisms at the individual, population, community (ecology), community, ecosystem, and biosphere lev ...

molecular biology Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...

neuroscience Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions, and its disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, ...

and

biochemistry Biochemistry, or biological chemistry, is the study of chemical processes within and relating to living organisms. A sub-discipline of both chemistry and biology, biochemistry may be divided into three fields: structural biology, enzymology, a ...

. In groundwater quality management, it is important to relate the spatial distribution of different chemical parameters to different possible sources, which have different chemical signatures. For example, a sulfide mine is likely to be associated with high levels of acidity, dissolved sulfates and transition metals. These signatures can be identified as factors through R-mode factor analysis, and the location of possible sources can be suggested by contouring the factor scores. In

, different factors can correspond to different mineral associations, and thus to mineralisation.

In microarray analysis

Factor analysis can be used for summarizing high-density

oligonucleotide Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, Recombinant DNA, research, and Forensic DNA, forensics. Commonly made in the laboratory by Oligonucleotide synthesis, solid-phase ...

DNA microarrays A DNA microarray (also commonly known as a DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to ...

data at probe level for

Affymetrix Affymetrix is now Applied Biosystems, a brand of DNA microarray products sold by Thermo Fisher Scientific that originated with an American biotechnology research and development and manufacturing company of the same name. The Santa Clara, Calif ...

GeneChips. In this case, the latent variable corresponds to the

RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...

concentration in a sample.

Implementation

Factor analysis has been implemented in several statistical analysis programs since the 1980s: * BMDP *

JMP (statistical software) JMP (pronounced "jump") is a suite of computer programs for statistical analysis and machine learning developed by JMP, a subsidiary of SAS Institute. The program was launched in 1989 to take advantage of the graphical user interface introduced ...

* Mplus (statistical software) *

Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...

: module

scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support ...

* R (with the base function ''factanal'' or ''fa'' function in package psych). Rotations are implemented in the ''GPArotation'' R package. * SAS (using PROC FACTOR or PROC CALIS) *

Stand-alone

*Facto

- free factor analysis software developed by the

Rovira i Virgili University University of Rovira i Virgili (; ) is located in the Catalan cities of Tarragona, Reus, Tortosa Vila-seca, el Vendrell and Vilafranca del Penedès (Spain). Its name is in honor of Antoni Rovira i Virgili. The Universitat Rovira i Virgili i ...

Notes

References

External links

A Beginner's Guide to Factor Analysis
* Exploratory Factor Analysis. A Book Manuscript by Tucker, L. & MacCallum R. (1993). Retrieved June 8, 2006, from

* Garson, G. David, "Factor Analysis," from ''Statnotes: Topics in Multivariate Analysis''. Retrieved on April 13, 2009, fro

* ttp://www.fa100.info/index.html Factor Analysis at 100— conference material

{{Authority control Latent variable models Quantitative marketing research Product management Market research Market segmentation

Statistical model

Definition

Example

Mathematical model of the same example

Geometric interpretation

Practical implementation

Types of factor analysis

Exploratory factor analysis

Confirmatory factor analysis

Types of factor extraction

Terminology

Criteria for determining the number of factors

Modern criteria

Older methods

Bayesian methods

Rotation methods

Orthogonal methods

Problems with factor rotation

Higher order factor analysis

Exploratory factor analysis (EFA) versus principal components analysis (PCA)

Arguments contrasting PCA and EFA

Variance versus covariance

Differences in procedure and results

In psychometrics

History

Applications in psychology

Advantages

Disadvantages

In cross-cultural research

In political science

In marketing

Information collection

Analysis

Advantages

Disadvantages

In physical and biological sciences

In microarray analysis

Implementation

Stand-alone

See also

Notes

References

Further reading

External links