In statistics, confirmatory composite analysis (CCA) is a sub-type of

structural equation modeling Structural equation modeling (SEM) is a label for a diverse set of methods used by scientists in both experimental and observational research across the sciences, business, and other fields. It is used most in the social and behavioral scienc ...

(SEM). Although, historically, CCA emerged from a re-orientation and re-start of

partial least squares path modeling The partial least squares path modeling or partial least squares structural equation modeling (PLS-PM, PLS-SEM) is a method for structural equation modeling that allows estimation of complex cause-effect relationships in path models with latent va ...

(PLS-PM), it has become an independent approach and the two should not be confused. In many ways it is similar to, but also quite distinct from

confirmatory factor analysis In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social science research.Kline, R. B. (2010). ''Principles and practice of structural equation modeling (3rd ed.).'' New York, New York: Gu ...

(CFA). It shares with CFA the process of model specification, model identification, model estimation, and model assessment. However, in contrast to CFA which always assumes the existence of

latent variable In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...

s, in CCA all variables can be observable, with their interrelationships expressed in terms of composites, i.e., linear compounds of subsets of the variables. The composites are treated as the fundamental objects and path diagrams can be used to illustrate their relationships. This makes CCA particularly useful for disciplines examining theoretical concepts that are designed to attain certain goals, so-called artifacts, and their interplay with theoretical concepts of behavioral sciences.

Development

The initial idea of CCA was sketched by Theo K. Dijkstra and Jörg Henseler in 2014. The scholarly publishing process took its time until the first full description of CCA was published by Florian Schuberth, Jörg Henseler and Theo K. Dijkstra in 2018. As common for statistical developments, interim developments of CCA were shared with the scientific community in written form. Moreover, CCA was presented at several conferences including the 5th Modern Modeling Methods Conference, the 2nd International Symposium on Partial Least Squares Path Modeling, the 5th CIM Community Workshop, and the Meeting of the SEM Working Group in 2018.

Statistical model

A composite is typically a linear combination of observable random variables. However, also so-called second-order composites as linear combinations of latent variables and composites, respectively, are conceivable. For a random column vector

\mathbf

of observable variables that is partitioned into sub-vectors

\mathbf_i

, composites can be defined as weighted linear combinations. So the ''i''-th composite

c_i

equals: :

c_i= \mathbf_i'\mathbf_i

, where the weights of each composite are appropriately normalized (see Confirmatory composite analysis#Model identification). In the following, it is assumed that the weights are scaled in such a way that each composite has a variance of one, i.e.,

\mathbf_i' \mathbf_ \mathbf_i

. Moreover, it is assumed that the observable random variables are standardized having a mean of zero and a unit variance. Generally, the variance-covariance matrices

\mathbf_

of the sub-vectors are not constrained beyond being positive definite. Similar to the latent variables of a factor model, the composites explain the covariances between the sub-vectors leading to the following inter-block covariance matrix: :

\mathbf_=\rho_ \mathbf_\mathbf_i (\mathbf_ \mathbf_j)'

, where

\rho_

is the correlation between the composites

c_j

and

c_i

. The composite model imposes rank one constraints on the inter-block covariance matrices

\mathbf_

, i.e.,

\text(\mathbf_)=1

. Generally, the variance-covariance matrix of

\mathbf

is positive definite iff the correlation matrix of the composites

\mathbf:=(\rho_)

and the variance-covariance matrices

\mathbf_

's are both positive definite. In addition, the composites can be related via a structural model which constrains the correlation matrix

\mathbf

indirectly via a set of

simultaneous equations In mathematics, a set of simultaneous equations, also known as a system of equations or an equation system, is a finite set of equations for which common solutions are sought. An equation system is usually classified in the same manner as single ...

: :

\mathbf \mathbf_=\mathbf \mathbf_+\mathbf

, where the vector

\mathbf

is partitioned in an exogenous and an endogenous part, and the matrices

\mathbf

and

\mathbf

contain the so-called path (and feedback) coefficients. Moreover, the vector

\mathbf

contains the structural error terms having a zero mean and being uncorrelated with

\mathbf_

. As the model needs not to be recursive, the matrix

\mathbf

is not necessarily triangular and the elements of

\mathbf

may be correlated.

Model identification

To ensure

identification Identification or identify may refer to: *Identity document, any document used to verify a person's identity Arts, entertainment and media * ''Identify'' (album) by Got7, 2014 * "Identify" (song), by Natalie Imbruglia, 1999 *Identification (a ...

of the composite model, each composite must be correlated with at least one variable not forming the composite. Additionally to this non-isolation condition, each composite needs to be normalized, e.g., by fixing one weight per composite, the length of each weight vector, or the composite’s variance to a certain value. If the composites are embedded in a structural model, also the structural model needs to be identified. Finally, since the weight signs are still undetermined, it is recommended to select a dominant indicator per block of indicators that dictates the orientation of the composite. The

degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...

of the basic composite model, i.e., with no constraints imposed on the composites' correlation matrix

\mathbf

, are calculated as follows:

Model estimation

To estimate the parameters of a composite model, various methods that create composites can be used such as approaches to

generalized canonical correlation In statistics, the generalized canonical correlation analysis (gCCA), is a way of making sense of cross-correlation matrices between the sets of random variables when there are more than two sets. While a conventional CCA generalizes principal com ...

principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...

, and

linear discriminant analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...

. Moreover, a maximum-likelihood estimator and composite-based methods for SEM such as

and generalized structured component analysis can be employed to estimate weights and the correlations among the composites.

Evaluating model fit

In CCA, the model fit, i.e., the discrepancy between the estimated model-implied variance-covariance matrix

\hat

and its sample counterpart

\mathbf

, can be assessed in two non-exclusive ways. On the one hand, measures of fit can be employed; on the other hand, a test for overall model fit can be used. While the former relies on heuristic rules, the latter is based on statistical inferences. Fit measures for composite models comprises statistics such as the standardized root mean square residual (SRMR), and the root mean squared error of outer residuals (RMS

_

) In contrast to fit measures for common factor models, fit measures for composite models are relatively unexplored and reliable thresholds still need to be determined. To assess the overall model fit by means of statistical testing, the bootstrap test for overall model fit, also known as Bollen-Stine bootstrap test, can be used to investigate whether a composite model fits to the data.

Alternative views on CCA

Besides the originally proposed CCA, the evaluation steps known from partial least squares structural equation modeling (PLS-SEM) are dubbed CCA. It is emphasized that PLS-SEM's evaluation steps, in the following called PLS-CCA, differ from CCA in many regards:. (i) While PLS-CCA aims at conforming reflective and formative measurement models, CCA aims at assessing composite models; (ii) PLS-CCA omits overall model fit assessment, which is a crucial step in CCA as well as SEM; (iii) PLS-CCA is strongly linked to PLS-PM, while for CCA PLS-PM can be employed as one estimator, but this is in no way mandatory. Hence, researchers who employ need to be aware to which technique they are referring to.

References

{{reflist Structural equation models