Functional Principal Component Analysis
   HOME

TheInfoList



OR:

Functional principal component analysis (FPCA) is a statistical method for investigating the dominant modes of variation of functional data. Using this method, a
random function In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a Indexed family, family of random variables. Stochastic processes are widely used as mathematical models of systems and phen ...
is represented in the eigenbasis, which is an
orthonormal In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal (or perpendicular along a line) unit vectors. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of un ...
basis of the
Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natural ...
''L''2 that consists of the eigenfunctions of the autocovariance operator. FPCA represents functional data in the most parsimonious way, in the sense that when using a fixed number of
basis functions In mathematics, a basis function is an element of a particular basis for a function space. Every function in the function space can be represented as a linear combination of basis functions, just as every vector in a vector space can be repres ...
, the eigenfunction basis explains more variation than any other basis expansion. FPCA can be applied for representing random functions, or in functional regression and classification.


Formulation

For a square-integrable
stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...
''X''(''t''), ''t'' ∈ 𝒯, let : \mu(t) = \text(X(t)) and : G(s, t) = \text(X(s), X(t)) = \sum_^\infty \lambda_k \varphi_k(s) \varphi_k(t), where \lambda_1 \geq \lambda_2 \geq ... \geq 0 are the eigenvalues and \varphi_1, \varphi_2, ... are the
orthonormal In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal (or perpendicular along a line) unit vectors. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of un ...
eigenfunctions of the linear
Hilbert–Schmidt operator In mathematics, a Hilbert–Schmidt operator, named after David Hilbert and Erhard Schmidt, is a bounded operator A \colon H \to H that acts on a Hilbert space H and has finite Hilbert–Schmidt norm \, A\, ^2_ \ \stackrel\ \sum_ \, Ae_i\, ^2_ ...
: G: L^2(\mathcal) \rightarrow L^2(\mathcal),\, G(f) = \int_\mathcal G(s, t) f(s) ds. By the Karhunen–Loève theorem, one can express the centered process in the eigenbasis, : X(t) - \mu(t) = \sum_^\infty \xi_k \varphi_k(t), where : \xi_k = \int_\mathcal (X(t) - \mu(t)) \varphi_k(t) dt is the principal component associated with the ''k''-th eigenfunction \varphi_k, with the properties : \text(\xi_k) = 0, \text(\xi_k) = \lambda_k \text \text(\xi_k \xi_l) = 0 \text k \ne l. The centered process is then equivalent to ''ξ''1, ''ξ''2, .... A common assumption is that ''X'' can be represented by only the first few eigenfunctions (after subtracting the mean function), i.e. : X(t) \approx X_m(t) = \mu(t) + \sum_^m \xi_k \varphi_k(t), where : \mathrm\left(\int_ \left( X(t) - X_m(t)\right)^2 dt\right) = \sum_ \lambda_j \rightarrow 0 \text m \rightarrow \infty .


Interpretation of eigenfunctions

The first eigenfunction \varphi_1 depicts the dominant mode of variation of ''X''. : \varphi_1 = \underset \left\, where : \Vert \mathbf \Vert = \left( \int_\mathcal \varphi(t)^2 dt \right)^. The ''k''-th eigenfunction \varphi_k is the dominant mode of variation orthogonal to \varphi_1, \varphi_2, ... , \varphi_, : \varphi_k = \underset \left\, where : \langle \varphi, \varphi_j \rangle = \int_\mathcal \varphi(t)\varphi_j(t) dt, \text j = 1, \dots, k-1.


Estimation

Let ''Y''''ij'' = ''X''''i''(''t''''ij'') + ε''ij'' be the observations made at locations (usually time points) ''t''''ij'', where ''X''''i'' is the ''i''-th realization of the smooth stochastic process that generates the data, and ε''ij'' are identically and independently distributed normal random variable with mean 0 and variance σ2, ''j'' = 1, 2, ..., ''m''''i''. To obtain an estimate of the mean function ''μ''(''t''''ij''), if a dense sample on a regular grid is available, one may take the average at each location ''t''''ij'': : \hat(t_) = \frac \sum_^n Y_. If the observations are sparse, one needs to smooth the data pooled from all observations to obtain the mean estimate, using smoothing methods like local linear smoothing or
spline smoothing Smoothing splines are function estimates, \hat f(x), obtained from a set of noisy observations y_i of the target f(x_i), in order to balance a measure of goodness of fit of \hat f(x_i) to y_i with a derivative based measure of the smoothness of \ ...
. Then the estimate of the covariance function \hat(s, t) is obtained by averaging (in the dense case) or smoothing (in the sparse case) the raw covariances : G_i(t_, t_) = (Y_ - \hat(t_)) (Y_ - \hat(t_)), j \neq l, i = 1, \dots, n. Note that the diagonal elements of ''G''''i'' should be removed because they contain measurement error. In practice, \hat(s, t) is discretized to an equal-spaced dense grid, and the estimation of eigenvalues ''λ''''k'' and eigenvectors ''v''''k'' is carried out by numerical linear algebra. The eigenfunction estimates \hat_k can then be obtained by
interpolating In the mathematics, mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points. In engineering and science, one ...
the eigenvectors \hat. The fitted covariance should be
positive definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite f ...
and symmetric and is then obtained as : \tilde(s, t) = \sum_ \hat_k \hat_k(s) \hat_k(t). Let \hat(t) be a smoothed version of the diagonal elements ''G''''i''(''tij, tij'') of the raw covariance matrices. Then \hat(t) is an estimate of (''G''(''t'', ''t'') + ''σ''2). An estimate of ''σ''2 is obtained by : \hat^2 = \frac \int_ (\hat(t) - \tilde(t, t)) dt, if \hat^2 > 0; otherwise \hat^2 = 0. If the observations ''X''''ij'', ''j''=1, 2, ..., ''mi'' are dense in 𝒯, then the ''k''-th FPC ''ξ''''k'' can be estimated by
numerical integration In analysis, numerical integration comprises a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations ...
, implementing : \hat_k = \langle X - \hat, \hat_k \rangle. However, if the observations are sparse, this method will not work. Instead, one can use best linear unbiased predictors, yielding : \hat_k = \hat_k \hat_k^T \hat_^(Y_i - \hat), where : \hat_ = \tilde + \hat^2 \mathbf_ , and \tilde is evaluated at the grid points generated by ''t''''ij'', ''j'' = 1, 2, ..., ''m''''i''. The algorithm, PACE, has an available Matlab package and R package Asymptotic convergence properties of these estimates have been investigated.


Applications

FPCA can be applied for displaying the modes of functional variation, in scatterplots of FPCs against each other or of responses against FPCs, for modeling sparse
longitudinal data In statistics and econometrics, panel data and longitudinal data are both multi-dimensional data involving measurements over time. Panel data is a subset of longitudinal data where observations are for the same subjects each time. Time series and ...
, or for functional regression and classification, e.g., functional linear regression. Scree plots and other methods can be used to determine the number of included components. Functional Principal component analysis has varied applications in time series analysis. Nowadays, this methodology is being adapted from traditional multi-variate techniques to carry out analysis on financial data sets such as stock market indices, generation of implied volatility graphs and so on.Functional Data Analysis with Applications in Finance by Michal Benko A very nice example of the advantages of the functional approach is the Smoothed FPCA (SPCA), proposed by Silverman
996 Year 996 ( CMXCVI) was a leap year starting on Wednesday (link will display the full calendar) of the Julian calendar. Events By place Japan * February - Chotoku Incident: Fujiwara no Korechika and Takaie shoot an arrow at Retired Em ...
and studied by Pezzulli and Silverman
993 Year 993 ( CMXCIII) was a common year starting on Sunday (link will display the full calendar) of the Julian calendar. Events By place Europe * Spring – The 12-year-old King Otto III gives the Sword of Saints Cosmas and Damian ...
that enables direct combination of the FPCA analysis together with a general smoothing approach that makes the use of the information stored in some linear differential operators possible. An important application of the FPCA already known from multivariate PCA, is motivated by the Karhunen-Loève decomposition of a random function to the set of functional parameters – factor functions and corresponding factor loadings (scalar random variables). This application is much more important than in the standard multivariate PCA since the distribution of the random function is in general too complex to be directly analyzed and the Karhunen-Loève decomposition reduces the analysis to the interpretation of the factor functions and the distribution of scalar random variables. Due to dimensionality reduction as well as its accuracy to represent data, there is a wide scope for further developments of functional principal component techniques in the financial field.


Connection with principal component analysis

The following table shows a comparison of various elements of
principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
(PCA) and FPCA. The two methods are both used for dimensionality reduction. In implementations, FPCA uses a PCA step. However, PCA and FPCA differ in some critical aspects. First, the order of multivariate data in PCA can be permuted, which has no effect on the analysis, but the order of functional data carries time or space information and cannot be reordered. Second, the spacing of observations in FPCA matters, while there is no spacing issue in PCA. Third, regular PCA does not work for high-dimensional data without
regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) * Regularization (solid modeling) * Regularization Law, an Israeli law intended to retroactively legalize settlements See also ...
, while FPCA has a built-in regularization due to the smoothness of the functional data and the truncation to a finite number of included components.


See also

*
Principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...


Notes


References

* {{cite book, author1=James O. Ramsay, author2=B. W. Silverman, title=Functional Data Analysis, url=https://books.google.com/books?id=mU3dop5wY_4C, date=8 June 2005, publisher=Springer, isbn=978-0-387-40080-8 Factor analysis Nonparametric statistics