A CUR matrix approximation is a set of three
matrices
Matrix (: matrices or matrixes) or MATRIX may refer to:
Science and mathematics
* Matrix (mathematics), a rectangular array of numbers, symbols or expressions
* Matrix (logic), part of a formula in prenex normal form
* Matrix (biology), the ...
that, when multiplied together, closely approximate a given matrix.
A CUR approximation can be used in the same way as the
low-rank approximation of the
singular value decomposition
In linear algebra, the singular value decomposition (SVD) is a Matrix decomposition, factorization of a real number, real or complex number, complex matrix (mathematics), matrix into a rotation, followed by a rescaling followed by another rota ...
(SVD). CUR approximations are less accurate than the SVD, but they offer two key advantages, both stemming from the fact that the rows and columns come from the original matrix (rather than left and right singular vectors):
* There are methods to calculate it with lower asymptotic time complexity versus the SVD.
* The matrices are more interpretable; The meanings of rows and columns in the decomposed matrix are essentially the same as their meanings in the original matrix.
Formally, a CUR matrix approximation of a matrix ''A'' is three matrices ''C'', ''U'', and ''R'' such that ''C'' is made from columns of ''A'', ''R'' is made from rows of ''A'', and that the product ''CUR'' closely approximates ''A''. Usually the CUR is selected to be a
rank-''k'' approximation, which means that ''C'' contains ''k'' columns of ''A'', ''R'' contains ''k'' rows of ''A'', and ''U'' is a ''k''-by-''k'' matrix. There are many possible CUR matrix approximations, and many CUR matrix approximations for a given rank.
The CUR matrix approximation is often used in place of the low-rank approximation of the SVD in
principal component analysis
Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.
The data is linearly transformed onto a new coordinate system such that th ...
. The CUR is less accurate, but the columns of the matrix ''C'' are taken from ''A'' and the rows of ''R'' are taken from ''A''. In PCA, each column of ''A'' contains a data sample; thus, the matrix ''C'' is made of a subset of data samples. This is much easier to interpret than the SVD's left singular vectors, which represent the data in a rotated space. Similarly, the matrix ''R'' is made of a subset of variables measured for each data sample. This is easier to comprehend than the SVD's right singular vectors, which are another rotations of the data in space.
Matrix CUR
Hamm and Aldroubi et al. describe the following theorem, which outlines a CUR decomposition of a matrix
with rank
:
Theorem: Consider row and column indices