CUR matrix approximation
   HOME

TheInfoList



OR:

A CUR matrix approximation is a set of three
matrices Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** ''The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchis ...
that, when multiplied together, closely approximate a given matrix. A CUR approximation can be used in the same way as the
low-rank approximation In mathematics, low-rank approximation is a mathematical optimization, minimization problem, in which the Loss function, cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subjec ...
of the
singular value decomposition In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is re ...
(SVD). CUR approximations are less accurate than the SVD, but they offer two key advantages, both stemming from the fact that the rows and columns come from the original matrix (rather than left and right singular vectors): * There are methods to calculate it with lower asymptotic time complexity versus the SVD. * The matrices are more interpretable; The meanings of rows and columns in the decomposed matrix are essentially the same as their meanings in the original matrix. Formally, a CUR matrix approximation of a matrix ''A'' is three matrices ''C'', ''U'', and ''R'' such that ''C'' is made from columns of ''A'', ''R'' is made from rows of ''A'', and that the product ''CUR'' closely approximates ''A''. Usually the CUR is selected to be a
rank Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as: Level or position in a hierarchical organization * Academic rank * Diplomatic rank * Hierarchy * ...
-''k'' approximation, which means that ''C'' contains ''k'' columns of ''A'', ''R'' contains ''k'' rows of ''A'', and ''U'' is a ''k''-by-''k'' matrix. There are many possible CUR matrix approximations, and many CUR matrix approximations for a given rank. The CUR matrix approximation is often used in place of the low-rank approximation of the SVD in principal component analysis. The CUR is less accurate, but the columns of the matrix ''C'' are taken from ''A'' and the rows of ''R'' are taken from ''A''. In PCA, each column of ''A'' contains a data sample; thus, the matrix ''C'' is made of a subset of data samples. This is much easier to interpret than the SVD's left singular vectors, which represent the data in a rotated space. Similarly, the matrix ''R'' is made of a subset of variables measured for each data sample. This is easier to comprehend than the SVD's right singular vectors, which are another rotations of the data in space.


Mathematical definition

Hamm and Huang gives the following theorem describing the basics of a CUR decomposition of a matrix L with rank r: Theorem: Consider row and column indices I, J \subseteq /math> with , I, , , J, \ge r. Denote submatrices C = L_, U = L_ and R = L_. If rank(U) = rank(L), then L = CU^+R, where (\cdot)^+ denotes the Moore–Penrose pseudoinverse. In other words, if L has low rank, we can take a sub-matrix U = L_ of the same rank, together with some rows R and columns C of L and use them to reconstruct L.


Algorithms

The CUR matrix approximation is not unique and there are multiple algorithms for computing one. One is ALGORITHMCUR. The "Linear Time CUR" algorithm simply picks ''J'' by sampling columns randomly (with replacement) with probability proportional to the squared column norms, \, L_\, _2^2; and similarly sampling ''I'' proportional to the squared row norms, \, L_\, _2^2. The authors show that taking , J, \approx k /\varepsilon^4 and , I, \approx k / \varepsilon^2 where 0 \le \varepsilon, the algorithm achieves Frobenius error bound \, A - CUR\, _F \le \, A - A_k\, _F + \varepsilon \, A\, _F, where A_k is the optimal rank ''k'' approximation.


Tensor

Tensor-CURT decomposition is a generalization of matrix-CUR decomposition. Formally, a CURT tensor approximation of a tensor ''A'' is three matrices and a (core-)tensor ''C'', ''R'', ''T'' and ''U'' such that ''C'' is made from columns of ''A'', ''R'' is made from rows of ''A'', ''T'' is made from tubes of ''A'' and that the product ''U(C,R,T)'' (where the i,j,l-th entry of it is \sum_U_C_R_T_{l,l'} ) closely approximates ''A''. Usually the CURT is selected to be a
rank Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as: Level or position in a hierarchical organization * Academic rank * Diplomatic rank * Hierarchy * ...
-''k'' approximation, which means that ''C'' contains ''k'' columns of ''A'', ''R'' contains ''k'' rows of ''A'', ''T'' contains tubes of ''A'' and ''U'' is a ''k''-by-''k''-by-''k'' (core-)tensor.


See also

*
dimensionality reduction Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...


References

Matrices