applied mathematics Applied mathematics is the application of mathematical methods by different fields such as physics, engineering, medicine, biology, finance, business, computer science, and industry. Thus, applied mathematics is a combination of mathematical s ...

, K-SVD is a

dictionary learning Sparse coding is a representation learning method which aims at finding a sparse representation of the input data (also known as sparse coding) in the form of a linear combination of basic elements as well as those basic elements themselves. These ...

algorithm for creating a dictionary for sparse representations, via a

singular value decomposition In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is related ...

approach. K-SVD is a generalization of the

k-means clustering ''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or ...

method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary, and updating the atoms in the dictionary to better fit the data. It is structurally related to the

expectation maximization Expectation or Expectations may refer to: Science * Expectation (epistemic) * Expected value, in mathematical probability theory * Expectation value (quantum mechanics) * Expectation–maximization algorithm, in statistics Music * ''Expectation' ...

(EM) algorithm. K-SVD can be found widely in use in applications such as image processing, audio processing, biology, and document analysis.

K-SVD algorithm

K-SVD is a kind of generalization of K-means, as follows. The

can be also regarded as a method of sparse representation. That is, finding the best possible codebook to represent the data samples

\^M_

by nearest neighbor, by solving :

\quad  \min \limits _  \ \qquad \text \forall i, x_i = e_k \text k.

which is equivalent to :

\quad  \min \limits _ \ \qquad \text\quad \forall i ,  \, x_i\, _0 = 1

. The letter F denotes the

Frobenius norm In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices (of given dimensions). Preliminaries Given a field K of either real or complex numbers, let K^ be the -vector space of matrices with m rows ...

. The sparse representation term

x_i = e_k

enforces K-means algorithm to use only one atom (column) in dictionary

D

. To relax this constraint, the target of the K-SVD algorithm is to represent signal as a linear combination of atoms in

D

. The K-SVD algorithm follows the construction flow of the K-means algorithm. However, in contrary to K-means, in order to achieve a linear combination of atoms in

D

, the sparsity term of the constraint is relaxed so that the number of nonzero entries of each column

x_i

can be more than 1, but less than a number

T_0

. So, the objective function becomes :

\quad  \min \limits _ \ \qquad \text \quad \forall i \;,  \, x_i\, _0 \le T_0.

or in another objective form :

\quad  \min \limits _  \sum_ \, x_i\, _0  \qquad \text \quad \forall i \;,  \, Y - DX\, ^2_F \le \epsilon.

In the K-SVD algorithm, the

D

is first fixed and the best coefficient matrix

X

is found. As finding the truly optimal

X

is impossible, we use an approximation pursuit method. Any algorithm such as OMP, the orthogonal

matching pursuit Matching pursuit (MP) is a sparse approximation algorithm which finds the "best matching" projections of multidimensional data onto the span of an over-complete (i.e., redundant) dictionary D. The basic idea is to approximately represent a signal ...

can be used for the calculation of the coefficients, as long as it can supply a solution with a fixed and predetermined number of nonzero entries

T_0

. After the sparse coding task, the next is to search for a better dictionary

D

. However, finding the whole dictionary all at a time is impossible, so the process is to update only one column of the dictionary

D

each time, while fixing

X

. The update of the

k

-th column is done by rewriting the penalty term as :

\, Y - DX\, ^2_F =  \left\,  Y - \sum_^K d_j x^T_j\right\, ^2_F = \left\,  \left(Y - \sum_ d_j x^T_j \right)  -  d_k x^T_k \right\, ^2_F = \,  E_k - d_k x^T_k\, ^2_F

where

x_k^T

denotes the ''k''-th row of ''X''. By decomposing the multiplication

DX

into sum of

K

rank 1 matrices, we can assume the other

K-1

terms are assumed fixed, and the

k

-th remains unknown. After this step, we can solve the minimization problem by approximate the

E_k

term with a

rank -1

matrix using

, then update

d_k

with it. However, the new solution of vector

x^T_k

is very likely to be filled, because the sparsity constraint is not enforced. To cure this problem, Define

\omega_k

as :

\omega_k = \,

which points to examples

\_^N

that use atom

d_k

(also the entries of

x_i

that is nonzero). Then, define

\Omega_k

as a matrix of size

N\times, \omega_k,

, with ones on the

(i,\omega_k(i))\text,

entries and zeros otherwise. When multiplying

\tilde^T_k = x^T_k\Omega_k

, this shrinks the row vector

x^T_k

by discarding the zero entries. Similarly, the multiplication

\tilde_k = Y\Omega_k

is the subset of the examples that are current using the

d_k

atom. The same effect can be seen on

\tilde_k = E_k\Omega_k

. So the minimization problem as mentioned before becomes :

\,  E_k\Omega_k - d_k x^T_k\Omega_k\, ^2_F =  \,  \tilde_k - d_k \tilde^T_k\, ^2_F

and can be done by directly using SVD. SVD decomposes

\tilde_k

into

U\Delta V^T

. The solution for

d_k

is the first column of U, the coefficient vector

\tilde^T_k

as the first column of

V \times \Delta (1, 1)

. After updating the whole dictionary, the process then turns to iteratively solve X, then iteratively solve D.

Limitations

Choosing an appropriate "dictionary" for a dataset is a non-convex problem, and K-SVD operates by an iterative update which does not guarantee to find the global optimum. However, this is common to other algorithms for this purpose, and K-SVD works fairly well in practice.{{better source, date=May 2014

References

Norms (mathematics) Linear algebra Cluster analysis algorithms

K-SVD algorithm

Limitations

See also

References