HOME

TheInfoList



OR:

Non-negative matrix factorization (NMF or NNMF), also non-negative matrix approximation is a group of
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s in
multivariate analysis Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the dif ...
and
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...
where a
matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** ''The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchis ...
is factorized into (usually) two matrices and , with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically. NMF finds applications in such fields as
astronomy Astronomy () is a natural science that studies astronomical object, celestial objects and phenomena. It uses mathematics, physics, and chemistry in order to explain their origin and chronology of the Universe, evolution. Objects of interest ...
,
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
,
document clustering Document clustering (or text clustering) is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering. Overview Document cluster ...
, missing data imputation,
chemometrics Chemometrics is the science of extracting information from chemical systems by data-driven means. Chemometrics is inherently interdisciplinary, using methods frequently employed in core data-analytic disciplines such as multivariate statistics, a ...
,
audio signal processing Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting ...
,
recommender systems A recommender system, or a recommendation system (sometimes replacing 'system' with a synonym such as platform or engine), is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular u ...
, and
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
.


History

In
chemometrics Chemometrics is the science of extracting information from chemical systems by data-driven means. Chemometrics is inherently interdisciplinary, using methods frequently employed in core data-analytic disciplines such as multivariate statistics, a ...
non-negative matrix factorization has a long history under the name "self modeling curve resolution". In this framework the vectors in the right matrix are continuous curves rather than discrete vectors. Also early work on non-negative matrix factorizations was performed by a Finnish group of researchers in the 1990s under the name ''positive matrix factorization''. It became more widely known as ''non-negative matrix factorization'' after Lee and
Seung The ''sueng'' ( th, ซึง, Burmese: ၄ကြိုးထပ်ပို (ဆီုင်), , also spelled ''seung'' or ''süng'') is a plucked fretted lute from the northern (Lanna) region of Thailand. The instrument is made from hardwood ...
investigated the properties of the algorithm and published some simple and useful algorithms for two types of factorizations.


Background

Let matrix be the product of the matrices and , :\mathbf = \mathbf \mathbf \,. Matrix multiplication can be implemented as computing the column vectors of as linear combinations of the column vectors in using coefficients supplied by columns of . That is, each column of can be computed as follows: :\mathbf_i = \mathbf \mathbf_ \,, where is the -th column vector of the product matrix and is the -th column vector of the matrix . When multiplying matrices, the dimensions of the factor matrices may be significantly lower than those of the product matrix and it is this property that forms the basis of NMF. NMF generates factors with significantly reduced dimensions compared to the original matrix. For example, if is an matrix, is an matrix, and is a matrix then can be significantly less than both and . Here is an example based on a text-mining application: * Let the input matrix (the matrix to be factored) be with 10000 rows and 500 columns where words are in rows and documents are in columns. That is, we have 500 documents indexed by 10000 words. It follows that a column vector in represents a document. * Assume we ask the algorithm to find 10 features in order to generate a ''features matrix'' with 10000 rows and 10 columns and a ''coefficients matrix'' with 10 rows and 500 columns. * The product of and is a matrix with 10000 rows and 500 columns, the same shape as the input matrix and, if the factorization worked, it is a reasonable approximation to the input matrix . * From the treatment of matrix multiplication above it follows that each column in the product matrix is a linear combination of the 10 column vectors in the features matrix with coefficients supplied by the coefficients matrix . This last point is the basis of NMF because we can consider each original document in our example as being built from a small set of hidden features. NMF generates these features. It is useful to think of each feature (column vector) in the features matrix as a document archetype comprising a set of words where each word's cell value defines the word's rank in the feature: The higher a word's cell value the higher the word's rank in the feature. A column in the coefficients matrix represents an original document with a cell value defining the document's rank for a feature. We can now reconstruct a document (column vector) from our input matrix by a linear combination of our features (column vectors in ) where each feature is weighted by the feature's cell value from the document's column in .


Clustering property

NMF has an inherent clustering property, i.e., it automatically clusters the columns of input data \mathbf = (v_1, \dots, v_n) . More specifically, the approximation of \mathbf by \mathbf \simeq \mathbf\mathbf is achieved by finding W and H that minimize the error function (using the
Frobenius norm In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices (of given dimensions). Preliminaries Given a field K of either real or complex numbers, let K^ be the -vector space of matrices with m ro ...
) \left\, V - WH \right\, _F, subject to W \geq 0, H \geq 0., If we furthermore impose an orthogonality constraint on \mathbf , i.e. \mathbf\mathbf^T = I , then the above minimization is mathematically equivalent to the minimization of K-means clustering. Furthermore, the computed H gives the cluster membership, i.e., if \mathbf_ > \mathbf_ for all ''i'' ≠ ''k'', this suggests that the input data v_j belongs to k-th cluster. The computed W gives the cluster centroids, i.e., the k-th column gives the cluster centroid of k-th cluster. This centroid's representation can be significantly enhanced by convex NMF. When the orthogonality constraint \mathbf\mathbf^T = I is not explicitly imposed, the orthogonality holds to a large extent, and the clustering property holds too. Clustering is the main objective of most data mining applications of NMF. When the error function to be used is
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
, NMF is identical to the
probabilistic latent semantic analysis Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one ca ...
(PLSA), a popular document clustering method.


Types


Approximate non-negative matrix factorization

Usually the number of columns of and the number of rows of in NMF are selected so the product will become an approximation to . The full decomposition of then amounts to the two non-negative matrices and as well as a residual , such that: . The elements of the residual matrix can either be negative or positive. When and are smaller than they become easier to store and manipulate. Another reason for factorizing into smaller matrices and , is that if one is able to approximately represent the elements of by significantly less data, then one has to infer some latent structure in the data.


Convex non-negative matrix factorization

In standard NMF, matrix factor , i.e., can be anything in that space. Convex NMFC Ding, T Li, MI Jordan, Convex and semi-nonnegative matrix factorizations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 45-55, 2010 restricts the columns of to
convex combination In convex geometry and vector algebra, a convex combination is a linear combination of points (which can be vectors, scalars, or more generally points in an affine space) where all coefficients are non-negative and sum to 1. In other word ...
s of the input data vectors (v_1, \dots, v_n) . This greatly improves the quality of data representation of . Furthermore, the resulting matrix factor becomes more sparse and orthogonal.


Nonnegative rank factorization

In case the nonnegative rank of is equal to its actual rank, is called a nonnegative rank factorization (NRF). The problem of finding the NRF of , if it exists, is known to be NP-hard.


Different cost functions and regularizations

There are different types of non-negative matrix factorizations. The different types arise from using different cost functions for measuring the divergence between and and possibly by
regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) In physics, especially quantum field theory, regularization is a method of modifying observables which have singularities in ...
of the and/or matrices. Two simple divergence functions studied by Lee and Seung are the squared error (or
Frobenius norm In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices (of given dimensions). Preliminaries Given a field K of either real or complex numbers, let K^ be the -vector space of matrices with m ro ...
) and an extension of the Kullback–Leibler divergence to positive matrices (the original
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
is defined on probability distributions). Each divergence leads to a different NMF algorithm, usually minimizing the divergence using iterative update rules. The factorization problem in the squared error version of NMF may be stated as: Given a matrix \mathbf find nonnegative matrices W and H that minimize the function : F(\mathbf,\mathbf) = \left\, \mathbf - \mathbf \right\, ^2_F Another type of NMF for images is based on the
total variation norm In mathematics, the total variation identifies several slightly different concepts, related to the (local or global) structure of the codomain of a function or a measure. For a real-valued continuous function ''f'', defined on an interval ' ...
. When L1 regularization (akin to
Lasso A lasso ( or ), also called lariat, riata, or reata (all from Castilian, la reata 're-tied rope'), is a loop of rope designed as a restraint to be thrown around a target and tightened when pulled. It is a well-known tool of the Spanish an ...
) is added to NMF with the mean squared error cost function, the resulting problem may be called non-negative sparse coding due to the similarity to the
sparse coding Neural coding (or Neural representation) is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the individual or ensemble neuronal responses and the relationship among the electrical activit ...
problem, although it may also still be referred to as NMF.


Online NMF

Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where the data are provided in
streaming Streaming media is multimedia that is delivered and consumed in a continuous manner from a source, with little or no intermediate storage in network elements. ''Streaming'' refers to the delivery method of content, rather than the content it ...
fashion. One such use is for
collaborative filtering Collaborative filtering (CF) is a technique used by recommender systems.Francesco Ricci and Lior Rokach and Bracha ShapiraIntroduction to Recommender Systems Handbook Recommender Systems Handbook, Springer, 2011, pp. 1-35 Collaborative filtering ...
in recommendation systems, where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system. The cost function for optimization in these cases may or may not be the same as for standard NMF, but the algorithms need to be rather different.


Algorithms

There are several ways in which the and may be found: Lee and Seung's multiplicative update rule has been a popular method due to the simplicity of implementation. This algorithm is: :initialize: and non negative. :Then update the values in and by computing the following, with n as an index of the iteration. : \mathbf_^ \leftarrow \mathbf_^n \frac :and : \mathbf_^ \leftarrow \mathbf_^n \frac :Until and are stable. Note that the updates are done on an element by element basis not matrix multiplication. We note that the multiplicative factors for and , i.e. the \frac and terms, are matrices of ones when \mathbf = \mathbf \mathbf. More recently other algorithms have been developed. Some approaches are based on alternating
non-negative least squares In mathematical optimization, the problem of non-negative least squares (NNLS) is a type of constrained least squares problem where the coefficients are not allowed to become negative. That is, given a matrix and a (column) vector of response vari ...
: in each step of such an algorithm, first is fixed and found by a non-negative least squares solver, then is fixed and is found analogously. The procedures used to solve for and may be the same or different, as some NMF variants regularize one of and . Specific approaches include the projected
gradient descent In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
methods, the active set method, the optimal gradient method, and the block principal pivoting method among several others. Current algorithms are sub-optimal in that they only guarantee finding a local minimum, rather than a global minimum of the cost function. A provably optimal algorithm is unlikely in the near future as the problem has been shown to generalize the k-means clustering problem which is known to be
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
. However, as in many other data mining applications, a local minimum may still prove to be useful.


Sequential NMF

The sequential construction of NMF components ( and ) was firstly used to relate NMF with
Principal Component Analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
(PCA) in astronomy. The contribution from the PCA components are ranked by the magnitude of their corresponding eigenvalues; for NMF, its components can be ranked empirically when they are constructed one by one (sequentially), i.e., learn the (n+1)-th component with the first n components constructed. The contribution of the sequential NMF components can be compared with the Karhunen–Loève theorem, an application of PCA, using the plot of eigenvalues. A typical choice of the number of components with PCA is based on the "elbow" point, then the existence of the flat plateau is indicating that PCA is not capturing the data efficiently, and at last there exists a sudden drop reflecting the capture of random noise and falls into the regime of overfitting. For sequential NMF, the plot of eigenvalues is approximated by the plot of the fractional residual variance curves, where the curves decreases continuously, and converge to a higher level than PCA, which is the indication of less over-fitting of sequential NMF.


Exact NMF

Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix . A polynomial time algorithm for solving nonnegative rank factorization if contains a monomial sub matrix of rank equal to its rank was given by Campbell and Poole in 1981. Kalofolias and Gallopoulos (2012) solved the symmetric counterpart of this problem, where is symmetric and contains a diagonal principal sub matrix of rank r. Their algorithm runs in time in the dense case. Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) give a polynomial time algorithm for exact NMF that works for the case where one of the factors W satisfies a separability condition.


Relation to other techniques

In ''Learning the parts of objects by non-negative matrix factorization'' Lee and Seung proposed NMF mainly for parts-based decomposition of images. It compares NMF to
vector quantization Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression. It works by di ...
and
principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
, and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results. It was later shown that some types of NMF are an instance of a more general probabilistic model called "multinomial PCA". When NMF is obtained by minimizing the
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
, it is in fact equivalent to another instance of multinomial PCA,
probabilistic latent semantic analysis Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one ca ...
, trained by
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
estimation. That method is commonly used for analyzing and clustering textual data and is also related to the
latent class model In statistics, a latent class model (LCM) relates a set of observed (usually discrete) multivariate variables to a set of latent variables. It is a type of latent variable model. It is called a latent class model because the latent variable is dis ...
. NMF with the least-squares objective is equivalent to a relaxed form of K-means clustering: the matrix factor contains cluster centroids and contains cluster membership indicators.C. Ding, X. He, H.D. Simon (2005)
"On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering"
Proc. SIAM Int'l Conf. Data Mining, pp. 606-610. May 2005
This provides a theoretical foundation for using NMF for data clustering. However, k-means does not enforce non-negativity on its centroids, so the closest analogy is in fact with "semi-NMF". NMF can be seen as a two-layer directed graphical model with one layer of observed random variables and one layer of hidden random variables. NMF extends beyond matrices to tensors of arbitrary order. This extension may be viewed as a non-negative counterpart to, e.g., the
PARAFAC In multilinear algebra, the tensor rank decomposition or the rank-R decomposition of a tensor is the decomposition of a tensor in terms of a sum of minimum R rank-1 tensors. This is an open problem. Canonical polyadic decomposition (CPD) is a var ...
model. Other extensions of NMF include joint factorization of several data matrices and tensors where some factors are shared. Such models are useful for sensor fusion and relational learning. NMF is an instance of nonnegative
quadratic programming Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize (minimize or maximize) a multivariate quadratic function subject to linear constr ...
(
NQP Raku is a member of the Perl family of programming languages. Formerly known as Perl 6, it was renamed in October 2019. Raku introduces elements of many modern and historical languages. Compatibility with Perl was not a goal, though a compatibil ...
), just like the
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
(SVM). However, SVM and NMF are related at a more intimate level than that of NQP, which allows direct application of the solution algorithms developed for either of the two methods to problems in both domains.


Uniqueness

The factorization is not unique: A matrix and its inverse can be used to transform the two factorization matrices by, e.g., : \mathbf = \mathbf^\mathbf If the two new matrices \mathbf and \mathbf=\mathbf^\mathbf are
non-negative In mathematics, the sign of a real number is its property of being either positive, negative, or zero. Depending on local conventions, zero may be considered as being neither positive nor negative (having no sign or a unique third sign), or it ...
they form another parametrization of the factorization. The non-negativity of \mathbf and \mathbf applies at least if is a non-negative
monomial matrix In mathematics, a generalized permutation matrix (or monomial matrix) is a matrix with the same nonzero pattern as a permutation matrix, i.e. there is exactly one nonzero entry in each row and each column. Unlike a permutation matrix, where the n ...
. In this simple case it will just correspond to a scaling and a
permutation In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or proc ...
. More control over the non-uniqueness of NMF is obtained with sparsity constraints.


Applications


Astronomy

In astronomy, NMF is a promising method for
dimension reduction Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...
in the sense that astrophysical signals are non-negative. NMF has been applied to the spectroscopic observations and the direct imaging observations as a method to study the common properties of astronomical objects and post-process the astronomical observations. The advances in the spectroscopic observations by Blanton & Roweis (2007) takes into account of the uncertainties of astronomical observations, which is later improved by Zhu (2016) where missing data are also considered and
parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
is enabled. Their method is then adopted by Ren et al. (2018) to the direct imaging field as one of the
methods of detecting exoplanets Any planet is an extremely faint light source compared to its parent star. For example, a star like the Sun is about a billion times as bright as the reflected light from any of the planets orbiting it. In addition to the intrinsic difficulty of ...
, especially for the direct imaging of
circumstellar disks A circumstellar disc (or circumstellar disk) is a torus, pancake or ring-shaped accretion disk of matter composed of gas, dust, planetesimals, asteroids, or collision fragments in orbit around a star. Around the youngest stars, they are the re ...
. Ren et al. (2018) are able to prove the stability of NMF components when they are constructed sequentially (i.e., one by one), which enables the
linearity Linearity is the property of a mathematical relationship (''function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear r ...
of the NMF modeling process; the
linearity Linearity is the property of a mathematical relationship (''function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear r ...
property is used to separate the stellar light and the light scattered from the
exoplanets An exoplanet or extrasolar planet is a planet outside the Solar System. The first possible evidence of an exoplanet was noted in 1917 but was not recognized as such. The first confirmation of detection occurred in 1992. A different planet, init ...
and
circumstellar disks A circumstellar disc (or circumstellar disk) is a torus, pancake or ring-shaped accretion disk of matter composed of gas, dust, planetesimals, asteroids, or collision fragments in orbit around a star. Around the youngest stars, they are the re ...
. In direct imaging, to reveal the faint exoplanets and circumstellar disks from bright the surrounding stellar lights, which has a typical contrast from 10⁵ to 10¹⁰, various statistical methods have been adopted, however the light from the exoplanets or circumstellar disks are usually over-fitted, where forward modeling have to be adopted to recover the true flux. Forward modeling is currently optimized for point sources, however not for extended sources, especially for irregularly shaped structures such as circumstellar disks. In this situation, NMF has been an excellent method, being less over-fitting in the sense of the non-negativity and
sparsity In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse b ...
of the NMF modeling coefficients, therefore forward modeling can be performed with a few scaling factors, rather than a computationally intensive data re-reduction on generated models.


Data imputation

To impute missing data in statistics, NMF can take missing data while minimizing its cost function, rather than treating these missing data as zeros. This makes it a mathematically proven method for data imputation in statistics. By first proving that the missing data are ignored in the cost function, then proving that the impact from missing data can be as small as a second order effect, Ren et al. (2020) studied and applied such an approach for the field of astronomy. Their work focuses on two-dimensional matrices, specifically, it includes mathematical derivation, simulated data imputation, and application to on-sky data. The data imputation procedure with NMF can be composed of two steps. First, when the NMF components are known, Ren et al. (2020) proved that impact from missing data during data imputation ("target modeling" in their study) is a second order effect. Second, when the NMF components are unknown, the authors proved that the impact from missing data during component construction is a first-to-second order effect. Depending on the way that the NMF components are obtained, the former step above can be either independent or dependent from the latter. In addition, the imputation quality can be increased when the more NMF components are used, see Figure 4 of Ren et al. (2020) for their illustration.


Text mining

NMF can be used for
text mining Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
applications. In this process, a ''document-term'' matrix is constructed with the weights of various terms (typically weighted word frequency information) from a set of documents. This matrix is factored into a ''term-feature'' and a ''feature-document'' matrix. The features are derived from the contents of the documents, and the feature-document matrix describes
data cluster In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. Each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs) and 2048 bytes for CD-ROMs and DVD ...
s of related documents. One specific application used hierarchical NMF on a small subset of scientific abstracts from
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the ...
. Another research group clustered parts of the
Enron Enron Corporation was an American energy, commodities, and services company based in Houston, Texas. It was founded by Kenneth Lay in 1985 as a merger between Lay's Houston Natural Gas and InterNorth, both relatively small regional companies. ...
email dataset with 65,033 messages and 91,133 terms into 50 clusters. NMF has also been applied to citations data, with one example clustering
English Wikipedia The English Wikipedia is, along with the Simple English Wikipedia, one of two English-language editions of Wikipedia, an online encyclopedia. It was founded on January 15, 2001, as Wikipedia's first edition, and, as of , has the most arti ...
articles and
scientific journal In academic publishing, a scientific journal is a periodical publication intended to further the progress of science, usually by reporting new research. Content Articles in scientific journals are mostly written by active scientists such as s ...
s based on the outbound scientific citations in English Wikipedia. Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) have given polynomial-time algorithms to learn topic models using NMF. The algorithm assumes that the topic matrix satisfies a separability condition that is often found to hold in these settings. Hassani, Iranmanesh and Mansouri (2019) proposed a feature agglomeration method for term-document matrices which operates using NMF. The algorithm reduces the term-document matrix into a smaller matrix more suitable for text clustering.


Spectral data analysis

NMF is also used to analyze spectral data; one such use is in the classification of space objects and debris.


Scalable Internet distance prediction

NMF is applied in scalable Internet distance (round-trip time) prediction. For a network with N hosts, with the help of NMF, the distances of all the N^2 end-to-end links can be predicted after conducting only O(N) measurements. This kind of method was firstly introduced in Internet Distance Estimation Service (IDES). Afterwards, as a fully decentralized approach, Phoenix network coordinate system is proposed. It achieves better overall prediction accuracy by introducing the concept of weight.


Non-stationary speech denoising

Speech denoising has been a long lasting problem in
audio signal processing Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting ...
. There are many algorithms for denoising if the noise is stationary. For example, the
Wiener filter In signal processing, the Wiener filter is a filter used to produce an estimate of a desired or target random process by linear time-invariant ( LTI) filtering of an observed noisy process, assuming known stationary signal and noise spectra, and ...
is suitable for additive
Gaussian noise Gaussian noise, named after Carl Friedrich Gauss, is a term from signal processing theory denoting a kind of signal noise that has a probability density function (pdf) equal to that of the normal distribution (which is also known as the Gaussia ...
. However, if the noise is non-stationary, the classical denoising algorithms usually have poor performance because the statistical information of the non-stationary noise is difficult to estimate. Schmidt et al. use NMF to do speech denoising under non-stationary noise, which is completely different from classical statistical approaches. The key idea is that clean speech signal can be sparsely represented by a speech dictionary, but non-stationary noise cannot. Similarly, non-stationary noise can also be sparsely represented by a noise dictionary, but speech cannot. The algorithm for NMF denoising goes as follows. Two dictionaries, one for speech and one for noise, need to be trained offline. Once a noisy speech is given, we first calculate the magnitude of the Short-Time-Fourier-Transform. Second, separate it into two parts via NMF, one can be sparsely represented by the speech dictionary, and the other part can be sparsely represented by the noise dictionary. Third, the part that is represented by the speech dictionary will be the estimated clean speech.


Population genetics

Sparse NMF is used in
Population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and pop ...
for estimating individual admixture coefficients, detecting genetic clusters of individuals in a population sample or evaluating
genetic admixture Genetic admixture occurs when previously diverged or isolated genetic lineages mix.⅝ Admixture results in the introduction of new genetic lineages into a population. Examples Climatic cycles facilitate genetic admixture in cold periods and gene ...
in sampled genomes. In human genetic clustering, NMF algorithms provide estimates similar to those of the computer program STRUCTURE, but the algorithms are more efficient computationally and allow analysis of large population genomic data sets.


Bioinformatics

NMF has been successfully applied in
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
for clustering
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
and
DNA methylation DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter, DNA methylation typically acts t ...
data and finding the genes most representative of the clusters. In the analysis of cancer mutations it has been used to identify common patterns of mutations that occur in many cancers and that probably have distinct causes. NMF techniques can identify sources of variation such as cell types, disease subtypes, population stratification, tissue composition, and tumor clonality. A particular variant of NMF, namely Non-Negative Matrix Tri-Factorization (NMTF), has been use for drug repurposing tasks in order to predict novel protein targets and therapeutic indications for approved drugs and to infer pair of synergic anticancer drugs.


Nuclear imaging

NMF, also referred in this field as factor analysis, has been used since the 1980s to analyze sequences of images in
SPECT Single-photon emission computed tomography (SPECT, or less commonly, SPET) is a nuclear medicine tomographic imaging technique using gamma rays. It is very similar to conventional nuclear medicine planar imaging using a gamma camera (that is, ...
and
PET A pet, or companion animal, is an animal kept primarily for a person's company or entertainment rather than as a working animal, livestock, or a laboratory animal. Popular pets are often considered to have attractive appearances, intelligence, ...
dynamic medical imaging. Non-uniqueness of NMF was addressed using sparsity constraints.


Current research

Current research (since 2010) in nonnegative matrix factorization includes, but is not limited to, # Algorithmic: searching for global minima of the factors and factor initialization. # Scalability: how to factorize million-by-billion matrices, which are commonplace in Web-scale data mining, e.g., see Distributed Nonnegative Matrix Factorization (DNMF), Scalable Nonnegative Matrix Factorization (ScalableNMF), Distributed Stochastic Singular Value Decomposition. # Online: how to update the factorization when new data comes in without recomputing from scratch, e.g., see online CNSC # Collective (joint) factorization: factorizing multiple interrelated matrices for multiple-view learning, e.g. multi-view clustering, see CoNMF and MultiNMF # Cohen and Rothblum 1993 problem: whether a rational matrix always has an NMF of minimal inner dimension whose factors are also rational. Recently, this problem has been answered negatively.


See also

*
Multilinear algebra Multilinear algebra is a subfield of mathematics that extends the methods of linear algebra. Just as linear algebra is built on the concept of a vector and develops the theory of vector spaces, multilinear algebra builds on the concepts of ''p' ...
*
Multilinear subspace learning Multilinear subspace learning is an approach to dimensionality reduction.M. A. O. Vasilescu, D. Terzopoulos (2003"Multilinear Subspace Analysis of Image Ensembles" "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVP ...
*
Tensor In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. Tensors may map between different objects such as vectors, scalars, and even other tenso ...
*
Tensor decomposition In multilinear algebra, a tensor decomposition is any scheme for expressing a "data tensor" (M-way array) as a sequence of elementary operations acting on other, often simpler tensors. Many tensor decompositions generalize some matrix decompositi ...
*
Tensor software Tensor software is a class of mathematical software designed for manipulation and calculation with tensors. Standalone software * SPLATT is an open source software package for high-performance sparse tensor factorization. SPLATT ships a stand-alo ...


Sources and external links


Notes


Others

* * * * * * * * * Andrzej Cichocki, Morten Mrup, et al.: "Advances in Nonnegative Matrix and Tensor Factorization", Hindawi Publishing Corporation, (2008). * Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan and Shun-ichi Amari: "Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation", Wiley, (2009). * Andri Mirzal: "Nonnegative Matrix Factorizations for Clustering and LSI: Theory and Programming", LAP LAMBERT Academic Publishing, (2011). * Yong Xiang: "Blind Source Separation: Dependent Component Analysis", Springer, (2014). * Ganesh R. Naik(Ed.): "Non-negative Matrix Factorization Techniques: Advances in Theory and Applications", Springer, (2016). * Julian Becker: "Nonnegative Matrix Factorization with Adaptive Elements for Monaural Audio Source Separation: 1 ", Shaker Verlag GmbH, Germany, (2016). * Jen-Tzung Chien: "Source Separation and Machine Learning", Academic Press, (2018). * Shoji Makino(Ed.): "Audio Source Separation", Springer, (2019). * Nicolas Gillis: "Nonnegative Matrix Factorization", SIAM, ISBN 978-1-611976-40-3 (2020). {{Scholia, topic Linear algebra Matrix theory Machine learning algorithms factorization