Matrix factorization is a class of

collaborative filtering Collaborative filtering (CF) is, besides content-based filtering, one of two major techniques used by recommender systems.Francesco Ricci and Lior Rokach and Bracha ShapiraIntroduction to Recommender Systems Handbook, Recommender Systems Handbo ...

algorithms used in

recommender system A recommender system (RecSys), or a recommendation system (sometimes replacing ''system'' with terms such as ''platform'', ''engine'', or ''algorithm'') and sometimes only called "the algorithm" or "algorithm", is a subclass of information fi ...

s. Matrix factorization algorithms work by decomposing the user-item interaction

matrix Matrix (: matrices or matrixes) or MATRIX may refer to: Science and mathematics * Matrix (mathematics), a rectangular array of numbers, symbols or expressions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the m ...

into the product of two lower dimensionality rectangular matrices. This family of methods became widely known during the

Netflix prize The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users being identified ...

challenge due to its effectiveness as reported by Simon Funk in his 2006 blog post, where he shared his findings with the research community. The prediction results can be improved by assigning different regularization weights to the latent factors based on items' popularity and users' activeness.

Techniques

The idea behind matrix factorization is to represent users and items in a lower dimensional

latent space A latent space, also known as a latent feature space or embedding space, is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another. Position within the latent space can be viewed a ...

. Since the initial work by Funk in 2006 a multitude of matrix factorization approaches have been proposed for recommender systems. Some of the most used and simpler ones are listed in the following sections.

Funk MF

The original algorithm proposed by Simon Funk in his blog post factorized the user-item rating matrix as the product of two lower dimensional matrices, the first one has a row for each user, while the second has a column for each item. The row or column associated to a specific user or item is referred to as ''latent factors''. Note that, in Funk MF no

singular value decomposition In linear algebra, the singular value decomposition (SVD) is a Matrix decomposition, factorization of a real number, real or complex number, complex matrix (mathematics), matrix into a rotation, followed by a rescaling followed by another rota ...

is applied, it is a SVD-like machine learning model. The predicted ratings can be computed as

\tilde=H W

, where

\tilde \in \mathbb^

is the user-item rating matrix,

H \in \mathbb^

contains the user's latent factors and

W \in \mathbb^

the item's latent factors. Specifically, the predicted rating user ''u'' will give to item ''i'' is computed as: :

\tilde_ = \sum_^ H_W_

It is possible to tune the expressive power of the model by changing the number of latent factors. It has been demonstrated that a matrix factorization with one latent factor is equivalent to a ''most popular'' or ''top popular'' recommender (e.g. recommends the items with the most interactions without any personalization). Increasing the number of latent factors will improve personalization, therefore recommendation quality, until the number of factors becomes too high, at which point the model starts to

overfit mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...

and the recommendation quality will decrease. A common strategy to avoid overfitting is to add

regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) * Regularization (solid modeling) * Regularization Law, an Israeli law intended to retroactively legalize settlements See also ...

terms to the objective function. Funk MF was developed as a ''rating prediction'' problem, therefore it uses explicit numerical ratings as user-item interactions. All things considered, Funk MF minimizes the following objective function: :

\underset\, \, R - \tilde\, _ + \alpha\, H\,  + \beta\, W\,

Where

\, .\, _

is defined to be the

frobenius norm In the field of mathematics, norms are defined for elements within a vector space. Specifically, when the vector space comprises matrices, such norms are referred to as matrix norms. Matrix norms differ from vector norms in that they must also ...

whereas the other norms might be either frobenius or another norm depending on the specific recommending problem.

SVD++

While Funk MF is able to provide very good recommendation quality, its ability to use only explicit numerical ratings as user-items interactions constitutes a limitation. Modern day

recommender systems A recommender system (RecSys), or a recommendation system (sometimes replacing ''system'' with terms such as ''platform'', ''engine'', or ''algorithm'') and sometimes only called "the algorithm" or "algorithm", is a subclass of information fil ...

should exploit all available interactions both explicit (e.g. numerical ratings) and implicit (e.g. likes, purchases, skipped, bookmarked). To this end SVD++ was designed to take into account implicit interactions as well. Compared to Funk MF, SVD++ takes also into account user and item bias. The predicted rating user ''u'' will give to item ''i'' is computed as: :

\tilde_ = \mu + b_i + b_u + \sum_^ H_W_

Where

\mu

refers to the overall average rating over all items and

b_i

and

b_u

refers to the observed deviation of the item and the user respectively from the average. SVD++ has however some disadvantages, with the main drawback being that this method is not ''model-based.'' This means that if a new user is added, the algorithm is incapable of modeling it unless the whole model is retrained. Even though the system might have gathered some interactions for that new user, its latent factors are not available and therefore no recommendations can be computed. This is an example of a cold-start problem, that is the recommender cannot deal efficiently with new users or items and specific strategies should be put in place to handle this disadvantage. A possible way to address this cold start problem is to modify SVD++ in order for it to become a ''model-based'' algorithm, therefore allowing to easily manage new items and new users. As previously mentioned in SVD++ we don't have the latent factors of new users, therefore it is necessary to represent them in a different way. The user's latent factors represent the preference of that user for the corresponding item's latent factors, therefore user's latent factors can be estimated via the past user interactions. If the system is able to gather some interactions for the new user it is possible to estimate its latent factors. Note that this does not entirely solve the cold-start problem, since the recommender still requires some reliable interactions for new users, but at least there is no need to recompute the whole model every time. It has been demonstrated that this formulation is almost equivalent to a SLIM model, which is an item-item model based recommender. :

\tilde_ = \mu + b_i + b_u + \sum_^ \biggl( \sum_^ r_ W^T_ \biggr) W_

With this formulation, the equivalent item-item recommender would be

\tilde = R S = R W^ W

. Therefore the similarity matrix is symmetric.

Asymmetric SVD

Asymmetric SVD aims at combining the advantages of SVD++ while being a model based algorithm, therefore being able to consider new users with a few ratings without needing to retrain the whole model. As opposed to the model-based SVD here the user latent factor matrix H is replaced by Q, which learns the user's preferences as function of their ratings. The predicted rating user ''u'' will give to item ''i'' is computed as:

\tilde_ = \mu + b_i + b_u + \sum_^ \sum_^ r_ Q_W_

With this formulation, the equivalent item-item recommender would be

\tilde = R S = R Q^ W

. Since matrices Q and W are different the similarity matrix is asymmetric, hence the name of the model.

Group-specific SVD

A group-specific SVD can be an effective approach for the cold-start problem in many scenarios. It clusters users and items based on dependency information and similarities in characteristics. Then once a new user or item arrives, we can assign a group label to it, and approximates its latent factor by the group effects (of the corresponding group). Therefore, although ratings associated with the new user or item are not necessarily available, the group effects provide immediate and effective predictions. The predicted rating user ''u'' will give to item ''i'' is computed as: :

\tilde_ = \sum_^ (H_+S_)(W_+T_)

Here

v_u

and

j_i

represent the group label of user ''u'' and item ''i'', respectively, which are identical across members from the same group. And and are matrices of group effects. For example, for a new user

u_

whose latent factor

H_

is not available, we can at least identify their group label

v_

, and predict their ratings as: :

\tilde_ = \sum_^ S_(W_+T_)

This provides a good approximation to the unobserved ratings.

Hybrid MF

In recent years many other matrix factorization models have been developed to exploit the ever increasing amount and variety of available interaction data and use cases. Hybrid matrix factorization algorithms are capable of merging explicit and implicit interactions or both content and collaborative data

Deep-learning MF

In recent years a number of neural and deep-learning techniques have been proposed, some of which generalize traditional Matrix factorization algorithms via a non-linear neural architecture. While deep learning has been applied to many different scenarios (context-aware, sequence-aware, social tagging, etc.), its real effectiveness when used in a simple

Collaborative filtering Collaborative filtering (CF) is, besides content-based filtering, one of two major techniques used by recommender systems.Francesco Ricci and Lior Rokach and Bracha ShapiraIntroduction to Recommender Systems Handbook, Recommender Systems Handbo ...

scenario has been put into question. Systematic analysis of publications applying deep learning or neural methods to the top-k recommendation problem, published in top conferences (SIGIR, KDD, WWW, RecSys, IJCAI), has shown that on average less than 40% of articles are reproducible, with as little as 14% in some conferences. Overall the studies identify 26 articles, only 12 of them could be reproduced and 11 of them could be outperformed by much older and simpler properly tuned baselines. The articles also highlights a number of potential problems in today's research scholarship and call for improved scientific practices in that area. Similar issues have been spotted also in sequence-aware recommender systems.

References

{{reflist, 30em Collective intelligence Information systems Recommender systems