A recommender system, or a recommendation system (sometimes replacing 'system' with a synonym such as platform or engine), is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. Typically, the suggestions refer to various decision-making processes, such as what product to purchase, what music to listen to, or what online news to read. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer. Recommender systems are used in a variety of areas, with commonly recognised examples taking the form of

playlist A playlist is a list of video or audio files that can be played back on a media player either sequentially or in a shuffled order. In its most general form, an audio playlist is simply a list of songs, but sometimes a loop. The term has sev ...

generators for video and music services, product recommenders for online stores, or content recommenders for social media platforms and open web content recommenders.Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Bosagh Zade
WTF:The who-to-follow system at Twitter
Proceedings of the 22nd international conference on World Wide Web These systems can operate using a single input, like music, or multiple inputs within and across platforms like news, books and search queries. There are also popular recommender systems for specific topics like restaurants and

online dating Online dating, also known as Internet dating, Virtual dating, or Mobile app dating, is a relatively recent method used by people with a goal of searching for and interacting with potential romantic or sexual partners, via the internet. An onlin ...

. Recommender systems have also been developed to explore research articles and experts,H. Chen, A. G. Ororbia II, C. L. Gile
ExpertSeer: a Keyphrase Based Expert Recommender for Digital Libraries
in arXiv preprint 2015 collaborators,H. Chen, L. Gou, X. Zhang, C. Gile
Collabseer: a search engine for collaboration discovery
in ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2011 and financial services.Alexander Felfernig, Klaus Isak, Kalman Szabo, Peter Zachar
The VITA Financial Services Sales Support Environment
in AAAI/IAAI 2007, pp. 1692-1699, Vancouver, Canada, 2007.

Overview

Recommender systems usually make use of either or both

collaborative filtering Collaborative filtering (CF) is a technique used by recommender systems.Francesco Ricci and Lior Rokach and Bracha ShapiraIntroduction to Recommender Systems Handbook Recommender Systems Handbook, Springer, 2011, pp. 1-35 Collaborative filtering ...

and content-based filtering (also known as the personality-based approach), as well as other systems such as

knowledge-based systems A knowledge-based system (KBS) is a computer program that reasons and uses a knowledge base to solve complex problems. The term is broad and refers to many different kinds of systems. The one common theme that unites all knowledge based systems i ...

. Collaborative filtering approaches build a model from a user's past behavior (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items (or ratings for items) that the user may have an interest in.Prem Melville and Vikas Sindhwani
Recommender Systems
Encyclopedia of Machine Learning, 2010. Content-based filtering approaches utilize a series of discrete, pre-tagged characteristics of an item in order to recommend additional items with similar properties. We can demonstrate the differences between collaborative and content-based filtering by comparing two early music recommender systems –

Last.fm Last.fm is a music website founded in the United Kingdom in 2002. Using a music recommender system called "Audioscrobbler", Last.fm builds a detailed profile of each user's musical taste by recording details of the tracks the user listens to, e ...

and Pandora Radio. * Last.fm creates a "station" of recommended songs by observing what bands and individual tracks the user has listened to on a regular basis and comparing those against the listening behavior of other users. Last.fm will play tracks that do not appear in the user's library, but are often played by other users with similar interests. As this approach leverages the behavior of users, it is an example of a collaborative filtering technique. * Pandora uses the properties of a song or artist (a subset of the 400 attributes provided by the Music Genome Project) to seed a "station" that plays music with similar properties. User feedback is used to refine the station's results, deemphasizing certain attributes when a user "dislikes" a particular song and emphasizing other attributes when a user "likes" a song. This is an example of a content-based approach. Each type of system has its strengths and weaknesses. In the above example, Last.fm requires a large amount of information about a user to make accurate recommendations. This is an example of the cold start problem, and is common in collaborative filtering systems. Whereas Pandora needs very little information to start, it is far more limited in scope (for example, it can only make recommendations that are similar to the original seed). Recommender systems are a useful alternative to

search algorithm In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with eith ...

s since they help users discover items they might not have found otherwise. Of note, recommender systems are often implemented using search engines indexing non-traditional data. Recommender systems have been the focus of several granted patents.

History

Elaine Rich Elaine Alice Rich is an American computer scientist, known for her textbooks on artificial intelligence and automata theory and for her research on user modeling. She is retired as a distinguished senior lecturer from the University of Texas at A ...

created 1979 unknowingly the first recommender systems Grundy. She looked for a way to recommend a user a book he might like. Her idea was to create a system that asks the user specific questions and assigns him stereotypes depending on his answers. Depending on a users stereotype, he would then get a recommendation for a book he might like. The first actual mention of recommender System was in a technical report as a "digital bookshelf" in 1990 by

Jussi Karlgren Jussi Karlgren is a Swedish computational linguist, research scientist at Spotify, and co-founder of text analytics company Gavagai AB. He holds a PhD in computational linguistics from Stockholm University, and the title of docent (adjoint prof ...

at Columbia University, and implemented at scale and worked through in technical reports and publications from 1994 onwards by Jussi Karlgren, then at

SICS RISE SICS (previously Swedish Institute of Computer Science) is a leading research institute for applied information and communication technology in Sweden, founded in 1985. It explores the digitalization of products, services and businesses. In ...

,a and research groups led by

Pattie Maes Pattie Maes (born 1961) is a professor in MIT's program in Media Arts and Sciences. She founded and directed the MIT Media Lab's Fluid Interfaces Group. Previously, she founded and ran the Software Agents group. She served for several years as ...

at MIT, Will Hill at Bellcore, and Paul Resnick, also at MIT Resnick, Paul, and Hal R. Varian. "Recommender systems." Communications of the ACM 40, no. 3 (1997): 56-58. whose work with GroupLens was awarded the 2010

ACM Software Systems Award The ACM Software System Award is an annual award that honors people or an organization "for developing a software system that has had a lasting influence, reflected in contributions to concepts, in commercial acceptance, or both". It is awarded b ...

. Montaner provided the first overview of recommender systems from an intelligent agent perspective. Adomavicius provided a new, alternate overview of recommender systems.. Herlocker provides an additional overview of evaluation techniques for recommender systems, and

Beel A beel ( Bengali and Assamese: বিল) is a billabong or a lake-like wetland with static water (as opposed to moving water in rivers and canals - typically called in Bengali, in the Ganges - Brahmaputra flood plains of the Eastern Indian ...

et al. discussed the problems of offline evaluations. Beel et al. have also provided literature surveys on available research paper recommender systems and existing challenges.

Approaches

Collaborative filtering

One approach to the design of recommender systems that has wide use is

. Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. The system generates recommendations using only information about rating profiles for different users or items. By locating peer users/items with a rating history similar to the current user or item, they generate recommendations using this neighborhood. Collaborative filtering methods are classified as memory-based and model-based. A well-known example of memory-based approaches is the user-based algorithm, while that of model-based approaches is Matrix factorization (recommender systems). A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself. Many algorithms have been used in measuring user similarity or item similarity in recommender systems. For example, the

k-nearest neighbor In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regressi ...

(k-NN) approach and the Pearson Correlation as first implemented by Allen. When building a model from a user's behavior, a distinction is often made between explicit and

implicit Implicit may refer to: Mathematics * Implicit function * Implicit function theorem * Implicit curve * Implicit surface * Implicit differential equation Other uses * Implicit assumption, in logic * Implicit-association test, in social psycholog ...

forms of

data collection Data collection or data gathering is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. Data collection is a research com ...

. Examples of explicit data collection include the following: * Asking a user to rate an item on a sliding scale. * Asking a user to search. * Asking a user to rank a collection of items from favorite to least favorite. * Presenting two items to a user and asking him/her to choose the better one of them. * Asking a user to create a list of items that he/she likes (see '' Rocchio classification'' or other similar techniques). Examples of implicit data collection include the following: * Observing the items that a user views in an online store. * Analyzing item/user viewing times. * Keeping a record of the items that a user purchases online. * Obtaining a list of items that a user has listened to or watched on his/her computer. * Analyzing the user's social network and discovering similar likes and dislikes. Collaborative filtering approaches often suffer from three problems: cold start, scalability, and sparsity.Sanghack Lee and Jihoon Yang and Sung-Yong Park
Discovery of Hidden Similarity on Collaborative Filtering to Overcome Sparsity Problem
Discovery Science, 2007. * Cold start: For a new user or item, there isn't enough data to make accurate recommendations. Note: one commonly implemented solution to this problem is the Multi-armed bandit algorithm. * Scalability: There are millions of users and products in many of the environments in which these systems make recommendations. Thus, a large amount of computation power is often necessary to calculate recommendations. * Sparsity: The number of items sold on major e-commerce sites is extremely large. The most active users will only have rated a small subset of the overall database. Thus, even the most popular items have very few ratings. One of the most famous examples of collaborative filtering is item-to-item collaborative filtering (people who buy x also buy y), an algorithm popularized by Amazon.com's recommender system.Collaborative Recommendations Using Item-to-Item Similarity Mappings
Many

social networks A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for a ...

originally used collaborative filtering to recommend new friends, groups, and other social connections by examining the network of connections between a user and their friends. Collaborative filtering is still used as part of hybrid systems.

Content-based filtering

Another common approach when designing recommender systems is content-based filtering. Content-based filtering methods are based on a description of the item and a profile of the user's preferences. These methods are best suited to situations where there is known data on an item (name, location, description, etc.), but not on the user. Content-based recommenders treat recommendation as a user-specific classification problem and learn a classifier for the user's likes and dislikes based on an item's features. In this system, keywords are used to describe the items, and a

user profile A user profile is a collection of settings and information associated with a user. It contains critical information that is used to identify an individual, such as their name, age, portrait photograph and individual characteristics such as ...

is built to indicate the type of item this user likes. In other words, these algorithms try to recommend items similar to those that a user liked in the past or is examining in the present. It does not rely on a user sign-in mechanism to generate this often temporary profile. In particular, various candidate items are compared with items previously rated by the user, and the best-matching items are recommended. This approach has its roots in

information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other c ...

and

information filtering An information filtering system is a system that removes redundant or unwanted information from an information stream using (semi)automated or computerized methods prior to presentation to a human user. Its main goal is the management of the inform ...

research. To create a

, the system mostly focuses on two types of information: 1. A model of the user's preference. 2. A history of the user's interaction with the recommender system. Basically, these methods use an item profile (i.e., a set of discrete attributes and features) characterizing the item within the system. To abstract the features of the items in the system, an item presentation algorithm is applied. A widely used algorithm is the

tf–idf In information retrieval, tf–idf (also TF*IDF, TFIDF, TF–IDF, or Tf–idf), short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or c ...

representation (also called vector space representation). The system creates a content-based profile of users based on a weighted vector of item features. The weights denote the importance of each feature to the user and can be computed from individually rated content vectors using a variety of techniques. Simple approaches use the average values of the rated item vector while other sophisticated methods use machine learning techniques such as Bayesian Classifiers,

cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...

decision trees A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains cond ...

, and

artificial neural networks Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected units ...

in order to estimate the probability that the user is going to like the item. A key issue with content-based filtering is whether the system can learn user preferences from users' actions regarding one content source and use them across other content types. When the system is limited to recommending content of the same type as the user is already using, the value from the recommendation system is significantly less than when other content types from other services can be recommended. For example, recommending news articles based on news browsing is useful. Still, it would be much more useful when music, videos, products, discussions, etc., from different services, can be recommended based on news browsing. To overcome this, most content-based recommender systems now use some form of the hybrid system. Content-based recommender systems can also include opinion-based recommender systems. In some cases, users are allowed to leave text reviews or feedback on the items. These user-generated texts are implicit data for the recommender system because they are potentially rich resources of both feature/aspects of the item and users' evaluation/sentiment to the item. Features extracted from the user-generated reviews are improved

meta-data Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...

of items, because as they also reflect aspects of the item like

, extracted features are widely concerned by the users. Sentiments extracted from the reviews can be seen as users' rating scores on the corresponding features. Popular approaches of opinion-based recommender system utilize various techniques including

text mining Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...

sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjec ...

(see also Multimodal sentiment analysis) and deep learning.

Hybrid recommendations approaches

Most recommender systems now use a hybrid approach, combining

, content-based filtering, and other approaches. There is no reason why several different techniques of the same type could not be hybridized. Hybrid approaches can be implemented in several ways: by making content-based and collaborative-based predictions separately and then combining them; by adding content-based capabilities to a collaborative-based approach (and vice versa); or by unifying the approaches into one model (see for a complete review of recommender systems). Several studies that empirically compare the performance of the hybrid with the pure collaborative and content-based methods and demonstrated that the hybrid methods can provide more accurate recommendations than pure approaches. These methods can also be used to overcome some of the common problems in recommender systems such as cold start and the sparsity problem, as well as the knowledge engineering bottleneck in knowledge-based approaches.

Netflix Netflix, Inc. is an American subscription video on-demand over-the-top streaming service and production company based in Los Gatos, California. Founded in 1997 by Reed Hastings and Marc Randolph in Scotts Valley, California, it offers a ...

is a good example of the use of hybrid recommender systems. The website makes recommendations by comparing the watching and searching habits of similar users (i.e., collaborative filtering) as well as by offering movies that share characteristics with films that a user has rated highly (content-based filtering). Some hybridization techniques include: *Weighted: Combining the score of different recommendation components numerically. *Switching: Choosing among recommendation components and applying the selected one. *Mixed: Recommendations from different recommenders are presented together to give the recommendation. *Feature Combination: Features derived from different knowledge sources are combined together and given to a single recommendation algorithm. *Feature Augmentation: Computing a feature or set of features, which is then part of the input to the next technique. *Cascade: Recommenders are given strict priority, with the lower priority ones breaking ties in the scoring of the higher ones. *Meta-level: One recommendation technique is applied and produces some sort of model, which is then the input used by the next technique.Robin Burke
Hybrid Web Recommender Systems
, pp. 377-408, The Adaptive Web, Peter Brusilovsky, Alfred Kobsa, Wolfgang Nejdl (Ed.), Lecture Notes in Computer Science, Springer-Verlag, Berlin, Germany, Lecture Notes in Computer Science, Vol. 4321, May 2007, 978-3-540-72078-2.

Technologies

Session-based recommender systems

These recommender systems use the interactions of a user within a session to generate recommendations. Session-based recommender systems are used at Youtube and Amazon. These are particularly useful when history (such as past clicks, purchases) of a user is not available or not relevant in the current user session. Domains, where session-based recommendations are particularly relevant, include video, e-commerce, travel, music and more. Most instances of session-based recommender systems rely on the sequence of recent interactions within a session without requiring any additional details (historical, demographic) of the user. Techniques for session-based recommendations are mainly based on generative sequential models such as Recurrent Neural Networks, Transformers, and other deep learning based approaches

Reinforcement learning for recommender systems

The recommendation problem can be seen as a special instance of a reinforcement learning problem whereby the user is the environment upon which the agent, the recommendation system acts upon in order to receive a reward, for instance, a click or engagement by the user. One aspect of reinforcement learning that is of particular use in the area of recommender systems is the fact that the models or policies can be learned by providing a reward to the recommendation agent. This is in contrast to traditional learning techniques which rely on supervised learning approaches that are less flexible, reinforcement learning recommendation techniques allow to potentially train models that can be optimized directly on metrics of engagement, and user interest.

Multi-criteria recommender systems

Multi-criteria recommender systems (MCRS) can be defined as recommender systems that incorporate preference information upon multiple criteria. Instead of developing recommendation techniques based on a single criterion value, the overall preference of user u for the item i, these systems try to predict a rating for unexplored items of u by exploiting preference information on multiple criteria that affect this overall preference value. Several researchers approach MCRS as a multi-criteria decision making (MCDM) problem, and apply MCDM methods and techniques to implement MCRS systems. See this chapter for an extended introduction.

Risk-aware recommender systems

The majority of existing approaches to recommender systems focus on recommending the most relevant content to users using contextual information, yet do not take into account the risk of disturbing the user with unwanted notifications. It is important to consider the risk of upsetting the user by pushing recommendations in certain circumstances, for instance, during a professional meeting, early morning, or late at night. Therefore, the performance of the recommender system depends in part on the degree to which it has incorporated the risk into the recommendation process. One option to manage this issue is ''DRARS'', a system which models the context-aware recommendation as a

bandit problem In probability theory and machine learning, the multi-armed bandit problem (sometimes called the ''K''- or ''N''-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices ...

. This system combines a content-based technique and a contextual bandit algorithm.

Mobile recommender systems

Mobile recommender systems make use of internet-accessing

smart phones A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, which ...

to offer personalized, context-sensitive recommendations. This is a particularly difficult area of research as mobile data is more complex than data that recommender systems often have to deal with. It is heterogeneous, noisy, requires spatial and temporal auto-correlation, and has validation and generality problems. There are three factors that could affect the mobile recommender systems and the accuracy of prediction results: the context, the recommendation method and privacy. Additionally, mobile recommender systems suffer from a transplantation problem – recommendations may not apply in all regions (for instance, it would be unwise to recommend a recipe in an area where all of the ingredients may not be available). One example of a mobile recommender system are the approaches taken by companies such as

Uber Uber Technologies, Inc. (Uber), based in San Francisco, provides mobility as a service, ride-hailing (allowing users to book a car and driver to transport them in a way similar to a taxi), food delivery ( Uber Eats and Postmates), pa ...

and

Lyft Lyft, Inc. offers mobility as a service, ride-hailing, vehicles for hire, motorized scooters, a bicycle-sharing system, rental cars, and food delivery in the United States and select cities in Canada. Lyft sets fares, which vary using a dyn ...

to generate driving routes for taxi drivers in a city. This system uses GPS data of the routes that taxi drivers take while working, which includes location (latitude and longitude), time stamps, and operational status (with or without passengers). It uses this data to recommend a list of pickup points along a route, with the goal of optimizing occupancy times and profits.

The Netflix Prize

One of the events that energized research in recommender systems was the

Netflix Prize The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users being identified e ...

. From 2006 to 2009, Netflix sponsored a competition, offering a grand prize of $1,000,000 to the team that could take an offered dataset of over 100 million movie ratings and return recommendations that were 10% more accurate than those offered by the company's existing recommender system. This competition energized the search for new and more accurate algorithms. On 21 September 2009, the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team using tiebreaking rules. The most accurate algorithm in 2007 used an ensemble method of 107 different algorithmic approaches, blended into a single prediction. As stated by the winners, Bell et al.:

Predictive accuracy is substantially improved when blending multiple predictors. ''Our experience is that most efforts should be concentrated in deriving substantially different approaches, rather than refining a single technique.'' Consequently, our solution is an ensemble of many methods.

Many benefits accrued to the web due to the Netflix project. Some teams have taken their technology and applied it to other markets. Some members from the team that finished second place founded Gravity R&D, a recommendation engine that's active in the RecSys community. 4-Tell, Inc. created a Netflix project–derived solution for ecommerce websites. A number of privacy issues arose around the dataset offered by Netflix for the

competition. Although the data sets were anonymized in order to preserve customer privacy, in 2007 two researchers from the University of Texas were able to identify individual users by matching the data sets with film ratings on the Internet Movie Database. As a result, in December 2009, an anonymous Netflix user sued Netflix in Doe v. Netflix, alleging that Netflix had violated United States fair trade laws and the Video Privacy Protection Act by releasing the datasets. This, as well as concerns from the

Federal Trade Commission The Federal Trade Commission (FTC) is an independent agency of the United States government whose principal mission is the enforcement of civil (non-criminal) antitrust law and the promotion of consumer protection. The FTC shares jurisdiction o ...

, led to the cancellation of a second Netflix Prize competition in 2010.

Evaluation

Performance measures

Evaluation is important in assessing the effectiveness of recommendation algorithms. To measure the

effectiveness Effectiveness is the capability of producing a desired result or the ability to produce desired output. When something is deemed effective, it means it has an intended or expected outcome, or produces a deep, vivid impression. Etymology The ori ...

of recommender systems, and compare different approaches, three types of

evaluation Evaluation is a systematic determination and assessment of a subject's merit, worth and significance, using criteria governed by a set of standards. It can assist an organization, program, design, project or any other intervention or initiative to ...

s are available: user studies, online evaluations (A/B tests), and offline evaluations. The commonly used metrics are the

mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...

and root mean squared error, the latter having been used in the Netflix Prize. The information retrieval metrics such as

precision and recall In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also call ...

or DCG are useful to assess the quality of a recommendation method. Diversity, novelty, and coverage are also considered as important aspects in evaluation. However, many of the classic evaluation measures are highly criticized. Evaluating the performance of a recommendation algorithm on a fixed test dataset will always be extremely challenging as it is impossible to accurately predict the reactions of real users to the recommendations. Hence any metric that computes the effectiveness of an algorithm in offline data will be imprecise. User studies are rather a small scale. A few dozens or hundreds of users are presented recommendations created by different recommendation approaches, and then the users judge which recommendations are best. In A/B tests, recommendations are shown to typically thousands of users of a real product, and the recommender system randomly picks at least two different recommendation approaches to generate recommendations. The effectiveness is measured with implicit measures of effectiveness such as

conversion rate In electronic commerce, conversion marketing is marketing with the intention of increasing ''conversions—''that is, site visitors who are paying customers. Measures Conversion marketing attempts to solve low online conversions through opt ...

click-through rate Click-through rate (CTR) is the ratio of users who click on a specific link to the number of total users who view a page, email, or advertisement. It is commonly used to measure the success of an online advertising campaign for a particular we ...

. Offline evaluations are based on historic data, e.g. a dataset that contains information about how users previously rated movies. The effectiveness of recommendation approaches is then measured based on how well a recommendation approach can predict the users' ratings in the dataset. While a rating is an explicit expression of whether a user liked a movie, such information is not available in all domains. For instance, in the domain of citation recommender systems, users typically do not rate a citation or recommended article. In such cases, offline evaluations may use implicit measures of effectiveness. For instance, it may be assumed that a recommender system is effective that is able to recommend as many articles as possible that are contained in a research article's reference list. However, this kind of offline evaluations is seen critical by many researchers. For instance, it has been shown that results of offline evaluations have low correlation with results from user studies or A/B tests. A dataset popular for offline evaluation has been shown to contain duplicate data and thus to lead to wrong conclusions in the evaluation of algorithms. Often, results of so-called offline evaluations do not correlate with actually assessed user-satisfaction. This is probably because offline training is highly biased toward the highly reachable items, and offline testing data is highly influenced by the outputs of the online recommendation module. Researchers have concluded that the results of offline evaluations should be viewed critically.

Beyond accuracy

Typically, research on recommender systems is concerned with finding the most accurate recommendation algorithms. However, there are a number of factors that are also important. *Diversity – Users tend to be more satisfied with recommendations when there is a higher intra-list diversity, e.g. items from different artists. *Recommender persistence – In some situations, it is more effective to re-show recommendations, or let users re-rate items, than showing new items. There are several reasons for this. Users may ignore items when they are shown for the first time, for instance, because they had no time to inspect the recommendations carefully. *Privacy – Recommender systems usually have to deal with privacy concerns because users have to reveal sensitive information. Building user profiles using collaborative filtering can be problematic from a privacy point of view. Many European countries have a strong culture of

data privacy Information privacy is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, contextual information norms, and the legal and political issues surrounding them. It is also known as data pr ...

, and every attempt to introduce any level of user profiling can result in a negative customer response. Much research has been conducted on ongoing privacy issues in this space. The

is particularly notable for the detailed personal information released in its dataset. Ramakrishnan et al. have conducted an extensive overview of the trade-offs between personalization and privacy and found that the combination of weak ties (an unexpected connection that provides serendipitous recommendations) and other data sources can be used to uncover identities of users in an anonymized dataset. *User demographics – Beel et al. found that user demographics may influence how satisfied users are with recommendations. In their paper they show that elderly users tend to be more interested in recommendations than younger users. *Robustness – When users can participate in the recommender system, the issue of fraud must be addressed. *Serendipity –

Serendipity Serendipity is an unplanned fortunate discovery. Serendipity is a common occurrence throughout the history of product invention and scientific discovery. Etymology The first noted use of "serendipity" was by Horace Walpole on 28 January 1754. ...

is a measure of "how surprising the recommendations are". For instance, a recommender system that recommends milk to a customer in a grocery store might be perfectly accurate, but it is not a good recommendation because it is an obvious item for the customer to buy. " erendipityserves two purposes: First, the chance that users lose interest because the choice set is too uniform decreases. Second, these items are needed for algorithms to learn and improve themselves". *Trust – A recommender system is of little value for a user if the user does not trust the system. Trust can be built by a recommender system by explaining how it generates recommendations, and why it recommends an item. *Labelling – User satisfaction with recommendations may be influenced by the labeling of the recommendations. For instance, in the cited study

(CTR) for recommendations labeled as "Sponsored" were lower (CTR=5.93%) than CTR for identical recommendations labeled as "Organic" (CTR=8.86%). Recommendations with no label performed best (CTR=9.87%) in that study.

Reproducibility

Recommender systems are notoriously difficult to evaluate offline, with some researchers claiming that this has led to a

reproducibility crisis The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibi ...

in recommender systems publications. A recent survey of a small number of selected publications applying deep learning or neural methods to the top-k recommendation problem, published in top conferences (SIGIR, KDD, WWW, RecSys, IJCAI), has shown that on average less than 40% of articles could be reproduced by the authors of the survey, with as little as 14% in some conferences. Overall the studies identify 26 articles, only 12 of them could be reproduced by the authors and 11 of them could be outperformed by much older and simpler properly tuned baselines on off-line evaluation metrics. The articles considers a number of potential problems in today's research scholarship and suggests improved scientific practices in that area. More recent work on benchmarking a set of the same methods came to qualitatively very different results whereby neural methods were found to be among the best performing methods. Deep learning and neural methods for recommender systems have been used in the winning solutions in several recent recommender system challenges, WSDM, RecSys Challenge. Moreover neural and deep learning methods are widely used in industry where they are extensively tested.Yves Raimond, Justin Basilic
Deep Learning for Recommender Systems
Deep Learning Re-Work SF Summit 2018 The topic of reproducibility is not new in recommender systems. By 2011, Ekstrand, Konstan, et al. criticized that "it is currently difficult to reproduce and extend recommender systems research results,” and that evaluations are “not handled consistently". Konstan and Adomavicius conclude that "the Recommender Systems research community is facing a crisis where a significant number of papers present results that contribute little to collective knowledge ��often because the research lacks the ��evaluation to be properly judged and, hence, to provide meaningful contributions." As a consequence, much research about recommender systems can be considered as not reproducible. Hence, operators of recommender systems find little guidance in the current research for answering the question, which recommendation approaches to use in a recommender systems. Said & Bellogín conducted a study of papers published in the field, as well as benchmarked some of the most popular frameworks for recommendation and found large inconsistencies in results, even when the same algorithms and data sets were used. Some researchers demonstrated that minor variations in the recommendation algorithms or scenarios led to strong changes in the effectiveness of a recommender system. They conclude that seven actions are necessary to improve the current situation: "(1) survey other research fields and learn from them, (2) find a common understanding of reproducibility, (3) identify and understand the determinants that affect reproducibility, (4) conduct more comprehensive experiments (5) modernize publication practices, (6) foster the development and use of recommendation frameworks, and (7) establish best-practice guidelines for recommender-systems research."

References

External links

* * Hangartner, Rick
"What is the Recommender Industry?"
MSearchGroove, December 17, 2007.
ACM Conference on Recommender Systems

Recsys group at Politecnico di Milano

Data Science: Data to Insights from MIT (recommendation systems)
{{DEFAULTSORT:Update System Social information processing