Learning to rank. Slides from Tie-Yan Liu's talk at

WWW The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web se ...

2009 conference ar
available online
or machine-learned ranking (MLR) is the application of

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for

information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...

systems. Training data consists of lists of items with some

partial order In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a binary ...

specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.

Applications

In information retrieval

Ranking is a central part of many

problems, such as document retrieval, collaborative filtering, sentiment analysis, and online advertising. A possible architecture of a machine-learned search engine is shown in the accompanying figure. Training data consists of queries and documents matching them together with relevance degree of each match. It may be prepared manually by human ''assessors'' (or ''raters'', as

Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...

calls them), who check results for some queries and determine

relevance Relevance is the concept of one topic being connected to another topic in a way that makes it useful to consider the second topic when considering the first. The concept of relevance is studied in many different fields, including cognitive sci ...

of each result. It is not feasible to check the relevance of all documents, and so typically a technique called pooling is used — only the top few documents, retrieved by some existing ranking models are checked. This technique may introduce selection bias. Alternatively, training data may be derived automatically by analyzing ''clickthrough logs'' (i.e. search results which got clicks from users), ''query chains'', or such search engines' features as Google's (since-replaced)

SearchWiki SearchWiki was a Google Search feature which allowed logged-in users to annotate and re-order search results. The annotations and modified order only applied to the user's searches, but it was possible to view other users' annotation An annot ...

. Clickthrough logs can be biased by the tendency of users to click on the top search results on the assumption that they are already well-ranked. Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries. Typically, users expect a search query to complete in a short time (such as a few hundred milliseconds for web search), which makes it impossible to evaluate a complex ranking model on each document in the corpus, and so a two-phase scheme is used. First, a small number of potentially relevant documents are identified using simpler retrieval models which permit fast query evaluation, such as the

vector space model Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers (such as index terms). It is used in information filtering, information retrieval, indexing and r ...

boolean model Any kind of logic, function, expression, or theory based on the work of George Boole is considered Boolean. Related to this, "Boolean" may refer to: * Boolean data type, a form of data with only two possible values (usually "true" and "false" ...

, weighted AND, or BM25. This phase is called ''top-

k

document retrieval'' and many heuristics were proposed in the literature to accelerate it, such as using a document's static quality score and tiered indexes.. Sectio
7.1
In the second phase, a more accurate but computationally expensive machine-learned model is used to re-rank these documents.

In other areas

Learning to rank algorithms have been applied in areas other than information retrieval: * In

machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...

for ranking a set of hypothesized translations; * In

computational biology Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...

for ranking candidate 3-D structures in protein structure prediction problem. * In recommender systems for identifying a ranked list of related news articles to recommend to a user after he or she has read a current news article. * In

software engineering Software engineering is a systematic engineering approach to software development. A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term '' ...

, learning-to-rank methods have been used for fault localization.

Feature vectors

For the convenience of MLR algorithms, query-document pairs are usually represented by numerical vectors, which are called '' feature vectors''. Such an approach is sometimes called ''bag of features'' and is analogous to the

bag of words The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding g ...

model and

used in information retrieval for representation of documents. Components of such vectors are called '' features'', ''factors'' or ''ranking signals''. They may be divided into three groups (features from document retrieval are shown as examples): * ''Query-independent'' or ''static'' features — those features, which depend only on the document, but not on the query. For example, PageRank or document's length. Such features can be precomputed in off-line mode during indexing. They may be used to compute document's ''static quality score'' (or ''static rank''), which is often used to speed up search query evaluation. * ''Query-dependent'' or ''dynamic'' features — those features, which depend both on the contents of the document and the query, such as TF-IDF score or other non-machine-learned ranking functions. * ''

Query-level feature A query-level feature or QLF is a ranking feature utilized in a machine-learned ranking algorithm. Example QLFs: * How many times has this query been run in the last month? * How many words are in the query? * What is the sum/average/min/max/media ...

s'' or ''query features'', which depend only on the query. For example, the number of words in a query. Some examples of features, which were used in the well-know
LETOR
dataset: * TF, TF-IDF, BM25, and

language modeling A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...

scores of document's

zone Zone or The Zone may refer to: Places Climate and altitude zones * Death zone (originally the lethal zone), altitudes above a certain point where the amount of oxygen is insufficient to sustain human life for an extended time span * Frigid zone, ...

s (title, body, anchors text, URL) for a given query; * Lengths and

IDF IDF or idf may refer to: Defence forces * Irish Defence Forces * Israel Defense Forces *Iceland Defense Force, of the US Armed Forces, 1951-2006 * Indian Defence Force, a part-time force, 1917 Organizations * Israeli Diving Federation * Interac ...

sums of document's zones; * Document's PageRank,

HITS Hits or H.I.T.S. may refer to: Arts, entertainment, and media Music * ''H.I.T.S.'', 1991 album by New Kids on the Block * ''...Hits'' (Phil Collins album), 1998 * ''Hits'' (compilation series), 1984–2006; 2014 - a British compilation album se ...

ranks and their variants. Selecting and designing good features is an important area in machine learning, which is called

feature engineering Feature engineering or feature extraction or feature discovery is the process of using domain knowledge to extract features (characteristics, properties, attributes) from raw data. The motivation is to use these extra features to improve the qual ...

Evaluation measures

There are several measures (metrics) which are commonly used to judge how well an algorithm is doing on training data and to compare the performance of different MLR algorithms. Often a learning-to-rank problem is reformulated as an optimization problem with respect to one of these metrics. Examples of ranking quality measures: *

Mean average precision Evaluation measures for an information retrieval (IR) system assess how well an index, search engine or database returns results from a collection of resources that satisfy a user's query. They are therefore fundamental to the success of informatio ...

(MAP); * DCG and NDCG; *

Precision Precision, precise or precisely may refer to: Science, and technology, and mathematics Mathematics and computing (general) * Accuracy and precision, measurement deviation from true value and its scatter * Significant figures, the number of digit ...

@''n'', NDCG@''n'', where "@''n''" denotes that the metrics are evaluated only on top ''n'' documents; *

Mean reciprocal rank The mean reciprocal rank is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inve ...

; *

Kendall's tau In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient (after the Greek letter τ, tau), is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non ...

; * Spearman's rho. DCG and its normalized variant NDCG are usually preferred in academic research when multiple levels of relevance are used. Other metrics such as MAP, MRR and precision, are defined only for binary judgments. Recently, there have been proposed several new evaluation metrics which claim to model user's satisfaction with search results better than the DCG metric: * Expected reciprocal rank (ERR); * Yandex's pfound. Both of these metrics are based on the assumption that the user is more likely to stop looking at search results after examining a more relevant document, than after a less relevant document.

Approaches

Tie-Yan Liu of Microsoft Research Asia has analyzed existing algorithms for learning to rank problems in his book ''Learning to Rank for Information Retrieval''. He categorized them into three groups by their input spaces, output spaces, hypothesis spaces (the core function of the model) and

loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...

s: the pointwise, pairwise, and listwise approach. In practice, listwise approaches often outperform pairwise approaches and pointwise approaches. This statement was further supported by a large scale experiment on the performance of different learning-to-rank methods on a large collection of benchmark data sets. In this section, without further notice,

x

denotes an object to be evaluated, for example, a document or an image,

f(x)

denotes a single-value hypothesis,

h(\cdot)

denotes a bi-variate or multi-variate function and

L(\cdot)

denotes the loss function.

Pointwise approach

In this case, it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then the learning-to-rank problem can be approximated by a regression problem — given a single query-document pair, predict its score. Formally speaking, the pointwise approach aims at learning a function

f(x)

predicting the real-value or ordinal score of a document

x

using the loss function

L(f, x_j, y_j)

. A number of existing supervised machine learning algorithms can be readily used for this purpose.

Ordinal regression In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between dif ...

and

classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

algorithms can also be used in pointwise approach when they are used to predict the score of a single query-document pair, and it takes a small, finite number of values.

Pairwise approach

In this case, the learning-to-rank problem is approximated by a classification problem — learning a

binary classifier Binary classification is the task of classifying the elements of a set into two groups (each called ''class'') on the basis of a classification rule. Typical binary classification problems include: * Medical testing to determine if a patient has c ...

h(x_u, x_v)

that can tell which document is better in a given pair of documents. The classifier shall take two images as its input and the goal is to minimize a loss function

L(h; x_u, x_v, y_)

. The loss function may reflect the average number of inversions in ranking. In many cases, the binary classifier

h(x_u, x_v)

is implemented with a scoring function

f(x)

. As an example, RankNet adapts a probability model and defines

h(x_u, x_v)

as the estimated probability of the document

x_u

has higher quality than

x_v

: :

P_(f) = 
  \text (f(x_u) - f(x_v)),

where

\text(\cdot)

is a

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

, for example, the standard logistic CDF, i.e. :

\text(x)
  =
  \frac.

Listwise approach

These algorithms try to directly optimize the value of one of the above evaluation measures, averaged over all queries in the training data. This is difficult because most evaluation measures are not continuous functions with respect to ranking model's parameters, and so continuous approximations or bounds on evaluation measures have to be used.

List of methods

A partial list of published learning-to-rank algorithms is shown below with years of first publication of each method: : Note: as most supervised learning algorithms can be applied to pointwise case, only those methods which are specifically designed with ranking in mind are shown above.

History

Norbert Fuhr Norbert Fuhr (born 1956) is a professor of computer science and the leader of the Duisburg Information Engineering Group based at the University of Duisburg-Essen, Germany. Education His first degree is in technical computer science, which he got ...

introduced the general idea of MLR in 1992, describing learning approaches in information retrieval as a generalization of parameter estimation; a specific variant of this approach (using polynomial regression) had been published by him three years earlier. Bill Cooper proposed

logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...

for the same purpose in 1992 and used it with his

Berkeley Berkeley most often refers to: *Berkeley, California, a city in the United States **University of California, Berkeley, a public university in Berkeley, California * George Berkeley (1685–1753), Anglo-Irish philosopher Berkeley may also refer ...

research group to train a successful ranking function for

TREC TREC may refer to: * Techniques de Randonnée Équestre de Compétition or Trec, an equestrian discipline * Text Retrieval Conference, workshops co-sponsored by the National Institute of Standards and Technology (NIST) and the U.S. Department of ...

. Manning et al. suggest that these early works achieved limited results in their time due to little available training data and poor machine learning techniques. Several conferences, such as NIPS, SIGIR and

ICML The International Conference on Machine Learning (ICML) is the leading international academic conference in machine learning. Along with NeurIPS and ICLR, it is one of the three primary conferences of high impact in machine learning and artificia ...

had workshops devoted to the learning-to-rank problem since mid-2000s (decade).

Practical usage by search engines

Commercial

web search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...

s began using machine learned ranking systems since the 2000s (decade). One of the first search engines to start using it was AltaVista (later its technology was acquired by

Overture Overture (from French ''ouverture'', "opening") in music was originally the instrumental introduction to a ballet, opera, or oratorio in the 17th century. During the early Romantic era, composers such as Beethoven and Mendelssohn composed overt ...

, and then

Yahoo Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo! Inc. (2017–present), Yahoo Inc., which is 90% owned by investment funds ma ...

), which launched a

gradient boosting Gradient boosting is a machine learning technique used in regression and classification tasks, among others. It gives a prediction model in the form of an ensemble of weak prediction models, which are typically decision trees. When a decision t ...

-trained ranking function in April 2003. Bing's search is said to be powered b
RankNet
algorithm, which was invented at

Microsoft Research Microsoft Research (MSR) is the research subsidiary of Microsoft. It was created in 1991 by Richard Rashid, Bill Gates and Nathan Myhrvold with the intent to advance state-of-the-art computing and solve difficult world problems through technologi ...

in 2005. In November 2009 a Russian search engine Yandex announcedYandex corporate blog entry about new ranking model "Snezhinsk"
(in Russian) that it had significantly increased its search quality due to deployment of a new proprietary

MatrixNet MatrixNet is a proprietary machine learning algorithm developed by Yandex and used widely throughout the company products. The algorithm is based on gradient boosting Gradient boosting is a machine learning technique used in regression and class ...

algorithm, a variant of

method which uses oblivious decision trees. Recently they have also sponsored a machine-learned ranking competition "Internet Mathematics 2009" based on their own search engine's production data. Yahoo has announced a similar competition in 2010. As of 2008,

's Peter Norvig denied that their search engine exclusively relies on machine-learned ranking. Cuil's CEO, Tom Costello, suggests that they prefer hand-built models because they can outperform machine-learned models when measured against metrics like click-through rate or time on landing page, which is because machine-learned models "learn what people say they like, not what people actually like". In January 2017 the technology was included in the

open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...

search engine Apache Solr™, thus making machine learned search rank widely accessible also for enterprise search.

Vulnerabilities

Similar to recognition applications in

computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...

, recent neural network based ranking algorithms are also found to be susceptible to covert adversarial attacks, both on the candidates and the queries. With small perturbations imperceptible to human beings, ranking order could be arbitrarily altered. In addition, model-agnostic transferable adversarial examples are found to be possible, which enables black-box adversarial attacks on deep ranking systems without requiring access to their underlying implementations. Conversely, the robustness of such ranking systems can be improved via adversarial defenses such as the Madry defense.

References

{{reflist, 2

External links

; Competitions and public datasets
LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval

Yandex's Internet Mathematics 2009

Yahoo! Learning to Rank Challenge

Microsoft Learning to Rank Datasets
; Open Source code
Parallel C++/MPI implementation of Gradient Boosted Regression Trees for ranking, released September 2011

C++ implementation of Gradient Boosted Regression Trees and Random Forests for ranking

* ttps://github.com/apache/lucene-solr/tree/master/solr/contrib/ltr Java implementation in the Apache Solr search engine Information retrieval techniques Machine learning Ranking functions