HOME

TheInfoList



OR:

Relevance feedback is a feature of some
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user
feedback Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can then be said to ''feed back'' into itself. The notion of cause-and-effect has to be handled ...
, and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback.


Explicit feedback

Explicit feedback is obtained from assessors of relevance indicating the relevance of a document retrieved for a query. This type of feedback is defined as explicit only when the assessors (or other users of a system) know that the feedback provided is interpreted as
relevance Relevance is the concept of one topic being connected to another topic in a way that makes it useful to consider the second topic when considering the first. The concept of relevance is studied in many different fields, including cognitive sci ...
judgments. Users may indicate relevance explicitly using a ''binary'' or ''graded'' relevance system. Binary relevance feedback indicates that a document is either relevant or irrelevant for a given query. Graded relevance feedback indicates the relevance of a document to a query on a scale using numbers, letters, or descriptions (such as "not relevant", "somewhat relevant", "relevant", or "very relevant"). Graded relevance may also take the form of a cardinal ordering of documents created by an assessor; that is, the assessor places documents of a result set in order of (usually descending) relevance. An example of this would be the
SearchWiki SearchWiki was a Google Search feature which allowed logged-in users to annotate and re-order search results. The annotations and modified order only applied to the user's searches, but it was possible to view other users' annotation An annot ...
feature implemented by
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
on their search website. The relevance feedback information needs to be interpolated with the original query to improve retrieval performance, such as the well-known
Rocchio algorithm The Rocchio algorithm is based on a method of relevance feedback found in information retrieval systems which stemmed from the SMART Information Retrieval System developed between 1960 and 1964. Like many other retrieval systems, the Rocchio algori ...
. A
performance metric A performance indicator or key performance indicator (KPI) is a type of performance measurement. KPIs evaluate the success of an organization or of a particular activity (such as projects, programs, products and other initiatives) in which it en ...
which became popular around 2005 to measure the usefulness of a ranking
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
based on the explicit relevance feedback is normalized discounted cumulative gain. Other measures include
precision Precision, precise or precisely may refer to: Science, and technology, and mathematics Mathematics and computing (general) * Accuracy and precision, measurement deviation from true value and its scatter * Significant figures, the number of digit ...
at ''k'' and
mean average precision Evaluation measures for an information retrieval (IR) system assess how well an index, search engine or database returns results from a collection of resources that satisfy a user's query. They are therefore fundamental to the success of informatio ...
.


Implicit feedback

Implicit feedback is inferred from user behavior, such as noting which documents they do and do not select for viewing, the duration of time spent viewing a document, or page browsing or scrolling actions. There are many signals during the search process that one can use for implicit feedback and the types of information to provide in response. The key differences of implicit relevance feedback from that of explicit include: # the user is not assessing relevance for the benefit of the IR system, but only satisfying their own needs and # the user is not necessarily informed that their behavior (selected documents) will be used as relevance feedback An example of this is dwell time, which is a measure of how long a user spends viewing the page linked to in a search result. It is an indicator of how well the search result met the query intent of the user, and is used as a feedback mechanism to improve search results.


Pseudo relevance feedback

Pseudo relevance feedback, also known as blind relevance feedback, provides a method for automatic local analysis. It automates the manual part of relevance feedback, so that the user gets improved retrieval performance without an extended interaction. The method is to do normal retrieval to find an initial set of most relevant documents, to then assume that the top "k" ranked documents are relevant, and finally to do relevance feedback as before under this assumption. The procedure is: # Take the results returned by initial query as relevant results (only top k with k being between 10 and 50 in most experiments). # Select top 20-30 (indicative number) terms from these documents using for instance tf-idf weights. # Do Query Expansion, add these terms to query, and then match the returned documents for this query and finally return the most relevant documents. Some experiments such as results from the Cornell SMART system published in (Buckley et al.1995), show improvement of retrieval systems performances using pseudo-relevance feedback in the context of TREC 4 experiments. This automatic technique mostly works. Evidence suggests that it tends to work better than global analysis. Through a query expansion, some relevant documents missed in the initial round can then be retrieved to improve the overall performance. Clearly, the effect of this method strongly relies on the quality of selected expansion terms. It has been found to improve performance in the TREC ad hoc task . But it is not without the dangers of an automatic process. For example, if the query is about copper mines and the top several documents are all about mines in Chile, then there may be query drift in the direction of documents on Chile. In addition, if the words added to the original query are unrelated to the query topic, the quality of the retrieval is likely to be degraded, especially in Web search, where web documents often cover multiple different topics. To improve the quality of expansion words in pseudo-relevance feedback, a positional relevance feedback for pseudo-relevance feedback has been proposed to select from feedback documents those words that are focused on the query topic based on positions of words in feedback documents.Yuanhua Lv and ChengXiang Zhai
relevance model for pseudo-relevance feedback''
in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR), 2010.
Specifically, the positional relevance model assigns more weights to words occurring closer to query words based on the intuition that words closer to query words are more likely to be related to the query topic. Blind feedback automates the manual part of relevance feedback and has the advantage that assessors are not required.


Using relevance information

Relevance information is utilized by using the contents of the relevant documents to either adjust the weights of terms in the original query, or by using those contents to add words to the query. Relevance feedback is often implemented using the
Rocchio algorithm The Rocchio algorithm is based on a method of relevance feedback found in information retrieval systems which stemmed from the SMART Information Retrieval System developed between 1960 and 1964. Like many other retrieval systems, the Rocchio algori ...
.


References

{{reflist, 30em


Further reading


Relevance feedback lecture notes
- Jimmy Lin's lecture notes, adapted from Doug Oard's

- chapter from ''Modern Information Retrieval'' *Stefan Büttcher, Charles L. A. Clarke, and Gordon V. Cormack
Information Retrieval: Implementing and Evaluating Search Engines
MIT Press, Cambridge, Mass., 2010. Internet search algorithms Information retrieval evaluation zh:相关反馈