HOME

TheInfoList



OR:

In
information science Information science (also known as information studies) is an academic field which is primarily concerned with analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of informatio ...
and
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
, relevance denotes how well a retrieved document or set of documents meets the
information need The term information need is often understood as an individual or group's desire to locate and obtain information to satisfy a conscious or unconscious need. Rarely mentioned in general literature about needs, it is a common term in information sc ...
of the user. Relevance may include concerns such as timeliness, authority or novelty of the result.


History

The concern with the problem of finding relevant information dates back at least to the first publication of scientific journals in the 17th century. The formal study of relevance began in the 20th Century with the study of what would later be called
bibliometrics Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Biblio ...
. In the 1930s and 1940s, S. C. Bradford used the term "relevant" to characterize articles relevant to a subject (cf.,
Bradford's law Bradford's law is a pattern first described by Samuel C. Bradford in 1934 that estimates the exponentially diminishing returns of searching for references in science journals. One formulation is that if journals in a field are sorted by number of ...
). In the 1950s, the first information retrieval systems emerged, and researchers noted the retrieval of irrelevant articles as a significant concern. In 1958, B. C. Vickery made the concept of relevance explicit in an address at the International Conference on Scientific Information. Since 1958, information scientists have explored and debated definitions of relevance. A particular focus of the debate was the distinction between "relevance to a subject" or "topical relevance" and "user relevance".


Evaluation

The information retrieval community has emphasized the use of test collections and benchmark tasks to measure topical relevance, starting with the
Cranfield Experiments The Cranfield experiments were a series of experimental studies in information retrieval conducted by Cyril W. Cleverdon at the College of Aeronautics, today known as Cranfield University, in the 1960s to evaluate the efficiency of indexing syste ...
of the early 1960s and culminating in the TREC evaluations that continue to this day as the main evaluation framework for information retrieval research. In order to evaluate how well an
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
system retrieved topically relevant results, the relevance of retrieved results must be quantified. In
Cranfield Cranfield is a village and civil parish in the west of Bedfordshire, England, situated between Bedford and Milton Keynes. It had a population of 4,909 in 2001. increasing to 5,369 at the 2011 Census. The parish is in Central Bedfordshire uni ...
-style evaluations, this typically involves assigning a ''relevance level'' to each retrieved result, a process known as ''relevance assessment''. Relevance levels can be binary (indicating a result is relevant or that it is not relevant), or graded (indicating results have a varying degree of match between the topic of the result and the information need). Once relevance levels have been assigned to the retrieved results, information retrieval performance measures can be used to assess the quality of a retrieval system's output. In contrast to this focus solely on topical relevance, the information science community has emphasized user studies that consider user relevance. These studies often focus on aspects of human-computer interaction (see also human-computer information retrieval).


Clustering and relevance

The
cluster hypothesis In machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial i ...
, proposed by
C. J. van Rijsbergen C. J. "Keith" van Rijsbergen FREng (Cornelis Joost van Rijsbergen; born 1943) is a professor of computer science at the University of Glasgow, where he founded the Glasgow Information Retrieval Group. He is one of the founders of modern Info ...
in 1979, asserts that two documents that are similar to each other have a high likelihood of being relevant to the same information need. With respect to the embedding similarity space, the cluster hypothesis can be interpreted globally or locally.F. Diaz
Autocorrelation and Regularization of Query-Based Retrieval Scores
PhD thesis, University of Massachusetts Amherst, Amherst, MA, February 2008, Chapter 3.
The global interpretation assumes that there exist some fixed set of underlying topics derived from inter-document similarity. These global clusters or their representatives can then be used to relate relevance of two documents (e.g. two documents in the same cluster should both be relevant to the same request). Methods in this spirit include: * cluster-based information retrieval * cluster-based document expansion such as
latent semantic analysis Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the do ...
or its language modeling equivalents.X. Liu and W. B. Croft, β
Cluster-based retrieval using language models
” in SIGIR ’04: Proceedings of the 27th annual international conference on Research and development in information retrieval, (New York, NY, USA), pp. 186–193, ACM Press, 2004.
It is important to ensure that clusters – either in isolation or combination – successfully model the set of possible relevant documents. A second interpretation, most notably advanced by
Ellen Voorhees Ellen Marie Voorhees (born March 13, 1958) is an American computer scientist known for her work in document retrieval, information retrieval, and natural language processing. She works in the retrieval group at the National Institute of Standards ...
, E. M. Voorhees, β€œThe cluster hypothesis revisited,” in SIGIR ’85: Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval, (New York, NY, USA), pp. 188–196, ACM Press, 1985. focuses on the local relationships between documents. The local interpretation avoids having to model the number or size of clusters in the collection and allow relevance at multiple scales. Methods in this spirit include: * multiple cluster retrieval * spreading activationS. Preece, A spreading activation network model for information retrieval. PhD thesis, University of Illinois, Urbana-Champaign, 1981. and relevance propagationT. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma, β
A study of relevance propagation for web search
” in SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, (New York, NY, USA), pp. 408–415, ACM Press, 2005.
methods * local
document expansion A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" or ...
A. Singhal and F. Pereira, β
Document expansion for speech retrieval
” in SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, (New York, NY, USA), pp. 34–41, ACM Press, 1999.
* score regularization Local methods require an accurate and appropriate document
similarity measure In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such meas ...
.


Problems and alternatives

The documents which are most relevant are not necessarily those which are most useful to display in the first page of search results. For example, two duplicate documents might be individually considered quite relevant, but it is only useful to display one of them. A measure called "maximal marginal relevance" (MMR) has been proposed to overcome this shortcoming. It considers the relevance of each document only in terms of how much new information it brings given the previous results. In some cases, a query may have an ambiguous interpretation, or a variety of potential responses. Providing a diversity of results can be a consideration when evaluating the utility of a result set.


See also

* Information overload *
Relevance Relevance is the concept of one topic being connected to another topic in a way that makes it useful to consider the second topic when considering the first. The concept of relevance is studied in many different fields, including cognitive sci ...


References


Further reading

* *Relevance : communication and cognition. by Dan Sperber; Deirdre Wilson. 2nd ed. Oxford; Cambridge, MA: Blackwell Publishers, 2001. * * *{{cite journal , doi=10.1002/asi.20681 , url=http://www.scils.rutgers.edu/~tefko/Saracevic%20relevance%20pt%20III%20JASIST%20%2707.pdf , archive-url=https://wayback.archive-it.org/all/20080221223251/http://www.scils.rutgers.edu/~tefko/Saracevic%20relevance%20pt%20III%20JASIST%20'07.pdf , url-status=dead , archive-date=2008-02-21 , title=Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance, year=2007, last1=Saracevic, first1=Tefko, journal=Journal of the American Society for Information Science and Technology, volume=58, issue=13, pages=2126–2144 *Saracevic, T. (2007). Relevance in information science. Invited Annual Thomson Scientific Lazerow Memorial Lecture at School of Information Sciences, University of Tennessee. September 19, 2007.
video
*Introduction to Information Retrieval: Evaluation. Stanford.
presentation in PDF
Information retrieval evaluation