The probabilistic relevance model was devised by Stephen E. Robertson and

Karen Spärck Jones Karen Sparck Jones is a computer science researcher and innovator who pioneered the search engine algorithm known as inverse document frequency (IDF). While many early information scientists and computer engineers were focused on developing progr ...

as a framework for probabilistic models to come. It is a formalism of information retrieval useful to derive ranking functions used by search engines and

web search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...

s in order to rank matching documents according to their

relevance Relevance is the concept of one topic being connected to another topic in a way that makes it useful to consider the second topic when considering the first. The concept of relevance is studied in many different fields, including cognitive sci ...

to a given search query. It is a theoretical model estimating the probability that a document ''d_j'' is relevant to a query ''q''. The model assumes that this probability of relevance depends on the query and document representations. Furthermore, it assumes that there is a portion of all documents that is preferred by the user as the answer set for query ''q''. Such an ideal answer set is called ''R'' and should maximize the overall probability of relevance to that user. The prediction is that documents in this set ''R'' are relevant to the query, while documents not present in the set are non-relevant.

sim(d_,q) = \frac

Related models

There are some limitations to this framework that need to be addressed by further development: * There is no accurate estimate for the first run probabilities * Index terms are not weighted * Terms are assumed mutually independent To address these and other concerns, other models have been developed from the probabilistic relevance framework, among them the

Binary Independence Model The Binary Independence Model (BIM) in computing and information science is a probabilistic information retrieval technique. The model makes some simple assumptions to make the estimation of document/query similarity probable and feasible. Defini ...

from the same author. The best-known derivative of this framework is the Okapi (BM25) weighting scheme, along with BM25F, a modification thereof.

References

{{reflist Information retrieval techniques Probabilistic models