The Extended Boolean model was described in a Communications of the ACM article appearing in 1983, by Gerard Salton, Edward A. Fox, and Harry Wu. The goal of the Extended Boolean model is to overcome the drawbacks of the Boolean model that has been used in
information retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
. The Boolean model doesn't consider term weights in queries, and the result set of a Boolean query is often either too small or too big. The idea of the extended model is to make use of partial matching and term weights as in the vector space model. It combines the characteristics of the
Vector Space Model
Vector space model or term vector model is an algebraic model for representing text documents (or more generally, items) as vector space, vectors such that the distance between vectors represents the relevance between the documents. It is used in i ...
with the properties of
Boolean algebra
In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variable (mathematics), variables are the truth values ''true'' and ''false'', usually denot ...
and ranks the similarity between queries and documents. This way a document may be somewhat relevant if it matches some of the queried terms and will be returned as a result, whereas in the
Standard Boolean model it wasn't.
Thus, the extended Boolean model can be considered as a generalization of both the Boolean and vector space models; those two are special cases if suitable settings and definitions are employed. Further, research has shown effectiveness improves relative to that for Boolean query processing. Other research has shown that
relevance feedback and
query expansion
Query expansion (QE) is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding.
In the context of search engines, query expansion involves ...
can be integrated with extended Boolean query processing.
Definitions
In the Extended Boolean model, a document is represented as a vector (similarly to in the vector model). Each ''i''
dimension
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coo ...
corresponds to a separate term associated with the document.
The weight of term associated with document is measured by its normalized
Term frequency
Term may refer to:
Language
*Terminology, context-specific nouns or compound words
**Technical term (or ''term of art''), used by specialists in a field
***Scientific terminology, used by scientists
*Term (argumentation), part of an argument in d ...
and can be defined as:
where is
inverse document frequency and the term frequency for term x in document j.
The weight vector associated with document can be represented as: