Generalized vector space model
   HOME

TheInfoList



OR:

The Generalized vector space model is a generalization of the
vector space model Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers (such as index terms). It is used in information filtering, information retrieval, indexing and ...
used in information retrieval. Wong ''et al.'' presented an analysis of the problems that the pairwise orthogonality assumption of the
vector space model Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers (such as index terms). It is used in information filtering, information retrieval, indexing and ...
(VSM) creates. From here they extended the VSM to the generalized vector space model (GVSM).


Definitions

GVSM introduces term to term correlations, which deprecate the pairwise orthogonality assumption. More specifically, the factor considered a new space, where each term vector ''ti'' was expressed as a linear combination of ''2n'' vectors ''mr'' where ''r = 1...2n''. For a document ''dk'' and a query ''q'' the similarity function now becomes: :sim(d_k,q) = \frac where ''ti'' and ''tj'' are now vectors of a ''2n'' dimensional space. Term correlation t_i \cdot t_j can be implemented in several ways. For an example, Wong et al. uses the term occurrence frequency matrix obtained from automatic indexing as input to their algorithm. The term occurrence and the output is the term correlation between any pair of index terms.


Semantic information on GVSM

There are at least two basic directions for embedding term to term relatedness, other than exact keyword matching, into a retrieval model: # compute semantic correlations between terms # compute frequency co-occurrence statistics from large corpora Recently Tsatsaronis focused on the first approach. They measure semantic relatedness (''SR'') using a thesaurus (''O'') like WordNet. It considers the path length, captured by compactness (''SCM''), and the path depth, captured by semantic path elaboration (''SPE''). They estimate the t_i \cdot t_j inner product by: t_i \cdot t_j = SR((t_i, t_j), (s_i, s_j), O) where ''si'' and ''sj'' are senses of terms ''ti'' and ''tj'' respectively, maximizing SCM \cdot SPE. Building also on the first approach, Waitelonis et al. have computed semantic relatedness from
Linked Open Data In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but r ...
resources including
DBpedia DBpedia (from "DB" for " database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semanti ...
as well as the YAGO taxonomy. Thereby they exploits taxonomic relationships among semantic entities in documents and queries after named entity linking.


References

{{reflist Vector space model