computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, a term index is a data structure to facilitate fast lookup of terms and

clauses In language, a clause is a Constituent (linguistics), constituent or Phrase (grammar), phrase that comprises a semantic predicand (expressed or not) and a semantic Predicate (grammar), predicate. A typical clause consists of a subject (grammar), ...

in a

logic program Logic programming is a programming, database and knowledge representation paradigm based on formal logic. A logic program is a set of sentences in logical form, representing knowledge about some problem domain. Computation is performed by applyin ...

deductive database A deductive database is a database system that can make deductions (i.e. conclude additional facts) based on rules and facts stored in its database. Datalog is the language typically used to specify facts, rules and queries in deductive database ...

, or

automated theorem prover Automated theorem proving (also known as ATP or automated deduction) is a subfield of automated reasoning and mathematical logic dealing with proving mathematical theorems by computer programs. Automated reasoning over mathematical proof was a ma ...

Overview

Many operations in automatic theorem provers require search in huge collections of terms and clauses. Such operations typically fall into the following scheme. Given a collection

S

of terms (clauses) and a query term (clause)

q

, find in

S

some/all terms

t

related to

q

according to a certain retrieval condition. Most interesting retrieval conditions are formulated as existence of a substitution that relates in a special way the query and the retrieved objects

t

. Here is a list of retrieval conditions frequently used in provers: * term

q

is unifiable with term

t

, i.e., there exists a substitution

\theta

, such that

q\theta

t\theta

* term

t

is an instance of

q

, i.e., there exists a substitution

\theta

, such that

q\theta

t

* term

t

is a generalisation of

q

, i.e., there exists a substitution

\theta

, such that

q

t\theta

* clause

q

θ-subsumes clause

t

, i.e., there exists a substitution

\theta

, such that

q\theta

is a subset/submultiset of

t

* clause

q

is θ-subsumed by

t

, i.e., there exists a substitution

\theta

, such that

t\theta

is a subset/submultiset of

q

More often than not, we are actually interested in finding the appropriate substitutions explicitly, together with the retrieved terms

t

, rather than just in establishing existence of such substitutions. Very often the sizes of term sets to be searched are large, the retrieval calls are frequent and the retrieval condition test is rather complex. In such situations linear search in

S

, when the retrieval condition is tested on every term from

S

, becomes prohibitively costly. To overcome this problem, special data structures, called ''indexes'', are designed in order to support fast retrieval. Such data structures, together with the accompanying algorithms for index maintenance and retrieval, are called ''term indexing techniques''.

Classic indexing techniques

discrimination tree Discrimination is the process of making unfair or prejudicial distinctions between people based on the groups, classes, or other categories to which they belong or are perceived to belong, such as race, gender, age, class, religion, or sexu ...

s *

substitution tree Substitution may refer to: Arts and media *Substitution (poetry), a variation in poetic scansion * Substitution (theatre), an acting methodology Music *Chord substitution, swapping one chord for a related one within a chord progression *Tritone ...

s *

path indexing A path is a route for physical travel – see Trail. Path or PATH may also refer to: Physical paths of different types * Bicycle path * Bridle path, used by people on horseback * Course (navigation), the intended path of a vehicle * Desire p ...

Substitution trees outperform path indexing, discrimination tree indexing, and abstraction trees. A discrimination tree term index stores its information in a

trie In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store t ...

data structure.

Indexing techniques used in logic programming

First-argument indexing is the most common strategy where the first argument is used as index. It distinguishes atomic values and the principal functor of compound terms. Nonfirst argument indexing is a variation of first-argument indexing that uses the same or similar techniques as first-argument indexing on one or more alternative arguments. For instance, if a predicate call uses variables for the first argument, the system may choose to use the second argument as the index instead. Multiargument indexing creates a combined index over multiple instantiated arguments if there is not a sufficiently selective single argument index. Deep indexing is used when multiple clauses use the same principal functor for some argument. It recursively uses the same or similar indexing techniques on the arguments of the compound terms.

Trie In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store t ...

indexing uses a prefix tree to find applicable clauses.

References

{{reflist

, volume 2, 2001 (recent overview) * W. W. McCune, Experiments with Discrimination-Tree Indexing and Path Indexing for Term Retrieval, Journal of Automated Reasoning, 9(2), 1992 * P. Graf, Substitution Tree Indexing, Proc. of RTA, Lecture Notes in Computer Science 914, 1995 * M. Stickel, The Path Indexing Method for Indexing Terms, Tech. Rep. 473,

Artificial Intelligence Center The Artificial Intelligence Center is a laboratory in the Information and Computing Sciences Division of SRI International. It was founded in 1966 by Charles Rosen and studies artificial intelligence. One of their early projects was Shakey the ...

SRI International SRI International (SRI) is a nonprofit organization, nonprofit scientific research, scientific research institute and organization headquartered in Menlo Park, California, United States. It was established in 1946 by trustees of Stanford Univer ...

, 1989 * S. Schulz, Simple and Efficient Clause Subsumption with Feature Vector Indexing, Proc. of IJCAR-2004 workshop ESFOR, 2004 * A. Riazanov and A. Voronkov, Partially Adaptive Code Trees, Proc. JELIA, Lecture Notes in Artificial Intelligence 1919, 2000 * H. Ganzinger and R. Nieuwenhuis and P. Nivela, Fast Term Indexing with Coded Context Trees, Journal of Automated Reasoning, 32(2), 2004 * A. Riazanov and A. Voronkov, Efficient Instance Retrieval with Standard and Relational Path Indexing, Information and Computation, 199(1–2), 2005 Data structures Logic programming Theorem proving software systems

Overview

Classic indexing techniques

Indexing techniques used in logic programming

References

Further reading