Gist Communications
   HOME

TheInfoList



OR:

In computing, GiST or Generalized Search Tree, is a
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
and
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standa ...
that can be used to build a variety of disk-based
search trees Searching or search may refer to: Computing technology * Search algorithm, including keyword search ** :Search algorithms * Search and optimization for problem solving in artificial intelligence * Search engine technology, software for finding ...
. GiST is a generalization of the
B+ tree A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children. A B+ tree can be viewed as a B- ...
, providing a concurrent and recoverable height-balanced search tree infrastructure without making any assumptions about the type of data being stored, or the queries being serviced. GiST can be used to easily implement a range of well-known indexes, including
B+ tree A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children. A B+ tree can be viewed as a B- ...
s,
R-tree R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found sign ...
s, hB-trees, RD-trees, and many others; it also allows for easy development of specialized indexes for new data types. It cannot be used directly to implement non-height-balanced trees such as quad trees or prefix trees (tries), though like prefix trees it does support compression, including
lossy compression In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
. GiST can be used for any data type that can be naturally ordered into a hierarchy of
superset In mathematics, set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset of ...
s. Not only is it extensible in terms of data type support and tree layout, it allows the extension writer to support any query predicates that they choose. GiST is an example of software
extensibility Extensibility is a software engineering and systems design principle that provides for future growth. Extensibility is a measure of the ability to extend a system and the level of effort required to implement the extension. Extensions can be th ...
in the context of database systems: it allows the easy evolution of a database system to support new tree-based indexes. It achieves this by factoring out its core system infrastructure from a narrow
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standa ...
that is sufficient to capture the application-specific aspects of a wide variety of index designs. The GiST infrastructure code manages the layout of the index pages on disk, the algorithms for searching indexes and deleting from indexes, and complex transactional details such as page-level locking for high concurrency and
write-ahead logging In computer science, write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems. A write ahead log is an append-only auxiliary disk-resident structure used for crash ...
for crash recovery. This allows authors of new tree-based indexes to focus on implementing the novel features of the new index type — for example, the way in which subsets of the data should be described for search — without becoming experts in database system internals. Although originally designed for answering Boolean selection queries, GiST can also support nearest-neighbor search, and various forms of statistical
approximation An approximation is anything that is intentionally similar but not exactly equality (mathematics), equal to something else. Etymology and usage The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very ...
over large data sets.


Implementations

The most widely used GiST implementation is in the
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
; it was also implemented in the
Informix IBM Informix is a product family within IBM's Information Management division that is centered on several relational database management system (RDBMS) offerings. The Informix products were originally developed by Informix Corporation, whose I ...
Universal Server, and as a standalone library, libgist.


PostgreSQL

The PostgreSQL GiST implementation includes support for variable length keys, composite keys, concurrency control and recovery; these features are inherited by all GiST extensions. There are several contributed modules developed using GiST and distributed with PostgreSQL. For example: * rtree_gist, btree_gist - GiST implementation of R-tree and B-tree * intarray - index support for one-dimensional array of int4's * tsearch2 - a searchable (full text) data type with indexed access * ltree - data types, indexed access methods and queries for data organized as a tree-like structures * hstore - a storage for (key,value) data * cube - data type, representing multidimensional cubes The PostgreSQL GiST implementation provides the indexing support for the
PostGIS PostGIS ( ) is an open source software program that adds support for geographic objects to the PostgreSQL object-relational database. PostGIS follows the Simple Features for SQL specification from the Open Geospatial Consortium (OGC). Technicall ...
(
geographic information system A geographic information system (GIS) is a type of database containing Geographic data and information, geographic data (that is, descriptions of phenomena for which location is relevant), combined with Geographic information system software, sof ...
) and the BioPostgres
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
system.


References

*
Joseph M. Hellerstein Joseph M. Hellerstein (born ) is an American professor of Computer Science at the University of California, Berkeley, where he works on database systems and computer networks. He co-founded Trifacta with Jeffrey Heer and Sean Kandel in 2012, whi ...
,
Jeffrey F. Naughton Jeffrey Naughton is a computer scientist and former professor and department chair of Computer Sciences at the University of Wisconsin–Madison, where he was one of the leaders of thWisconsin Database Group He was lead of Google's Madison offic ...
and Avi Pfeffer
Generalized Search Trees for Database Systems
Proc. 21st Int'l Conf. on Very Large Data Bases, Zürich, September 1995, 562–573. * Marcel Kornacker, C. Mohan and Joseph M. Hellerstein
Concurrency and Recovery in Generalized Search Trees
Proc. ACM SIGMOD Conf. on Management of Data, Tucson, AZ, May 1997, 62–72. * Paul M. Aoki
Generalizing "Search" in Generalized Search Trees
Proc. 14th Int'l Conf. on Data Engineering, Orlando, FL, Feb. 1998, 380–389. * Marcel Kornacker
High-Performance Generalized Search Trees
Proc. 24th Int'l Conf. on Very Large Data Bases, Edinburgh, Scotland, September 1999. * Paul M. Aoki
How to Avoid Building DataBlades That Know the Value of Everything and the Cost of Nothing
Proc. 11th Int'l Conf. on Scientific and Statistical Database Management, Cleveland, OH, July 1999, 122–133.


External links


GiST research project website

PostgreSQL GiST Development


for the GiST support in PostgreSQL

{{in lang, ru
GiST in PostgreSQL wiki

PostGIS

BioPostgres
Trees (data structures) PostgreSQL