Block Range Index
   HOME

TheInfoList



OR:

A Block Range Index or BRIN is a
database index A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without ...
ing technique. They are intended to improve performance with extremely large tables. BRIN indexes provide similar benefits to horizontal partitioning or
sharding A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load. Some data within a database remains present in all shards, but some ...
but without needing to explicitly declare partitions. A BRIN is applicable to an index on a table that is large and where the index key value is easily sorted and evaluated with a MinMax function. BRIN were originally proposed by Alvaro Herrera of 2ndQuadrant in 2013 as 'Minmax indexes'. Implementations thus far are tightly coupled to internal implementation and storage techniques for the database tables. This makes them efficient, but limits them to particular vendors. So far
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
is the only vendor to have announced a live product with this specific feature, in PostgreSQL 9.5. Other vendors have described some similar features, including
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word '' ...
,
Netezza IBM Netezza (pronounced ne-teez-a) is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, busines ...
'zone maps',
Infobright Infobright is a commercial provider of column-oriented relational database software with a focus in machine-generated data. The company's head office is located in Toronto, Ontario, Canada. Most of its research and development is based in Wars ...
'data packs',
MonetDB MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against lar ...
and
Apache Hive Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditi ...
with ORC/Parquet.


Design

BRIN operate by "summarising" large blocks of data into a compact form, which can be efficiently tested to exclude many of them from a database query, early on. These tests exclude a large block of data for each comparison. By reducing the data volume so early on, both by representing large blocks as small tuples, and by eliminating many blocks, BRIN substantially reduce the amount of detailed data that must be examined by the database node on a row-by-row basis. Data storage in large databases is layered and chunked, with the table storage arranged into 'blocks'. Each block contains perhaps 1MB in each chunk and they are retrieved by requesting specific blocks from a disk-based storage layer. BRIN are a lightweight in-memory summary layer above this: each tuple in the index summarises one block as to the range of the data contained therein: its minimum and maximum values, and if the block contains any non-null data for the column(s) of interest. Unlike a traditional index which ''locates'' the regions of the table containing values of interest, BRIN act as "negative indexes", showing the blocks that are definitely ''not'' of interest and thus do not need to be processed further. Some simple benchmarks suggest a five-fold improvement in search performance with an index scan, compared to the unindexed table. Compared to B-trees, they avoid their maintenance overhead. As BRIN are so lightweight, they may be held entirely in memory, thus avoiding disk overhead during the scan. The same may not be true of B-tree: B-tree requires a tree node for every approximately ''N rows'' in the table, where N is the capacity of a single node, thus the index size is large. As BRIN only requires a tuple for each block (of many rows), the index becomes sufficiently small to make the difference between disk and memory. For a 'narrow' table the B-tree index volume approaches that of the table itself; the BRIN may be only 5-15% of it.


Advantages


Search and index scan

A large database index would typically use
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for n ...
algorithms. BRIN is not always a substitute for B-tree, it is an improvement on sequential scanning of an index, with particular (and potentially large) advantages when the index meets particular conditions for being ordered and for the search target to be a narrow set of these values. In the general case, with random data, B-tree may still be superior. A particular advantage of the BRIN technique, shared with Oracle Exadata's Smart Scanning, is in the use of this type of index with
Big Data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
or
data warehousing In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integra ...
applications, where it is known that almost all of the table is irrelevant to the range of interest. BRIN allows the table to be queried in such cases by only retrieving blocks that ''may'' contain data of interest and excluding those which are clearly outside the range, or contain no data for this column.


Insert

A regular problem with the processing of large tables is that retrieval requires the use of an index, but maintaining this index slows down the addition of new records. Typical practices have been to group additions together and add them as a single bulk transaction, or to drop the index, add the batch of new records and then recreate the index. Both of these are disruptive to simultaneous read / write operations and may not be possible in some continuously-operating businesses. With BRIN, the slowdown from maintaining the index is much reduced compared to B-tree. Wong reports that B-tree slowed down additions to an unindexed 10GB table by 85%, but a comparable BRIN only had an overhead of 11%.


Index creation

BRIN may be created for extremely large data where B-tree would require horizontal partitioning. Creating the BRIN is also much faster than for a B-tree, by 80%. This would be a useful improvement to
refactoring In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the '' factoring''—without changing its external behavior. Refactoring is intended to improve the design, structure ...
existing database applications that use the drop-add-reindex approach, without requiring code changes.


Implementation


Dependence on table ordering

Multiple BRIN may be defined for different columns on a single table. However, there are restrictions. BRIN are only efficient if the ordering of the key values follows the organisation of blocks in the storage layer. In the simplest case, this could require the physical ordering of the table, which is often the creation order of the rows within it, to match the key's order. Where this key is a creation date, that may be a trivial requirement. If the data is truly random, or if there is much churn of the key values in a 'hot' database, the assumptions underlying BRIN may break down. All blocks contain entries "of interest" and so few may be excluded early on by the BRIN range filter. In most cases, BRIN is restricted to a single index per table. Multiple BRIN may be defined, but only one is likely to have suitable ordering. If two (or more) indexes have similar ordering behaviour, it may be possible and useful to define multiple BRIN on the same table. An obvious example is where both a creation date and a record_id column both increase
monotonic In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order ...
ally with the record creation sequence. In other cases, the key value may not be monotonic, but provided that there is still a strong grouping within the record's physical order, BRIN is effective.


Exadata Storage Indexes

BRIN have some similarities to
Oracle Exadata The Oracle Exadata Database Machine (Exadata) is a computing platform optimized for running Oracle Databases. Exadata is a combined hardware and software platform that includes scale-out Intel x86-64 compute and storage servers, RoCE or InfiniB ...
"
Storage Index A Block Range Index or BRIN is a database indexing technique. They are intended to improve performance with extremely large tables. BRIN indexes provide similar benefits to horizontal partitioning or sharding but without needing to explicitly decl ...
es". Exadata has the strong concept of a 'storage layer' in its architecture stack. Table data is held in blocks or 'storage cells' on the storage servers. These storage cells are opaque to the storage server and are returned to the database engine on request, by their identifier. Previously, the database nodes must request all the storage cells in order to scan them. Storage Indexes provides data pruning at this layer: efficiently indicating sections that are of no further interest. The Storage Index is loaded into memory on the storage server, so that when a request for cells is issued it may be predicated with search values. These are compared to the Storage Index and then only the relevant cells need be returned to the database node. Performance advantages with a Storage Index are most evident when the indexed column contains many
null Null may refer to: Science, technology, and mathematics Computing *Null (SQL) (or NULL), a special marker and keyword in SQL indicating that something has no value *Null character, the zero-valued ASCII character, also designated by , often used ...
s. Massive performance advantages are gained when scanning across sparse data.


Development

Development for PostgreSQL was carried out as part of the
AXLE project An axle or axletree is a central shaft for a rotating wheel or gear. On wheeled vehicles, the axle may be fixed to the wheels, rotating with them, or fixed to the vehicle, with the wheels rotating around the axle. In the former case, bearin ...
(Advanced Analytics for Extremely Large European Databases) This work was partially funded by the European Union's
Seventh Framework Programme The Framework Programmes for Research and Technological Development, also called Framework Programmes or abbreviated FP1 to FP9, are funding programmes created by the European Union/European Commission to support and foster research in the Europea ...
(FP7/2007-2013).


PostgreSQL

Implementation for PostgreSQL was first evident in 2013. BRIN appeared in release 9.5 of
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
at the start of 2016.


See also

*
Sharding A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load. Some data within a database remains present in all shards, but some ...


Notes


References

{{Reflist, 30em Database index techniques PostgreSQL