CrateDB
   HOME

TheInfoList



OR:

CrateDB is a distributed SQL database management system that integrates a fully searchable document-oriented
data store In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
. It is open-source, written in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
, based on a
shared-nothing architecture A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit) in a computer cluster. The intent is to eliminate contention among nodes. Nodes do ...
, and designed for high scalability. CrateDB includes components from Trino,
Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as ...
,
Elasticsearch Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is dual ...
and Netty.


History

The CrateDB project was started by Christian Lutz, Bernd Dorn, and Jodok Batlogg, an open source contributor and creator who has contributed to the Open Source Initiative Vorarlberg while at Lovely Systems in
Dornbirn Dornbirn () is a city in the westernmost Austrian state of Vorarlberg. It is the administrative centre for the district of Dornbirn, which also includes the town of Hohenems, and the market town Lustenau. Dornbirn is the largest city in Vorarlb ...
. The software is an open source, clustered database used for fast text search and analytics."CrateDB packs NoSQL flexibility, SQL familiarity"
InfoWorld. Dec. 19, 2016
The company, now called Crate.io, raised its first round of financing in April 2014. In June that year, Crate.io won the judge's choice award at the GigaOm Structure Launchpad competition. In October, Crate.io won the TechCrunch Disrupt Europe in London. Crate.io closed a $4M founding round in March 2016. In December, CrateDB 1.0 was released having more than one million downloads. CrateDB 2.0, the first Enterprise Edition of CrateDB, was released in May 2017 after a $2.5M round from Dawn Capital, Draper Esprit, Speedinvest, and Sunstone Capital. In June 2021 Crate.io announced another $10M funding round. Since September 2020, Crate.io is led by Eva Schönleitner as the CEO.


Overview

Architecture CrateDB operates in a
shared-nothing architecture A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit) in a computer cluster. The intent is to eliminate contention among nodes. Nodes do ...
as a cluster of identically configured servers (nodes). The nodes coordinate to automatically distribute the execution of both write and query operations across the cluster. Querying CrateDB's SQL syntax includes JOINs, aggregations, indexes, sub-queries, user-defined functions, and views. It also supports full-text search, geospatial queries, and nested JSON object columns. For query distribution, CrateDB implements memory-resident columnar field caches on each shard. The caches tell the query engine whether there are rows on that shard that meet the query criteria, and where the rows are located. This is performed automatically. Schemas CrateDB supports “strict”, “dynamic”, or “ignored” schemas: * Strict schema: if an INSERT statement includes a column that wasn’t defined in the table, CrateDB enforces the original schema by rejecting the INSERT and throwing an error. * Dynamic schema: CrateDB automatically updates the schema by indexing the new column. * Ignored schema: CrateDB doesn’t index the column, but it stores the plain JSON value. Consistency CrateDB implements an eventually consistent, non-blocking data insertion model. It includes record versioning, optimistic concurrency control, and a table-level refresh frequency setting, which forces CrateDB data to become consistent every ''n'' milliseconds. CrateDB supports read-after-write consistency: the queries retrieving a specific row by its primary key always receive the most recent row. All the other queries (search operations) return eventually-consistent data. Search operations are performed on share
IndexReaders
which provide caching and reverse lookup capabilities for shards. An IndexReader is always bound to the Lucene segment from which it was started, meaning it has to be refreshed in order to see new changes. Therefore, a search only sees a change if the associated IndexReader was refreshed after that change occurred. By default, this is done once per second, but it can be reconfigured to occur more or less frequently. Every replica shard is updated synchronously with its primary, and always carries the same information. Therefore, in terms of consistency, it does not matter if the primary or a replica shard is accessed. In CrateDB, only the refresh of the IndexReader affects consistency. Atomicity and durability CrateDB implements WAL (
write-ahead logging In computer science, write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems. A write ahead log is an append-only auxiliary disk-resident structure used for crash ...
): * Operations on rows (which are internally stored in CrateDB as JSON documents) are atomic. * Operations on rows are persisted to disk without having to issue a Lucene-commit for every write operation. When the translog gets flushed, all data is written to the persistent index storage of Lucene, and the translog gets cleared. * In the case of an unclean shutdown of a shard, the transactions in the translog are replayed upon startup, to ensure that all executed operations are permanent. * The translog is also directly transferred when a newly allocated replica initializes itself from the primary shard.


References


External links

* {{official website Relational database management systems Distributed data stores NewSQL Free database management systems Document-oriented databases Databases Time series software