Log-structured Merge-tree

picture info	Log-structured Merge-tree In computer science, the log-structured merge-tree (also known as LSM tree, or LSMT) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches. One simple version of the LSM tree is a two-level LSM tree. As described by Patrick O'Neil, a two-level LSM tree comprises two tree-like structures, called C0 and C1. C0 is smaller and entirely resident in memory, whereas C1 is resident on disk. New records are inserted into the memory-resident C0 component. If the insertion causes the C0 component to exceed a certain size threshold, a contiguous segment of entries is removed from C0 and merged into C1 on disk. ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Patrick O'Neil Patrick Eugene O'Neil (1942 – September 20, 2019) was an American computer scientist, an expert on databases, and a professor of computer science at the University of Massachusetts Boston.Curriculum vitae retrieved 2010-11-26. O'Neil did his undergraduate studies at the Massachusetts Institute of Technology , receiving a B.S. in mathematics in 1963. After earning a master's degree at the University of Chicago , he moved to [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Bigtable Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio. History Bigtable development began in 2004.. It is now used by a number of Google applications, such as Google Analytics, web indexing, MapReduce, which is often used for generating and modifying data stored in Bigtable, Google Maps,. Google Books search, "My Search History", Google Earth, Blogger.com, Google Code hosting, YouTube, and Gmail. Google's reasons for developing its own database include scalability and better control of performance characteristics. Google's Spanner RDBMS is layered on an implementation of Bigtable with a Paxos group for two-phase commits to each table. Google F1 was built using Spanner to replace an implementation based on MySQL. Apache HBase and Cassandra are some of the best known open source projects that were modeled after Bigtable. On May 6, 2015, a public version of Bigtabl ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Scylla (database) ScyllaDB is an open-source distributed NoSQL wide-column data store. It was designed to be compatible with Apache Cassandra while achieving significantly higher throughputs and lower latencies. It supports the same protocols as Cassandra ( CQL and Thrift) and the same file formats (SSTable), but is a completely rewritten implementation, using the C++20 language replacing Cassandra's Java, and the Seastar asynchronous programming library replacing classic Linux programming techniques such as threads, shared memory and mapped files. In addition to implementing Cassandra's protocols, ScyllaDB also implements the Amazon DynamoDB API. ScyllaDB uses a sharded design on each node, meaning that each CPU core handles a different subset of data. Cores do not share data, but rather communicate explicitly when they need to. The ScyllaDB authors claim that this design allows ScyllaDB to achieve much better performance on modern NUMA SMP machines, and to scale very well with the number of cor ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	InfluxDB InfluxDB is an open-source time series database (TSDB) developed by the company InfluxData. It is written in the Go programming language for storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics. It also has support for processing data from Graphite. History Y Combinator-backed company Errplane began developing InfluxDB as an open-source project in late 2013 for performance monitoring and alerting. Errplane raised an $8.1M Series A financing led by Mayfield Fund and Trinity Ventures in November 2014. In late 2015, Errplane officially changed its name to InfluxData Inc. InfluxData raised Series B round of funding of $16 million in September 2016. In February 2018, InfluxData closed a $35 million Series C round of funding led by Sapphire Ventures. Another round of $60 million was disclosed in 2019. Technical overview InfluxDB has no external dependencies and provid ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	WiredTiger WiredTiger is a NoSQL, Open Source extensible platform for data management. It is released under version 2 or 3 of the GNU General Public License. WiredTiger uses MultiVersion Concurrency Control ( MVCC) architecture. MongoDB MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Ser ... acquired WiredTiger Inc. on December 16, 2014. The WiredTiger storage engine is the default storage engine starting in MongoDB version 3.2. It provides a document-level concurrency model, checkpointing, and compression, among other features. In MongoDB Enterprise, WiredTiger also supports Encryption At Rest. References Further reading * * Wolpe, Toby (December 16, 2014)"MongoDB snaps up WiredTiger and its storage expert team" ZDNet. Accessed July 8, 2016. * * * * * {{cite web , last=Wolpe , first=Tob ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	RocksDB RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit many CPU cores, and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree (LSM tree) data structure. It is written in C++ and provides official language bindings for C++, C, and Java; alongside many third-party language bindings. RocksDB is open-source software, and was originally released under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license, possibly in response to the Apache Software Foundation's blacklist of the previous BSD+Patents license clause. RocksDB is used in production systems at various web-scale enterprises including Facebook, Yahoo!, and LinkedIn. Features RocksDB, like LevelDB, stores keys and values in arbitrary byte arrays, and data is sorted byte-wise b ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Tarantool Tarantool is an in-memory computing platform with a flexible data schema, best used for creating high-performance applications. Two main parts of it are an in-memory database and a Lua application server. Tarantool maintains data in memory and ensures crash resistance with write-ahead logging and snapshotting. It includes a Lua interpreter and interactive console, but also accepts connections from programs in several other languages. History Mail.Ru, one of the largest Internet companies in Russia, started the project in 2008 as part of the development of Moy Mir (My World) social network. In 2010 for a project head it hired a former technical lead from MySQL. Open-source contributors have been active especially in the area of external-language connectors for C, Perl, PHP, Python, Ruby, and node.js. Tarantool became part of the Mail.Ru backbone, used for dynamic content such as user sessions, unsent instant messages, task queues, and a caching layer for traditional relatio ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	SQLite4 SQLite (, ) is a database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it belongs to the family of embedded databases. It is the most widely deployed database engine, as it is used by several of the top web browsers, operating systems, mobile phones, and other embedded systems. Many programming languages have bindings to the SQLite library. It generally follows PostgreSQL syntax, but does not enforce type checking by default. This means that one can, for example, insert a string into a column defined as an integer. History D. Richard Hipp designed SQLite in the spring of 2000 while working for General Dynamics on contract with the United States Navy. Hipp was designing software used for a damage-control system aboard guided-missile destroyers, which originally used HP-UX with an IBM Informix database back-end. SQLite began as a Tcl extension. In August 2000, ver ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Apache Accumulo Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels and server-side programming mechanisms. According to DB-Engines ranking, Accumulo is the third most popular NoSQL wide column store behind Apache Cassandra and HBase and the 67th most popular database engine of any type (complete) as of 2018. History Accumulo was created in 2008 by the US National Security Agency and contributed to the Apache Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the Ap ... as an incubator project in September 2011. [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	LevelDB LevelDB is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable, LevelDB is hosted on GitHub under the New BSD License and has been ported to a variety of Unix-based systems, macOS, Windows, and Android. Features LevelDB stores keys and values in arbitrary byte arrays, and data is sorted by key. It supports batching writes, forward and backward iteration, and compression of the data via Google's Snappy compression library. LevelDB is not an SQL database. Like other NoSQL and dbm stores, it does not have a relational data model and it does not support SQL queries. Also, it has no support for indexes. Applications use LevelDB as a library, as it does not provide a server or command-line interface. MariaDB 10.0 comes with a storage engine which allows users to query LevelDB tables from MariaDB. History LevelDB is based on concepts from Google's Bigtable database system. The table implementation for the Bigtab ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection). HBase features compression, in-memory operation, and Bloom filters on a per-column basis as outlined in the original Bigtable paper. Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST, Avro or Thrift gateway APIs. HBase is a wide-column store and h ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	B+ Tree A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children. A B+ tree can be viewed as a B-tree in which each node contains only keys (not key–value pairs), and to which an additional level is added at the bottom with linked leaves. The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context — in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have very high fanout (number of pointers to child nodes in a node, typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree. History There is no single paper introducing the B+ tree concept. Instead, the notion of maintaining all data in leaf nodes is repeatedly brought up as an interesting variant. Douglas Comer notes in an ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]