HOME

TheInfoList



OR:

TokuMX is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
distribution of
MongoDB MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Serve ...
which, among other things, replaces the default B-tree
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
found in the basic MongoDB distribution with a
fractal tree index In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Li ...
. It is a drop-in replacement for MongoDB (applications will run "as is") that offers the scalability and performance improvements associated with fractal tree indexing. It also adds support for document-level locking, transaction support with
ACID In computer science, ACID ( atomicity, consistency, isolation, durability) is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequ ...
and MVCC, and replication optimization; it does not support full-text search. TokuMX is specifically designed for high performance on write-intensive workloads. It achieves this using a fractal tree index, which replaces 40-year-old B-tree indexing and is based on cache-oblivious algorithms. This approach to building memory-efficient systems was originally jointly developed by researchers at the
Massachusetts Institute of Technology The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the ...
, Rutgers University, and the
State University of New York at Stony Brook Stony Brook University (SBU), officially the State University of New York at Stony Brook, is a public research university in Stony Brook, New York. Along with the University at Buffalo, it is one of the State University of New York system's ...
(SUNY). TokuMX is a
scalable Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
, ACID, and MVCC compliant distribution of MongoDB that provides indexing-based query improvements, offers online
schema The word schema comes from the Greek word ('), which means ''shape'', or more generally, ''plan''. The plural is ('). In English, both ''schemas'' and ''schemata'' are used as plural forms. Schema may refer to: Science and technology * SCHEMA ...
modifications, and reduces
slave Slavery and enslavement are both the state and the condition of being a slave—someone forbidden to quit one's service for an enslaver, and who is treated by the enslaver as property. Slavery typically involves slaves being made to perf ...
lag for both
hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
s and
flash memory Flash memory is an electronic non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for the NOR and NAND logic gates. Both us ...
. It also adds transactions with MVCC and ACID reliability to any MongoDB application, making MongoDB suitable for a much wider range of solutions. Most TokuMX source files are made available under the terms of the
GNU Affero General Public License The GNU Affero General Public License (GNU AGPL) is a free, copyleft license published by the Free Software Foundation in November 2007, and based on the GNU General Public License, version 3 and the Affero General Public License. The Free So ...
(AGPL). The TokuKV Fractal Tree Indexing library is made available under the terms of the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
(GPL) version 2, with an additional grant of a patent license.


B-trees

Most relational databases use indexes to increase query performance. Databases can leverage indexes to significantly reduce the amount of data they examine while responding to queries. Indexes are commonly implemented with B-trees, a data structure first described in 1970. The B-tree data structure allows for operations like inserting data and sorted order iteration, the primary operation used by an index. Depending on the workload and implementation, B-tree performance can be limited by the random I/O characteristics of disks. In addition, while freshly loaded databases tend to have good sequential behavior, this behavior becomes increasingly difficult to maintain as a database grows, resulting in more random I/O and performance challenges. With the advent of
big data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
and the ever-increasing database needs of the 21st century, many niche databases have been created to get around the limitations of 50-year-old B-tree indexing. These include some optimized for reads, some optimized for writes, and a series of other special-purpose databases designed for a narrow problem set.


Fractal tree indexes


Overview

Fractal tree index In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Li ...
ing technology is a new approach to indexing that replaces B-trees. Fractal tree indexes implement the same operations as a B-tree, and thus are a drop-in replacement for B-trees. Fractal tree indexes effectively replaces small, frequent writes with larger, less frequent ones, enabling better compression and insertion performance. Fractal trees also allow for messages to be injected into the tree in such a fashion that schema changes such as adding or dropping a
column A column or pillar in architecture and structural engineering is a structural element that transmits, through compression, the weight of the structure above to other structural elements below. In other words, a column is a compression member. ...
, or adding an index can be done online and in the background. As a result, more indexes can be maintained without a drop in performance. This is because adding data to indexes tends to stress the performance of B-trees, but performs well in fractal tree indexes. Fractal tree index modifications do not cause the database files to
fragment Fragment may refer to: Entertainment Television and film * "Fragments" (''Torchwood''), an episode from the BBC TV series * "Fragments", an episode from the Canadian TV series ''Sanctuary'' * "Fragments" (Steven Universe Future), an episode f ...
, so periodic maintenance to compact files is not needed.


Uses

Fractal tree indexes can be applied to a number of applications characterized by near-real time analysis of streaming data. They can be used as the storage layer of a database or as the storage layer of a file system. When used in a database, they can be used in any setting where a B-tree is used, with improved performance. Examples include: network event management, online advertising networks, web 2.0 and clickstream analytics, and air traffic control management. Other uses include accelerated crawler performance for
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
s for
social media Social media are interactive media technologies that facilitate the creation and sharing of information, ideas, interests, and other forms of expression through virtual communities and networks. While challenges to the definition of ''social medi ...
sites. It can also be used to create indexes and columns online, enabling query flexibility for ecommerce personalization. It is also suited to improving performance and reducing existing loads on transactional websites. In general it performs well in applications that must simultaneously store
log file In computing, logging is the act of keeping a log of events that occur in a computer system, such as problems, errors or just information on current operations. These events may occur in the operating system or in other software. A message or lo ...
data and execute ''ad hoc'' queries.


See also

*
Database engine A database engine (or storage engine) is the underlying software component that a database management system (DBMS) uses to create, read, update and delete (CRUD) data from a database. Most database management systems include their own application ...
*
MongoDB MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Serve ...
* TokuDB


References


External links


TokuMX product page

TokuMX source code on Github

Comparison of MySQL Storage Engines



TokuView
{sndinfo on the official TokuDB blog
DBMS2.com Overview of Tokutek
Database engines Free database management systems NoSQL Database-related software for Linux Software using the GNU AGPL license