Elliptics
   HOME

TheInfoList



OR:

Elliptics is a distributed key–value data storage with
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups (distributed hashes). Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files (1 KB up to gigabytes in size, thousands to billions of objects).


History

Elliptics was created in 2007, initially as a part of POHMELFS, a cache coherent distributed file system developed by Linux programmer Evgeniy Polyakov. POHMELFS was announced on January 31, 2008, and merged into the staging area of the
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ope ...
source tree in version 2.6.30, released June 9, 2009. The filesystem went practically unused and was removed again in February 2012. In 2008 Elliptics separated as an independent project. Polyakov tried different approaches to distributed data storage systems, some of them were not suitable because of their complexity and some of them were too far from a real-life (
BerkeleyDB Berkeley DB (BDB) is an unmaintained embedded database software library for key/value data, historically significant in open source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitr ...
, LevelDB, Kyoto Cabinet backends for medium and big files, different datacenters in a single DHT ring, non eventual recovery). Elliptics is eventually consistent system with multiple updated in parallel
replicas A 1:1 replica is an exact copy of an object, made out of the same raw materials, whether a molecule, a work of art, or a commercial product. The term is also used for copies that closely resemble the original, without claiming to be identical. Al ...
potentially living in physically distributed locations. Elliptics contains multiple layers from low-level on-disk store (named Eblob) up to SLRU caches and dynamic routing protocol. In 2012, Polyakov announced a new version of POHMELFS based on Elliptics. As of 2014, Elliptics is used in Yandex Maps, Disk, Music, Photos, Market and infrastructure, Sputnik search engine and
Coub Coub is a video streaming platform available on the web, iOS and Android. It allows users to create and share looped audio-visual collages up to ten seconds long, using existing video clips from YouTube, Vimeo, and other popular video sharing ...
.


Architecture

By default, Elliptics forms a distributed hash table in single group (a replica). Group may contains one or many servers as well as physical server can contain multiple elliptics groups (replicas) stored on different backends. Groups can live in different physical locations thus allowing to serve clients' requests when other locations are not accessible. A
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer n ...
(P2P) protocol can be used to access data directly from storage servers without proxying. Elliptics supports server-side scripting in C++,
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
, based on the
Cocaine Cocaine (from , from , ultimately from Quechuan languages, Quechua: ''kúka'') is a central nervous system (CNS) stimulant mainly recreational drug use, used recreationally for its euphoria, euphoric effects. It is primarily obtained from t ...
technology, SLRU cache and multiple pluggable backends (eblob is the fastest for medium and large data and the most popular one). Elliptics clients connect directly to all storage servers which helps to: * Execute lookup in ''O''(1) network requests (single network request per replica) * Run write/update commands into multiple replicas in parallel There are several
application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
s (APIs) for data access: * Asynchronous feature-promise C++ library * Python binding * Go binding * HTTP-proxy named Rift with buckets and ACLs based on TheVoid library (using boost::asio) * Community-driven Erlang bindings


Features

* Distributed hash tables, no metadata servers, true horizontal scaling * Data replication – replicas can be stored in different physical locations * Range and bulk requests * Different I/O storage backends, API to create own low-level storage backends * Automatic data repartitioning in case of removed or added nodes * Eventually consistent recovery * Consistent hashing addressing model * Cluster statistics * Frontend: HTTP; bindings: C/ C++, Go,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
* Server-side script execution support (write trigger analog) * Distributed SLRU cache with TTL * P2P streaming support (eblob and file backends only - external applications like
Nginx Nginx (pronounced "engine x" ) is a web server that can also be used as a reverse proxy, load balancer, mail proxy and HTTP cache. The software was created by Igor Sysoev and publicly released in 2004. Nginx is free and open-source software ...
web server can stream data from eblob object files directly to clients without proxying)


Problems and restrictions

* Eventual consistency: as Elliptics is fully distributed in case of emergency server can possibly return a file copy which is older than an actual one. Sometimes it can be unacceptable. In these cases due to time loses it is better to use more reliable ways of data request. * Network between client and servers can become a weak point as data is written on several servers in parallel. * API may be inconvenient for high-level requests. Elliptics does not provide its users with SQL-like data requests. * Elliptics does not have high-level transactions support that is why it is impossible to guarantee that a command group will be fully executed or will not be executed at all. * Transactions are only atomic within group and are locked based on primary key.


Documentation

Elliptics and its supporting projects are being documented at communit
wiki
It contains high-level design docs, tutorial, low-level details and knowledge base. Elliptics and related projects are discussed in open Googl
group


See also

*
MongoDB MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Serve ...
* CouchDB * Couchbase Server * Apache Cassandra *
HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File Sys ...
* Riak * Elasticsearch *
Memcached Memcached (pronounced variously ''mem-cash-dee'' or ''mem-cashed'') is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of t ...
*
Redis Redis (; Remote Dictionary Server) is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, su ...


References


External links

* . {{Cloud computing Key-value databases Software using the LGPL license NoSQL products Yandex software