Elliptics
   HOME

TheInfoList



OR:

Elliptics is a distributed key–value data storage with
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
code. By default it is a classic
distributed hash table A distributed hash table (DHT) is a Distributed computing, distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node (networking), node can efficiently retrieve the ...
(DHT) with multiple replicas put in different groups (distributed hashes). Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files (1 KB up to gigabytes in size, thousands to billions of objects).


History

Elliptics was created in 2007, initially as a part of POHMELFS, a cache coherent
distributed file system A clustered file system (CFS) is a file system which is shared by being simultaneously Mount (computing), mounted on multiple Server (computing), servers. There are several approaches to computer cluster, clustering, most of which do not emplo ...
developed by Linux programmer Evgeniy Polyakov. POHMELFS was announced on January 31, 2008, and merged into the staging area of the
Linux kernel The Linux kernel is a Free and open-source software, free and open source Unix-like kernel (operating system), kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the k ...
source tree in version 2.6.30, released June 9, 2009. The filesystem went practically unused and was removed again in February 2012. In 2008 Elliptics separated as an independent project. Polyakov tried different approaches to distributed data storage systems, some of them were not suitable because of their complexity and some of them were too far from a real-life ( BerkeleyDB,
LevelDB LevelDB is an open-source on-disk key-value store written by Google fellows Jeffrey Dean and Sanjay Ghemawat. Inspired by Bigtable, LevelDB source code is hosted on GitHub under the New BSD License and has been ported to a variety of Unix-base ...
, Kyoto Cabinet backends for medium and big files, different datacenters in a single DHT ring, non eventual recovery). Elliptics is eventually consistent system with multiple updated in parallel
replicas A replica is an exact (usually 1:1 in scale) copy or remake of an object, made out of the same raw materials, whether a molecule, a work of art, or a commercial product. The term is also used for copies that closely resemble the original, without ...
potentially living in physically distributed locations. Elliptics contains multiple layers from low-level on-disk store (named Eblob) up to SLRU caches and dynamic routing protocol. In 2012, Polyakov announced a new version of POHMELFS based on Elliptics. As of 2014, Elliptics is used in
Yandex Yandex LLC ( rus, Яндекс, r=Yandeks, p=ˈjandəks) is a Russian technology company that provides Internet-related products and services including a web browser, search engine, cloud computing, web mapping, online food ordering, streaming ...
Maps A map is a symbolic depiction of interrelationships, commonly spatial, between things within a space. A map may be annotated with text and graphics. Like any graphic, a map may be fixed to paper or other durable media, or may be displayed on ...
, Disk, Music, Photos, Market and infrastructure, Sputnik search engine and Coub.


Architecture

By default, Elliptics forms a distributed hash table in single group (a replica). Group may contains one or many servers as well as physical server can contain multiple elliptics groups (replicas) stored on different backends. Groups can live in different physical locations thus allowing to serve clients' requests when other locations are not accessible. A
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...
(P2P) protocol can be used to access data directly from storage servers without proxying. Elliptics supports server-side scripting in C++,
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
, based on the
Cocaine Cocaine is a tropane alkaloid and central nervous system stimulant, derived primarily from the leaves of two South American coca plants, ''Erythroxylum coca'' and ''Erythroxylum novogranatense, E. novogranatense'', which are cultivated a ...
technology, SLRU cache and multiple pluggable backends (eblob is the fastest for medium and large data and the most popular one). Elliptics clients connect directly to all storage servers which helps to: * Execute
lookup In computer science, a lookup table (LUT) is an array that replaces runtime computation of a mathematical function with a simpler array indexing operation, in a process termed as ''direct addressing''. The savings in processing time can be sig ...
in ''O''(1) network requests (single network request per replica) * Run write/update commands into multiple replicas in parallel There are several
application programming interface An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...
s (APIs) for data access: * Asynchronous feature-promise C++ library * Python binding * Go binding * HTTP-proxy named Rift with buckets and ACLs based on TheVoid library (using boost::asio) * Community-driven Erlang bindings


Features

* Distributed hash tables, no metadata servers, true horizontal scaling * Data replication – replicas can be stored in different physical locations * Range and bulk requests * Different I/O storage backends, API to create own low-level storage backends * Automatic data repartitioning in case of removed or added nodes * Eventually consistent recovery * Consistent hashing addressing model * Cluster statistics * Frontend: HTTP; bindings: C/ C++, Go,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
* Server-side script execution support (write trigger analog) * Distributed SLRU cache with TTL * P2P streaming support (eblob and file backends only - external applications like
Nginx (pronounced "engine x" , stylized as NGINX or nginx) is a web server that can also be used as a reverse proxy, load balancer, mail proxy and HTTP cache. The software was created by Russian developer Igor Sysoev and publicly released in 20 ...
web server can stream data from eblob object files directly to clients without proxying)


Problems and restrictions

*
Eventual consistency Eventual consistency is a consistency model used in distributed computing to achieve high availability. Put simply: if no new updates are made to a given data item, ''eventually'' all accesses to that item will return the last updated value. Eve ...
: as Elliptics is fully distributed in case of emergency server can possibly return a file copy which is older than an actual one. Sometimes it can be unacceptable. In these cases due to time loses it is better to use more reliable ways of data request. * Network between client and servers can become a weak point as data is written on several servers in parallel. * API may be inconvenient for high-level requests. Elliptics does not provide its users with SQL-like data requests. * Elliptics does not have high-level transactions support that is why it is impossible to guarantee that a command group will be fully executed or will not be executed at all. * Transactions are only atomic within group and are locked based on primary key.


Documentation

Elliptics and its supporting projects are being documented at communit
wiki
. It contains high-level design docs, tutorial, low-level details and knowledge base. Elliptics and related projects are discussed in open Googl
group


See also

*
MongoDB MongoDB is a source-available, cross-platform, document-oriented database program. Classified as a NoSQL database product, MongoDB uses JSON-like documents with optional database schema, schemas. Released in February 2009 by 10gen (now MongoDB ...
*
CouchDB Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang. CouchDB uses multiple formats and protocols to store, transfer, and process its data. It uses JSON to store data, JavaScript as its query language using ...
*
Couchbase Server Couchbase Server, originally known as Membase, is a source-available, distributed (shared-nothing architecture) multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve m ...
*
Apache Cassandra Apache Cassandra is a free and open-source software, free and open-source database management system designed to handle large volumes of data across multiple Commodity computing, commodity servers. The system prioritizes availability and scalab ...
*
HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File Sy ...
*
Riak Riak (pronounced "ree-ack" ) is a distributed NoSQL key-value data store that offers high availability, fault tolerance, operational simplicity, and scalability. Riak moved to an entirely open-source project in August 2017, with many of the ...
*
Elasticsearch Elasticsearch is a Search engine (computing), search engine based on Apache Lucene, a free and open-source search engine. It provides a distributed, Multitenancy, multitenant-capable full-text search engine with an HTTP web interface and schema ...
*
Memcached Memcached (pronounced variously /mɛmkæʃˈdiː/ ''mem-cash-dee'' or /ˈmɛmkæʃt/ ''mem-cashed'') is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and object ...
*
Redis Redis (; Remote Dictionary Server) is an in-memory key–value database, used as a distributed cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low- latency reads ...


References


External links

* . {{Cloud computing Key-value databases Software using the GNU Lesser General Public License NoSQL products Yandex software