Distributed data store
   HOME

TheInfoList



OR:

A distributed data store is a
computer network A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are ...
where information is stored on more than one
node In general, a node is a localized swelling (a " knot") or a point of intersection (a vertex). Node may refer to: In mathematics * Vertex (graph theory), a vertex in a mathematical graph * Vertex (geometry), a point where two or more curves, line ...
, often in a replicated fashion. It is usually specifically used to refer to either a
distributed database A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location (e.g. a data centre); or maybe dispersed over a network of interconnect ...
where users store information on a ''number of nodes'', or a
computer network A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are ...
in which users store information on a ''number of peer network nodes''.


Distributed databases

Distributed database A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location (e.g. a data centre); or maybe dispersed over a network of interconnect ...
s are usually non-relational databases that enable a quick access to data over a large number of nodes. Some distributed databases expose rich query abilities while others are limited to a key-value store semantics. Examples of limited distributed databases are
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
's
Bigtable Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio. History Bigtable development began in 2004.. It is now used by a number of Googl ...
, which is much more than a
distributed file system A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage fo ...
or a peer-to-peer network,
Amazon Amazon most often refers to: * Amazons, a tribe of female warriors in Greek mythology * Amazon rainforest, a rainforest covering most of the Amazon basin * Amazon River, in South America * Amazon (company), an American multinational technolog ...
's
Dynamo "Dynamo Electric Machine" (end view, partly section, ) A dynamo is an electrical generator that creates direct current using a commutator. Dynamos were the first electrical generators capable of delivering power for industry, and the foundati ...
and Microsoft Azure Storage. As the ability of arbitrary querying is not as important as the
availability In reliability engineering, the term availability has the following meanings: * The degree to which a system, subsystem or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at ...
, designers of distributed data stores have increased the latter at an expense of consistency. But the high-speed read/write access results in reduced consistency, as it is not possible to guarantee both
consistency In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
and availability on a partitioned network, as stated by the
CAP theorem In theoretical computer science, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that any distributed data store can provide only two of the following three guarantees:Seth Gilbert and Nancy Lynch"Brewe ...
.


Peer network node data stores

In peer network data stores, the user can usually reciprocate and allow other users to use their computer as a storage node as well. Information may or may not be accessible to other users depending on the design of the network. Most
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer ...
networks do not have distributed data stores in that the user's data is only available when their node is on the network. However, this distinction is somewhat blurred in a system such as BitTorrent, where it is possible for the originating node to go offline but the content to continue to be served. Still, this is only the case for individual files requested by the redistributors, as contrasted with networks such as
Freenet Freenet is a peer-to-peer platform for censorship-resistant, anonymous communication. It uses a decentralized distributed data store to keep and deliver information, and has a suite of free software for publishing and communicating on the Web ...
, Winny, Share and Perfect Dark where any node may be storing any part of the files on the network. Distributed data stores typically use an
error detection and correction In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable commu ...
technique. Some distributed data stores (such as
Parchive Parchive (a portmanteau of parity archive, and formally known as Parity Volume Set Specification) is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operatio ...
over NNTP) use
forward error correction In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea i ...
techniques to recover the original file when parts of that file are damaged or unavailable. Others try again to download that file from a different mirror.


Examples


Distributed non-relational databases


Peer network node data stores

* BitTorrent *
Blockchain (database) A blockchain is a type of distributed ledger technology (DLT) that consists of growing lists of records, called ''blocks'', that are securely linked together using cryptography. Each block contains a cryptographic hash of the previous block, ...
* Chord project *
Freenet Freenet is a peer-to-peer platform for censorship-resistant, anonymous communication. It uses a decentralized distributed data store to keep and deliver information, and has a suite of free software for publishing and communicating on the Web ...
* GNUnet *
IPFS The InterPlanetary File System (IPFS) is a protocol, hypermedia and file sharing peer-to-peer network for storing and sharing data in a distributed file system. IPFS uses content-addressing to uniquely identify each file in a global namespace ...
* Mnet *
Napster Napster was a peer-to-peer file sharing application. It originally launched on June 1, 1999, with an emphasis on digital audio file distribution. Audio songs shared on the service were typically encoded in the MP3 format. It was founded by Sha ...
*
NNTP The Network News Transfer Protocol (NNTP) is an application protocol used for transporting Usenet news articles (''netnews'') between news servers, and for reading/posting articles by the end user client applications. Brian Kantor of the Univers ...
(the distributed data storage protocol used for
Usenet Usenet () is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it wa ...
news) * Unity, of the software Perfect Dark * Share * Siacoin * DeNet *
Storage@home Storage@home was a distributed data store project designed to store massive amounts of scientific data across a large number of volunteer machines. The project was developed by some of the Folding@home team at Stanford University, from about 2007 ...
* STORJ *
Tahoe-LAFS Tahoe-LAFS (Tahoe Least-Authority File Store) is a free and open, secure, decentralized, fault-tolerant, distributed data store and distributed file system. It can be used as an online backup system, or to serve as a file or Web host similar t ...
* Winny *
ZeroNet ZeroNet is a decentralized web-like network of peer-to-peer users, created by Tamas Kocsis in 2015, programming for the network was based in Budapest, Hungary; is built in Python; and is fully open source. Instead of having an IP address, si ...


See also

*
Cooperative storage cloud A cooperative storage cloud is a decentralized model of networked online storage where data is stored on multiple computers ( nodes), hosted by the participants cooperating in the cloud. For the cooperative scheme to be viable, the total storage ...
* Data store *
Distributed file system A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage fo ...
* Keyspace, the DDS
schema The word schema comes from the Greek word ('), which means ''shape'', or more generally, ''plan''. The plural is ('). In English, both ''schemas'' and ''schemata'' are used as plural forms. Schema may refer to: Science and technology * SCHEMA ...
*
Peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer ...
*
Distributed hash table A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table: key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The ...
*
Distributed cache In computing, a distributed cache is an extension of the traditional concept of cache used in a single locale. A distributed cache may span multiple servers so that it can grow in size and in transactional capacity. It is mainly used to store app ...
* Cyber Resilience


References

{{Reflist Data management ja:分散ファイルシステム#分散データストア