HOME

TheInfoList



OR:

In
distributed computing A distributed system is a system whose components are located on different computer network, networked computers, which communicate and coordinate their actions by message passing, passing messages to one another from any system. Distributed com ...
, a conflict-free replicated data type (CRDT) is a
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
that is replicated across multiple computers in a
network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
, with the following features: # The application can update any replica independently, concurrently and without coordinating with other replicas. # An algorithm (itself part of the data type) automatically resolves any inconsistencies that might occur. # Although replicas may have different state at any particular point in time, they are guaranteed to eventually converge. The CRDT concept was formally defined in 2011 by Marc Shapiro, Nuno Preguiça, Carlos Baquero and Marek Zawirski. Development was initially motivated by collaborative text editing and
mobile computing Mobile computing is human–computer interaction in which a computer is expected to be transported during normal usage, which allows for the transmission of data, voice, and video. Mobile computing involves mobile communication, mobile hardware ...
. CRDTs have also been used in
online chat Online chat may refer to any kind of communication over the Internet that offers a real-time text, real-time transmission of text-based, text messages from sender to receiver. Chat messages are generally short in order to enable other participa ...
systems,
online gambling Online gambling is any kind of gambling conducted on the internet. This includes virtual poker, casinos and sports betting. The first online gambling venue opened to the general public was ticketing for the Liechtenstein International Lottery in ...
, and in the
SoundCloud SoundCloud is an online audio distribution platform and music sharing website that enables its users to upload, promote, and share audio. Founded in 2007 by Alexander Ljung and Eric Wahlforss, SoundCloud is one of the largest music streaming se ...
audio distribution platform. The
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
distributed databases
Redis Redis (; Remote Dictionary Server) is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, su ...
,
Riak Riak (pronounced "ree-ack" ) is a distributed NoSQL key-value data store based on Amazon's Dynamo paper, including its "tunable AP" approach, that is tunable consistency, to the tradeoffs imposed by the CAP Theorem. Riak offers high availability, ...
and
Cosmos DB Azure Cosmos DB is Microsoft's proprietary globally distributed, multi-model database service "for managing data at planet-scale" launched in May 2017. It is schema-agnostic, horizontally scalable, and generally classified as a NoSQL database. ...
have CRDT data types.


Background

Concurrent updates to multiple replicas of the same data, without coordination between the computers hosting the replicas, can result in inconsistencies between the replicas, which in the general case may not be resolvable. Restoring consistency and data integrity when there are conflicts between updates may require some or all of the updates to be entirely or partially dropped. Accordingly, much of distributed computing focuses on the problem of how to prevent concurrent updates to replicated data. But another possible approach is
optimistic replication Optimistic replication, also known as lazy replication, is a strategy for replication, in which replicas are allowed to diverge. Traditional pessimistic replication systems try to guarantee from the beginning that all of the replicas are identi ...
, where all concurrent updates are allowed to go through, with inconsistencies possibly created, and the results are merged or "resolved" later. In this approach, consistency between the replicas is eventually re-established via "merges" of differing replicas. While optimistic replication might not work in the general case, there is a significant and practically useful class of data structures, CRDTs, where it does work — where it is always possible to merge or resolve concurrent updates on different replicas of the data structure without conflicts. This makes CRDTs ideal for optimistic replication. As an example, a one-way Boolean event flag is a trivial CRDT: one bit, with a value of true or false. True means some particular event has occurred at least once. False means the event has not occurred. Once set to true, the flag cannot be set back to false (an event having occurred cannot un-occur). The resolution method is "true wins": when merging a replica where the flag is true (that replica has observed the event), and another one where the flag is false (that replica hasn't observed the event), the resolved result is true — the event has been observed.


Types of CRDTs

There are two approaches to CRDTs, both of which can provide strong
eventual consistency Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last upd ...
: operation-based CRDTs and state-based CRDTs. The two alternatives are theoretically equivalent, as each can emulate the other. However, there are practical differences. State-based CRDTs are often simpler to design and to implement; their only requirement from the communication substrate is some kind of
gossip protocol A gossip protocol or epidemic protocol is a procedure or process of computer peer-to-peer communication that is based on the way epidemics spread. Some distributed systems use peer-to-peer gossip to ensure that data is disseminated to all members o ...
. Their drawback is that the entire state of every CRDT must be transmitted eventually to every other replica, which may be costly. In contrast, operation-based CRDTs transmit only the update operations, which are typically small. However, operation-based CRDTs require guarantees from the communication middleware; that the operations are not dropped or duplicated when transmitted to the other replicas, and that they are delivered in causal order.


Operation-based CRDTs

Operation-based CRDTs are also called commutative replicated data types, or CmRDTs. CmRDT replicas propagate state by transmitting only the update operation. For example, a CmRDT of a single integer might broadcast the operations (+10) or (−20). Replicas receive the updates and apply them locally. The operations are
commutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Most familiar as the name o ...
. However, they are not necessarily
idempotent Idempotence (, ) is the property of certain operation (mathematics), operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence ...
. The communications infrastructure must therefore ensure that all operations on a replica are delivered to the other replicas, without duplication, but in any order. ''Pure'' operation-based CRDTs are a variant of operation-based CRDTs that reduces the metadata size.


State-based CRDTs

State-based CRDTs are called convergent replicated data types, or CvRDTs. In contrast to CmRDTs, CvRDTs send their full local state to other replicas, where the states are merged by a function which must be
commutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Most familiar as the name o ...
,
associative In mathematics, the associative property is a property of some binary operations, which means that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement f ...
, and
idempotent Idempotence (, ) is the property of certain operation (mathematics), operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence ...
. The merge function provides a
join Join may refer to: * Join (law), to include additional counts or additional defendants on an indictment *In mathematics: ** Join (mathematics), a least upper bound of sets orders in lattice theory ** Join (topology), an operation combining two topo ...
for any pair of replica states, so the set of all states forms a
semilattice In mathematics, a join-semilattice (or upper semilattice) is a partially ordered set that has a join (a least upper bound) for any nonempty finite subset. Dually, a meet-semilattice (or lower semilattice) is a partially ordered set which has a mee ...
. The update function must monotonically increase the internal state, according to the same
partial order In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a binary ...
rules as the semilattice. ''Delta state'' CRDTs (or simply Delta CRDTs) are optimized state-based CRDTs where only recently applied changes to a state are disseminated instead of the entire state.


Comparison

While CmRDTs place more requirements on the protocol for transmitting operations between replicas, they use less bandwidth than CvRDTs when the number of transactions is small in comparison to the size of internal state. However, since the CvRDT merge function is associative, merging with the state of some replica yields all previous updates to that replica.
Gossip protocol A gossip protocol or epidemic protocol is a procedure or process of computer peer-to-peer communication that is based on the way epidemics spread. Some distributed systems use peer-to-peer gossip to ensure that data is disseminated to all members o ...
s work well for propagating CvRDT state to other replicas while reducing network use and handling topology changes. Some lower bounds on the storage complexity of state-based CRDTs are known.


Known CRDTs


G-Counter (Grow-only Counter)

payload integer P initial ,0,...,0 update ''increment''() let g = ''myId''() P := P + 1 query ''value''() : integer v let v = Σ P compare (X, Y) : boolean b let b = (∀i ∈ , n - 1: X.P ≤ Y.P merge (X, Y) : payload Z let ∀i ∈ , n - 1: Z.P = ''max''(X.P Y.P This CvRDT implements a counter for a cluster of ''n'' nodes. Each node in the cluster is assigned an ID from 0 to ''n'' - 1, which is retrieved with a call to ''myId''(). Thus each node is assigned its own slot in the array ''P'', which it increments locally. Updates are propagated in the background, and merged by taking the ''max''() of every element in ''P''. The compare function is included to illustrate a partial order on the states. The merge function is commutative, associative, and idempotent. The update function monotonically increases the internal state according to the compare function. This is thus a correctly-defined CvRDT and will provide strong eventual consistency. The CmRDT equivalent broadcasts increment operations as they are received.


PN-Counter (Positive-Negative Counter)

payload integer P, integer N initial ,0,...,0 ,0,...,0 update ''increment''() let g = ''myId''() P := P + 1 update ''decrement''() let g = ''myId''() N := N + 1 query ''value''() : integer v let v = Σ P - Σ N compare (X, Y) : boolean b let b = (∀i ∈ , n - 1: X.P ≤ Y.P ∧ ∀i ∈ , n - 1: X.N ≤ Y.N merge (X, Y) : payload Z let ∀i ∈ , n - 1: Z.P = ''max''(X.P Y.P let ∀i ∈ , n - 1: Z.N = ''max''(X.N Y.N A common strategy in CRDT development is to combine multiple CRDTs to make a more complex CRDT. In this case, two G-Counters are combined to create a data type supporting both increment and decrement operations. The "P" G-Counter counts increments; and the "N" G-Counter counts decrements. The value of the PN-Counter is the value of the P counter minus the value of the N counter. Merge is handled by letting the merged P counter be the merge of the two P G-Counters, and similarly for N counters. Note that the CRDT's internal state must increase monotonically, even though its external state as exposed through ''query'' can return to previous values.


G-Set (Grow-only Set)

payload set A initial ∅ update ''add''(element e) A := A ∪ query ''lookup''(element e) : boolean b let b = (e ∈ A) compare (S, T) : boolean b let b = (S.A ⊆ T.A) merge (S, T) : payload U let U.A = S.A ∪ T.A The G-Set (grow-only set) is a set which only allows adds. An element, once added, cannot be removed. The merger of two G-Sets is their union.


2P-Set (Two-Phase Set)

payload set A, set R initial ∅, ∅ query ''lookup''(element e) : boolean b let b = (e ∈ A ∧ e ∉ R) update ''add''(element e) A := A ∪ update ''remove''(element e) ''pre'' ''lookup''(e) R := R ∪ compare (S, T) : boolean b let b = (S.A ⊆ T.A ∧ S.R ⊆ T.R) merge (S, T) : payload U let U.A = S.A ∪ T.A let U.R = S.R ∪ T.R Two G-Sets (grow-only sets) are combined to create the 2P-set. With the addition of a remove set (called the "tombstone" set), elements can be added and also removed. Once removed, an element cannot be re-added; that is, once an element ''e'' is in the tombstone set, query will never again return True for that element. The 2P-set uses "remove-wins" semantics, so ''remove''(''e'') takes precedence over ''add''(''e'').


LWW-Element-Set (Last-Write-Wins-Element-Set)

LWW-Element-Set is similar to 2P-Set in that it consists of an "add set" and a "remove set", with a timestamp for each element. Elements are added to an LWW-Element-Set by inserting the element into the add set, with a timestamp. Elements are removed from the LWW-Element-Set by being added to the remove set, again with a timestamp. An element is a member of the LWW-Element-Set if it is in the add set, and either not in the remove set, or in the remove set but with an earlier timestamp than the latest timestamp in the add set. Merging two replicas of the LWW-Element-Set consists of taking the union of the add sets and the union of the remove sets. When timestamps are equal, the "bias" of the LWW-Element-Set comes into play. A LWW-Element-Set can be biased towards adds or removals. The advantage of LWW-Element-Set over 2P-Set is that, unlike 2P-Set, LWW-Element-Set allows an element to be reinserted after having been removed.


OR-Set (Observed-Remove Set)

OR-Set resembles LWW-Element-Set, but using unique tags instead of timestamps. For each element in the set, a list of add-tags and a list of remove-tags are maintained. An element is inserted into the OR-Set by having a new unique tag generated and added to the add-tag list for the element. Elements are removed from the OR-Set by having all the tags in the element's add-tag list added to the element's remove-tag (tombstone) list. To merge two OR-Sets, for each element, let its add-tag list be the union of the two add-tag lists, and likewise for the two remove-tag lists. An element is a member of the set if and only if the add-tag list less the remove-tag list is nonempty. An optimization that eliminates the need for maintaining a tombstone set is possible; this avoids the potentially unbounded growth of the tombstone set. The optimization is achieved by maintaining a vector of timestamps for each replica.


Sequence CRDTs

A sequence, list, or
ordered set In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a binary r ...
CRDT can be used to build a
Collaborative real-time editor A collaborative real-time editor is a type of collaborative software or web application which enables real-time collaborative editing, simultaneous editing, or live editing of the same digital document, computer file or cloud-stored data – suc ...
, as an alternative to
Operational transformation Operational transformation (OT) is a technology for supporting a range of collaboration functionalities in advanced collaborative software systems. OT was originally invented for consistency maintenance and concurrency control in collaborative edit ...
(OT). Some known Sequence CRDTs are Treedoc, RGA, Woot, Logoot, and LSEQ. CRATE is a decentralized real-time editor built on top of LSEQSplit (an extension of LSEQ) and runnable on a network of browsers using
WebRTC WebRTC (Web Real-Time Communication) is a free and open-source project providing web browsers and mobile applications with real-time communication (RTC) via application programming interfaces (APIs). It allows audio and video communication to wor ...
. LogootSplit was proposed as an extension of Logoot in order to reduce the metadata for sequence CRDTs. MUTE is an online web-based peer-to-peer real-time collaborative editor relying on LogootSplit algorithm. Industrial sequence CRDTs, including open-source ones, are known to out-perform academic implementations due to optimizations and a more realistic testing methodology. The main popular example is Yjs CRDT, a pioneer in using a plain list instead of a tree (ala Kleppmann's ''automerge'').


Industry use

* Fluid Framework is an open-source collaborative platform built by
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
that provides both server reference implementations and client-side SDKs for creating modern realtime web applications using CRDTs. *
Nimbus Note Nimbus Note is a note-taking app designed by Nimbus Web company based in Cleveland, Ohio. The app is cross-platform, for Android, iOS, macOS, and Microsoft Windows. Technical overview The app allows users to create notes with document or ...
is a collaborative note-taking application that uses the Yjs CRDT for collaborative editing. *
Redis Redis (; Remote Dictionary Server) is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, su ...
is a distributed, highly available and scalable in-memory database that uses CRDTs for implementing globally distributed databases based on and fully compatible with Redis open source. *
SoundCloud SoundCloud is an online audio distribution platform and music sharing website that enables its users to upload, promote, and share audio. Founded in 2007 by Alexander Ljung and Eric Wahlforss, SoundCloud is one of the largest music streaming se ...
open-source
Roshi
a LWW-element-set CRDT for the SoundCloud stream implemented on top of
Redis Redis (; Remote Dictionary Server) is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, su ...
. *
Riak Riak (pronounced "ree-ack" ) is a distributed NoSQL key-value data store based on Amazon's Dynamo paper, including its "tunable AP" approach, that is tunable consistency, to the tradeoffs imposed by the CAP Theorem. Riak offers high availability, ...
is a distributed NoSQL key-value data store based on CRDTs.
League of Legends ''League of Legends'' (''LoL''), commonly referred to as ''League'', is a 2009 multiplayer online battle arena video game developed and published by Riot Games. Inspired by ''Defense of the Ancients'', a Mod (video games), custom map for War ...
uses the Riak CRDT implementation for its in-game chat system, which handles 7.5 million concurrent users and 11,000 messages per second.
Bet365 Bet365 Group Ltd (commonly known and stylized as bet365 and spoken as "bet three-six-five") is a leading British online gambling company based in the United Kingdom. It was founded by Denise Coates, who remains the majority shareholder and joint ...
, stores hundreds of megabytes of data in the
Riak Riak (pronounced "ree-ack" ) is a distributed NoSQL key-value data store based on Amazon's Dynamo paper, including its "tunable AP" approach, that is tunable consistency, to the tradeoffs imposed by the CAP Theorem. Riak offers high availability, ...
implementation of OR-Set. *
TomTom TomTom N.V. is a Dutch multinational developer and creator of location technology and consumer electronics. Founded in 1991 and headquartered in Amsterdam, TomTom released its first generation of satellite navigation devices to market in 2004. ...
employs CRDTs to synchronize navigation data between the devices of a user. *
Phoenix Phoenix most often refers to: * Phoenix (mythology), a legendary bird from ancient Greek folklore * Phoenix, Arizona, a city in the United States Phoenix may also refer to: Mythology Greek mythological figures * Phoenix (son of Amyntor), a ...
, a web framework written in
Elixir ELIXIR (the European life-sciences Infrastructure for biological Information) is an initiative that will allow life science laboratories across Europe to share and store their research data as part of an organised network. Its goal is to bring t ...
, uses CRDTs to support real time multi-node information sharing in version 1.2. *
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
implements CRDTs in their Apollo low-latency "consistency at scale" database. *
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
uses CRDTs in their FlightTracker system for managing the Facebook graph internally. * Teletype for
Atom Every atom is composed of a nucleus and one or more electrons bound to the nucleus. The nucleus is made of one or more protons and a number of neutrons. Only the most common variety of hydrogen has no neutrons. Every solid, liquid, gas, and ...
employs CRDTs to enable developers share their workspace with team members and collaborate on code in real time. * Haja Networks' OrbitDB uses operation-based CRDTs in its core data structure, IPFS-Log. *
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
implements CRDTs in the Notes app for syncing offline edits between multiple devices.
Swim
is a platform for running distributed real-time
streaming Streaming media is multimedia that is delivered and consumed in a continuous manner from a source, with little or no intermediate storage in network elements. ''Streaming'' refers to the delivery method of content, rather than the content it ...
applications that deliver continuous intelligence. It uses streaming actors that stream pure op-based CRDT state updates to other actors in a DAG that implements a streaming data pipeline.
RxDB
is a client side NoSQL database for distributed real-time
streaming Streaming media is multimedia that is delivered and consumed in a continuous manner from a source, with little or no intermediate storage in network elements. ''Streaming'' refers to the delivery method of content, rather than the content it ...
applications. It has
CRDT plugin
that enables to update document by storing NoSQL based CRDT deltas and replicate these with other clients or a backend server.


References

{{cite web , url = https://pages.lip6.fr/Marc.Shapiro/papers/RR-7687.pdf , title = Conflict-free Replicated Data Types , date = July 19, 2011 , publisher = inria.fr


External links


A collection of resources and papers on CRDTs

"Strong Eventual Consistency and Conflict-free Replicated Data Types" (A talk on CRDTs)
by Marc Shapiro

by Christopher Meiklejohn
CAP theorem and CRDTs: CAP 12 years later. How the rules have changed
by Eric Brewer Distributed data structures Distributed algorithms Fault-tolerant computer systems