A version vector is a mechanism for tracking changes to data in a
distributed system
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
, where multiple agents might update the data at different times. The version vector allows the participants to determine if one update preceded another (
happened-before In computer science, the happened-before relation (denoted: \to \;) is a relation between the result of two events, such that if one event should happen before another event, the result must reflect that, even if those events are in reality execute ...
), followed it, or if the two updates happened concurrently (and therefore might conflict with each other). In this way, version vectors enable
causality
Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the ca ...
tracking among data replicas and are a basic mechanism for
optimistic replication
Optimistic replication, also known as lazy replication, is a strategy for replication, in which replicas are allowed to diverge.
Traditional pessimistic replication systems try to guarantee from the beginning that all of the replicas are identic ...
. In mathematical terms, the version vector generates a
preorder
In mathematics, especially in order theory, a preorder or quasiorder is a binary relation that is reflexive and transitive. Preorders are more general than equivalence relations and (non-strict) partial orders, both of which are special c ...
that tracks the events that precede, and may therefore influence, later updates.
Version vectors maintain state identical to that in a
vector clock
A vector clock is a data structure used for determining the partial ordering of events in a distributed system and detecting causality violations. Just as in Lamport timestamps, inter-process messages contain the state of the sending process's lo ...
, but the update rules differ slightly; in this example, replicas can either experience local updates (e.g., the user editing a file on the local node), or can synchronize with another replica:
* Initially all vector counters are zero.
* Each time a replica experiences a local update event, it increments its own counter in the vector by one.
* Each time two replicas and synchronize, they both set the elements in their copy of the vector to the maximum of the element across both counters:
. After synchronization, the two replicas have identical version vectors.
Pairs of replicas, , , can be compared by inspecting their version vectors and determined to be either: identical (
), concurrent (
), or ordered (
or
). The ordered relation is defined as: Vector
if and only if every element of
is less than or equal to its corresponding element in
, and at least one of the elements is strictly less than. If neither
or
, but the vectors are not identical, then the two vectors must be concurrent.
Version vectors or variants are used to track updates in many distributed file systems, such as
Coda (file system)
Coda is a distributed file system developed as a research project at Carnegie Mellon University since 1987 under the direction of Mahadev Satyanarayanan. It descended directly from an older version of Andrew File System (AFS-2) and offers many s ...
and Ficus, and are the main data structure behind optimistic replication.
Other mechanisms
* Hash Histories avoid the use of counters by keeping a set of hashes of each updated version and comparing those sets by set inclusion. However this mechanism can only give probabilistic guarantees.
* Concise Version Vectors allow significant space savings when handling multiple replicated items, such as in directory structures in filesystems.
* Version Stamps allow tracking of a variable number of replicas and do not resort to counters. This mechanism can depict scalability problems in some settings, but can be replaced by Interval Tree Clocks.
* Interval Tree Clocks generalize version vectors and vector clocks and allows dynamic numbers of replicas/processes.
* Bounded Version Vectors allow a bounded implementation, with bounded size counters, as long as replica pairs can be atomically synchronized.
* Dotted Version Vectors
[Nuno Preguiça, Carlos Baquero, Paulo Almeida, Victor Fonte and Ricardo Gonçalves. Brief Announcement: Efficient Causality Tracking in Distributed Storage Systems With Dotted Version Vectors. ACM PODC, pp. 335-336, 2012.] address scalability with a small set of servers mediating replica access by a large number of concurrent clients.
References
{{reflist
External links
Why Logical Clocks are Easy (Compares Causal Histories, Vector Clocks and Version Vectors)
Data synchronization
Logical clock algorithms
Distributed computing problems
Causality