HOME

TheInfoList



OR:

A vector clock is a
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
used for determining the
partial ordering In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a binary r ...
of events in a
distributed system A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
and detecting
causality Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the cau ...
violations. Just as in Lamport timestamps, inter-process messages contain the state of the sending process's
logical clock A logical clock is a mechanism for capturing chronological and causal relationships in a distributed system. Often, distributed systems may have no physically synchronous global clock. In many applications (such as distributed GNU make), if two pr ...
. A vector clock of a system of ''N'' processes is an
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
/vector of ''N'' logical clocks, one clock per process; a local "largest possible values" copy of the global clock-array is kept in each process. Denote VC_i as the vector clock maintained by process i, the clock updates proceed as follows: * Initially all clocks are zero. * Each time a process experiences an internal event, it increments its own
logical clock A logical clock is a mechanism for capturing chronological and causal relationships in a distributed system. Often, distributed systems may have no physically synchronous global clock. In many applications (such as distributed GNU make), if two pr ...
in the vector by one. For instance, upon an event at process i, it updates VC_ \leftarrow VC_ + 1. * Each time a process sends a message, it increments its own logical clock in the vector by one (as in the bullet above, but not twice for the same event) and then the message piggybacks a copy of its own vector. * Each time a process receives a message, it increments its own logical clock in the vector by one and updates each element in its vector by taking the maximum of the value in its own vector clock and the value in the vector in the received message (for every element). For example, if process Pj receives a message m from Pi, it updates by setting VC_\leftarrow max(VC_ 1, VC_ , \forall k.


History

Without using the specific name "vector clock", the concept of a vector clock was first mentioned in a 1986 paper by Rivka Ladin and
Barbara Liskov Barbara Liskov (born November 7, 1939 as Barbara Jane Huberman) is an American computer scientist who has made pioneering contributions to programming languages and distributed computing. Her notable work includes the development of the Liskov ...
where they use the term "multipart timestamp". To quote from page 31 of the Liskov/Ladin paper:
We solve this problem by using ''multipart timestamps'', where there is one part for each replica. Thus, if there are n replicas, a timestamp t is t = where each part is a positive integer. Since there will typically be a small number of replicas (e.g., 3 to 7), using such a timestamp is practical.
The term "vector clock" was first used independently by Colin Fidge and
Friedemann Mattern Friedemann Mattern (born 28 July 1955) is a German scientist. After studying computer science with a minor in communication sciences at the University of Bonn, Mattern became a VLSI design and parallelism researcher at Kaiserslautern University ...
in 1988.


Partial ordering property

Vector clocks allow for the partial causal ordering of events. Defining the following: * VC(x) denotes the vector clock of event x, and VC(x)_z denotes the component of that clock for process z. * VC(x) < VC(y) \iff \forall z C(x)_z \le VC(y)_z\land \exists z' VC(x)_ < VC(y)_ /math> ** In English: VC(x) is less than VC(y), if and only if VC(x)_z is less than or equal to VC(y)_z for all process indices z, and at least one of those relationships is strictly smaller (that is, VC(x)_ < VC(y)_). * x \to y\; denotes that event x happened before event y. It is defined as: if x \to y\;, then VC(x) < VC(y) Properties: *
Antisymmetry In linguistics, antisymmetry is a syntactic theory presented in Richard S. Kayne's 1994 monograph ''The Antisymmetry of Syntax''. It asserts that grammatical hierarchies in natural language follow a universal order, namely specifier-head-compl ...
: if VC(a) < VC(b), then ¬(VC(b) < VC(a)) * Transitivity: if VC(a) < VC(b) and VC(b) < VC(c), then VC(a) < VC(c); or, if a \to b\; and b \to c\;, then a \to c\; Relation with other orders: * Let RT(x) be the real time when event x occurs. If VC(a) < VC(b), then RT(a) < RT(b) * Let C(x) be the Lamport timestamp of event x. If VC(a) < VC(b), then C(a) < C(b)


Other mechanisms

* In 1999, Torres-Rojas and Ahamad developed Plausible Clocks, a mechanism that takes less space than vector clocks but that, in some cases, will totally order events that are causally concurrent. * In 2005, Agargwal and Garg created Chain Clocks, a system that tracks dependencies using vectors with size smaller than the number of processes and that adapts automatically to systems with dynamic number of processes. * In 2008, Almeida ''et al.'' introduced Interval Tree Clocks. This mechanism generalizes Vector Clocks and allows operation in dynamic environments when the identities and number of processes in the computation is not known in advance. * In 2019, Lum Ramabaja developed Bloom Clocks, a probabilistic data structure whose space complexity does not depend on the number of nodes in a system. If two clocks are not comparable, the bloom clock can always deduce it, i.e. false negatives are not possible. If two clocks are comparable, the bloom clock can calculate the confidence of that statement, i.e. it can compute the false positive rate between comparable pairs of clocks.


See also

*
Lamport timestamps The Lamport timestamp algorithm is a simple logical clock algorithm used to determine the order of events in a distributed computer system. As different nodes or processes will typically not be perfectly synchronized, this algorithm is used to pr ...
* Matrix clocks *
Version vector A version vector is a mechanism for tracking changes to data in a distributed system, where multiple agents might update the data at different times. The version vector allows the participants to determine if one update preceded another (happened-b ...


References


External links


Why Logical Clocks are Easy (Compares Causal Histories, Vector Clocks and Version Vectors)

Explanation of Vector clocks

Timestamp-based vector clock implementation in Erlang

Vector clock implementation in Objective-C

Vector clock implementation in Erlang

Why Vector Clocks are Hard

Why Cassandra doesn’t need vector clocks
{{DEFAULTSORT:Vector Clock Logical clock algorithms