Double compare-and-swap (DCAS or CAS2) is an
atomic primitive
In concurrent programming, an operation (or set of operations) is linearizable if it consists of an ordered list of invocation and response events (event), that may be extended by adding response events such that:
# The extended list can be re-e ...
proposed to support certain
concurrent programming
Concurrent means happening at the same time. Concurrency, concurrent, or concurrence may refer to:
Law
* Concurrence, in jurisprudence, the need to prove both ''actus reus'' and ''mens rea''
* Concurring opinion (also called a "concurrence"), a ...
techniques. DCAS takes two not necessarily contiguous memory locations and writes new values into them only if they match pre-supplied "expected" values; as such, it is an extension of the much more popular
compare-and-swap
In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of tha ...
(CAS) operation.
DCAS is sometimes confused with the double-width compare-and-swap (DWCAS) implemented by instructions such as x86 CMPXCHG16B. DCAS, as discussed here, handles two discontiguous memory locations, typically of pointer size, whereas DWCAS handles two adjacent pointer-sized memory locations.
In his doctoral thesis, Michael Greenwald recommended adding DCAS to modern hardware, showing it could be used to create easy-to-apply yet efficient
software transactional memory In computer science, software transactional memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. STM ...
(STM). Greenwald points out that an advantage of DCAS vs CAS is that higher-order (multiple item) CAS''n'' can be implemented in O(''n'') with DCAS, but requires O(''n'' log ''p'') time with unary CAS, where ''p'' is the number of contending processes.
One of the advantages of DCAS is the ability to implement atomic
deque
In computer science, a double-ended queue (abbreviated to deque, pronounced ''deck'', like "cheque") is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail). ...
s (i.e.
doubly linked list
In computer science, a doubly linked list is a linked data structure that consists of a set of sequentially linked record (computer science), records called node (computer science), nodes. Each node contains three field (computer science), fields: ...
s) with relative ease.
More recently, however, it has been shown that an STM can be implemented with comparable properties using only CAS.
[Keir Fraser (2004), "Practical lock-freedom]
UCAM-CL-TR-579.pdf
/ref> In general however, DCAS is not a silver bullet
In folklore, a bullet cast from silver is often one of the few weapons that are effective against a werewolf or witch. The term ''silver bullet'' is also a metaphor for a simple, seemingly magical, solution to a difficult problem: for example, pe ...
: implementing lock-free and wait-free algorithms
In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking im ...
using it is typically just as complex and error-prone as for CAS.
Motorola at one point included DCAS in the instruction set for its 68k series; however, the slowness of DCAS relative to other primitives (apparently due to cache handling issues) led to its avoidance in practical contexts. , DCAS is not natively supported by any widespread CPUs in production.
The generalization of DCAS to more than two addresses is sometimes called MCAS (multi-word CAS); MCAS can be implemented by a nestable LL/SC
In computer science, load-linked/store-conditional (LL/SC), sometimes known as load-reserved/store-conditional (LR/SC), are a pair of instructions used in multithreading to achieve synchronization. Load-link returns the current value of a memory ...
, but such a primitive is not directly available in hardware. MCAS can be implemented in software in terms of DCAS, in various ways. In 2013, Trevor Brown, Faith Ellen
Faith Ellen (formerly known as Faith E. Fich) is a professor of computer science at the University of Toronto who studies distributed data structures and the theory of distributed computing.
She earned her bachelor's degree and masters from the U ...
, and Eric Ruppert have implemented in software a multi-address LL/SC extension (which they call LLX/SCX) that while being more restrictive than MCAS enabled them, via some automated code generation, to implement one of the best performing concurrent binary search tree
In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and ...
(actually a chromatic tree
Diatonic and chromatic are terms in music theory that are most often used to characterize scales, and are also applied to musical instruments, intervals, chords, notes, musical styles, and kinds of harmony. They are very often used as a pai ...
), slightly beating the JDK
The Java Development Kit (JDK) is a distribution of Java Technology by Oracle Corporation. It implements the Java Language Specification (JLS) and the Java Virtual Machine Specification (JVMS) and provides the Standard Edition (SE) of the Java ...
CAS-based skip list
In computer science, a skip list (or skiplist) is a probabilistic data structure that allows \mathcal(\log n) average complexity for search as well as \mathcal(\log n) average complexity for insertion within an ordered sequence of n elements. ...
implementation.
In general, DCAS can be provided by a more expressive hardware transactional memory In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transa ...
.[Dave Dice, Yossi Lev, Mark Moir, Dan Nussbaum, and Marek Olszewski. (2009) "Early experience with a commercial hardware transactional memory implementation." Sun Microsystems technical report (60 pp.) SMLI TR-2009-180. A short version appeared at ASPLOS’09 {{doi, 10.1145/1508244.1508263. The full-length report discusses how to implement DCAS using HTM in section 5.] IBM POWER8
POWER8 is a family of superscalar multi-core microprocessors based on the Power ISA, announced in August 2013 at the Hot Chips conference. The designs are available for licensing under the OpenPOWER Foundation, which is the first time for s ...
and Intel Intel TSX
Transactional Synchronization Extensions (TSX), also called Transactional Synchronization Extensions New Instructions (TSX-NI), is an extension to the x86 instruction set architecture (ISA) that adds hardware transactional memory support, speedin ...
provide working implementations of transactional memory. Sun
The Sun is the star at the center of the Solar System. It is a nearly perfect ball of hot plasma, heated to incandescence by nuclear fusion reactions in its core. The Sun radiates this energy mainly as light, ultraviolet, and infrared radi ...
's cancelled Rock processor
Rock (or ROCK) was a multithreading, multicore, SPARC microprocessor under development at Sun Microsystems. Canceled in 2010, it was a separate project from the SPARC T-Series (CoolThreads/Niagara) family of processors.
Rock aimed at higher per ...
would have supported it as well.
References
External links
US Patent 4584640 ''Method and apparatus for a compare and swap instruction''
Concurrency control