One-copy Serializability
   HOME

TheInfoList



OR:

In
concurrency control In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for Concurrent computing, concurrent operations a ...
of
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
s, Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (1987)
''Concurrency Control and Recovery in Database Systems''
(free PDF download), Addison Wesley Publishing Company,
Gerhard Weikum Gerhard Weikum is a Research Director at the Max Planck Institute for Informatics in Saarbrücken, Germany, where he is leading the databases and information systems department. His current research interests include transactional and distributed ...
, Gottfried Vossen (2001)
''Transactional Information Systems''
Elsevier,
transaction processing Transaction processing is information processing in computer science that is divided into individual, indivisible operations called ''transactions''. Each transaction must succeed or fail as a complete unit; it can never be only partially comple ...
(transaction management), and various transactional applications (e.g.,
transactional memory In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transa ...
Maurice Herlihy Maurice Peter Herlihy (born 4 January 1954) is a computer scientist active in the field of multiprocessor synchronization. Herlihy has contributed to areas including theoretical foundations of wait-free synchronization, linearizable data structu ...
and J. Eliot B. Moss. ''Transactional memory: architectural support for lock-free data structures.'' Proceedings of the 20th annual international symposium on Computer architecture (ISCA '93). Volume 21, Issue 2, May 1993.
and
software transactional memory In computer science, software transactional memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. STM ...
), both centralized and
distributed Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...
, a transaction
schedule A schedule or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things are i ...
is serializable if its outcome (e.g., the resulting database state) is equal to the outcome of its transactions executed serially, i.e. without overlapping in time. Transactions are normally executed concurrently (they overlap), since this is the most efficient way. Serializability is the major correctness criterion for concurrent transactions' executions. It is considered the highest level of isolation between transactions, and plays an essential role in
concurrency control In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for Concurrent computing, concurrent operations a ...
. As such it is supported in all general purpose database systems. '' Strong strict two-phase locking'' (SS2PL) is a popular serializability mechanism utilized in most of the database systems (in various variants) since their early days in the 1970s. Serializability theory provides the formal framework to reason about and analyze serializability and its techniques. Though it is
mathematical Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
in nature, its fundamentals are informally (without mathematics notation) introduced below.


Correctness


Serializability

Serializability is used to keep the data in the data item in a consistent state. Serializability is a property of a transaction
schedule A schedule or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things are i ...
(history). It relates to the '' isolation'' property of a
database transaction A database transaction symbolizes a unit of work, performed within a database management system (or similar system) against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally rep ...
. :Serializability of a schedule means equivalence (in the outcome, the database state, data values) to a ''serial schedule'' (i.e., sequential with no transaction overlap in time) with the same transactions. It is the major criterion for the correctness of concurrent transactions' schedule, and thus supported in all general purpose database systems. : The rationale behind serializability is the following: : If each transaction is correct by itself, i.e., meets certain integrity conditions, then a schedule that comprises any ''serial'' execution of these transactions is correct (its transactions still meet their conditions): "Serial" means that transactions do not overlap in time and cannot interfere with each other, i.e, complete ''isolation'' between each other exists. Any order of the transactions is legitimate, if no dependencies among them exist, which is assumed (see comment below). As a result, a schedule that comprises any execution (not necessarily serial) that is equivalent (in its outcome) to any serial execution of these transactions, is correct. Schedules that are not serializable are likely to generate erroneous outcomes. Well known examples are with transactions that debit and credit accounts with money: If the related schedules are not serializable, then the total sum of money may not be preserved. Money could disappear, or be generated from nowhere. This and violations of possibly needed other
invariant Invariant and invariance may refer to: Computer science * Invariant (computer science), an expression whose value doesn't change during program execution ** Loop invariant, a property of a program loop that is true before (and after) each iteratio ...
preservations are caused by one transaction writing, and "stepping on" and erasing what has been written by another transaction before it has become permanent in the database. It does not happen if serializability is maintained. If any specific order between some transactions is requested by an application, then it is enforced independently of the underlying serializability mechanisms. These mechanisms are typically indifferent to any specific order, and generate some unpredictable
partial order In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a binary ...
that is typically compatible with multiple serial orders of these transactions. This partial order results from the scheduling orders of concurrent transactions' data access operations, which depend on many factors. A major characteristic of a database transaction is '' atomicity'', which means that it either ''commits'', i.e., all its operations' results take effect in the database, or ''aborts'' (rolled-back), all its operations' results do not have any effect on the database ("all or nothing" semantics of a transaction). In all real systems transactions can abort for many reasons, and serializability by itself is not sufficient for correctness. Schedules also need to possess the '' recoverability'' (from abort) property. Recoverability means that committed transactions have not read data written by aborted transactions (whose effects do not exist in the resulting database states). While serializability is currently compromised on purpose in many applications for better performance (only in cases when application's correctness is not harmed), compromising recoverability would quickly violate the database's integrity, as well as that of transactions' results external to the database. A schedule with the recoverability property (a ''recoverable'' schedule) "recovers" from aborts by itself, i.e., aborts do not harm the integrity of its committed transactions and resulting database. This is false without recoverability, where the likely integrity violations (resulting incorrect database data) need special, typically manual, corrective actions in the database. Implementing recoverability in its general form may result in ''cascading aborts'': Aborting one transaction may result in a need to abort a second transaction, and then a third, and so on. This results in a waste of already partially executed transactions, and may result also in a performance penalty.
Avoiding cascading aborts In the fields of databases and transaction processing (transaction management), a schedule (or history) of a system is an abstract model to describe execution of transactions running in the system. Often it is a ''list'' of operations (actions) o ...
(ACA, or Cascadelessness) is a special case of recoverability that exactly prevents such phenomena. Often in practice a special case of ACA is utilized: Strictness. Strictness allows efficient database recovery from failure. Note that the ''recoverability'' property is needed even if no database failure occurs and no database ''recovery'' from failure is needed. It is, rather, needed to correctly automatically handle aborts, which may be unrelated to database failure and recovery from failure.


Relaxing serializability

In many applications, unlike with finances, absolute correctness is not needed. For example, when retrieving a list of products according to specification, in most cases it does not matter much if a product, whose data was updated a short time ago, does not appear in the list, even if it meets the specification. It will typically appear in such a list when tried again a short time later. Commercial databases provide concurrency control with a whole range of isolation levels which are in fact (controlled) serializability violations in order to achieve higher performance. Higher performance means better transaction execution rate and shorter average transaction response time (transaction duration). ''
Snapshot isolation In databases, and transaction processing (transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at ...
'' is an example of a popular, widely utilized efficient relaxed serializability method with many characteristics of full serializability, but still short of some, and unfit in many situations. Another common reason nowadays for distributed serializability relaxation (see below) is the requirement of
availability In reliability engineering, the term availability has the following meanings: * The degree to which a system, subsystem or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at a ...
of
internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
products and
services Service may refer to: Activities * Administrative service, a required part of the workload of university faculty * Civil service, the body of employees of a government * Community service, volunteer service for the benefit of a community or a p ...
. This requirement is typically answered by large-scale data replication. The straightforward solution for synchronizing replicas' updates of the same database object is including all these updates in a single atomic
distributed transaction A distributed transaction is a database transaction in which two or more network hosts are involved. Usually, hosts provide transactional resources, while the transaction manager is responsible for creating and managing a global transaction that enc ...
. However, with many replicas such a transaction is very large, and may span enough of a number of several
computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as C ...
s and
networks Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
that some of them are likely to be unavailable. Thus such a transaction is likely to end with abort and miss its purpose. Consequently,
Optimistic replication Optimistic replication, also known as lazy replication, is a strategy for replication, in which replicas are allowed to diverge. Traditional pessimistic replication systems try to guarantee from the beginning that all of the replicas are identi ...
(Lazy replication) is often utilized (e.g., in many products and services by
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
,
Amazon Amazon most often refers to: * Amazons, a tribe of female warriors in Greek mythology * Amazon rainforest, a rainforest covering most of the Amazon basin * Amazon River, in South America * Amazon (company), an American multinational technology c ...
,
Yahoo Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo! Inc. (2017–present), Yahoo Inc., which is 90% owned by investment funds ma ...
, and the like), while serializability is relaxed and compromised for
eventual consistency Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last upd ...
. Again, in this case, relaxation is done only for applications that are not expected to be harmed by this technique. Classes of schedules defined by ''relaxed serializability'' properties either contain the serializability class, or are incomparable with it.


View and conflict serializability

Mechanisms that enforce serializability need to execute in
real time Real-time or real time describes various operations in computing or other processes that must guarantee response times within a specified time (deadline), usually a relatively short time. A real-time process is generally one that happens in defined ...
, or almost in real time, while transactions are running at high rates. In order to meet this requirement, special cases of serializability, sufficient conditions for serializability which can be enforced effectively, are utilized. Two major types of serializability exist: ''view-serializability'', and ''conflict-serializability''. View-serializability matches the general definition of serializability given above. Conflict-serializability is a broad special case, i.e., any schedule that is conflict-serializable is also view-serializable, but not necessarily the opposite. Conflict-serializability is widely utilized because it is easier to determine and covers a substantial portion of the view-serializable schedules. Determining view-serializability of a schedule is an
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
problem (a class of problems with only difficult-to-compute, excessively time-consuming known solutions). : View-serializability of a schedule is defined by equivalence to a serial schedule (no overlapping transactions) with the same transactions, such that respective transactions in the two schedules read and write the same data values ("view" the same data values). : Conflict-serializability of a schedule is defined by equivalence to a serial schedule (no overlapping transactions) with the same transactions, such that both schedules have the same sets of respective chronologically ordered pairs of conflicting operations (same precedence relations of respective conflicting operations). Operations upon data are ''read'' or ''write'' (a write: ''insert'', ''update'', or ''delete''). Two operations are ''conflicting'' if they are of different transactions, upon the same datum (data item), and at least one of them is ''write''. Each such pair of conflicting operations has a ''conflict type'': it is either a ''read–write'', or ''write–read'', or a ''write–write'' conflict. The transaction of the second operation in the pair is said to be ''in conflict'' with the transaction of the first operation. A more general definition of conflicting operations (also for complex operations, which may each consist of several "simple" read/write operations) requires that they are
noncommutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Most familiar as the name o ...
(changing their order also changes their combined result). Each such operation needs to be atomic by itself (using proper system support) in order to be considered an operation for a commutativity check. For example, read–read operations are commutative (unlike read–write and the other possibilities) and thus read–read is not a conflict. Another more complex example: the operations ''increment'' and ''decrement'' of a ''counter'' are both ''write'' operations (both modify the counter), but do not need to be considered conflicting (write-write conflict type) since they are commutative (thus increment–decrement is not a conflict; e.g., already has been supported in the old IBM's IMS "fast path"). Only precedence (time order) in pairs of conflicting (non-commutative) operations is important when checking equivalence to a serial schedule, since different schedules consisting of the same transactions can be transformed from one to another by changing orders between different transactions' operations (different transactions' interleaving), and since changing orders of commutative operations (non-conflicting) does not change an overall operation sequence result, i.e., a schedule outcome (the outcome is preserved through order change between non-conflicting operations, but typically not when conflicting operations change order). This means that if a schedule can be transformed to any serial schedule without changing orders of conflicting operations (but changing orders of non-conflicting, while preserving operation order inside each transaction), then the outcome of both schedules is the same, and the schedule is conflict-serializable by definition. Conflicts are the reason for blocking transactions and delays (non-materialized conflicts), or for aborting transactions due to serializability violation prevention. Both possibilities reduce performance. Thus reducing the number of conflicts, e.g., by commutativity (when possible), is a way to increase performance. A transaction can issue/request a conflicting operation and be ''in conflict'' with another transaction while its conflicting operation is delayed and not executed (e.g., blocked by a
lock Lock(s) may refer to: Common meanings *Lock and key, a mechanical device used to secure items of importance *Lock (water navigation), a device for boats to transit between different levels of water, as in a canal Arts and entertainment * ''Lock ...
). Only executed (''materialized'') conflicting operations are relevant to ''conflict serializability'' (see more below).


Enforcing conflict serializability


Testing conflict serializability

Schedule compliance with conflict serializability can be tested with the
precedence graph A precedence graph, also named conflict graph and serializability graph, is used in the context of concurrency control in databases. The precedence graph for a schedule S contains: * A node for each committed transaction in S * An arc from Ti to Tj ...
(''serializability graph'', ''serialization graph'', ''conflict graph'') for committed transactions of the schedule. It is the
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
representing precedence of transactions in the schedule, as reflected by precedence of conflicting operations in the transactions. : In the
precedence graph A precedence graph, also named conflict graph and serializability graph, is used in the context of concurrency control in databases. The precedence graph for a schedule S contains: * A node for each committed transaction in S * An arc from Ti to Tj ...
transactions are nodes and precedence relations are directed edges. There exists an edge from a first transaction to a second transaction, if the second transaction is ''in conflict'' with the first (see Conflict serializability above), and the conflict is materialized (i.e., if the requested conflicting operation is actually executed: in many cases a requested/issued conflicting operation by a transaction is delayed and even never executed, typically by a
lock Lock(s) may refer to: Common meanings *Lock and key, a mechanical device used to secure items of importance *Lock (water navigation), a device for boats to transit between different levels of water, as in a canal Arts and entertainment * ''Lock ...
on the operation's object, held by another transaction, or when writing to a transaction's temporary private workspace and materializing, copying to the database itself, upon commit; as long as a requested/issued conflicting operation is not executed upon the database itself, the conflict is non-materialized; non-materialized conflicts are not represented by an edge in the precedence graph). : Comment: In many text books only ''committed transactions'' are included in the precedence graph. Here all transactions are included for convenience in later discussions. The following observation is a key characterization of conflict serializability: : A schedule is ''conflict-serializable''
if and only if In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false. The connective is bicondi ...
its precedence graph of ''committed transactions'' (when only ''committed'' transactions are considered) is '' acyclic''. This means that a cycle consisting of committed transactions only is generated in the (general) precedence graph, if and only if conflict-serializability is violated. Cycles of committed transactions can be prevented by aborting an ''undecided'' (neither committed, nor aborted) transaction on each cycle in the precedence graph of all the transactions, which can otherwise turn into a cycle of committed transactions (and a committed transaction cannot be aborted). One transaction aborted per cycle is both required and sufficient in number to break and eliminate the cycle (more aborts are possible, and can happen under some mechanisms, but are unnecessary for serializability). The probability of cycle generation is typically low, but, nevertheless, such a situation is carefully handled, typically with a considerable amount of overhead, since correctness is involved. Transactions aborted due to serializability violation prevention are ''restarted'' and executed again immediately. Serializability-enforcing mechanisms typically do not maintain a precedence graph as a data structure, but rather prevent or break cycles implicitly (e.g., SS2PL below).


Common mechanism — SS2PL

''Strong strict two-phase locking'' (SS2PL) is a common mechanism utilized in database systems since their early days in the 1970s (the "SS" in the name SS2PL is newer, though) to enforce both conflict serializability and '' strictness'' (a special case of recoverability which allows effective database recovery from failure) of a schedule. Under this mechanism, each datum is locked by a transaction before its accessing it (in any read or write operation): the item is marked by and associated with a ''
lock Lock(s) may refer to: Common meanings *Lock and key, a mechanical device used to secure items of importance *Lock (water navigation), a device for boats to transit between different levels of water, as in a canal Arts and entertainment * ''Lock ...
'' of a certain type depending on the operation being performed (and the specific transaction implementation; various models with different lock types exist; in some models, locks may change type during the transaction's life). As a result, access by another transaction may be blocked, typically upon a conflict (the lock delays or completely prevents the conflict from being materialized and be reflected in the precedence graph by blocking the conflicting operation), depending on lock type and the other transaction's access operation type. Employing an SS2PL mechanism means that all locks on data on behalf of a transaction are released only after the transaction has ended (either committed or aborted). SS2PL is the name of the resulting schedule property as well, which is also called ''rigorousness''. SS2PL is a special case (
proper subset In mathematics, set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset of ...
) of
Two-phase locking In databases and transaction processing, two-phase locking (2PL) is a concurrency control method that guarantees serializability.Phil Bernstein, Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (1987) ''Concurrency Control and Recovery in Dat ...
(2PL) Mutual blocking between transactions results in a ''deadlock'', where execution of these transactions is stalled and no completion can be reached. Thus deadlocks need to be resolved to complete these transactions' execution and release related computing resources. A deadlock is a reflection of a potential cycle in the precedence graph that would occur without the blocking when conflicts are materialized. A deadlock is resolved by aborting a transaction involved with such a potential cycle and breaking the cycle. It is often detected using a ''
wait-for graph A wait-for graph in computer science is a directed graph used for deadlock detection in operating systems and relational database systems. In computer science, a system that allows concurrent operation of multiple processes and locking of resourc ...
'' (a graph of conflicts blocked by locks from being materialized; it can be also defined as the graph of non-materialized conflicts; conflicts not materialized are not reflected in the precedence graph and do not affect serializability), which indicates which transaction is "waiting for" the release of one of more locks by which other transaction or transactions, and a cycle in this graph means a deadlock. Aborting one transaction per cycle is sufficient to break the cycle. Transactions aborted due to deadlock resolution are ''restarted'' and executed again immediately.


Other enforcing techniques

Other known mechanisms include: *
Precedence graph A precedence graph, also named conflict graph and serializability graph, is used in the context of concurrency control in databases. The precedence graph for a schedule S contains: * A node for each committed transaction in S * An arc from Ti to Tj ...
(or Serializability graph, Conflict graph) cycle elimination *
Two-phase locking In databases and transaction processing, two-phase locking (2PL) is a concurrency control method that guarantees serializability.Phil Bernstein, Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (1987) ''Concurrency Control and Recovery in Dat ...
(2PL) * Timestamp ordering (TO) * Serializable snapshot isolationMichael J. Cahill, Uwe Röhm, Alan D. Fekete (2008)
"Serializable isolation for snapshot databases"
''Proceedings of the 2008 ACM SIGMOD international conference on Management of data'', pp. 729-738, Vancouver, Canada, June 2008, (SIGMOD 2008 best paper award)
(SerializableSI) The above (conflict) serializability techniques in their general form do not provide recoverability. Special enhancements are needed for adding recoverability.


Optimistic versus pessimistic techniques

Concurrency control techniques are of three major types: # ''Pessimistic'': In Pessimistic concurrency control, a transaction blocks the data access operations of other transactions upon conflicts, and conflicts are ''non-materialized'' until blocking is removed. This is done to ensure that operations that may violate serializability (and in practice also recoverability) do not occur. # ''Optimistic'': In
Optimistic concurrency control Optimistic concurrency control (OCC), also known as optimistic locking, is a concurrency control method applied to transactional systems such as relational database management systems and software transactional memory. OCC assumes that multiple tran ...
, the data access operations of other transactions are not blocked upon conflicts, and conflicts are immediately ''materialized''. When the transaction reaches the ''ready'' state, i.e., its ''running'' state has been completed, possible serializability (and in practice also recoverability) violation by the transaction's operations (relative to other running transactions) is checked: if violation has occurred, the transaction is typically ''aborted'' (sometimes aborting ''another'' transaction to handle serializability violation is preferred). Otherwise, it is ''committed''. # ''Semi-optimistic'': Mechanisms that mix blocking in certain situations with not blocking in other situations and employ both materialized and non-materialized conflicts. The main differences between the technique types is the conflict types that are generated by them. A pessimistic method blocks a transaction operation upon conflict and generates a non-materialized conflict, while an optimistic method does not block and generates a materialized conflict. A semi-optimistic method generates both conflict types. Both conflict types are generated by the chronological orders in which transaction operations are invoked, independently of the type of conflict. A cycle of committed transactions (with materialized conflicts) in the ''
precedence graph A precedence graph, also named conflict graph and serializability graph, is used in the context of concurrency control in databases. The precedence graph for a schedule S contains: * A node for each committed transaction in S * An arc from Ti to Tj ...
'' (conflict graph) represents a serializability violation, and should be avoided for maintaining serializability. A cycle of (non-materialized) conflicts in the ''
wait-for graph A wait-for graph in computer science is a directed graph used for deadlock detection in operating systems and relational database systems. In computer science, a system that allows concurrent operation of multiple processes and locking of resourc ...
'' represents a deadlock situation, which should be resolved by breaking the cycle. Both cycle types result from conflicts and should be broken. Under any technique type, conflicts should be detected and considered, with similar overhead for both materialized and non-materialized conflicts (typically by using mechanisms like locking, while either blocking for locks or not blocking but recording conflict for materialized conflicts). In a blocking method, typically a
context switch In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes ...
ing occurs upon conflict, with (additional) incurred overhead. Otherwise, blocked transactions' related computing resources remain idle, unutilized, which may be a worse alternative. When conflicts do not occur frequently, optimistic methods typically have an advantage. With different transaction loads (mixes of transaction types) one technique type (i.e., either optimistic or pessimistic) may provide better performance than the other. Unless schedule classes are ''inherently blocking'' (i.e., they cannot be implemented without data-access operations blocking; e.g., 2PL, SS2PL and SCO above; see chart), they can also be implemented using optimistic techniques (e.g., Serializability, Recoverability).


Serializable multi-version concurrency control

: See also
Multiversion concurrency control Multiversion concurrency control (MCC or MVCC), is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory. Description W ...
(partial coverage) and Serializable Snapshot Isolation in
Snapshot isolation In databases, and transaction processing (transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at ...
Multi-version concurrency control (MVCC) is a common way today to increase concurrency and performance by generating a new version of a database object each time the object is written and allowing transactions' read operations of several last relevant versions (of each object), depending on scheduling method. MVCC can be combined with all the serializability techniques listed above (except SerializableSI, which is originally MVCC-based). It is utilized in most general-purpose DBMS products. MVCC is especially popular nowadays through the ''relaxed serializability'' (see above) method ''
Snapshot isolation In databases, and transaction processing (transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at ...
'' (SI) which provides better performance than most known serializability mechanisms (at the cost of possible serializability violation in certain cases). SerializableSI, which is an efficient enhancement of SI to make it serializable, is intended to provide an efficient serializable solution. SerializableSI has been analyzedAlan Fekete (2009)
"Snapshot Isolation and Serializable Execution"
Presentation, Page 4, 2009, The university of Sydney (Australia). Retrieved 16 September 2009
via a general theory of MVCC.


Distributed serializability


Overview

Distributed serializability is the serializability of a schedule of a transactional
distributed system A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
(e.g., a
distributed database A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location (e.g. a data centre); or maybe dispersed over a network of interconnect ...
system). Such a system is characterized by ''
distributed transaction A distributed transaction is a database transaction in which two or more network hosts are involved. Usually, hosts provide transactional resources, while the transaction manager is responsible for creating and managing a global transaction that enc ...
s'' (also called ''global transactions''), i.e., transactions that span computer processes (a process abstraction in a general sense, depending on computing environment; e.g.,
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
's thread) and possibly network nodes. A distributed transaction comprises more than one of several ''local sub-transactions'' that each has states as described above for a
database transaction A database transaction symbolizes a unit of work, performed within a database management system (or similar system) against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally rep ...
. A local sub-transaction comprises a single process, or more processes that typically fail together (e.g., in a single
processor core A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
). Distributed transactions imply a need for an
atomic commit In the field of computer science, an atomic commit is an operation that applies a set of distinct changes as a single operation. If the changes are applied, then the atomic commit is said to have succeeded. If there is a failure before the atomic ...
protocol to reach consensus among its local sub-transactions on whether to commit or abort. Such protocols can vary from a simple (one-phase) handshake among processes that fail together to more sophisticated protocols, like
two-phase commit In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed a ...
, to handle more complicated cases of failure (e.g., process, node, communication, etc. failure). Distributed serializability is a major goal of
distributed concurrency control Distributed concurrency control is the concurrency control of a system distributed over a computer network ( Bernstein et al. 1987, Weikum and Vossen 2001). In ''database systems'' and ''transaction processing'' (''transaction management'') dist ...
for correctness. With the proliferation of the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
,
cloud computing Cloud computing is the on-demand availability of computer system resources, especially data storage ( cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over mul ...
,
grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from co ...
, and small, portable, powerful computing devices (e.g.,
smartphone A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whic ...
s,) the need for effective distributed serializability techniques to ensure correctness in and among distributed applications seems to increase. Distributed serializability is achieved by implementing distributed versions of the known centralized techniques. Typically, all such distributed versions require utilizing conflict information (of either materialized or non-materialized conflicts, or, equivalently, transaction precedence or blocking information; conflict serializability is usually utilized) that is not generated locally, but rather in different processes, and remote locations. Thus information distribution is needed (e.g., precedence relations, lock information, timestamps, or tickets). When the distributed system is of a relatively small scale and message delays across the system are small, the centralized concurrency control methods can be used unchanged while certain processes or nodes in the system manage the related algorithms. However, in a large-scale system (e.g., ''grid'' and ''cloud''), due to the distribution of such information, a substantial performance penalty is typically incurred, even when distributed versions of the methods (vs. the centralized ones) are used, primarily due to computer and communication latency. Also, when such information is distributed, related techniques typically do not scale well. A well-known example with respect to scalability problems is a
distributed lock manager Operating systems use lock managers to organise and serialise the access to resources. A distributed lock manager (DLM) runs in every machine in a cluster, with an identical copy of a cluster-wide lock database. In this way a DLM provides software ...
, which distributes lock (non-materialized conflict) information across the distributed system to implement locking techniques.


See also

* Strong strict two-phase locking (SS2PL or Rigorousness). * Making snapshot isolation serializable in
Snapshot isolation In databases, and transaction processing (transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at ...
. *
Global serializability In concurrency control of ''databases'', ''transaction processing'' (''transaction management''), and other transactional distributed applications, global serializability (or modular serializability) is a property of a ''global schedule'' of tran ...
, where the ''Global serializability problem'' and its proposed solutions are described. *
Linearizability In concurrent programming, an operation (or set of operations) is linearizable if it consists of an ordered list of invocation and response events (event), that may be extended by adding response events such that: # The extended list can be re-e ...
, a more general concept in
concurrent computing Concurrent computing is a form of computing in which several computations are executed '' concurrently''—during overlapping time periods—instead of ''sequentially—''with one completing before the next starts. This is a property of a syst ...
.


Notes


References

* Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (1987)
''Concurrency Control and Recovery in Database Systems''
Addison Wesley Publishing Company, *
Gerhard Weikum Gerhard Weikum is a Research Director at the Max Planck Institute for Informatics in Saarbrücken, Germany, where he is leading the databases and information systems department. His current research interests include transactional and distributed ...
, Gottfried Vossen (2001)
''Transactional Information Systems''
Elsevier, {{ISBN, 1-55860-508-8 Data management Databases Concurrency control Transaction processing Distributed computing problems el:Σειριοποιησιμότητα Συγκρούσεων