Multi-master Replication
   HOME

TheInfoList



OR:

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members. Multi-master replication can be contrasted with primary-replica replication, in which a single member of the group is designated as the "master" for a given piece of data and is the only node allowed to modify that data item. Other members wishing to modify the data item must first contact the master node. Allowing only a single master makes it easier to achieve consistency among the members of the group, but is less flexible than multi-master replication. Multi-master replication can also be contrasted with failover clustering where passive replica servers are replicating the master data in order to prepare for takeover in the event that the master stops functioning. The master is the only server active for client interaction. Often, communication and replication in Multi-master systems are handled via a type of
Consensus algorithm A fundamental problem in distributed computing and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes. This often requires coordinating processes to reach consensus, or agree on some data va ...
, but can also be implemented via custom or proprietary algorithms specific to the software. The primary purposes of multi-master replication are increased availability and faster server response time.


Advantages

* Availability: If one master fails, other masters continue to update the
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
. * Distributed Access: Masters can be located in several physical sites, i.e. distributed across the network.


Disadvantages

* Consistency: Most multi-master replication systems are only loosely consistent, i.e. lazy and asynchronous, violating
ACID In computer science, ACID ( atomicity, consistency, isolation, durability) is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequ ...
properties. * Performance: Eager replication systems are complex and increase communication latency. * Integrity: Issues such as conflict resolution can become intractable as the number of nodes involved rises and latency increases.


Implementations


Directory services

Many directory servers are based on
Lightweight Directory Access Protocol The Lightweight Directory Access Protocol (LDAP ) is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. Directory servi ...
(LDAP) and implement multi-master replication.


Active Directory

One of the more prevalent multi-master replication implementations in directory servers is
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
's
Active Directory Active Directory (AD) is a directory service developed by Microsoft for Windows domain networks. It is included in most Windows Server operating systems as a set of processes and services. Initially, Active Directory was used only for centralize ...
. Within Active Directory, objects that are updated on one
Domain Controller A domain controller (DC) is a server computer that responds to security authentication requests within a computer network domain. It is a network server that is responsible for allowing host access to domain resources. It authenticates users, store ...
are then replicated to other domain controllers through multi-master replication. It is not required for all domain controllers to replicate with each other as this would cause excessive network traffic in large Active Directory deployments. Instead, domain controllers have a complex update pattern that ensures that all servers are updated in a timely fashion without excessive replication traffic. Some Active Directory needs are however better served by
Flexible single master operation Flexible Single Master Operations (FSMO, F is sometimes "floating"; pronounced Fiz-mo), or just single master operation or operations master, is a feature of Microsoft's Active Directory (AD). As of 2005, the term FSMO has been deprecated in favour ...
.


CA Directory

CA Directory supports multi-master replication.


OpenDS/OpenDJ

OpenDS OpenDJ is a directory server which implements a wide range of Lightweight Directory Access Protocol and related standards, including full compliance with LDAPv3 but also support for Directory Service Markup Language (DSMLv2). Written in Java (prog ...
(and its successor product
OpenDJ OpenDJ is a directory server which implements a wide range of Lightweight Directory Access Protocol and related standards, including full compliance with LDAPv3 but also support for Directory Service Markup Language (DSMLv2). Written in Java, Open ...
) implemented multi-master since version 1.0. The OpenDS/OpenDJ multi-master replication is asynchronous, it uses a log with a publish-subscribe mechanism that allows scaling to a large number of nodes. OpenDS/OpenDJ replication does conflict resolution at the entry and attribute level. OpenDS/OpenDJ replication can be used over a
wide area network A wide area network (WAN) is a telecommunications network that extends over a large geographic area. Wide area networks are often established with leased telecommunication circuits. Businesses, as well as schools and government entities, us ...
.


OpenLDAP

OpenLDAP OpenLDAP is a free, open-source implementation of the Lightweight Directory Access Protocol (LDAP) developed by the OpenLDAP Project. It is released under its own BSD-style license called the OpenLDAP Public License. LDAP is a platform-independe ...
, the widely used
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
LDAP server, implements multi-master replication since version 2.4 (October 2007


Database Management Systems


Amazon Aurora

Amazon Aurora Amazon Aurora is a relational database service developed and offered by Amazon Web Services beginning in October 2014. Aurora is available as part of the Amazon Relational Database Service (RDS). History Aurora offered MySQL compatible servic ...
is composed of writer nodes, which replicate redo records, and 6 storage nodes. The writer node sends change to each storage node, each of which checks for conflicts then reports confirmation or rejection of the change.


Apache CouchDB

Apache CouchDB uses a simple, HTTP-based multi-master replication system built from its use of an append-only data-store and use of Multiversion Concurrency Control (MVCC). Each document contains a revision ID, so every record stores the evolutionary timeline of all previous revision IDs leading up to itself—which provides the foundation of CouchDB's MVCC system. Additionally, it keeps a by-sequence index for the entire database. "The replication process only copies the last revision of a document, so all previous revisions that were only on the source database are not copied to the destination database." The CouchDB replicator acts as a simple HTTP client acting on both a and database. It compares current sequence IDs for the database, calculates revision differences, and makes the necessary changes to the based on what it found in the history of the database. Bi-directional replication is the result of merely doing another replication with the and values swapped.


ArangoDB

ArangoDB ArangoDB is a free and open-source native graph database system developed by ArangoDB Inc. ArangoDB is a multi-model database system since it supports three data models (graphs, JSON documents, key/value) with one database core and a unified q ...
is a native multi-model database system using multi-master replication. Clusters in ArangoDB use the CP master/master model with no single point of failure. When a cluster encounters a network partition, ArangoDB prefers to maintain its internal consistency over availability. Clients experience the same view of the database regardless of which node they connect to. And, the cluster continues to serve requests even when one machine fails.


Cloudant

Cloudant Cloudant is an IBM software product, which is primarily delivered as a cloud-based service. Cloudant is a non-relational, distributed database service of the same name. Cloudant is based on the Apache-backed CouchDB project and the open source B ...
, a distributed database system, uses largely the same HTTP API as Apache CouchDB, and exposes the same ability to replicate using Multiversion Concurrency Control (MVCC). Cloudant databases can replicate between each other, but internally, nodes within Cloudant clusters use multi-master replication to stay in sync with each other and provide high availability to API consumers.


eXtremeDB Cluster

eXtremeDB Cluster is the clustering sub-system for McObject's
eXtremeDB eXtremeDB is a high performance, low-latency, ACID-compliant embedded database management system using an in-memory database system (IMDS) architecture and designed to be linked into C/ C++ based programs. It works on Windows, Linux, and other ...
embedded database product family. It maintains database consistency across multiple hardware nodes by replicating transactions in a synchronous manner (two-phase commit). An important characteristic of eXtremeDB Cluster is transaction replication, in contrast to log file-based, SQL statement-based, or other replication schemes that may or may not guarantee the success or failure of entire transactions. Accordingly, eXtremeDB Cluster is an
ACID In computer science, ACID ( atomicity, consistency, isolation, durability) is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequ ...
compliant system (not BASE or
eventual consistency Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last upd ...
); a query executed on any cluster node will return the same result as if executed on any other cluster node.


Oracle

Database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
clusters implement multi-master replication using one of two methods.
Asynchronous Asynchrony is the state of not being in synchronization. Asynchrony or asynchronous may refer to: Electronics and computing * Asynchrony (computer programming), the occurrence of events independent of the main program flow, and ways to deal with ...
multi-master replication commits data changes to a
deferred transaction queue Defer may refer to: * Defer Elementary School, a Michigan State Historic Site * Deference, the acknowledgement of the legitimacy of the power of one's superior or superiors * Deferral, the delaying of the realization of an asset or liability until ...
which is periodically processed on all databases in the cluster.
Synchronous Synchronization is the coordination of events to operate a system in unison. For example, the conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are said to be synchronou ...
multi-master replication uses Oracle's two-phase commit functionality to ensure that all databases with the cluster have a consistent
dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
.


Microsoft SQL

Microsoft SQL provides multi-master replication through peer-to-peer replication. It provides a scale-out and high-availability solution by maintaining copies of data across multiple nodes. Built on the foundation of transactional replication, peer-to-peer replication propagates transactionally consistent changes in near real-time.


MySQL / MariaDB

At a basic level, it is possible to achieve a multi-master replication scheme beginning with MySQL version 3.23 with circular replication. Departing from that,
MariaDB MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the ori ...
and
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
ship with some replication support, each of them with different nuances. In terms of direct support we have: MariaDB: natively supports multi-master replication since version 10.0, but conflict resolution is not supported, so each master must contain different databases. On MySQL, this is named multi-source available since versio
5.7.6
MySQL: MySQL Group Replication, a plugin for virtual synchronous multi-master with conflict handling and distributed recovery was released wit

Cluster Projects:
MySQL Cluster MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear sca ...
supports conflict detection and resolution between multiple masters since version 6.3 for true multi-master capability for the MySQL Server. There is also an external project, Galera Cluster created b
codership
, that provides true multi-master capability, based on a fork of the InnoDB storage engine and custom replication plug-ins. Replication is synchronous, so no conflict is possible.
Percona Percona is an American company based in Durham, North Carolina and the developer of a number of open source software projects for MySQL, MariaDB, PostgreSQL, MongoDB and RocksDB users. The company’s revenue of around $25 million a year is de ...
XtraDB Cluster also is a combination of Galera replication library and MySQL supporting multi-master.


PostgreSQL

Various options for synchronous multi-master replication exist.
Postgres-XL Postgres-XL is a distributed relational database management system (RDBMS) software based on PostgreSQL. It aims to provide feature parity with PostgreSQL while distributing the workload over a cluster. The name "Postgres-XL" stands for "eXtensibl ...
which is available under the Mozilla Public License, an
PostgresXC
(now known a
Postgres-X2
which is available under the same license as PostgreSQL itself are examples. Note that th
PgCluster
() project was abandoned in 2007. The replication documentation for
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
categorises the different types of replication available. Various options exist for distributed multi-master, includin
Bucardo

rubyrep
and BD
Bi-Directional Replication


=PostgreSQL BDR

= BDR is aimed at eventual inclusion in PostgreSQL core and has been benchmarked as demonstrating significantly enhanced performanceBDR Performance
Petr Jelinek, 2ndQuadrant. Retrieved 2014-07-10 over earlier options. BDR includes replication of data writes (DML), as well as changes to data definition (DDL) and global sequences. BDR nodes may be upgraded online from version 0.9 onwards. 2ndQuadrant has developed BDR continuously since 2012, with the system used in production since 2014. The latest version BDR 3.6 provides column-level conflict detection, CRDTs, eager replication, multi-node query consistency, and many other features.


Ingres

Within
Ingres Jean-Auguste-Dominique Ingres ( , ; 29 August 1780 – 14 January 1867) was a French Neoclassicism, Neoclassical Painting, painter. Ingres was profoundly influenced by past artistic traditions and aspired to become the guardian of academic ...
Replicator, objects that are updated on one Ingres server can then be replicated to other servers whether local or remote through multi-master replication. If one server fails, client connections can be re-directed to another server. It is not required for all Ingres servers in an environment to replicate with each other as this could cause excessive network traffic in large implementations. Instead, Ingres Replicator allows the appropriate data to be replicated to the appropriate servers without excessive replication traffic. This means that some servers in the environment can serve as failover candidates while other servers can meet other requirements such as managing a subset of columns or tables for a departmental solution, a subset of rows for a geographical region or one-way replication for a reporting server. In the event of a source, target, or network failure, data integrity is enforced through this
two-phase commit protocol In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed a ...
by ensuring that either the whole transaction is replicated, or none of it is. In addition, Ingres Replicator can operate over RDBMS’s from multiple vendors to connect them.


See also

*
Flexible single master operation Flexible Single Master Operations (FSMO, F is sometimes "floating"; pronounced Fiz-mo), or just single master operation or operations master, is a feature of Microsoft's Active Directory (AD). As of 2005, the term FSMO has been deprecated in favour ...
*
Active Directory Active Directory (AD) is a directory service developed by Microsoft for Windows domain networks. It is included in most Windows Server operating systems as a set of processes and services. Initially, Active Directory was used only for centralize ...
*
Distributed database management system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
*
DNS zone transfer DNS zone transfer, also sometimes known by the inducing DNS query type AXFR, is a type of DNS transaction. It is one of the many mechanisms available for administrators to replicate DNS databases across a set of DNS servers. A zone transfer us ...
*
Optimistic Replication Optimistic replication, also known as lazy replication, is a strategy for replication, in which replicas are allowed to diverge. Traditional pessimistic replication systems try to guarantee from the beginning that all of the replicas are identi ...


References

{{reflist


External links


Active Directory Replication Model

Terms and Definitions for Database Replication

SymmetricDS
is database independent,
data synchronization Data synchronization is the process of establishing consistency between source and target data stores, and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and m ...
software. It uses web and database technologies to replicate tables between relational databases in near real time. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage. It supports
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
,
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word '' ...
, SQL Server,
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
,
IBM Db2 Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON a ...
,
Firebird Firebird and fire bird may refer to: Mythical birds * Phoenix (mythology), sacred firebird found in the mythologies of many cultures * Bennu, Egyptian firebird * Huma bird, Persian firebird * Firebird (Slavic folklore) Bird species ''Various sp ...
,
Interbase InterBase is a relational database management system (RDBMS) currently developed and marketed by Embarcadero Technologies. InterBase is distinguished from other RDBMSs by its small footprint, close to zero administration requirements, and multi-g ...
,
HSQLDB HSQLDB (''Hyper SQL Database'') is a relational database management system written in Java. It has a JDBC driver and supports a large subset of SQL-92, SQL:2008, SQL:2011, and SQL:2016 standards. It offers a fast, small (around 1300 kilobytes ...
, H2,
Apache Derby Apache Derby (previously distributed as IBM Cloudscape) is a relational database management system (RDBMS) developed by the Apache Software Foundation that can be embedded in Java programs and used for online transaction processing. It has a 3.5 ...
,
Informix IBM Informix is a product family within IBM's Information Management division that is centered on several relational database management system (RDBMS) offerings. The Informix products were originally developed by Informix Corporation, whose I ...
,
Greenplum Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired ...
,
SQLite SQLite (, ) is a database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it belongs to the family of embedded databases. It is the most ...
, Sybase ASE, and Sybase ASA. Licensed under both open source (
GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
) and commercial licenses.
Daffodil Replicator
is a
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
tool for
data synchronization Data synchronization is the process of establishing consistency between source and target data stores, and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and m ...
,
data migration Data migration is the process of selecting, preparing, extracting, and transforming data and permanently transferring it from one computer storage system to another. Additionally, the validation of migrated data for completeness and the decommis ...
, and data backup between various database servers. Daffodil Replicator works over standard
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. I ...
driver and supports replication across heterogeneous databases. At present, it supports following databases:
Microsoft SQL Server Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which ma ...
,
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word '' ...
, Daffodil database,
IBM Db2 Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON a ...
,
Apache Derby Apache Derby (previously distributed as IBM Cloudscape) is a relational database management system (RDBMS) developed by the Apache Software Foundation that can be embedded in Java programs and used for online transaction processing. It has a 3.5 ...
,
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
, and
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
. Daffodil Replicator is available in both enterprise (commercial) and
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
(
GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
-licensed) versions.
DMOZ Open Directory Project - Database Replication Page
Identity management Distributed computing architecture