MySQL Cluster
   HOME

TheInfoList



OR:

MySQL Cluster , also known as MySQL Ndb Cluster is a technology providing shared-nothing clustering and auto-sharding for the
MySQL MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
database management system In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and an ...
. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the
NDB NDB may refer to: Finance * National development bank, set up by a country's government to improve that country's economy * New Development Bank, a development bank jointly operated by the BRICS nations * NDB Bank, Sri Lankan commercial bank Pol ...
or NDBCLUSTER storage engine for MySQL ("NDB" stands for Network Database).


Architecture

MySQL Cluster is designed around a distributed, multi-master
ACID An acid is a molecule or ion capable of either donating a proton (i.e. Hydron, hydrogen cation, H+), known as a Brønsted–Lowry acid–base theory, Brønsted–Lowry acid, or forming a covalent bond with an electron pair, known as a Lewis ...
compliant architecture with no
single point of failure A single point of failure (SPOF) is a part of a system that would Cascading failure, stop the entire system from working if it were to fail. The term single point of failure implies that there is not a backup or redundant option that would enab ...
. MySQL Cluster uses automatic
sharding A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard may be held on a separate database server instance, to spread load. Some data in a database remains present in all shards, but so ...
(partitioning) to scale out read and write operations on commodity hardware and can be accessed via SQL and Non-SQL (NoSQL)
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
s.


Replication

Internally MySQL Cluster uses synchronous replication through a
two-phase commit Two-phase may refer to: * Two-phase electric power * Two-phase commit protocol * Two-phase flow * Two-phase locking * Binary phase, chemical compounds composed of two elements {{Disambig ...
mechanism in order to guarantee that data is written to multiple nodes upon committing the data. Two copies (known as ''replicas'') of the data are required to guarantee availability. MySQL Cluster automatically creates “node groups” from the number of replicas and data nodes specified by the user. Updates are synchronously replicated between members of the node group to protect against data loss and support fast failover between nodes. Cluster replication differs from "MySQL Replication", which is instead . It is also possible to replicate asynchronously between clusters; this is sometimes referred to as "MySQL Cluster Replication" or "geographical replication". This is typically used to replicate clusters between data centers for
IT disaster recovery IT disaster recovery (also, simply disaster recovery (DR)) is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle. DR employs policies, tools, an ...
or to reduce the effects of network latency by locating data physically closer to a set of users. Unlike standard MySQL replication, MySQL Cluster's geographic replication uses
optimistic concurrency control Optimistic concurrency control (OCC), also known as optimistic locking, is a non-locking concurrency control method applied to transactional systems such as relational database management systems and software transactional memory. OCC assumes that ...
and the concept of Epochs to provide a mechanism for conflict detection and resolution, enabling active/active clustering between data centers. Starting with MySQL Cluster 7.2, support for synchronous replication between data centers was supported with the Multi-Site Clustering feature.


Horizontal data partitioning (auto-sharding)

MySQL Cluster is implemented as a fully distributed multi-master database ensuring updates made by any application or SQL node are instantly available to all of the other nodes accessing the cluster, and each data node can accept write operations. Data within MySQL Cluster (NDB) tables is automatically partitioned across all of the data nodes in the system. This is done based on a hashing algorithm based on the
primary key In the relational model of databases, a primary key is a designated attribute (column) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combinati ...
on the
table Table may refer to: * Table (database), how the table data arrangement is used within the databases * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (information), a data arrangement with rows and column ...
, and is transparent to the end application. Clients can connect to any node in the cluster and have queries automatically access the correct shards needed to satisfy a query or commit a transaction. MySQL Cluster is able to support cross-shard queries and transactions. Users can define their own partitioning schemes. This allows developers to add “distribution awareness” to applications by partitioning based on a sub-key that is common to all rows being accessed by high running transactions. This ensures that data used to complete transactions is localized on the same shard, thereby reducing network hops.


Hybrid storage

MySQL Cluster allows datasets larger than the capacity of a single machine to be stored and accessed across multiple machines. MySQL Cluster maintains all indexed columns in distributed memory. Non-indexed columns can also be maintained in distributed memory or can be maintained on disk with an in-memory
page cache In computing, a page cache, sometimes also called disk cache, is a transparent cache for the pages originating from a secondary storage device such as a hard disk drive (HDD) or a solid-state drive (SSD). The operating system keeps a page ca ...
. Storing non-indexed columns on disk allows MySQL Cluster to store datasets larger than the aggregate memory of the clustered machines. MySQL Cluster writes
Redo Undo is an interaction technique which is implemented in many computer programs. It erases the last change done to the document, reverting it to an older state. In some more advanced programs, such as Graphics software, graphic processing, undo w ...
logs to disk for all data changes as well as check pointing data to disk regularly. This allows the cluster to consistently recover from disk after a full cluster outage. As the Redo logs are written
asynchronous Asynchrony is any dynamic far from synchronization. If and as parts of an asynchronous system become more synchronized, those parts or even the whole system can be said to be in sync. Asynchrony or asynchronous may refer to: Electronics and com ...
ly with respect to transaction commit, some small number of transactions can be lost if the full cluster fails, however this can be mitigated by using geographic replication or multi-site cluster discussed above. The current default asynchronous write delay is 2 seconds, and is configurable. Normal single point of failure scenarios do not result in any data loss due to the synchronous data replication within the cluster. When a MySQL Cluster table is maintained in memory, the cluster will only access disk storage to write Redo records and checkpoints. As these writes are sequential and limited random access patterns are involved, MySQL Cluster can achieve higher write throughput rates with limited disk hardware compared to a traditional disk-based caching RDBMS. This checkpointing to disk of in-memory table data can be disabled (on a per-table basis) if disk-based persistence isn't needed.


Shared nothing

MySQL Cluster is designed to have no
single point of failure A single point of failure (SPOF) is a part of a system that would Cascading failure, stop the entire system from working if it were to fail. The term single point of failure implies that there is not a backup or redundant option that would enab ...
. Provided that the cluster is set up correctly, any single node, system, or piece of hardware can fail without the entire cluster failing. Shared disk ( SAN) is not required. The interconnects between nodes can be standard
Ethernet Ethernet ( ) is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
,
Gigabit Ethernet In computer networking, Gigabit Ethernet (GbE or 1 GigE) is the term applied to transmitting Ethernet frames at a rate of a gigabit per second. The most popular variant, 1000BASE-T, is defined by the IEEE 802.3ab standard. It came into use in ...
,
InfiniBand InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used ...
, or SCI interconnects.


SQL and NoSQL APIs

As MySQL Cluster stores tables in data nodes, rather than in the MySQL Server, there are multiple interfaces available to access the database: * SQL access via the MySQL Server * NoSQL APIs where MySQL Cluster libraries can be embedded into an application to provide direct access to the data nodes without passing through a SQL layer. These include: *
Memcached
*
Node.js / JavaScript
*

*
HTTP / REST
*


MySQL Cluster Manager

Part of the commercial MySQL Cluster CGE, MySQL Cluster Manager is a tool designed to simplify the creation and administration of the MySQL Cluster CGE database by automating common management tasks, including on-line scaling, upgrades, backup/restore and reconfiguration. MySQL Cluster Manager also monitors and automatically recovers MySQL Server application nodes and management nodes, as well as the MySQL Cluster data nodes.


MySQL Ndb Operator

The open sourc
MySQL Ndb Operator
simplifies the deployment and operation of MySQL Cluster on a
Kubernetes Kubernetes (), also known as K8s is an open-source software, open-source OS-level virtualization, container orchestration (computing), orchestration system for automating software deployment, scaling, and management. Originally designed by Googl ...
cluster. Ndb Operator deploys containerized MySQL Cluster Data, Management and SQL nodes in a number of StatefulSets with data stored in Persistent Volumes. Kubernetes mechanisms extend the high availability features of MySQL Cluster, for example automatically restoring HA redundancy after hardware failures by migrating pods to new hardware. Operating MySQL Cluster on Kubernetes allows a full stack of cloud native software to be operated in the same way on private or public clouds.


NDB Cluster

NDB Cluster is the distributed
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
system underlying MySQL Cluster. It can be used independently of a
MySQL MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
Server with users accessing the Cluster via the NDB API (C++). "NDB" stands for Network Database. From the MySQL Server perspective the NDB Cluster is a
Storage engine A database engine (or storage engine) is the underlying software component that a database management system (DBMS) uses to create, read, update and delete (CRUD) data from a database. Most database management systems include their own application ...
for storing tables of rows. From the NDB Cluster perspective, a MySQL Server instance is an
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
process connected to the Cluster. NDB Cluster can concurrently support access from other types of API processes including Memcached, JavaScript/Node.JS, Java, JPA and HTTP/REST. All API processes can operate on the same tables and data stored in the NDB Cluster. MySQL Cluster uses the
MySQL MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
Server to provide the following capabilities on top of Ndb Cluster: *
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
parsing / optimising / execution capability ** Connectors to applications via
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the Java (programming language), Java programming language which defines how a client may access a database. It is a Java-based data access technology used for Java ...
,
ODBC In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An ...
etc. * Cross-table
join Join may refer to: * Join (law), to include additional counts or additional defendants on an indictment *In mathematics: ** Join (mathematics), a least upper bound of sets orders in lattice theory ** Join (topology), an operation combining two topo ...
mechanism * User authentication and authorisation * Asynchronous data replication to other systems All API processes including the MySQL Server use the NDBAPI
The MySQL Cluster API Developer Guide
C++ client library to connect to the NDB Cluster and perform operations.


Implementation

MySQL Cluster uses three different types of nodes (processes) : * Data node (ndbd/ndbmtd process): These nodes store the data. Tables are automatically sharded across the data nodes which also transparently handle load balancing, replication, failover and self-healing. * Management node (ndb_mgmd process): Used for configuration and monitoring of the cluster. They are required only to start or restart a cluster node. They can also be configured as arbitrators, but this is not mandatory (MySQL Servers can be configured as arbitrators instead).Jon Stephens, Mike Kruckenberg, Roland Bouman (2007)
"MySQL 5.1 Cluster DBA Certification Study Guide"
pp. 86
* Application node or SQL node (mysqld process): A MySQL server (mysqld) that connects to all of the data nodes in order to perform data storage and retrieval. This node type is optional; it is possible to query data nodes directly via the NDB API, either natively using the C++ API or one of the additional NoSQL APIs described above. Generally, it is expected that each node will run on a separate physical host, VM or cloud instance (although it is very common to co-locate Management Nodes with MySQL Servers). For best practice, it is recommended not to co-locate nodes within the same node group on a single physical host (as that would represent a single point of failure).


Versions

From the 8.0 release onwards, MySQL Cluster is based directly on the corresponding releases of the MySQL Server. Previously, MySQL Cluster version numbers were decoupled from those of MySQL Server - for example MySQL Cluster 7.6 was based on/contained the server component from MySQL 5.7. Higher versions of MySQL Cluster include all of the features of lower versions, plus some new features. Currently available versions: * MySQL Cluster 9.X Innovation Release series * MySQL Cluster 8.4 LTS based on MySQL 8.4 LTS release :Add support for TLS on cluster internal connections * MySQL Cluster 8.0 based on MySQL 8.0 :Increase in max row size to 30kB, Support for up to 144 data nodes, Improved distributed filtering and joining, Support for parallel outer joins and semi joins, Improved schema and ACL handling, Online column rename, Simplified configuration, Multithreaded parallel backup and restore, Disk data performance improvements, Enhanced support for 3 and 4 replica configurations, Support for multi socket mesh networking, Support for restore transformations, Improved Blob write performance, Backup encryption, Support for IPv6, Threading autoconfiguration, Improved recovery performance, Improved query multithreading. * MySQL Cluster 7.6 based on MySQL 5.7 :Improved restart and recovery times, reduced disk space usage, improved join performance, new import tool, shared memory communication, improved topology awareness for cloud. Older versions (no longer in development): * MySQL Cluster 8.1 based on MySQL 8.1 Innovation release * MySQL Cluster 8.2 based on MySQL 8.2 Innovation release * MySQL Cluster 8.3 based on MySQL 8.3 Innovation release * MySQL Cluster 7.5 based on MySQL 5.7 :Includes support for bigger datasets(more than 128TB per node), improved read scalability through read optimized tables, improved SQL support. * MySQL Cluster 7.4 based on MySQL 5.6 :Includes enhanced conflict detection and resolution, improved node restart times, new Event API. * MySQL Cluster 7.3 based on MySQL 5.6 :Includes support for
foreign key A foreign key is a set of attributes in a table that refers to the primary key of another table, linking these two tables. In the context of relational databases, a foreign key is subject to an inclusion dependency constraint that the tuples ...
constraints, Node.js /
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
API and an auto-installer. * MySQL Cluster 7.2 based on MySQL 5.5 :Includes Adaptive Query Localization (pushes JOIN operations down to the data nodes), Memcached API, simplified Active/Active Geographic replication, multi-site clustering, data node scalability enhancements, consolidated user privileges. * MySQL Cluster 7.1 based on MySQL 5.1.D :Includes ClusterJ and ClusterJPA connectors * MySQL Cluster 7.0 based on MySQL 5.1.C :Includes multi-threaded data nodes (ndbmtd), Transactional DDL, Windows support. * MySQL Cluster 6.3 based on MySQL 5.1.B :Includes compressed backup + LCP, circular replication support, conflict detection/resolution, table optimization etc. * MySQL Cluster 6.2 based on MySQL 5.1.A :First 'telco' or 'carrier grade edition' release. Supports 255 nodes, online table alter, replication latency and throughput enhancements etc. * Ndb included in MySQL 5.1.X source tree


Requirements

For evaluation purposes, it is possible to run MySQL Cluster on a single physical server. For production deployments, the minimum system requirements are for 3 x instances / hosts: * 2 × Data Nodes * 1 × Application / Management Node or * 2 × Data Node + Application * 1 × Management Node Configurations as follows: * OS:
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
Solaris Solaris is the Latin word for sun. It may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Sol ...
,
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
.
macOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
(for development only) * CPU: Intel/AMD x86/x86-64, UltraSPARC * Memory: 1GB * HDD: 3GB * Network: 1+ nodes (Standard Ethernet - TCP/IP) Tips and recommendations on deploying highly performant, production grade clusters can be found in th
MySQL Cluster Evaluation Guide
and th
Guide to Optimizing Performance of the MySQL Cluster Database


History

MySQL AB MySQL AB was a Swedish software company founded in 1995. It was acquired by Sun Microsystems in 2008, Sun was in turn acquired by Oracle Corporation in 2010. MySQL AB is the creator of MySQL, a relational database management system, as well a ...
acquired the technology behind MySQL Cluster from Alzato, a small venture company started by
Ericsson (), commonly known as Ericsson (), is a Swedish multinational networking and telecommunications company headquartered in Stockholm, Sweden. Ericsson has been a major contributor to the development of the telecommunications industry and is one ...
.
NDB NDB may refer to: Finance * National development bank, set up by a country's government to improve that country's economy * New Development Bank, a development bank jointly operated by the BRICS nations * NDB Bank, Sri Lankan commercial bank Pol ...
was originally designed for the telecom market, with its
high availability High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. There is now more dependence on these systems as a result of modernization ...
and high performance requirements. MySQL Cluster based on the NDB storage engine has since been integrated into the
MySQL MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
product, with its first release being in MySQL 4.1.


Books

MySQL Cluster 7.5 inside and out. Book written by Mikael Ronström, the founder of the NDB technology. Pro MySQL NDB Cluster. Book written by Jesper Wisborg Krogh and Mikiya Okuno, support engineers of MySQL.


Support

MySQL Cluster is licensed under the GPLv2 license. Commercial support is available as part of MySQL Cluster CGE, which also includes non-open source addons such as MySQL Cluster Manager, MySQL Enterprise Monitor, in addition to MySQL Enterprise Security and MySQL Enterprise Audit.


See also

* Galera Cluster, a generic synchronous multi-master replication library for transactional databases, for MySQL and MariaDB

*
Percona Percona is an American company based in Durham, North Carolina and the developer of a number of open source software projects for MySQL, MariaDB, PostgreSQL, MongoDB and RocksDB users. The company’s revenue of around $25 million a year is der ...
XtraDB Cluster, also is a combination of Galera replication library and MySQL supporting multi master. * RonDB A fork of MySQL Cluster maintained by Hopsworks.


References


External links


MySQL


Official MySQL Cluster DocumentationMySQL Cluster DemonstrationMySQL Cluster DatasheetMySQL Cluster Auto-Installer TutorialMySQL Cluster Whitepapers and GuidesMySQL Cluster Community Forum


Other


Design and Modelling of a Parallel Data Server for Telecom Applications (1997)
Original MySQL Cluster design motivation. {{MySQL MySQL Cluster computing