A distributed SQL database is a single
relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and
wide area network
A wide area network (WAN) is a telecommunications network that extends over a large geographic area. Wide area networks are often established with leased telecommunication circuits.
Businesses, as well as schools and government entities, u ...
s including cloud
availability zones and cloud
geographic zones. Distributed SQL databases typically use the
Paxos
Paxos ( gr, Παξός) is a Greek island in the Ionian Sea, lying just south of Corfu. As a group with the nearby island of Antipaxos and adjoining islets, it is also called by the plural form Paxi or Paxoi ( gr, Παξοί, pronounced in Engl ...
or
Raft
A raft is any flat structure for support or transportation over water. It is usually of basic design, characterized by the absence of a hull. Rafts are usually kept afloat by using any combination of buoyant materials such as wood, sealed barrel ...
algorithms to achieve
consensus across multiple nodes.
Sometimes distributed SQL databases are referred to as
NewSQL
NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.
Many ...
but NewSQL is a more inclusive term that includes databases that are not
distributed databases
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
.
History
Google
Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
's
Spanner
A wrench or spanner is a tool used to provide grip and mechanical advantage in applying torque to turn objects—usually rotary fasteners, such as nuts and bolts—or keep them from turning.
In the UK, Ireland, Australia, and New Zealan ...
popularized the modern distributed SQL database concept. Google described the database and its architecture in a 2012 whitepaper called "Spanner: Google's Globally-Distributed Database." The paper described Spanner as having evolved from a
Big Table
Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio.
History
Bigtable development began in 2004.. It is now used by a number of Googl ...
-like
key value store into a temporal multi-version database where data is stored in "schematized semi-relational tables."
[https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41344.pdf ]
Spanner uses atomic clocks with the Paxos algorithm to accomplish consensus with regards to state distributed between servers. In 2010, and earlier implementation,
ClustrixDB (now
MariaDB
MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the ...
Xpand) moved from a hardware appliance to a Paxos-based software database and was later acquired by
MariaDB
MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the ...
and added to a
SaaS
Software as a service (SaaS ) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is also known as "on-demand software" and Web-based/Web-hosted software.
SaaS is co ...
cloud offering called
SkySQL. In 2015, two Google engineers left the company to create
Cockroach DB which achieves similar results using the Raft algorithm without atomic clocks or custom hardware.
Spanner is primarily used for transactional and time-series use cases. However, Google furthered this research with a follow on paper about Google F1 which it describes as a
Hybrid transactional/analytical processing database built on Spanner.
Architecture
Distributed SQL databases have the following general characteristics:
* synchronous replication
* strong transactional consistency across at least availability zones (i.e.
ACID compliance)
* relational database front end structure meaning data represented as tables with rows and columns similar to any other
RDBMS
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
* automatically
sharded data storage
* underlying key–value storage
* native SQL implementation
Following the
CAP Theorem
In theoretical computer science, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that any distributed data store can provide only two of the following three guarantees:Seth Gilbert and Nancy Lynch"Brewer ...
, distributed SQL databases are "CP" or consistent and partition-tolerant. Algorithmically they sacrifice availability in that a failure of a primary node can make the database unavailable for writes. However, availability is achieved through greater software and hardware reliability, the election of new primaries, and heuristical recovery methods.
All distributed SQL implementations require some kind of temporal synchronization to guarantee consistency. With the exception of Spanner, most do not use custom hardware to provide atomic clocks. Spanner is able to synchronize writes with temporal guarantees. Implementations without custom hardware require servers to compare clock offsets and potentially retry reads.
Distributed SQL Implementations
Compared to NewSQL
CockroachDB, YugabyteDB and others have at times referred to themselves as
NewSQL
NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.
Many ...
databases. Some of the NewSQL databases like
Citus and
Vitess have fundamentally different architectures, but were cited as examples of NewSQL by Matthew Aslett who coined the term. In essence, distributed SQL databases are built from the ground-up and NewSQL databases include replication and sharding technologies added to existing client-server relational databases like
PostgreSQL
PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
. Some experts define DistributedSQL databases as a more specific subset of NewSQL databases.
[{{Cite web, url=https://medium.com/capital-one-tech/newsql-the-next-evolution-in-databases-19109973ee53, title=NewSQL — The Next Evolution in Databases, first=Gokul, last=Prabagaren, date=October 30, 2019, website=Medium]
References
SQL