HOME

TheInfoList



OR:

A distributed SQL database is a single
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and
wide area network A wide area network (WAN) is a telecommunications network that extends over a large geographic area. Wide area networks are often established with leased telecommunication circuits. Businesses, as well as schools and government entities, us ...
s including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the
Paxos Paxos ( gr, Παξός) is a Greek island in the Ionian Sea, lying just south of Corfu. As a group with the nearby island of Antipaxos and adjoining islets, it is also called by the plural form Paxi or Paxoi ( gr, Παξοί, pronounced in Engl ...
or
Raft A raft is any flat structure for support or transportation over water. It is usually of basic design, characterized by the absence of a hull. Rafts are usually kept afloat by using any combination of buoyant materials such as wood, sealed barrel ...
algorithms to achieve consensus across multiple nodes. Sometimes distributed SQL databases are referred to as
NewSQL NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system. Many e ...
but NewSQL is a more inclusive term that includes databases that are not distributed databases.


History

Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
's
Spanner A wrench or spanner is a tool used to provide grip and mechanical advantage in applying torque to turn objects—usually rotary fasteners, such as nuts and bolts—or keep them from turning. In the UK, Ireland, Australia, and New Zeala ...
popularized the modern distributed SQL database concept. Google described the database and its architecture in a 2012 whitepaper called "Spanner: Google's Globally-Distributed Database." The paper described Spanner as having evolved from a Big Table-like key value store into a temporal multi-version database where data is stored in "schematized semi-relational tables."https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41344.pdf Spanner uses atomic clocks with the Paxos algorithm to accomplish consensus with regards to state distributed between servers. In 2010, and earlier implementation, ClustrixDB (now
MariaDB MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the ori ...
Xpand) moved from a hardware appliance to a Paxos-based software database and was later acquired by
MariaDB MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the ori ...
and added to a
SaaS Software as a service (SaaS ) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is also known as "on-demand software" and Web-based/Web-hosted software. SaaS is cons ...
cloud offering called
SkySQL MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the o ...
. In 2015, two Google engineers left the company to create Cockroach DB which achieves similar results using the Raft algorithm without atomic clocks or custom hardware. Spanner is primarily used for transactional and time-series use cases. However, Google furthered this research with a follow on paper about Google F1 which it describes as a
Hybrid transactional/analytical processing Hybrid transaction/analytical processing (HTAP) is a term created by Gartner Inc., an information technology research and advisory company, in its early 2014 research report ''Hybrid Transaction/Analytical Processing Will Foster Opportunities for ...
database built on Spanner.


Architecture

Distributed SQL databases have the following general characteristics: * synchronous replication * strong transactional consistency across at least availability zones (i.e.
ACID In computer science, ACID ( atomicity, consistency, isolation, durability) is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequ ...
compliance) * relational database front end structure meaning data represented as tables with rows and columns similar to any other
RDBMS A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relation ...
* automatically sharded data storage * underlying key–value storage * native SQL implementation Following the
CAP Theorem In theoretical computer science, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that any distributed data store can provide only two of the following three guarantees:Seth Gilbert and Nancy Lynch"Brewer' ...
, distributed SQL databases are "CP" or consistent and partition-tolerant. Algorithmically they sacrifice availability in that a failure of a primary node can make the database unavailable for writes. However, availability is achieved through greater software and hardware reliability, the election of new primaries, and heuristical recovery methods. All distributed SQL implementations require some kind of temporal synchronization to guarantee consistency. With the exception of Spanner, most do not use custom hardware to provide atomic clocks. Spanner is able to synchronize writes with temporal guarantees. Implementations without custom hardware require servers to compare clock offsets and potentially retry reads.


Distributed SQL Implementations


Compared to NewSQL

CockroachDB, YugabyteDB and others have at times referred to themselves as
NewSQL NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system. Many e ...
databases. Some of the NewSQL databases like Citus and Vitess have fundamentally different architectures, but were cited as examples of NewSQL by Matthew Aslett who coined the term. In essence, distributed SQL databases are built from the ground-up and NewSQL databases include replication and sharding technologies added to existing client-server relational databases like
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
. Some experts define DistributedSQL databases as a more specific subset of NewSQL databases.{{Cite web, url=https://medium.com/capital-one-tech/newsql-the-next-evolution-in-databases-19109973ee53, title=NewSQL — The Next Evolution in Databases, first=Gokul, last=Prabagaren, date=October 30, 2019, website=Medium


References

SQL