HOME

TheInfoList



OR:

A distributed SQL database is a single
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and
wide area network A wide area network (WAN) is a telecommunications network that extends over a large geographic area. Wide area networks are often established with leased telecommunication circuits. Businesses, as well as schools and government entities, use ...
s including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or
Raft A raft is any flat structure for support or transportation over water. It is usually of basic design, characterized by the absence of a hull. Rafts are usually kept afloat by using any combination of buoyant materials such as wood, sealed barre ...
algorithms to achieve consensus across multiple nodes. Sometimes distributed SQL databases are referred to as
NewSQL NewSQL is a class of relational database management system, relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a t ...
but NewSQL is a more inclusive term that includes databases that are not distributed databases.


History

Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
's
Spanner A wrench or spanner is a tool used to provide grip and mechanical advantage in applying torque to turn objects—usually rotary fasteners, such as Nut (hardware), nuts and screw, bolts—or keep them from turning. In the United Kingdom, UK, ...
popularized the modern distributed SQL database concept. Google described the database and its architecture in a 2012 whitepaper called "Spanner: Google's Globally-Distributed Database." The paper described Spanner as having evolved from a Big Table-like key value store into a temporal multi-version database where data is stored in "schematized semi-relational tables."https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41344.pdf Spanner uses atomic clocks with the Paxos algorithm to accomplish consensus with regards to state distributed between servers. In 2010, and earlier implementation, ClustrixDB (now
MariaDB MariaDB is a community-developed, commercially supported Fork (software development), fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Developm ...
Xpand) moved from a hardware appliance to a Paxos-based software database and was later acquired by
MariaDB MariaDB is a community-developed, commercially supported Fork (software development), fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Developm ...
and added to a
SaaS Software as a service (SaaS ) is a cloud computing service model where the provider offers use of application software to a client and manages all needed physical and software resources. SaaS is usually accessed via a web application. Unlike oth ...
cloud offering called SkySQL. In 2015, two Google engineers left the company to create Cockroach DB which achieves similar results using the Raft algorithm without atomic clocks or custom hardware. Spanner is primarily used for transactional and time-series use cases. However, Google furthered this research with a follow on paper about Google F1 which it describes as a Hybrid transactional/analytical processing database built on Spanner.


Architecture

Distributed SQL databases have the following general characteristics: * synchronous replication * strong transactional consistency across at least availability zones (i.e.
ACID An acid is a molecule or ion capable of either donating a proton (i.e. Hydron, hydrogen cation, H+), known as a Brønsted–Lowry acid–base theory, Brønsted–Lowry acid, or forming a covalent bond with an electron pair, known as a Lewis ...
compliance) * relational database front end structure meaning data represented as tables with rows and columns similar to any other
RDBMS A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured forma ...
* automatically sharded data storage * underlying key–value storage * native SQL implementation Following the
CAP Theorem In database theory, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer (scientist), Eric Brewer, states that any distributed data store can provide at most Inconsistent triad, two of the following three guarantees: ; ...
, distributed SQL databases are "CP" or consistent and partition-tolerant. Algorithmically they sacrifice availability in that a failure of a primary node can make the database unavailable for writes. All distributed SQL implementations require some kind of temporal synchronization to guarantee consistency. With the exception of Spanner, most do not use custom hardware to provide atomic clocks. Spanner is able to synchronize writes with temporal guarantees. Implementations without custom hardware require servers to compare clock offsets and potentially retry reads.


Distributed SQL implementations


Compared to NewSQL

CockroachDB, YugabyteDB and others have at times referred to themselves as
NewSQL NewSQL is a class of relational database management system, relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a t ...
databases. Some of the NewSQL databases like Citus and Vitess have fundamentally different architectures, but were cited as examples of NewSQL by Matthew Aslett who coined the term. In essence, distributed SQL databases are built from the ground-up and NewSQL databases include replication and sharding technologies added to existing client-server relational databases like
PostgreSQL PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...
. Some experts define DistributedSQL databases as a more specific subset of NewSQL databases.{{Cite web, url=https://medium.com/capital-one-tech/newsql-the-next-evolution-in-databases-19109973ee53, title=NewSQL — The Next Evolution in Databases, first=Gokul, last=Prabagaren, date=October 30, 2019, website=Medium


References

SQL