# Distributed SQL

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Distributed_SQL
> Markdown URL: https://mediated.wiki/source/Distributed_SQL.md
> Source: https://en.wikipedia.org/wiki/Distributed_SQL
> Source revision: 1353574763
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

{{short description|Relational database which stores data across multiple servers}}

A '''distributed SQL''' database is a single [relational database](/source/relational_database) which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and [wide area network](/source/wide_area_network)s including cloud [availability zones](/source/availability_zone_(computer_science)) and cloud [geographic zones](/source/geographic_zone_(computer_science)). Distributed SQL databases typically use the [Paxos](/source/Paxos_(computer_science)) or [Raft](/source/Raft_(algorithm)) algorithms to achieve [consensus](/source/Consensus_decision-making) across multiple nodes.

Sometimes distributed SQL databases are referred to as [NewSQL](/source/NewSQL) but NewSQL is a more inclusive term that includes databases that are not [distributed databases](/source/distributed_databases).

== History ==

[Google](/source/Google)'s [Spanner](/source/Spanner_(database)) popularized the modern distributed SQL database concept. Google described the database and its architecture in a 2012 whitepaper called "Spanner: Google's Globally-Distributed Database." The paper described Spanner as having evolved from a [Big Table](/source/Big_Table)-like [key value](/source/key_value) store into a temporal multi-version database where data is stored in "schematized semi-relational tables."<ref name=autogenerated1>{{Cite conference |title=F1: A Distributed SQL Database That Scales |conference=The 39th International Conference on Very Large Data Bases, August 26th- 30th 2013, Riva del Garda, Trento, Italy |last=Shute |first=Jeff |url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41344.pdf |url-status=live |volume=6 |last2=Whipkey |first2=Chad |last3=Vingralek |first3=Radek |last4=Rollins |first4=Eric |last5=Samwel |first5=Bart |last6=Handy |first6=Ben |last7=Oancea |first7=Mircea |last8=Littlefield |first8=Kyle |last9=Menestrina |first9=David |display-authors=3 |year=2013 |language=en |publisher=VLDB Endowment |issue=11 |archive-url=https://web.archive.org/web/20260310191603/https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41344.pdf |archive-date=2026-03-10 |access-date=2026-05-10 |conference-url=https://vldb.org/2013/}}</ref>

Spanner uses atomic clocks with the Paxos algorithm to accomplish consensus with regards to state distributed between servers. In 2010, and earlier implementation, [ClustrixDB](/source/Clustrix) (now [MariaDB](/source/MariaDB) Xpand) moved from a hardware appliance to a Paxos-based software database<ref>{{Cite web|url=https://gigaom.com/2010/05/03/clustrix-builds-the-webscale-holy-grail-a-database-that-scales/,%20https://gigaom.com/2010/05/03/clustrix-builds-the-webscale-holy-grail-a-database-that-scales/|title=Clustrix Builds the Webscale Holy Grail: A Database That Scales|first=Stacey|last=Higginbotham|date=May 3, 2010|website=gigaom.com}}{{dead link|date=June 2025|bot=medic}}{{cbignore|bot=medic}}</ref> and was later acquired by [MariaDB](/source/MariaDB)<ref>{{Cite web|url=https://techcrunch.com/2018/09/20/mariadb-acquires-clusterix/|title=MariaDB acquires Clustrix|date=20 September 2018 }}</ref> and added to a [SaaS](/source/SaaS) cloud offering called [SkySQL](/source/MariaDB).<ref>{{Cite web|url=https://www.zdnet.com/article/for-mariadb-its-time-to-put-the-pieces-together/|title=For MariaDB, it's time to put the pieces together|first=Tony|last=Baer (dbInsight)|website=ZDNet}}</ref> In 2015, two Google engineers left the company to create [Cockroach DB](/source/Cockroach_Labs) which achieves similar results using the Raft algorithm without atomic clocks or custom hardware.<ref>{{Cite web|url=https://www.nextplatform.com/2017/02/22/google-spanner-inspires-cockroachdb-outrun/|title=Google Spanner Inspires CockroachDB To Outrun It|first=Timothy Prickett|last=Morgan|date=February 22, 2017|website=The Next Platform}}</ref>

Spanner is primarily used for transactional and time-series use cases. However, Google furthered this research with a follow on paper about Google F1 which it describes as a [Hybrid transactional/analytical processing](/source/Hybrid_transactional%2Fanalytical_processing) database built on Spanner.<ref name=autogenerated1 />

== Architecture ==

Distributed SQL databases have the following general characteristics:

* synchronous replication 
* strong transactional consistency across at least availability zones (i.e. [ACID](/source/ACID) compliance)<ref>{{Citation |title=The future of databases: distributed SQL & MariaDB ® |url=https://www.youtube.com/watch?v=3igIRQqmYc4 |language=en |access-date=2022-12-21}}</ref>
* relational database front end structure{{snd}} meaning data represented as tables with rows and columns similar to any other [RDBMS](/source/RDBMS)
* automatically [sharded](/source/Shard_(database_architecture)) data storage 
* underlying key–value storage<ref>{{Cite web|url=https://www.youtube.com/watch?v=avOgswXxayA|title=The Architecture of a Distributed SQL Database|date=23 September 2020 |via=www.youtube.com}}</ref><ref name=autogenerated1 />
* native SQL implementation

Following the [CAP Theorem](/source/CAP_Theorem), distributed SQL databases are "CP" or consistent and partition-tolerant. Algorithmically they sacrifice availability in that a failure of a primary node can make the database unavailable for writes.

All distributed SQL implementations require some kind of temporal synchronization to guarantee consistency. With the exception of Spanner, most do not use custom hardware to provide atomic clocks. Spanner is able to synchronize writes with temporal guarantees. Implementations without custom hardware require servers to compare clock offsets and potentially retry reads.<ref>{{Cite web|url=https://www.cockroachlabs.com/blog/living-without-atomic-clocks/|title=Living Without Atomic Clocks|date=April 21, 2020|website=Cockroach Labs}}</ref>

== Distributed SQL implementations ==

{| class="wikitable sortable"
|-
! Vendor !! API 
!License model
|-
| [Amazon Aurora](/source/Amazon_Aurora) || [PostgreSQL](/source/PostgreSQL) & [MySQL](/source/MySQL) 
|Proprietary
|-
|[CockroachDB](/source/CockroachDB)
|[PostgreSQL](/source/PostgreSQL)-like 
|Proprietary
|-
| [Google Spanner](/source/Spanner_(database)) || Proprietary SQL-like 
|Proprietary
|-
| [MySQL Cluster](/source/MySQL_Cluster)  || [MySQL](/source/MySQL)
|Open Source (GPLv2)
|-
| [NuoDB](/source/NuoDB) || Proprietary SQL
|Proprietary
|-
| [YugabyteDB](/source/YugabyteDB) || [PostgreSQL](/source/PostgreSQL) & [Cassandra](/source/Apache_Cassandra) CQL-like 
|Open Source (Apache 2.0)
|-
|[TiDB](/source/TiDB)
|[MySQL](/source/MySQL)-like
|Open Source (Apache 2.0)
|-
|[MariaDB XPand](/source/MariaDB)
|[MariaDB](/source/MariaDB)
|Proprietary
|-
| [Teradata](/source/Teradata) || Proprietary SQL-like
|Proprietary
|-
| [YDB](/source/YDB_(database))<ref>{{Cite web|url=https://ydb.tech|title= YDB is an open-source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions |website=ydb.tech}}</ref> || Proprietary SQL-like, [PostgreSQL](/source/PostgreSQL)-like
|Open Source (Apache 2.0)
|}

== Compared to NewSQL ==

CockroachDB, YugabyteDB and others have at times referred to themselves as [NewSQL](/source/NewSQL) databases. Some of the NewSQL databases have fundamentally different architectures, but were cited as examples of NewSQL by Matthew Aslett who coined the term.<ref>{{Cite web|url=https://blogs.451research.com/information_management/2011/04/06/what-we-talk-about-when-we-talk-about-newsql/|title=What we talk about when we talk about NewSQL — Too much information|access-date=2021-01-26|archive-date=2020-06-14|archive-url=https://web.archive.org/web/20200614231037/https://blogs.451research.com/information_management/2011/04/06/what-we-talk-about-when-we-talk-about-newsql/|url-status=dead}}</ref> In essence, distributed SQL databases are built from the ground-up and NewSQL databases include replication and sharding technologies added to existing client-server relational databases like [PostgreSQL](/source/PostgreSQL).<ref>{{Cite web|url=https://www.ibm.com/cloud/blog/sql-vs-nosql|title=SQL vs. NoSQL Databases: What's the Difference?|website=www.ibm.com|date=12 June 2022 }}</ref> Some experts define DistributedSQL databases as a more specific subset of NewSQL databases.<ref>{{Cite web|url=https://medium.com/capital-one-tech/newsql-the-next-evolution-in-databases-19109973ee53|title=NewSQL — The Next Evolution in Databases|first=Gokul|last=Prabagaren|date=October 30, 2019|website=Medium}}</ref>

== References ==
<references />

Category:SQL

---
Adapted from the Wikipedia article [Distributed SQL](https://en.wikipedia.org/wiki/Distributed_SQL) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Distributed_SQL?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
