Shard (database architecture)

{{short description|Horizontal partition of data in a database or search engine}}A '''database shard''', or simply a '''shard''', is a horizontal partition of data within a database or search engine. Each shard may be held on a separate database server instance in order to spread across multiple servers.

Some data in a database may remain present in all shards,{{efn|Typically supporting data such as dimension tables.}} while other data is stored in only one shard. In such cases, each shard acts as the single source for its subset of data.<ref>{{cite book |title= NoSQL Distilled |first1=Pramod J. |last1=Sadalage |author-link2 = Martin Fowler (software engineer) |first2=Martin |last2=Fowler |chapter=4: Distribution Models |isbn= 978-0321826626 |year= 2012 |publisher=Pearson Education }}</ref>

== Database architecture == '''Horizontal partitioning''' is a database design principle whereby ''rows'' of a database table are held separately, rather than being split into columns (as in normalization and vertical partitioning, to varying degrees ). Each partition forms part of a shard, which may in turn be located on a separate database server or in a separate physical location.

There are numerous advantages to the horizontal partitioning of data. Since tables are divided and distributed into multiple servers, the total number of rows in each table in each database is reduced. This reduces index size, which generally improves search performance. A database shard can be placed on separate hardware, and multiple shards can be placed on multiple machines. This enables the distribution of a database across a large number of machines, which can significantly improve performance. In addition, if the database shard is based on some real-world segmentation of the data (e.g., European customers v. American customers) it may be possible to infer the appropriate shard membership easily and automatically, and to query only the relevant shard.<ref name="Rahul Roy, Shard" >{{cite web |url=http://technoroy.blogspot.com/2008/07/shard-database-design.html |title=Shard - A Database Design |author=Rahul Roy |date=July 28, 2008 }} </ref>

In practice, sharding is complex. Although it has long been implemented through manual coding (especially where rows have an obvious grouping, as in the customer region example above), this approach is often inflexible. There is a desire to support sharding automatically, both in terms of adding code support for it, and for identifying candidates to be sharded separately. Consistent hashing is a technique used in sharding to distribute large loads across multiple smaller services and servers.<ref>{{cite web|last=Ries|first=Eric|title=Sharding for Startups|url=http://www.startuplessonslearned.com/2009/01/sharding-for-startups.html}}</ref>

Where distributed computing is used to separate load between multiple servers (either for performance or reliability reasons), a sharding approach may also be useful. In the 2010s, sharding of execution capacity, as well as the more traditional sharding of data, emerged as a potential approach to address performance and scalability challenges in blockchains.<ref name=acm20191021>{{cite book |chapter-url=https://dl.acm.org/doi/abs/10.1145/3318041.3355457 |date=21 October 2019 |pages=41–61 |last1=Wang|first1=Gang |last2=Shi|first2=Zhijie Jerry |last3=Nixon|first3=Mark |last4=Han|first4=Song |title=Proceedings of the 1st ACM Conference on Advances in Financial Technologies |chapter=SoK |doi=10.1145/3318041.3355457 |isbn=9781450367325 |s2cid=204749727 }}</ref><ref name=FCDS20200718 >{{cite book |url=https://dl.acm.org/doi/abs/10.1145/3318041.3355457 |date=18 July 2020 |pages=114–134 |last1=Yu|first1=Mingchao |last2=Sahraei|first2=Saeid |last3=Nixon|first3=Mark |last4=Han|first4=Song |title=Proceedings of the 1st ACM Conference on Advances in Financial Technologies |chapter=SoK: Sharding on Blockchain |doi=10.1145/3318041.3355457 |isbn=9781450367325 |s2cid=204749727 }}</ref>

Recent academic work has proposed protocols such as Cerberus to address cross-shard atomicity by braiding consensus across multiple shards, allowing transactions to affect multiple partitions simultaneously without requiring a global lock.<ref name="VLDB2021">{{cite journal |last1=Hellings |first1=Jelle |last2=Sadoghi |first2=Mohammad |title=Cerberus: Minimalistic Multi-shard Byzantine-resilient Transaction Processing |journal=Proceedings of the VLDB Endowment |volume=14 |issue=11 |pages=2230–2243 |year=2021 |url=http://www.vldb.org/pvldb/vol14/p2230-hellings.pdf |doi=10.14778/3476249.3476274}}</ref>

== Compared to horizontal partitioning == Horizontal partitioning splits one or more tables by row, usually within a ''single'' instance of a schema and a database server. It may offer an advantage by reducing index size (and thus search effort), provided there is an obvious, robust, and implicit way to identify in which partition a particular row will be found, without having to first search the index; for example, the classic case of the '<code>CustomersEast</code>' and '<code>CustomersWest</code>' tables, where a ZIP code already indicates where a row will be found.

Sharding extends this approach. It partitions the relevant table or tables in the same way, but does so across potentially ''multiple'' instances of the schema. An advantage is that the search load for the large partitioned table can be distributed across multiple servers (logical or physical), rather than only across multiple indexes on a same logical server.

Splitting shards across multiple isolated instances requires more than simple horizontal partitioning. The expected gains in efficiency would be reduced if querying the database required ''multiple'' instances to be accessed just to retrieve a simple dimension table. Beyond partitioning, sharding therefore involves distributing large, partitionable tables across servers, while smaller tables are replicated in full on each server.<ref>{{Cite web |title=Database Sharding: Concepts & Examples |url=https://www.mongodb.com/resources/products/capabilities/database-sharding-explained |access-date=2026-03-20 |website=MongoDB |language=en-us}}</ref>

This is also why sharding is related to a shared-nothing architecture—once sharded, each shard can reside in a separate logical schema instance, physical database server, data center, or geographic region. Sharding is intended to minimize the need for cross-shard access by partitioning data across independent shards.<ref>{{cite web|title=Understanding Database Sharding|url=https://www.digitalocean.com/community/tutorials/understanding-database-sharding|website=DigitalOcean Community Tutorials|date=2022-03-16|access-date=2025-10-09|quote=Database shards exemplify a shared-nothing architecture. This means that the shards are autonomous; they don't share any of the same data or resources.}}</ref>

This makes replication across multiple servers easier (simple horizontal partitioning does not). It is also useful for worldwide distribution of applications, where communications links between data centers might otherwise become a bottleneck.<ref>{{Cite web |title=A Guide To Horizontal Vs Vertical Scaling |url=https://www.mongodb.com/resources/basics/horizontal-vs-vertical-scaling |access-date=2026-03-20 |website=MongoDB |language=en-us}}</ref>

There is also a requirement for some notification and replication mechanism between schema instances, so that unpartitioned tables remain as closely synchronized as the application requires. This is a complex architectural choice in sharded systems: approaches range from making these tables effectively read-only (with updates that are rare and batched), to dynamically replicated tables (at the cost of reducing some of the distribution benefits of sharding), and many options in between.<ref>{{Cite web |title=Sharding - Database Manual - MongoDB Docs |url=https://www.mongodb.com/docs/manual/sharding/ |access-date=2026-03-20 |website=www.mongodb.com |language=en}}</ref>

== Implementations == * Altibase provides a combined (client-side and server-side) sharding architecture transparent to client applications. * Apache HBase supports automatic sharding.<ref>{{Cite web|url=http://hbase.apache.org/|title=Apache HBase – Apache HBase™ Home|website=hbase.apache.org}}</ref> * [https://azure.microsoft.com/en-us/products/category/databases Azure SQL Database Elastic Database] tools support support sharding to enable scaling out and in of an application’s data tier.<ref>{{Cite web|url=https://azure.microsoft.com/en-us/blog/introducing-elastic-scale-preview-for-azure-sql-database/|title=Introducing Elastic Scale preview for Azure SQL Database|website=azure.microsoft.com|date=2 October 2014 }}</ref> * ClickHouse, an open-source OLAP database management system, supports sharding. * Couchbase supports automatic and transparent sharding. * CUBRID has supported sharding since version 9.0. * [https://help.sap.com/docs/DB6/4c49a344277943ad91358094fdaf9765/c289a552d161224fe10000000a445394.html Db2 Data Partitioning Feature (MPP)], a shared-nothing database partitioning feature, runs on separate nodes. * [https://www.alibabacloud.com/blog/what-are-the-differences-between-polardb-x-and-drds_601253 DRDS (Distributed Relational Database Service)] of Alibaba Cloud supports database and table sharding,<ref>{{Cite web|url=https://www.alibabacloud.com/help/doc-detail/29659.htm?spm=a2c63.l28256.a3.1.4eb21d9a8lUMTW|title=Alibaba Cloud Help Center - Cloud Definition and Explanation of Cloud Based Services - Alibaba Cloud|website=www.alibabacloud.com}}</ref> and has been used for large-scale events such as Singles' Day.<ref>{{Cite web|url=https://www.alibabacloud.com/product/drds|title=Focuses on Large-Scale Online Databases - Alibaba Cloud|website=www.alibabacloud.com}}</ref> * Elasticsearch, an enterprise search server, supports sharding.<ref name="Elasticsearch Shard" >{{Cite web|url=https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-allocation.html|title=Index Shard Allocation | Elasticsearch Guide [7.13] | Elastic|website=www.elastic.co}}</ref> * eXtreme Scale is a cross-process in-memory key/value data store (a NoSQL data store) that uses sharding to achieve scalability across processes for both data and MapReduce-style parallel processing.<ref>{{Cite web|url=http://publib.boulder.ibm.com/infocenter/wxsinfo/v7r1/index.jsp?topic=%2Fcom.ibm.websphere.extremescale.over.doc%2|title=IBM Docs}}</ref> * Hibernate supports sharding, but has seen little development since 2007.<ref name="Hibernate Shards" >{{cite web |title=Hibernate Shards |url=http://shards.hibernate.org/ |date=2007-02-08 }}</ref><ref name="Hibernate Shards documentation" >{{Cite web|url=http://www.hibernate.org/hib_docs/shards/reference/en/html/|title=Hibernate Shards|access-date=2011-03-30|archive-date=2008-12-16|archive-url=https://web.archive.org/web/20081216005922/http://www.hibernate.org/hib_docs/shards/reference/en/html/|url-status=dead}}</ref> * IBM Informix has supported sharding since version 12.1 xC1 as part of the MACH11 technology. Informix 12.10 xC2 added full compatibility with MongoDB drivers, allowing a mix of regular relational tables and NoSQL collections while retaining sharding, fail-over, and ACID properties.<ref name="Informix Grid Queries" >{{cite web |title=New Grid queries for Informix |url=http://ibmdatamag.com/2013/04/informix-12-10-new-grid-queries/ |access-date=2013-10-07 |archive-date=2015-06-10 |archive-url=https://web.archive.org/web/20150610221958/http://ibmdatamag.com/2013/04/informix-12-10-new-grid-queries/ |url-status=dead }}</ref><ref name="NoSQL support in Informix" >{{Cite web|url=https://fr.slideshare.net/journalofinformix/informix-no-sql-sept-2013|title=NoSQL support in Informix (JSON storage, Mongo DB API)|date=September 24, 2013}}</ref> * Kdb+ has supported sharding since version 2.0. * MariaDB Spider, a storage engine, supports table federation, sharding, XA transactions, and ODBC data sources. It has been included in MariaDB server since version 10.0.4.<ref>{{Cite web |title=Spider |url=https://mariadb.com/kb/en/spider/ |access-date=2022-12-20 |website=MariaDB KnowledgeBase}}</ref> * MonetDB, an open-source column-store, introduced read-only sharding in its July 2015 release.<ref>{{cite web | url=https://www.monetdb.org/blog/monetdb-jul2015-released | title= MonetDB July2015 Released | date=31 August 2015}}</ref> * MongoDB has supported sharding since version 1.6.<ref>{{Cite web |title=MongoDB Sharding |url=https://www.mongodb.com/resources/products/capabilities/sharding |access-date=2026-03-20 |website=MongoDB |language=en-us}}</ref> * MySQL Cluster supports automatic and transparent sharding across commodity nodes, allowing scaling of read and write queries, without requiring application changes.<ref name="MySQL Cluster Features & Benefits" >{{cite web |title=MySQL Cluster Features & Benefits |url=http://www.mysql.com/products/cluster/features.html |date=2012-11-23 }}</ref> * MySQL Fabric (part of MySQL utilities) supports sharding.<ref name="MySQL Fabric sharding quick start guide">{{Cite web|url=http://dev.mysql.com/doc/mysql-utilities/1.5/en/fabric-quick-start-sharding.html|title=MySQL Fabric sharding quick start guide}}</ref> * Oracle Database shards since 12c Release 2 and in one liner: Combination of sharding advantages with well-known capabilities of enterprise ready multi-model Oracle Database.<ref name="Oracle 2018">{{cite web |title=Oracle Sharding |website=Oracle |date=2018-05-24 |url=https://www.oracle.com/database/technologies/high-availability/sharding.html |access-date=2021-07-10}}</ref> * Oracle NoSQL Database supports automatic sharding and elastic, online expansion of clusters. * OrientDB has supported sharding since version 1.7. * Solr, an enterprise search platform, supports sharding.<ref name="SorlShard" >{{Cite web|url=https://cwiki.apache.org/confluence/display/solr/DistributedSearch|title=DistributedSearch - SOLR - Apache Software Foundation|website=cwiki.apache.org}}</ref> * ScyllaDB uses per-core sharding within a server and across all nodes in a cluster. * Spanner, a distributed database developed by Google, shards across multiple Paxos state machines to scale to large numbers of machines, data centers, and rows.<ref name="Spanner" >{{cite web | first1 = James C | last1 = Corbett | first2 = Jeffrey | last2 = Dean | first3 = Michael | last3 = Epstein | first4 = Andrew | last4 = Fikes | first5 = Christopher | last5 = Frost | first6 = JJ | last6 = Furman | first7 = Sanjay | last7 = Ghemawat | first8 = Andrey | last8 = Gubarev | first9 = Christopher | last9 = Heiser | first10 = Peter | last10 = Hochschild | first11 = Wilson | last11 = Hsieh | first12 = Sebastian | last12 = Kanthak | first13 = Eugene | last13 = Kogan | first14 = Hongyi | last14 = Li | first15 = Alexander | last15 = Lloyd | first16 = Sergey | last16 = Melnik | first17 = David | last17 = Mwaura | first18 = David | last18 = Nagle | first19 = Sean | last19 = Quinlan | first20 = Rajesh | last20 = Rao | first21 = Lindsay | last21 = Rolig | first22 = Yasushi | last22 = Saito | first23 = Michal | last23 = Szymaniak | first24 = Christopher | last24 = Taylor | first25 = Ruth | last25 = Wang | first26 = Dale | last26 = Woodford | title = Spanner: Google's Globally-Distributed Database | url = http://research.google.com/archive/spanner-osdi2012.pdf | work = Proceedings of OSDI 2012 | access-date =24 February 2014}}</ref> * SQLAlchemy ORM, a data-mapper for the Python programming language shards.<ref name="SQLAlchemy" >{{Cite web|url=https://github.com/sqlalchemy/sqlalchemy|title=sqlalchemy/sqlalchemy|date=July 9, 2021|via=GitHub}}</ref> * SQL Server has supported sharding since SQL Server 2005 with the use of 3rd party tools.<ref name="SQLServer" >{{Cite web|url=https://www.infoq.com/news/2011/02/SQL-Sharding/|title=Partitioning and Sharding Options for SQL Server and SQL Azure|website=infoq.com}}</ref> * Teradata markets a massive parallel database management system as a data warehouse. *[https://www.ndss-symposium.org/ndss-paper/vault-fast-bootstrapping-for-the-algorand-cryptocurrency/ Vault], a cryptocurrency design, uses sharding to reduce the data required to join the network and verify transactions, improving scalability.<ref>{{Cite web|url=https://news.mit.edu/2019/vault-faster-more-efficient-cryptocurrency-0124|title=A faster, more efficient cryptocurrency|website=MIT News|date=24 January 2019 |access-date=2019-01-30}}</ref> *Vitess, an open-source database clustering system, supports sharding for MySQL and is a Cloud Native Computing Foundation project.<ref name="Vitess">{{Cite web|url=https://vitess.io/|title=Vitess|website=vitess.io}}</ref> *[https://shardingsphere.apache.org ShardingSphere] is a database clustering system that provides data sharding, distributed transactions, and distributed database management, and is an Apache Software Foundation (ASF) project.<ref name="ShardingSphere">{{Cite web|url=https://shardingsphere.apache.org/|title=ShardingSphere|website=shardingsphere.apache.org}}</ref>

== Disadvantages == Sharding a database table before it has been optimized locally can introduce unnecessary complexity. Sharding is generally recommended when other optimization strategies have proven insufficient.<ref>{{Cite book |last=Kleppmann |first=Martin |title=Designing Data-Intensive Applications |year=2017 |isbn=9781449373320}}</ref> The added complexity of database sharding can result in several potential challenges.<ref>{{Cite web |title=Database Sharding: Concepts & Examples |url=https://www.mongodb.com/resources/products/capabilities/database-sharding-explained |access-date=2026-03-20 |website=MongoDB |language=en-us}}</ref> * SQL complexity: Developers may need to write more complex SQL queries to handle sharding logic * Additional software requirements: Software that partitions, balances, coordinates, and maintains data integrity can fail or introduce errors. * Single point of failure: Corruption or failure of one shard due to network, hardware, or system issues can affect the integrity of the entire dataset. * Fail-over server complexity: Fail-over servers must maintain copies of all database shards. * Backups complexity: Database backups of the individual shards must be coordinated with the backups of the other shards. * Operational complexity: Tasks such as adding or removing indexes, modifying columns, or altering the schema become more difficult in a sharded environment.

== Etymology == In a database context, the term "shard" is believed to derive from one of two sources: Computer Corporation of America's "A System for Highly Available Replicated Data,"<ref>Sarin, DeWitt & Rosenberg, ''Overview of SHARD: A System for Highly Available Replicated Data'', Technical Report CCA-88-01, Computer Corporation of America, May 1988</ref> which used redundant hardware to facilitate data replication rather than horizontal partitioning, or the 1997 MMORPG ''Ultima Online''.<ref name="koster">{{cite web|url=http://www.raphkoster.com/2009/01/08/database-sharding-came-from-uo/|title=Database "sharding" came from UO?|last1=Koster|first1=Raph|date=2009-01-08|website=Raph Koster's Website|access-date=2015-01-17}}</ref><ref name="garriott" />

Richard Garriott, creator of ''Ultima Online'', recalled that the term originated during the production of the game, specifically in creating a self-regulating, virtual ecology system. Players were able to interact and harvest in-game resources via the internet, which disrupted the balance of the system.<ref name="garriott">{{Cite web|url=https://www.youtube.com/watch?v=KFNxJVTJleE |title=Ultima Online: The Virtual Ecology | War Stories|website=Ars Technica Videos|date=21 December 2017 }}</ref> To address this, the development team separated the global player base into multiple sessions and introduced part of ''Ultima Online''{{'s}} fictional connection to the end of ''Ultima I: The First Age of Darkness'', where the defeat of its antagonist Mondain also led to the creation of multiverse "shards." This modification provided Garriott's team with the fictional basis needed to justify creating copies of the virtual environment. The feature was later removed after several months of testing.<ref name="garriott" />

== See also == * Block Range Index * Shared-nothing architecture

== Notes == {{Notelist}}

== References == {{Reflist}}

== External links == * [https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.json.doc/ids_json_011.htm Informix JSON data sharding]

Category:Data partitioning Category:Database management systems Category:Software design patterns

de:Denormalisierung#Fragmentierung