Shard (database architecture)

{{short description|Horizontal partition of data in a database or search engine}}A '''database shard''', or simply a '''shard''', is a [[Partition (database)#Horizontal partitioning|horizontal partition]] of data within a [[DBMS|database]] or [[search engine]]. Each shard may be held on a separate [[database server]] instance in order to spread across multiple servers.

Some data in a database may remain present in all shards,{{efn|Typically supporting data such as [[dimension table]]s.}} while other data is stored in only one shard. In such cases, each shard acts as the single source for its subset of data.<ref>{{cite book |title= NoSQL Distilled |first1=Pramod J. |last1=Sadalage |author-link2 = Martin Fowler (software engineer) |first2=Martin |last2=Fowler |chapter=4: Distribution Models |isbn= 978-0321826626 |year= 2012 |publisher=Pearson Education }}</ref>

== Database architecture == '''Horizontal partitioning''' is a database design principle whereby ''[[row (database)|rows]]'' of a database table are held separately, rather than being split into [[column (database)|columns]] (as in [[database normalization|normalization]] and [[Partition (database)|vertical partitioning]], to varying degrees ). Each partition forms part of a shard, which may in turn be located on a separate database server or in a separate physical location.

There are numerous advantages to the horizontal partitioning of data. Since tables are divided and distributed into multiple servers, the total number of rows in each table in each database is reduced. This reduces [[Index (database)|index]] size, which generally improves search performance. A database shard can be placed on separate hardware, and multiple shards can be placed on multiple machines. This enables the distribution of a database across a large number of machines, which can significantly improve performance. In addition, if the database shard is based on some real-world segmentation of the data (e.g., European customers v. American customers) it may be possible to infer the appropriate shard membership easily and automatically, and to query only the relevant shard.<ref name="Rahul Roy, Shard" >{{cite web |url=http://technoroy.blogspot.com/2008/07/shard-database-design.html |title=Shard - A Database Design |author=Rahul Roy |date=July 28, 2008 }} </ref>

In practice, sharding is complex. Although it has long been implemented through manual coding (especially where rows have an obvious grouping, as in the customer region example above), this approach is often inflexible. There is a desire to support sharding automatically, both in terms of adding code support for it, and for identifying candidates to be sharded separately. [[Consistent hashing]] is a technique used in sharding to distribute large loads across multiple smaller services and servers.<ref>{{cite web|last=Ries|first=Eric|title=Sharding for Startups|url=http://www.startuplessonslearned.com/2009/01/sharding-for-startups.html}}</ref>

Where [[distributed computing]] is used to separate load between multiple servers (either for performance or reliability reasons), a sharding approach may also be useful. In the 2010s, sharding of [[Execution (computing)|execution]] capacity, as well as the more traditional sharding of [[data availability (cryptography)|data]], emerged as a potential approach to address performance and scalability challenges in [[blockchain]]s.<ref name=acm20191021>{{cite book |chapter-url=https://dl.acm.org/doi/abs/10.1145/3318041.3355457 |date=21 October 2019 |pages=41–61 |last1=Wang|first1=Gang |last2=Shi|first2=Zhijie Jerry |last3=Nixon|first3=Mark |last4=Han|first4=Song |title=Proceedings of the 1st ACM Conference on Advances in Financial Technologies |chapter=SoK |doi=10.1145/3318041.3355457 |isbn=9781450367325 |s2cid=204749727 }}</ref><ref name=FCDS20200718 >{{cite book |url=https://dl.acm.org/doi/abs/10.1145/3318041.3355457 |date=18 July 2020 |pages=114–134 |last1=Yu|first1=Mingchao |last2=Sahraei|first2=Saeid |last3=Nixon|first3=Mark |last4=Han|first4=Song |title=Proceedings of the 1st ACM Conference on Advances in Financial Technologies |chapter=SoK: Sharding on Blockchain |doi=10.1145/3318041.3355457 |isbn=9781450367325 |s2cid=204749727 }}</ref>

Recent academic work has proposed protocols such as Cerberus to address cross-shard [[Atomicity (database systems)|atomicity]] by braiding consensus across multiple shards, allowing transactions to affect multiple partitions simultaneously without requiring a global lock.<ref name="VLDB2021">{{cite journal |last1=Hellings |first1=Jelle |last2=Sadoghi |first2=Mohammad |title=Cerberus: Minimalistic Multi-shard Byzantine-resilient Transaction Processing |journal=Proceedings of the VLDB Endowment |volume=14 |issue=11 |pages=2230–2243 |year=2021 |url=http://www.vldb.org/pvldb/vol14/p2230-hellings.pdf |doi=10.14778/3476249.3476274}}</ref>

== Compared to horizontal partitioning == [[Partition (database)#Partitioning methods|Horizontal partitioning]] splits one or more tables by row, usually within a ''single'' instance of a [[Database schema|schema]] and a database server. It may offer an advantage by reducing index size (and thus search effort), provided there is an obvious, robust, and implicit way to identify in which partition a particular row will be found, without having to first search the index; for example, the classic case of the '<code>CustomersEast</code>' and '<code>CustomersWest</code>' tables, where a [[ZIP code]] already indicates where a row will be found.

Sharding extends this approach. It partitions the relevant table or tables in the same way, but does so across potentially ''multiple'' instances of the schema. An advantage is that the search load for the large partitioned table can be distributed across multiple servers (logical or physical), rather than only across multiple indexes on a same logical server.

Splitting shards across multiple isolated instances requires more than simple horizontal partitioning. The expected gains in efficiency would be reduced if querying the database required ''multiple'' instances to be accessed just to retrieve a simple [[dimension table]]. Beyond partitioning, sharding therefore involves distributing large, partitionable tables across servers, while smaller tables are replicated in full on each server.<ref>{{Cite web |title=Database Sharding: Concepts & Examples |url=https://www.mongodb.com/resources/products/capabilities/database-sharding-explained |access-date=2026-03-20 |website=MongoDB |language=en-us}}</ref>

This is also why sharding is related to a [[shared-nothing architecture]]—once sharded, each shard can reside in a separate logical schema instance, physical database server, [[data center]], or geographic region. Sharding is intended to minimize the need for cross-shard access by partitioning data across independent shards.<ref>{{cite web|title=Understanding Database Sharding|url=https://www.digitalocean.com/community/tutorials/understanding-database-sharding|website=DigitalOcean Community Tutorials|date=2022-03-16|access-date=2025-10-09|quote=Database shards exemplify a shared-nothing architecture. This means that the shards are autonomous; they don't share any of the same data or resources.}}</ref>

This makes replication across multiple servers easier (simple horizontal partitioning does not). It is also useful for worldwide distribution of applications, where communications links between data centers might otherwise become a bottleneck.<ref>{{Cite web |title=A Guide To Horizontal Vs Vertical Scaling |url=https://www.mongodb.com/resources/basics/horizontal-vs-vertical-scaling |access-date=2026-03-20 |website=MongoDB |language=en-us}}</ref>

There is also a requirement for some notification and replication mechanism between schema instances, so that unpartitioned tables remain as closely synchronized as the application requires. This is a complex architectural choice in sharded systems: approaches range from making these tables effectively read-only (with updates that are rare and batched), to dynamically [[replication (computer science)|replicated]] tables (at the cost of reducing some of the distribution benefits of sharding), and many options in between.<ref>{{Cite web |title=Sharding - Database Manual - MongoDB Docs |url=https://www.mongodb.com/docs/manual/sharding/ |access-date=2026-03-20 |website=www.mongodb.com |language=en}}</ref>

== Implementations == * [[Altibase]] provides a combined (client-side and server-side) sharding architecture transparent to client applications. * [[Apache HBase]] supports automatic sharding.<ref>{{Cite web|url=http://hbase.apache.org/|title=Apache HBase – Apache HBase™ Home|website=hbase.apache.org}}</ref> * [https://azure.microsoft.com/en-us/products/category/databases Azure SQL Database Elastic Database] tools support support sharding to enable scaling out and in of an application’s data tier.<ref>{{Cite web|url=https://azure.microsoft.com/en-us/blog/introducing-elastic-scale-preview-for-azure-sql-database/|title=Introducing Elastic Scale preview for Azure SQL Database|website=azure.microsoft.com|date=2 October 2014 }}</ref> * [[ClickHouse]], an open-source OLAP database management system, supports sharding. * [[Couchbase]] supports automatic and transparent sharding. * [[CUBRID]] has supported sharding since version 9.0. * [https://help.sap.com/docs/DB6/4c49a344277943ad91358094fdaf9765/c289a552d161224fe10000000a445394.html Db2 Data Partitioning Feature (MPP)], a shared-nothing database partitioning feature, runs on separate nodes. * [https://www.alibabacloud.com/blog/what-are-the-differences-between-polardb-x-and-drds_601253 DRDS (Distributed Relational Database Service)] of [[Alibaba Cloud]] supports database and table sharding,<ref>{{Cite web|url=https://www.alibabacloud.com/help/doc-detail/29659.htm?spm=a2c63.l28256.a3.1.4eb21d9a8lUMTW|title=Alibaba Cloud Help Center - Cloud Definition and Explanation of Cloud Based Services - Alibaba Cloud|website=www.alibabacloud.com}}</ref> and has been used for large-scale events such as [[Singles' Day]].<ref>{{Cite web|url=https://www.alibabacloud.com/product/drds|title=Focuses on Large-Scale Online Databases - Alibaba Cloud|website=www.alibabacloud.com}}</ref> * [[Elasticsearch]], an enterprise search server, supports sharding.<ref name="Elasticsearch Shard" >{{Cite web|url=https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-allocation.html|title=Index Shard Allocation | Elasticsearch Guide [7.13] | Elastic|website=www.elastic.co}}</ref> * [[IBM WebSphere eXtreme Scale|eXtreme Scale]] is a cross-process in-memory key/value data store (a [[NoSQL]] data store) that uses sharding to achieve scalability across processes for both data and [[MapReduce]]-style parallel processing.<ref>{{Cite web|url=http://publib.boulder.ibm.com/infocenter/wxsinfo/v7r1/index.jsp?topic=%2Fcom.ibm.websphere.extremescale.over.doc%2|title=IBM Docs}}</ref> * [[Hibernate (Java)|Hibernate]] supports sharding, but has seen little development since 2007.<ref name="Hibernate Shards" >{{cite web |title=Hibernate Shards |url=http://shards.hibernate.org/ |date=2007-02-08 }}</ref><ref name="Hibernate Shards documentation" >{{Cite web|url=http://www.hibernate.org/hib_docs/shards/reference/en/html/|title=Hibernate Shards|access-date=2011-03-30|archive-date=2008-12-16|archive-url=https://web.archive.org/web/20081216005922/http://www.hibernate.org/hib_docs/shards/reference/en/html/|url-status=dead}}</ref> * [[Informix|IBM Informix]] has supported sharding since version 12.1 xC1 as part of the MACH11 technology. Informix 12.10 xC2 added full compatibility with MongoDB drivers, allowing a mix of regular relational tables and NoSQL collections while retaining sharding, fail-over, and ACID properties.<ref name="Informix Grid Queries" >{{cite web |title=New Grid queries for Informix |url=http://ibmdatamag.com/2013/04/informix-12-10-new-grid-queries/ |access-date=2013-10-07 |archive-date=2015-06-10 |archive-url=https://web.archive.org/web/20150610221958/http://ibmdatamag.com/2013/04/informix-12-10-new-grid-queries/ |url-status=dead }}</ref><ref name="NoSQL support in Informix" >{{Cite web|url=https://fr.slideshare.net/journalofinformix/informix-no-sql-sept-2013|title=NoSQL support in Informix (JSON storage, Mongo DB API)|date=September 24, 2013}}</ref> * [[Kdb+]] has supported sharding since version 2.0. * [[MariaDB]] Spider, a storage engine, supports table federation, sharding, XA transactions, and ODBC data sources. It has been included in MariaDB server since version 10.0.4.<ref>{{Cite web |title=Spider |url=https://mariadb.com/kb/en/spider/ |access-date=2022-12-20 |website=MariaDB KnowledgeBase}}</ref> * [[MonetDB]], an open-source [[Column-oriented DBMS|column-store]], introduced read-only sharding in its July 2015 release.<ref>{{cite web | url=https://www.monetdb.org/blog/monetdb-jul2015-released | title= MonetDB July2015 Released | date=31 August 2015}}</ref> * [[MongoDB]] has supported sharding since version 1.6.<ref>{{Cite web |title=MongoDB Sharding |url=https://www.mongodb.com/resources/products/capabilities/sharding |access-date=2026-03-20 |website=MongoDB |language=en-us}}</ref> * [[MySQL Cluster]] supports automatic and transparent sharding across commodity nodes, allowing scaling of read and write queries, without requiring application changes.<ref name="MySQL Cluster Features & Benefits" >{{cite web |title=MySQL Cluster Features & Benefits |url=http://www.mysql.com/products/cluster/features.html |date=2012-11-23 }}</ref> * [[MySQL]] Fabric (part of MySQL utilities) supports sharding.<ref name="MySQL Fabric sharding quick start guide">{{Cite web|url=http://dev.mysql.com/doc/mysql-utilities/1.5/en/fabric-quick-start-sharding.html|title=MySQL Fabric sharding quick start guide}}</ref> * [[Oracle Database]] shards since 12c Release 2 and in one liner: Combination of sharding advantages with well-known capabilities of enterprise ready multi-model Oracle Database.<ref name="Oracle 2018">{{cite web |title=Oracle Sharding |website=Oracle |date=2018-05-24 |url=https://www.oracle.com/database/technologies/high-availability/sharding.html |access-date=2021-07-10}}</ref> * [[Oracle NoSQL Database]] supports automatic sharding and elastic, online expansion of clusters. * [[OrientDB]] has supported sharding since version 1.7. * [[Solr]], an enterprise search platform, supports sharding.<ref name="SorlShard" >{{Cite web|url=https://cwiki.apache.org/confluence/display/solr/DistributedSearch|title=DistributedSearch - SOLR - Apache Software Foundation|website=cwiki.apache.org}}</ref> * [[ScyllaDB]] uses per-core sharding within a server and across all nodes in a cluster. * [[Spanner (database)|Spanner]], a distributed database developed by Google, shards across multiple [[Paxos (computer science)|Paxos]] state machines to scale to large numbers of machines, data centers, and rows.<ref name="Spanner" >{{cite web | first1 = James C | last1 = Corbett | first2 = Jeffrey | last2 = Dean | first3 = Michael | last3 = Epstein | first4 = Andrew | last4 = Fikes | first5 = Christopher | last5 = Frost | first6 = JJ | last6 = Furman | first7 = Sanjay | last7 = Ghemawat | first8 = Andrey | last8 = Gubarev | first9 = Christopher | last9 = Heiser | first10 = Peter | last10 = Hochschild | first11 = Wilson | last11 = Hsieh | first12 = Sebastian | last12 = Kanthak | first13 = Eugene | last13 = Kogan | first14 = Hongyi | last14 = Li | first15 = Alexander | last15 = Lloyd | first16 = Sergey | last16 = Melnik | first17 = David | last17 = Mwaura | first18 = David | last18 = Nagle | first19 = Sean | last19 = Quinlan | first20 = Rajesh | last20 = Rao | first21 = Lindsay | last21 = Rolig | first22 = Yasushi | last22 = Saito | first23 = Michal | last23 = Szymaniak | first24 = Christopher | last24 = Taylor | first25 = Ruth | last25 = Wang | first26 = Dale | last26 = Woodford | title = Spanner: Google's Globally-Distributed Database | url = http://research.google.com/archive/spanner-osdi2012.pdf | work = Proceedings of OSDI 2012 | access-date =24 February 2014}}</ref> * [[SQLAlchemy|SQLAlchemy ORM]], a data-mapper for the [[Python (programming language)|Python programming language]] shards.<ref name="SQLAlchemy" >{{Cite web|url=https://github.com/sqlalchemy/sqlalchemy|title=sqlalchemy/sqlalchemy|date=July 9, 2021|via=GitHub}}</ref> * [[SQL Server (disambiguation)|SQL Server]] has supported sharding since SQL Server 2005 with the use of 3rd party tools.<ref name="SQLServer" >{{Cite web|url=https://www.infoq.com/news/2011/02/SQL-Sharding/|title=Partitioning and Sharding Options for SQL Server and SQL Azure|website=infoq.com}}</ref> * [[Teradata]] markets a massive parallel database management system as a [[data warehouse]]. *[https://www.ndss-symposium.org/ndss-paper/vault-fast-bootstrapping-for-the-algorand-cryptocurrency/ Vault], a [[cryptocurrency]] design, uses sharding to reduce the data required to join the network and verify transactions, improving scalability.<ref>{{Cite web|url=https://news.mit.edu/2019/vault-faster-more-efficient-cryptocurrency-0124|title=A faster, more efficient cryptocurrency|website=MIT News|date=24 January 2019 |access-date=2019-01-30}}</ref> *[[Cloud Native Computing Foundation#Vitess|Vitess]], an open-source database clustering system, supports sharding for MySQL and is a [[Cloud Native Computing Foundation]] project.<ref name="Vitess">{{Cite web|url=https://vitess.io/|title=Vitess|website=vitess.io}}</ref> *[https://shardingsphere.apache.org ShardingSphere] is a database clustering system that provides data sharding, distributed transactions, and distributed database management, and is an [[Apache Software Foundation]] (ASF) project.<ref name="ShardingSphere">{{Cite web|url=https://shardingsphere.apache.org/|title=ShardingSphere|website=shardingsphere.apache.org}}</ref>

== Disadvantages == Sharding a database table before it has been optimized locally can introduce unnecessary complexity. Sharding is generally recommended when other optimization strategies have proven insufficient.<ref>{{Cite book |last=Kleppmann |first=Martin |title=Designing Data-Intensive Applications |year=2017 |isbn=9781449373320}}</ref> The added complexity of database sharding can result in several potential challenges.<ref>{{Cite web |title=Database Sharding: Concepts & Examples |url=https://www.mongodb.com/resources/products/capabilities/database-sharding-explained |access-date=2026-03-20 |website=MongoDB |language=en-us}}</ref> * SQL complexity: Developers may need to write more complex SQL queries to handle sharding logic * Additional software requirements: Software that partitions, balances, coordinates, and maintains data integrity can fail or introduce errors. * [[Single point of failure]]: Corruption or failure of one shard due to network, hardware, or system issues can affect the integrity of the entire dataset. * [[Fail-over]] server complexity: Fail-over servers must maintain copies of all database shards. * [[Backup]]s complexity: Database backups of the individual shards must be coordinated with the backups of the other shards. * Operational complexity: Tasks such as adding or removing indexes, modifying columns, or altering the schema become more difficult in a sharded environment.

== Etymology == In a database context, the term "shard" is believed to derive from one of two sources: [[Computer Corporation of America]]'s "A System for Highly Available Replicated Data,"<ref>Sarin, DeWitt & Rosenberg, ''Overview of SHARD: A System for Highly Available Replicated Data'', Technical Report CCA-88-01, Computer Corporation of America, May 1988</ref> which used redundant hardware to facilitate data replication rather than horizontal partitioning, or the 1997 [[MMORPG]] ''[[Ultima Online]]''.<ref name="koster">{{cite web|url=http://www.raphkoster.com/2009/01/08/database-sharding-came-from-uo/|title=Database "sharding" came from UO?|last1=Koster|first1=Raph|date=2009-01-08|website=Raph Koster's Website|access-date=2015-01-17}}</ref><ref name="garriott" />

[[Richard Garriott]], creator of ''Ultima Online'', recalled that the term originated during the production of the game, specifically in creating a self-regulating, virtual ecology system. Players were able to interact and harvest in-game resources via the internet, which disrupted the balance of the system.<ref name="garriott">{{Cite web|url=https://www.youtube.com/watch?v=KFNxJVTJleE |title=Ultima Online: The Virtual Ecology | War Stories|website=Ars Technica Videos|date=21 December 2017 }}</ref> To address this, the development team separated the global player base into multiple sessions and introduced part of ''Ultima Online''{{'s}} fictional connection to the end of ''[[Ultima I: The First Age of Darkness]]'', where the defeat of its antagonist [[Mondain]] also led to the creation of [[multiverse]] "shards." This modification provided Garriott's team with the fictional basis needed to justify creating copies of the virtual environment. The feature was later removed after several months of testing.<ref name="garriott" />

== See also == * [[Block Range Index]] * [[Shared-nothing architecture]]

== Notes == {{Notelist}}

== References == {{Reflist}}

== External links == * [https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.json.doc/ids_json_011.htm Informix JSON data sharding]

[[Category:Data partitioning]] [[Category:Database management systems]] [[Category:Software design patterns]]

[[de:Denormalisierung#Fragmentierung]]