Database Sharding: Horizontal Partitioning Guide
After this topic, you will be able to:
- Compare horizontal vs vertical sharding strategies and their use cases
- Evaluate shard key selection criteria for balanced data distribution
- Assess consistent hashing benefits for dynamic shard rebalancing
TL;DR
Sharding horizontally partitions data across multiple database instances, with each shard holding a subset of the total dataset. Unlike replication where every node has all data, sharding distributes data so no single database becomes a bottleneck. This enables linear scalability for both storage and throughput, making it the primary strategy for systems that have outgrown vertical scaling limits.
Cheat Sheet: Horizontal sharding splits rows across databases using a shard key (user_id % N, hash-based, range-based). Vertical sharding splits tables by domain (users DB, orders DB). Consistent hashing enables dynamic resharding without full data migration. Cross-shard queries require application-level joins or scatter-gather patterns.
The Problem It Solves
When your database reaches the limits of vertical scaling—maxed-out CPU, memory, or IOPS on the largest available instance—you face a fundamental constraint: a single database server can only handle so much load. Instagram hit this wall at 25 million users when their primary PostgreSQL instance couldn’t keep up with write traffic despite aggressive caching and read replicas. Adding more read replicas doesn’t help write bottlenecks, and eventually even read traffic overwhelms the replication lag window.
The core problem is that traditional databases are designed around single-node ACID guarantees, which creates an architectural ceiling. You can’t simply add another database and expect your application to magically distribute load—someone needs to decide which data lives where, how to route queries, and what happens when you need to add capacity. Without sharding, your only options are to denormalize aggressively (sacrificing data integrity), cache everything (creating consistency nightmares), or accept that your system has a hard growth limit. For companies experiencing hypergrowth, this isn’t acceptable.
Sharding solves this by breaking the single-database assumption. Instead of one massive database doing everything, you distribute data across multiple independent databases (shards), each responsible for a subset of your data. This transforms your scaling problem from “how do I make this database bigger?” to “how do I add more databases?”—a problem with a clear solution path.
Solution Overview
Sharding partitions your dataset across multiple database instances using a deterministic routing strategy. Each shard is a fully functional database that stores only a portion of the total data, typically determined by a shard key—a field in your data model that determines which shard owns each record.
The fundamental insight is that most queries in real-world applications are scoped to a specific entity (a user, a tenant, a geographic region). If you partition data by that entity, most queries hit only one shard, preserving the performance characteristics of a single database while distributing the total load. A user viewing their Instagram feed only needs to query the shard containing their data, not all 1000+ shards in Instagram’s infrastructure.
Sharding operates at the application or middleware layer, not within the database itself. Your application code (or a proxy like Vitess or Citus) calculates which shard to query based on the shard key, routes the request to the appropriate database, and returns results. This means sharding is transparent to the database engine—each shard is just a normal PostgreSQL or MySQL instance that happens to store a subset of data. The complexity lives in the routing logic, shard key selection, and handling edge cases like cross-shard queries and resharding operations.
Unlike federation (see Federation for functional partitioning), which splits databases by business domain, sharding splits within a domain by data volume. You might have a single “users” table that’s sharded across 100 databases, with each database storing 1% of users.
Sharding Architecture: Hash-Based Distribution
graph LR
Client["Client Application"]
Router["Shard Router<br/><i>Calculates: hash(user_id) % 4</i>"]
subgraph Shard 0
DB0[("PostgreSQL<br/>Shard 0<br/><i>user_id % 4 = 0</i>")]
end
subgraph Shard 1
DB1[("PostgreSQL<br/>Shard 1<br/><i>user_id % 4 = 1</i>")]
end
subgraph Shard 2
DB2[("PostgreSQL<br/>Shard 2<br/><i>user_id % 4 = 2</i>")]
end
subgraph Shard 3
DB3[("PostgreSQL<br/>Shard 3<br/><i>user_id % 4 = 3</i>")]
end
Client --"1. Query: user_id=12345"--> Router
Router --"2. Route to Shard 1<br/>(12345 % 4 = 1)"--> DB1
Router -."Other users".-> DB0
Router -."Other users".-> DB2
Router -."Other users".-> DB3
Hash-based sharding distributes data across multiple independent databases using a deterministic function. Each shard stores only records where hash(shard_key) % num_shards equals the shard number, ensuring even distribution and consistent routing.
How It Works
Step 1: Choose a Shard Key
The shard key determines data distribution and query routing. For Instagram, user_id is the natural choice—every photo, like, and comment is associated with a user, so partitioning by user keeps related data together. The shard key must appear in most queries (otherwise you’ll need cross-shard queries) and should distribute data evenly (otherwise you get hot shards).
Step 2: Implement a Sharding Function
The sharding function maps shard keys to physical shards. The simplest approach is modulo sharding: shard_id = hash(user_id) % num_shards. If you have 100 shards, user_id 12345 might hash to shard 67. This is deterministic—the same user_id always routes to the same shard—but inflexible. Adding shards requires rehashing most data.
Consistent hashing solves this by mapping both shard keys and shards onto a hash ring (0 to 2^32). Each shard owns a range of the ring, and a key is assigned to the first shard clockwise from its hash position. Adding a new shard only requires moving data from adjacent shards, not a full reshuffle. Virtual nodes (multiple ring positions per physical shard) improve distribution balance.
Step 3: Route Queries
When a query arrives, extract the shard key and calculate the target shard. For SELECT * FROM posts WHERE user_id = 12345, hash 12345 to determine shard 67, then execute the query against that shard’s database. This requires application-level routing logic or a sharding proxy. Uber built Schemaless, a middleware layer that handles routing transparently, so application code queries a logical “users” table without knowing about physical shards.
Step 4: Handle Cross-Shard Queries
Queries without a shard key (e.g., SELECT * FROM posts WHERE created_at > '2024-01-01') require scatter-gather: send the query to all shards, collect results, and merge in the application layer. This is expensive—100 shards means 100 database queries—so you optimize by avoiding these queries in hot paths, using secondary indexes (like Elasticsearch) for search, or denormalizing data.
Step 5: Manage Shard Rebalancing
As data grows unevenly, some shards fill up faster than others. Rebalancing moves data between shards to restore balance. This is complex because you can’t afford downtime. Pinterest uses a dual-write strategy during resharding: write to both old and new shards, gradually migrate reads, then cut over once data is synchronized. See the resharding_strategies section for details.
Consistent Hashing Ring with Virtual Nodes
graph TB
subgraph Hash Ring: 0 to 2^32
Ring["<b>Consistent Hash Ring</b><br/>Keys and shards mapped to ring positions"]
subgraph Physical Shard A
VA1["Virtual Node A1<br/><i>Position: 100M</i>"]
VA2["Virtual Node A2<br/><i>Position: 1.2B</i>"]
VA3["Virtual Node A3<br/><i>Position: 3.8B</i>"]
end
subgraph Physical Shard B
VB1["Virtual Node B1<br/><i>Position: 500M</i>"]
VB2["Virtual Node B2<br/><i>Position: 2.1B</i>"]
VB3["Virtual Node B3<br/><i>Position: 3.2B</i>"]
end
subgraph Physical Shard C
VC1["Virtual Node C1<br/><i>Position: 800M</i>"]
VC2["Virtual Node C2<br/><i>Position: 1.9B</i>"]
VC3["Virtual Node C3<br/><i>Position: 2.7B</i>"]
end
Key1["Key: user_id=12345<br/>Hash: 450M"]
Key2["Key: user_id=67890<br/>Hash: 2.5B"]
end
Key1 --"Clockwise to next node"--> VB1
Key2 --"Clockwise to next node"--> VC3
VB1 -."Belongs to".-> Physical Shard B
VC3 -."Belongs to".-> Physical Shard C
Note["<b>Adding Shard D:</b><br/>Only moves data from<br/>adjacent virtual nodes,<br/>not entire resharding"]
Consistent hashing maps both keys and shards onto a ring (0 to 2^32). Each physical shard has multiple virtual nodes for better distribution. Keys are assigned to the first virtual node clockwise from their hash position, minimizing data movement when adding or removing shards.
Cross-Shard Query: Scatter-Gather Pattern
sequenceDiagram
participant Client
participant Router
participant Shard0
participant Shard1
participant Shard2
participant Shard3
Client->>Router: SELECT * FROM posts<br/>WHERE created_at > '2024-01-01'<br/>(No shard key!)
Note over Router: Query lacks shard key<br/>Must query ALL shards
par Scatter to all shards
Router->>Shard0: Execute query
Router->>Shard1: Execute query
Router->>Shard2: Execute query
Router->>Shard3: Execute query
end
par Gather results
Shard0-->>Router: 250 rows
Shard1-->>Router: 180 rows
Shard2-->>Router: 320 rows
Shard3-->>Router: 210 rows
end
Note over Router: Merge & sort results<br/>in application memory
Router-->>Client: 960 total rows<br/>(sorted by created_at)
Note over Client,Shard3: Cost: 4 database queries + application merge<br/>vs 1 query in non-sharded system
Cross-shard queries without a shard key require scatter-gather: the router sends the query to all shards in parallel, collects results, and merges them in application memory. This is 10-100x more expensive than single-shard queries and should be avoided in hot paths.
Resharding Strategies
Resharding—adding or removing shards and redistributing data—is the hardest operational challenge in sharded systems. You can’t simply stop the database, move data, and restart; production systems require zero-downtime migrations.
Dual-Write Migration
The safest approach is to write to both old and new shard configurations simultaneously during migration. When Pinterest added shards to their MySQL cluster, they: (1) Started writing new data to both old and new shards, (2) Backfilled historical data from old to new shards in the background, (3) Verified data consistency between old and new, (4) Switched reads to the new configuration, (5) Stopped writing to old shards and decommissioned them. This took weeks but ensured zero data loss and no downtime.
Shadow Traffic
Before cutting over to a new shard configuration, send a copy of production traffic to the new shards without serving results to users. This validates that the new configuration handles load correctly and catches bugs before they impact users. Uber used shadow traffic extensively when migrating from a single MySQL instance to a sharded architecture, running both systems in parallel for months.
Range-Based Resharding
If you use range-based sharding (e.g., users A-M on shard 1, N-Z on shard 2), resharding is simpler—you can split ranges without moving data from unaffected ranges. Adding a shard for users A-F only requires moving data from the old A-M shard, not touching N-Z. However, range-based sharding often creates hot spots (everyone wants usernames starting with ‘A’), so it’s less common than hash-based sharding.
Virtual Shards
Allocate far more logical shards than physical databases (e.g., 10,000 logical shards mapped to 100 physical databases). Each physical database hosts 100 logical shards. When you add a physical database, move some logical shards to it without changing the shard key mapping. This makes resharding a configuration change rather than a data migration. Instagram uses this approach to rebalance load without rehashing user IDs.
Zero-Downtime Resharding with Dual-Write
graph TB
subgraph Phase 1: Dual Write
App1["Application"]
Old1[("Old Shard Config<br/><i>4 shards</i>")]
New1[("New Shard Config<br/><i>8 shards</i>")]
App1 --"1. Write to both"--> Old1
App1 --"1. Write to both"--> New1
App1 --"2. Read from old"--> Old1
end
subgraph Phase 2: Backfill
Worker["Background Worker"]
Old2[("Old Shards<br/><i>Historical data</i>")]
New2[("New Shards<br/><i>Receiving backfill</i>")]
Worker --"Copy historical data"--> Old2
Worker --"Write to new"--> New2
end
subgraph Phase 3: Validation
Validator["Consistency Checker"]
Old3[("Old Shards")]
New3[("New Shards")]
Validator --"Compare data"--> Old3
Validator --"Compare data"--> New3
Validator --"Verify: 100% match"--> Result["✓ Data Consistent"]
end
subgraph Phase 4: Cutover
App4["Application"]
Old4[("Old Shards<br/><i>Deprecated</i>")]
New4[("New Shards<br/><i>Primary</i>")]
App4 --"3. Switch reads to new"--> New4
App4 -."4. Stop writing to old".-> Old4
end
Phase 1 --> Phase 2
Phase 2 --> Phase 3
Phase 3 --> Phase 4
Note1["Duration: Weeks to months<br/>Zero downtime, zero data loss"]
Zero-downtime resharding uses a four-phase dual-write strategy: (1) write to both old and new configurations, (2) backfill historical data in background, (3) validate consistency, (4) switch reads to new configuration and decommission old shards. This process takes weeks but ensures no data loss or service interruption.
Variants
Hash-Based Sharding
Use a hash function (MD5, SHA-1, or consistent hashing) to map shard keys to shards. This distributes data uniformly and prevents hot spots, but makes range queries impossible (you can’t query “all users with IDs between 1000 and 2000” without hitting all shards). Use when data distribution is more important than range query support. Instagram uses hash-based sharding on user_id because they rarely need user ID range queries.
Range-Based Sharding
Assign each shard a contiguous range of shard key values (shard 1: 1-1M, shard 2: 1M-2M). This enables efficient range queries and makes resharding easier (split ranges without moving unaffected data), but risks hot spots if data isn’t uniformly distributed. Use when range queries are common and you can predict data distribution. Time-series databases often use range-based sharding on timestamps because recent data is queried more frequently.
Geographic Sharding
Partition data by geographic region (US-East shard, EU shard, Asia shard). This reduces latency by keeping data close to users and satisfies data residency regulations (GDPR requires EU user data stay in EU). However, it creates uneven load distribution (US shard might be 10x larger than others) and complicates cross-region features. Use when latency or compliance requirements outweigh operational complexity. Uber shards by city because most rides are local.
Entity-Based Sharding (Vertical Sharding)
Split different tables or table groups into separate databases. Users and profiles go to one database, orders and payments to another. This is technically federation (see Federation), but the term “vertical sharding” is sometimes used. Unlike horizontal sharding, this doesn’t help when a single table grows too large—you still need horizontal sharding within each vertical partition.
Sharding Variants Comparison
graph TB
subgraph Hash-Based Sharding
H_Client["Query: user_id=12345"]
H_Router["hash(12345) % 4 = 1"]
H_S0[("Shard 0")]
H_S1[("Shard 1<br/><b>Target</b>")]
H_S2[("Shard 2")]
H_S3[("Shard 3")]
H_Client --> H_Router
H_Router --> H_S1
H_Router -.-> H_S0
H_Router -.-> H_S2
H_Router -.-> H_S3
H_Note["✓ Even distribution<br/>✗ No range queries<br/>✗ Hard to reshard"]
end
subgraph Range-Based Sharding
R_Client["Query: user_id=1,500,000"]
R_Router["Find range: 1M-2M"]
R_S0[("Shard 0<br/><i>1-1M</i>")]
R_S1[("Shard 1<br/><i>1M-2M</i><br/><b>Target</b>")]
R_S2[("Shard 2<br/><i>2M-3M</i>")]
R_S3[("Shard 3<br/><i>3M-4M</i>")]
R_Client --> R_Router
R_Router --> R_S1
R_Router -.-> R_S0
R_Router -.-> R_S2
R_Router -.-> R_S3
R_Note["✓ Range queries<br/>✓ Easy to reshard<br/>✗ Hot spots"]
end
subgraph Geographic Sharding
G_Client["Query: user in NYC"]
G_Router["Route by location"]
G_US[("US Shard<br/><i>New York</i><br/><b>Target</b>")]
G_EU[("EU Shard<br/><i>London</i>")]
G_ASIA[("Asia Shard<br/><i>Tokyo</i>")]
G_Client --> G_Router
G_Router --> G_US
G_Router -.-> G_EU
G_Router -.-> G_ASIA
G_Note["✓ Low latency<br/>✓ Data residency<br/>✗ Uneven load"]
end
Hash-Based Sharding -."Use case: Social media".-> H_Note
Range-Based Sharding -."Use case: Time-series".-> R_Note
Geographic Sharding -."Use case: Ride-sharing".-> G_Note
Three main sharding variants: Hash-based provides even distribution but no range queries; Range-based enables range queries but risks hot spots; Geographic reduces latency and satisfies compliance but creates uneven load. Choose based on query patterns and business requirements.
Trade-offs
Scalability vs Complexity
Sharding enables near-linear horizontal scalability—Instagram scaled from 1 to 1000+ shards over several years. However, you trade operational simplicity for this scalability. Every operational task (backups, schema migrations, monitoring) now applies to N shards instead of one database. Uber’s database team maintains custom tooling to deploy schema changes across thousands of shards without causing downtime.
Single-Shard Performance vs Cross-Shard Queries
Queries scoped to a single shard perform identically to a non-sharded database. But cross-shard queries (joins across shards, aggregations without a shard key) become application-level operations that are 10-100x slower. Choose your shard key carefully to minimize cross-shard queries. If 90% of queries are single-shard, sharding works well. If 50% require cross-shard operations, consider alternatives.
Consistent Hashing vs Simple Modulo
Consistent hashing minimizes data movement during resharding—adding a shard only affects adjacent ranges on the hash ring. Simple modulo (shard = key % N) is easier to implement but requires rehashing most data when N changes. Use consistent hashing if you expect to add shards frequently (hypergrowth startups). Use modulo if your shard count is stable and you want simpler code.
Shard Key Immutability vs Flexibility
Once chosen, the shard key is nearly impossible to change—it would require migrating all data. Instagram can’t switch from user_id to geographic sharding without a multi-year migration. Choose a shard key that will remain relevant as your product evolves. Avoid business-logic fields that might change (user’s country can change; user_id cannot).
When to Use (and When Not To)
Use Sharding When:
Your database has outgrown vertical scaling and you need to distribute write load across multiple databases. If you’re hitting CPU, memory, or IOPS limits on the largest available instance, and read replicas don’t help (because writes are the bottleneck), sharding is the solution. Instagram sharded at 25M users; Twitter sharded earlier due to write-heavy workloads.
Your data model has a natural partition key that appears in most queries. If 90% of queries include user_id, tenant_id, or another high-cardinality field, sharding on that field keeps most queries single-shard. Multi-tenant SaaS applications are ideal for sharding by tenant_id.
You have engineering resources to build and maintain sharding infrastructure. Sharding isn’t a library you import—it’s a fundamental architectural change requiring custom routing logic, operational tooling, and ongoing maintenance. Startups with 5-person teams should avoid sharding until absolutely necessary.
Avoid Sharding When:
Your dataset fits comfortably on a single large instance. Modern databases can handle terabytes of data and hundreds of thousands of queries per second on a single node. Don’t shard prematurely—vertical scaling plus read replicas (see Replication (RDBMS)) can take you surprisingly far.
Your queries frequently need to join across what would become different shards. If your application requires complex joins across users, orders, and products, and these entities would live on different shards, you’ll spend all your time doing expensive cross-shard queries. Consider denormalization or a different data model first.
You need strong ACID guarantees across all data. Sharding breaks single-database transactions—you can’t have an ACID transaction that spans multiple shards without distributed transaction protocols (which are slow and complex). If your business logic requires atomic updates across entities that would live on different shards, sharding creates more problems than it solves.
Real-World Examples
Instagram (User-Based Sharding)
Instagram shards PostgreSQL by user_id using consistent hashing with 10,000 logical shards mapped to hundreds of physical databases. Each user’s photos, likes, and comments live on the same shard, so viewing a profile requires only one database query. When they need to add capacity, they move logical shards between physical databases without changing the user_id → logical shard mapping. Interesting detail: They use a custom ID generation scheme that embeds the shard ID in the photo ID, so they can route to the correct shard without a lookup table.
Uber (Geographic Sharding)
Uber shards by city because most queries are geographically scoped—riders and drivers in San Francisco don’t interact with those in New York. Each city’s data lives on a dedicated shard (or set of shards for large cities). This reduces latency (shards are in the same region as users) and provides fault isolation (if the New York shard fails, San Francisco keeps working). Cross-city features like viewing ride history while traveling require cross-shard queries, which they handle with a scatter-gather pattern.
Pinterest (Resharding with Dual-Write)
Pinterest started with a single MySQL instance, then sharded by user_id as they grew. When individual shards became overloaded, they resharded by splitting each shard into multiple smaller shards. During resharding, they wrote to both old and new shards simultaneously, backfilled historical data in the background, and gradually migrated reads once data was consistent. The entire process took months and required careful coordination to avoid data loss or inconsistency. They documented this as one of their hardest engineering challenges.
Interview Essentials
Mid-Level
Explain the difference between horizontal and vertical sharding with examples. Describe how a hash-based sharding function works (shard = hash(key) % N) and why it distributes data evenly. Walk through routing a query in a sharded system: extract shard key, calculate shard, execute query. Discuss the challenge of cross-shard queries and why they’re expensive (scatter-gather across all shards). Demonstrate understanding that sharding is a last resort after exhausting vertical scaling and read replicas.
Senior
Compare hash-based vs range-based vs consistent hashing for shard key distribution, including trade-offs (uniform distribution vs range queries vs resharding complexity). Explain shard key selection criteria: high cardinality, appears in most queries, immutable, evenly distributed. Discuss hot shard problems (celebrity users, seasonal data) and mitigation strategies (virtual nodes, splitting hot shards). Describe a resharding strategy with zero downtime (dual-write, shadow traffic, gradual cutover). Explain when NOT to shard (premature optimization, complex joins, strong ACID requirements).
Staff+
Design a complete sharding strategy for a specific system (e.g., social network, e-commerce platform), including shard key selection, number of shards, routing architecture, and growth plan. Discuss the operational implications: schema migration across shards, backup/restore strategies, monitoring and alerting per shard, handling shard failures. Explain how to handle cross-shard transactions (avoid them, use saga pattern, or accept eventual consistency). Compare building custom sharding vs using a sharding-aware database (Vitess, Citus, CockroachDB). Discuss the organizational impact: how sharding affects team structure, on-call burden, and development velocity.
Common Interview Questions
How do you choose a shard key? (Look for: high cardinality, appears in most queries, immutable, even distribution. Red flag: choosing a low-cardinality field like country or a mutable field like user’s current city.)
What happens when you need to add more shards? (Look for: consistent hashing or virtual shards to minimize data movement, dual-write strategy during migration, discussion of downtime vs complexity trade-offs.)
How do you handle queries that don’t include the shard key? (Look for: scatter-gather pattern, secondary indexes in a separate system like Elasticsearch, denormalization to include shard key, or acknowledgment that these queries are expensive and should be avoided in hot paths.)
What’s the difference between sharding and replication? (Look for: sharding distributes different data across nodes for scalability; replication copies the same data for availability and read scaling. They’re complementary—you typically replicate within each shard.)
Red Flags to Avoid
Suggesting sharding as a first solution without considering simpler alternatives (vertical scaling, read replicas, caching)
Choosing a shard key without considering query patterns (e.g., sharding by timestamp when most queries are by user)
Not discussing cross-shard query challenges or assuming joins across shards are easy
Ignoring operational complexity (how do you deploy schema changes? how do you back up 1000 shards?)
Confusing sharding with replication or federation—these are different patterns with different purposes
Key Takeaways
Sharding horizontally partitions data across multiple databases using a shard key, enabling linear scalability for both storage and throughput. Unlike replication (which copies data for availability), sharding distributes different data to different nodes.
Shard key selection is the most critical decision: choose a high-cardinality, immutable field that appears in most queries and distributes data evenly. Poor shard key choices (low cardinality, mutable, uneven distribution) create hot shards and expensive cross-shard queries.
Consistent hashing minimizes data movement during resharding by mapping keys and shards onto a hash ring. Adding a shard only requires moving data from adjacent ranges, not rehashing the entire dataset. Virtual nodes improve load distribution.
Cross-shard queries (joins, aggregations without shard key) require scatter-gather across all shards and are 10-100x slower than single-shard queries. Design your data model and query patterns to minimize cross-shard operations, or use secondary indexes for search.
Sharding is operationally complex and should be a last resort after exhausting vertical scaling, read replicas, and caching. It requires custom routing logic, careful resharding strategies (dual-write, shadow traffic), and significantly increases operational burden (schema migrations, backups, monitoring across N shards).