Database Sharding Manager
Overview
Implement and manage horizontal database sharding strategies across PostgreSQL, MySQL, and MongoDB. This skill covers shard key selection, data distribution analysis, cross-shard query routing, and rebalancing operations for databases that have outgrown single-node capacity.
Prerequisites
- Database admin credentials with CREATE DATABASE, CREATE TABLE, and replication permissions
psql, mysql, or mongosh CLI tools installed and configured
- Network connectivity between all shard nodes
- Current table sizes and growth rate data (query
pgtotalrelationsize or informationschema.TABLES)
- Application query patterns documented or access to slow query logs
- Enough disk and memory on target shard nodes to handle redistributed data
Instructions
- Analyze the current database size and identify tables exceeding single-node capacity thresholds (typically >500GB or >1B rows). Run
SELECT pgsizepretty(pgtotalrelationsize('tablename')) for PostgreSQL or SELECT datalength + indexlength FROM information_schema.TABLES for MySQL.
- Evaluate candidate shard keys by examining query WHERE clauses, JOIN patterns, and data distribution. A good shard key has high cardinality, even distribution, and appears in most queries. Run
SELECT shardkeycolumn, COUNT() FROM table GROUP BY shardkeycolumn ORDER BY COUNT() DESC LIMIT 20 to check distribution.
- Choose a sharding strategy based on workload patterns:
- Hash-based: Even distribution, best for key-value lookups. Use
hash(shardkey) % numshards.
- Range-based: Good for time-series or sequential data. Partition by date ranges or ID ranges.
- Directory-based: Maximum flexibility with a lookup table mapping keys to shards.
- Geographic: Route by region for data residency or latency requirements.
- Design the shard topology by determining the number of shards, replication factor, and placement. For PostgreSQL, use Citus extension or manual foreign data wrappers. For MySQL, configure vitess or ProxySQL routing. For MongoDB, enable sharding on the cluster with
sh.enableSharding() and sh.shardCollection().
- Create the shard schema on all target nodes, ensuring identical table definitions, indexes, and constraints across every shard. Generate DDL scripts and verify with checksums.
- Implement the routing layer that directs queries to the correct shard. This can be application-level (connection selection based on shard key), middleware (ProxySQL, PgBouncer with routing), or