Database scaling strategies for growing applications
The database is often the first bottleneck as applications grow. According to PlanetScale's Database Report, database-related issues account for 35% of production incidents, making database scaling critical for reliability and growth.
Database scaling challenges
According to Percona's Database Survey, 62% of organizations have experienced database performance issues that impacted their business.
Vertical vs horizontal scaling
Vertical vs Horizontal Scaling
| Feature | Vertical (Scale Up) | Horizontal (Scale Out) |
|---|---|---|
| Implementation Simplicity | ✓ | ✗ |
| Unlimited Scaling | ✗ | ✓ |
| Application Changes Required | ✗ | ✓ |
| Cost Efficiency at Scale | ✗ | ✓ |
| High Availability | ✗ | ✓ |
| Data Locality | ✓ | ✗ |
Start Vertical: For most applications, vertical scaling (bigger machine) is the right first step. It's simpler, requires no application changes, and modern cloud instances are very powerful. Move to horizontal scaling when you hit limits.
Scaling decision tree
Optimize Queries
Index optimization, query rewriting, connection pooling. Often provides 10x improvement.
Vertical Scaling
Upgrade to larger instance. Simple, no code changes, works until hardware limits.
Read Replicas
Scale reads with replicas. Works for read-heavy workloads.
Caching Layer
Add Redis/Memcached for hot data. Reduces database load significantly.
Sharding
Distribute data across multiple databases. Complex but unlimited scale.
Query optimization first
Profile Queries
Identify slow queries with EXPLAIN, query logs
Add Indexes
Create indexes for WHERE, JOIN, ORDER BY clauses
Optimize Joins
Reduce joins, denormalize where appropriate
Limit Data
Select only needed columns, use pagination
Connection Pooling
PgBouncer, ProxySQL to manage connections
Monitor Continuously
Track query performance over time
Typical Performance Improvement by Optimization Type (%)
Read replicas
Primary Database
Handles all writes and critical reads
Replication
Changes replicated to read replicas
Read Replicas
Handle read queries, scale horizontally
Load Balancer
Distribute reads across replicas
Application Logic
Route reads to replicas, writes to primary
Lag Monitoring
Track replication lag, handle stale reads
Caching strategies
Cache-Aside (Lazy Loading)
Application checks cache first, loads from DB on miss, populates cache.
Write-Through
Application writes to cache and DB simultaneously. Cache always current.
Write-Behind
Write to cache immediately, async write to DB. Fast but complex.
Read-Through
Cache handles DB reads transparently. Simpler application code.
Common Caching Use Cases
Cache Invalidation: "There are only two hard things in computer science: cache invalidation and naming things." Design your invalidation strategy carefully—stale data causes subtle bugs.
Database sharding
Sharding Strategies
| Feature | Hash-Based | Range-Based | Directory-Based | Geographic |
|---|---|---|---|---|
| Even Distribution | ✓ | ✗ | ✓ | ✗ |
| Query Efficiency | ✓ | ✓ | ✗ | ✓ |
| Implementation Complexity | ✓ | ✓ | ✗ | ✗ |
| Resharding Ease | ✗ | ✓ | ✓ | ✗ |
| Cross-Shard Queries | ✗ | ✓ | ✗ | ✗ |
| Data Locality | ✗ | ✓ | ✗ | ✓ |
Choose Shard Key
Select key that distributes data evenly
Design Schema
Ensure queries can be routed to single shard
Implement Routing
Build logic to route queries to correct shard
Handle Cross-Shard
Design for queries spanning multiple shards
Plan Resharding
Strategy for adding shards as you grow
Monitor Balance
Track data distribution across shards
Modern database options
Database Popularity Among Developers (%)
Managed PostgreSQL/MySQL
RDS, Cloud SQL, Azure Database. Familiar SQL with managed operations.
NewSQL (CockroachDB, TiDB)
Distributed SQL with horizontal scaling. ACID at scale.
Serverless (PlanetScale, Neon)
Auto-scaling, branching, serverless pricing.
NoSQL (MongoDB, DynamoDB)
Schema flexibility, built-in horizontal scaling.
High availability patterns
Database Availability by Configuration
Performance monitoring
Query Performance
Track slow queries, execution plans, query counts
Connection Metrics
Active connections, wait times, pool utilization
Resource Usage
CPU, memory, disk I/O, storage growth
Replication Lag
Monitor lag between primary and replicas
Lock Contention
Track blocking queries and deadlocks
Index Efficiency
Unused indexes, missing indexes, bloat
FAQ
Q: When should we move from a single database to read replicas? A: When read load significantly exceeds write load (typically 10:1 or higher) and you're seeing CPU or connection limits on the primary. Read replicas are simple to implement and don't require application changes beyond routing.
Q: How do we choose a shard key? A: Choose a key that: distributes data evenly, is present in most queries, rarely changes, and groups related data together. Common choices: customer ID, tenant ID, geographic region.
Q: Should we use a managed database service? A: For most teams, yes. Managed services handle backups, patching, failover, and scaling. Self-managing is only worth it for very large scale or specific requirements.
Q: When does sharding make sense? A: When you've exhausted vertical scaling and read replicas, and your data volume or write load exceeds what a single node can handle. Typically hundreds of millions of rows or thousands of writes per second.
Sources and further reading
- PlanetScale Database Report
- Percona State of Databases
- Designing Data-Intensive Applications
- High Performance MySQL
- PostgreSQL Performance
Scale Your Database: Database scaling requires careful planning and execution. Our team helps organizations design and implement database architectures that scale with their business. Contact us to discuss your database strategy.
Ready to scale your database? Connect with our data engineering experts to develop a tailored scaling strategy.



