Caching Strategies
- system-design
- backend
- caching
- architecture
Caching is one of the highest impact optimizations you can make in any system. Cache is the short-term memory that makes everything fast.
The Story: The Sticky Note on Your Desk
You work in a large office. Every time you need a colleague’s phone number, you walk to HR, wait, find the file, and come back — 5 minutes. After repeating this a few times, you write it on a sticky note.
👉 That sticky note is your cache.
Cache = storing the result of an expensive operation nearby so you don’t repeat it
Why Caching Exists
Every database read has a cost:
- Disk I/O
- Network latency
- Query execution
Without cache:
[User] → [App] → [DB] → [App] → [User] (~10–100ms)
With cache:
[User] → [App] → [Cache] → [App] → [User] (<1ms)
👉 Massive latency reduction
👉 Massive database load reduction
The Three Laws of Caching
- Cache hit = fast, miss = expensive
- Cache stores only hot data
- Invalidation is the hardest problem
Cache Hit Rate — The Most Important Metric
Cache hit rate = hits / (hits + misses)
| Hit Rate | Meaning |
|---|---|
| > 99% | Excellent |
| 90–99% | Good |
| 70–90% | Needs improvement |
| < 70% | Cache is ineffective |
👉 Even 1% miss at scale = huge DB load
Cache Layers
Browser Cache ← HTML/CSS/JS, images (no server needed)
↓
CDN Cache ← Static assets and API responses at edge
↓
Load Balancer Cache ← Simple request deduplication
↓
Application Cache ← In-process memory (HashMap, LRU cache)
↓
Distributed Cache ← Redis/Memcached (shared across app servers)
↓
Database Buffer Pool ← DB caches its own pages in RAM
↓
Disk Cache (OS) ← OS caches disk reads in memory
In-process vs Distributed Cache
| In-process (local) | Distributed (Redis) | |
|---|---|---|
| Speed | Fastest (nanoseconds) | Fast (microseconds, network) |
| Shared? | No — each server has its own | Yes — all servers share one cache |
| Survives restart? | No | Yes (with persistence) |
| Memory limit | Single server’s RAM | Clustered RAM (terabytes possible) |
| Use when | Static data, tiny datasets | Session data, shared state, horizontal scaling |
Rule:
- Single server →
in-processcache (e.g., LRU cache in app memory) - Multiple servers → use Redis (because
in-processcache creates inconsistency)
Caching Strategies
Cache-Aside (Lazy Loading) — The Most Common
Story: You (the app) check your sticky note first. If it’s there, done. If not, you go to the records room (DB), get the data, and write it on a new sticky note for next time.
READ:
1. App checks cache for key
2. HIT → return cached value ✓
MISS → query DB → store result in cache → return result
WRITE:
1. Update DB
2. Invalidate (delete) the cache key ← next read will repopulate
def get_user(user_id):
# 1. Check cache
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# 2. Cache miss — hit DB
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
# 3. Populate cache with TTL
redis.setex(f"user:{user_id}", 3600, json.dumps(user))
return user
def update_user(user_id, data):
db.update("UPDATE users SET ... WHERE id = ?", user_id, data)
redis.delete(f"user:{user_id}") # Invalidate cache
Pros:
- Only requested data gets cached (no wasted memory)
- Cache failures don’t break the app — just slower
Cons:
- First request always slow (cache cold start)
- Potential for stale data between write and invalidation
Use when: General-purpose, read-heavy workloads. Default choice.
Read-Through — The Cache Manages Itself
Story: You ask the cache for data. The cache itself goes to the DB on a miss — you never talk to the DB directly.
[App] → [Cache] → (hit) → returns data
↓ (miss)
[DB]
↓
[Cache] populates and returns
Difference from cache-aside: The cache library/service handles the miss logic, not your application code.
Tools: Some Redis client libraries support this. Managed services like DAX (DynamoDB Accelerator).
Pros:
- Cleaner application code
- Cache miss handling is abstracted
Cons:
- First request is always slow
- Less control over what gets cached
Write-Through — Always Stay In Sync
Story: Every time you update a record, you update BOTH the DB and the cache simultaneously. The cache is never out of date — for long.
WRITE:
1. App writes to CACHE first
2. Cache synchronously writes to DB
3. Returns success only after both writes complete
READ:
Always hits cache → always fresh data
Pros:
- Cache is always consistent with DB
- Reads are always fast (no cold start problem)
Cons:
- Every write is slower (two writes instead of one)
- Cache fills with data that may never be read (write-once, never-read data wastes memory)
Use when: Read-heavy systems where stale data is unacceptable.
Example: user’s own profile page.
Write-Behind (Write-Back) — High-Speed Writes
Story: You scribble on the sticky note instantly. At the end of the day, someone updates the official records. Your writes are fast, but there’s a delay before the official record is updated.
WRITE:
1. App writes to cache → returns SUCCESS immediately
2. Cache asynchronously writes to DB (buffered, batched)
READ:
From cache → always fast
Pros:
- Extremely fast writes (no DB latency for the user)
- Batch DB writes = fewer DB round-trips
Cons:
- Risk of data loss if cache crashes before async write completes
- Complex recovery logic needed
Use when: High-throughput write scenarios where occasional data loss is tolerable.
Example: social media like counts, view counters, gaming leaderboards.
Refresh-Ahead — The Proactive Cache
Story: Before your sticky note expires, someone proactively fetches the fresh data so you never experience a cold miss.
Cache detects that key "user:42" TTL expires in 30s
→ Proactively fetches fresh data from DB
→ Repopulates before TTL expires
→ User never sees a cache miss
Pros: No latency spikes from cold misses on popular keys
Cons: May refresh data that’s no longer needed (wasted DB calls)
Use when: Highly predictable access patterns (dashboards, popular product pages)
Which Strategy Should You Use?
- Default: Cache-aside
- Strict consistency: Write-through
- High write load: Write-behind
- Predictable reads: Refresh-ahead
👉 Most real systems = Cache-aside + TTL + invalidation
Cache Invalidation: The Hardest Problem
“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton
Invalidation = figuring out when cached data has become stale and needs to be removed/updated.
TTL (Time to Live)
Assign an expiry to every cache entry. After TTL, the key expires and the next read goes to DB.
redis.setex("product:1001", 3600, data) # expires in 1 hour
| TTL too short | TTL too long |
|---|---|
| Many cache misses → DB load spikes | Stale data served to users |
Choosing TTL:
- User sessions: 24–72 hours
- Product catalog: 10–60 minutes
- Live sports scores: 10–30 seconds
- User’s own profile: 5 minutes or event-driven invalidation
Event-Driven Invalidation
Delete the cache key the moment underlying data changes.
# When order status changes:
def update_order_status(order_id, new_status):
db.update("UPDATE orders SET status=? WHERE id=?", new_status, order_id)
redis.delete(f"order:{order_id}") # direct key
redis.delete(f"user_orders:{order.user_id}") # related collection
Pro: Cache is never stale
Con: You must know all cache keys affected by every write — this gets complex
Versioned Cache Keys
Instead of invalidating, use a new key. Old key becomes orphaned and expires naturally.
# Store version in DB or separate Redis key
version = redis.get("user:42:version") or 1
cache_key = f"user:42:v{version}"
# On update: increment version
def update_user(user_id):
db.update(...)
redis.incr(f"user:{user_id}:version")
# Old versioned key will expire via TTL
Pro: Simple, atomic, no cache stampede
Con: Old keys waste memory until TTL expires
Cache Eviction Policies
When the cache is full, what gets kicked out?
| Policy | How it works | Best for |
|---|---|---|
| LRU (Least Recently Used) | Evict the key not accessed for the longest time | General purpose — default choice |
| LFU (Least Frequently Used) | Evict the key accessed the fewest times | Long-term hot data retention |
| FIFO (First In, First Out) | Evict oldest-inserted key | Simple queues |
| Random | Evict a random key | Low overhead, unpredictable but cheap |
| TTL-based | Evict expired keys first | When TTLs are well-calibrated |
Redis eviction policies: allkeys-lru (most common), volatile-lru (only keys with TTL), allkeys-lfu, noeviction (errors on full cache)
Cache Anti-Patterns
The Cache Stampede (Thundering Herd)
Problem: Popular key expires. Simultaneously, 1,000 requests miss the cache, all query the DB at the same time. DB melts.
T=3600s: "product:1001" TTL expires
T=3600s + 1ms: 1000 concurrent requests all get MISS
1000 DB queries fire simultaneously
DB falls over
Solutions:
- Mutex lock: First miss acquires a lock, fetches from DB, populates cache. Others wait.
lock = redis.set("lock:product:1001", 1, nx=True, ex=5) # 5s lock
if lock:
data = db.fetch(...)
redis.set("product:1001", data, ex=3600)
redis.delete("lock:product:1001")
else:
time.sleep(0.05)
return get_from_cache("product:1001") # retry
-
Probabilistic early expiration: Before TTL hits, probabilistically refresh. High-traffic keys refresh earlier.
-
Stale-while-revalidate: Serve the stale value while asynchronously refreshing it.
Cache Penetration — The Ghost Key Attack
Problem: Attacker (or bug) queries keys that will never exist (e.g., user:-1, product:99999999). Every request misses cache and hits DB.
Solution: Cache null values
result = db.query(user_id)
if result is None:
redis.setex(f"user:{user_id}", 60, "NULL") # cache the miss too
return None
Or use a Bloom filter — a probabilistic structure that tells you “definitely not in DB” before even querying.
Cache Avalanche
Problem: Many keys expire at the same time (e.g., cache seeded in bulk → all expire in 1 hour). Massive DB spike.
Solution: Add jitter to TTLs.
import random
ttl = 3600 + random.randint(-300, 300) # 3600s ± 5 minutes
redis.setex(key, ttl, value)
Redis: The Industry Standard
Redis (Remote Dictionary Server) is not just a cache — it’s an in-memory data structure store.
Data structures
| Type | Commands | Use case |
|---|---|---|
| String | GET, SET, INCR, EXPIRE | Cache, counters, rate limiting |
| Hash | HGET, HSET, HMGET | User objects, shopping carts |
| List | LPUSH, RPUSH, LRANGE | Queues, activity feeds |
| Set | SADD, SMEMBERS, SINTER | Unique visitors, tags |
| Sorted Set | ZADD, ZRANGE, ZRANGEBYSCORE | Leaderboards, priority queues |
| Pub/Sub | PUBLISH, SUBSCRIBE | Real-time messaging |
| Streams | XADD, XREAD | Event logs, Kafka-lite |
Redis for rate limiting
def is_rate_limited(user_id, limit=100, window=60):
key = f"rate:{user_id}"
count = redis.incr(key)
if count == 1:
redis.expire(key, window) # set expiry on first request
return count > limit
Redis Cluster
Horizontal scaling for Redis: data automatically sharded across nodes using consistent hashing (16,384 hash slots). Supports replica nodes per shard for HA.
Caching in Practice: A Real Example
Scenario: E-commerce product page receiving 50,000 requests/minute.
Product page request flow:
1. Browser checks its own cache (Cache-Control header)
HIT → serve from browser in 0ms
2. CDN edge (Cloudflare) checks its cache
HIT → serve from CDN in 5ms
3. App server checks Redis
HIT → return in 1ms
MISS
↓
4. Query PostgreSQL with read replica
→ takes 15–80ms
5. Store in Redis with TTL=300s (5 min)
6. Return response + set Cache-Control header for CDN/browser
Cache strategy per data type:
| Data | Cache location | TTL | Invalidation |
|---|---|---|---|
| Product details | Redis + CDN | 5–30 min | On product update |
| User session | Redis | 24 hours | On logout |
| Homepage recommendations | Redis | 10 min | TTL only |
| User’s own cart | Redis | 72 hours | On cart update |
| Static assets (JS/CSS) | CDN | 1 year (versioned URL) | Deploy new version with new URL |
How would you add caching to a system?
Step 1: Identify the hot path.
“What are the most frequently accessed data pieces? Product listings, user sessions, search results?”
Step 2: Choose strategy.
“I’d use cache-aside with Redis. The app checks Redis first; on miss, queries the DB and populates Redis with a 5-minute TTL.”
Step 3: Address invalidation.
“On product update, we delete the Redis key. On next read, fresh data populates the cache.”
Step 4: Address failure.
“If Redis goes down, the app falls through to the DB — degraded performance but not an outage. Redis persistence is configured so warm restart restores the cache quickly.”
Step 5: Address stampede.
“For very high traffic keys, we use a mutex lock on cache miss to prevent thundering herd.”
Flashcards
Q: What is cache-aside (lazy loading)?
App checks cache first; on miss, fetches from DB and populates cache. Most common pattern.
Q: What is write-through caching?
Every write goes to both cache and DB synchronously. Cache is always consistent; writes are slower.
Q: What is write-behind (write-back) caching?
Write to cache immediately; async write to DB later. Fastest writes, risk of data loss.
Q: What is a cache stampede?
When a popular cached key expires and many requests simultaneously miss the cache and overload the DB.
Q: What is cache penetration?
Requests for keys that don’t exist in DB bypass the cache repeatedly. Solution: cache null values or use Bloom filter.
Q: What is LRU eviction?
When cache is full, evict the key not accessed for the longest time. Default choice for most systems.
Q: What is cache avalanche?
Many cache keys expire simultaneously, causing a traffic spike to the DB. Solution: add random jitter to TTLs.
Series · System Design
Part 3 of 4 · Apr 2026