No-Caching Anti-Pattern: When Missing Cache Hurts

intermediate 11 min read Updated 2026-02-11

After this topic, you will be able to:

  • Identify opportunities for caching at different system layers
  • Evaluate cache-aside, write-through, and write-behind strategies
  • Recommend appropriate caching strategies based on read/write patterns
  • Calculate cache hit rate impact on system performance

TL;DR

The No Caching antipattern occurs when systems repeatedly fetch or compute the same data without storing results for reuse, forcing every request to hit slow backend resources. This creates unnecessary database load, increases latency, and wastes compute resources. The solution is implementing multi-layer caching from browser to database, choosing appropriate cache strategies (cache-aside, write-through, write-behind) based on read/write patterns, and monitoring cache hit rates to ensure effectiveness.

Cheat Sheet:

  • Problem: Repeated expensive operations (DB queries, API calls, computations) on every request
  • Impact: 10-100x latency increase, database overload, poor scalability
  • Solution: Multi-layer caching (browser → CDN → application → database)
  • Key Metric: Cache hit rate (target 80-95% for most workloads)
  • Common Mistake: Caching everything without considering invalidation complexity

The Problem It Solves

When Dropbox first launched, every file metadata request hit their PostgreSQL database directly. As they scaled to millions of users, database CPU spiked to 90%, queries slowed from 10ms to 500ms, and the system couldn’t handle peak traffic. This is the No Caching antipattern in action: treating every request as unique when most are actually requesting the same data.

The core problem manifests in three ways. First, repeated expensive operations where the same database query runs thousands of times per second, the same API call to a payment processor happens for identical requests, or the same recommendation algorithm recalculates results that haven’t changed. Second, resource exhaustion as databases become bottlenecks under read load, network bandwidth gets consumed by redundant data transfers, and CPU cycles are wasted on duplicate computations. Third, poor user experience with high latency (users waiting 2 seconds for data that could be served in 20ms), inconsistent performance during traffic spikes, and cascading failures when backend services slow down.

The antipattern is particularly insidious because it often works fine in development with low traffic, only revealing itself in production under real load. A system that feels snappy with 10 concurrent users becomes unusable at 1,000 users, not because the architecture is fundamentally broken, but because it’s doing 100x more work than necessary.

Solution Overview

The solution is implementing strategic caching at multiple layers of your system architecture, each serving a specific purpose and time scale. The key insight is that different types of data have different access patterns and freshness requirements, so you need different caching strategies.

At the browser layer, HTTP caching stores static assets (JavaScript, CSS, images) and API responses directly on the user’s device, eliminating network requests entirely. At the CDN layer, edge servers cache content geographically close to users, reducing latency from 200ms to 20ms for global traffic. At the application layer, in-memory caches like Redis or Memcached store frequently accessed data (user sessions, product catalogs, computed results) with microsecond access times. At the database layer, query result caches and materialized views reduce expensive joins and aggregations.

The effectiveness of caching is measured by cache hit rate: the percentage of requests served from cache versus going to the origin. A system with a 95% hit rate means only 5% of requests hit the database. If your database can handle 1,000 queries per second, caching at 95% hit rate lets you serve 20,000 requests per second—a 20x improvement. The math is simple but powerful: effective_capacity = origin_capacity / (1 - hit_rate). At 90% hit rate, you get 10x capacity. At 99% hit rate, you get 100x capacity.

The challenge isn’t just adding caching—it’s choosing the right cache strategy (cache-aside, write-through, write-behind), setting appropriate TTLs (time-to-live), handling cache invalidation when data changes, and monitoring cache effectiveness. A poorly implemented cache can be worse than no cache at all if it serves stale data or adds complexity without performance gains.

Multi-Layer Caching Architecture

graph LR
    User["👤 User<br/><i>Browser</i>"]
    Browser["Browser Cache<br/><i>HTTP Cache</i><br/>TTL: 1 year static<br/>60s dynamic"]
    CDN["CDN Edge<br/><i>CloudFlare</i><br/>TTL: 1-24 hours<br/>Hit Rate: 85-95%"]
    LB["Load Balancer"]
    
    subgraph Application Tier
        App1["App Server 1"]
        App2["App Server 2"]
        Redis[("Redis Cache<br/><i>In-Memory</i><br/>TTL: 5min-24hr<br/>Hit Rate: 80-95%")]
    end
    
    subgraph Database Tier
        Primary[("Primary DB<br/><i>PostgreSQL</i><br/>Query Cache")]
        Replica[("Read Replica<br/><i>Page Cache</i>")]
    end
    
    User --"1. Request"--> Browser
    Browser --"2. Cache Miss"--> CDN
    CDN --"3. Cache Miss"--> LB
    LB --"4. Route"--> App1
    App1 --"5. Check Cache"--> Redis
    Redis --"6. Cache Miss"--> App1
    App1 --"7. Query"--> Replica
    Replica --"8. Return Data"--> App1
    App1 --"9. Store in Cache"--> Redis
    App1 --"10. Return"--> CDN
    CDN --"11. Cache & Return"--> Browser
    Browser --"12. Cache & Display"--> User

Multi-layer caching architecture showing the request flow from browser to database. Each layer reduces load on downstream systems: 95% of requests served from browser/CDN, 4% from application cache, only 1% hit the database. This creates a 100x reduction in database load.

Cache Hit Rate Impact on System Capacity

graph TB
    subgraph "Without Cache"
        DB1[("Database<br/>1,000 QPS<br/>Max Capacity")]
        Load1["Incoming Load<br/>1,000 requests/sec"]
        Load1 --"100% requests"--> DB1
        Result1["❌ At Capacity Limit<br/>Cannot scale further"]
        DB1 -.-> Result1
    end
    
    subgraph "90% Cache Hit Rate"
        Cache2["Cache Layer<br/>900 req/sec"]
        DB2[("Database<br/>1,000 QPS<br/>Max Capacity")]
        Load2["Incoming Load<br/>10,000 requests/sec"]
        Load2 --"90% hits"--> Cache2
        Load2 --"10% miss<br/>1,000 req/sec"--> DB2
        Result2["✓ 10x Effective Capacity<br/>10,000 QPS supported"]
        Cache2 -.-> Result2
        DB2 -.-> Result2
    end
    
    subgraph "95% Cache Hit Rate"
        Cache3["Cache Layer<br/>19,000 req/sec"]
        DB3[("Database<br/>1,000 QPS<br/>Max Capacity")]
        Load3["Incoming Load<br/>20,000 requests/sec"]
        Load3 --"95% hits"--> Cache3
        Load3 --"5% miss<br/>1,000 req/sec"--> DB3
        Result3["✓ 20x Effective Capacity<br/>20,000 QPS supported"]
        Cache3 -.-> Result3
        DB3 -.-> Result3
    end
    
    Formula["📊 Formula: Effective Capacity = DB Capacity / (1 - Hit Rate)"]

Cache hit rate dramatically impacts system capacity. A database handling 1,000 QPS becomes effectively 10,000 QPS at 90% hit rate and 20,000 QPS at 95% hit rate. This is why even a 5% improvement in hit rate (90% → 95%) doubles your effective capacity.

Multi-Layer Caching Strategy

Understanding where to cache and what strategy to use at each layer is critical for interview discussions and production systems.

Browser Layer (HTTP Cache): The browser automatically caches responses based on HTTP headers (Cache-Control, ETag, Last-Modified). Static assets like JavaScript bundles should have long TTLs (1 year) with cache-busting via versioned filenames (app.v123.js). API responses can use short TTLs (60 seconds) or conditional requests with ETags. Expected hit rate: 90-95% for static assets, 60-80% for API responses. The key is setting Cache-Control: public, max-age=31536000, immutable for versioned assets and Cache-Control: private, max-age=60 for user-specific data.

CDN Layer (Edge Caching): CDNs like CloudFlare or Akamai cache content at edge locations worldwide. Best for static assets, images, videos, and cacheable API responses. TTLs typically range from 1 hour to 1 day depending on update frequency. Expected hit rate: 85-95% for popular content. The CDN reduces origin load and improves global latency—a user in Singapore accessing a US-based service sees 20ms CDN latency instead of 200ms origin latency. Configure cache keys carefully: include query parameters that affect responses but exclude tracking parameters.

Application Layer (In-Memory Cache): Redis or Memcached store serialized objects in RAM with microsecond access times. This is where you cache database query results, computed values, session data, and API responses from external services. TTLs vary widely: 5 minutes for product prices, 1 hour for user profiles, 24 hours for recommendation results. Expected hit rate: 80-95% depending on workload. Use cache-aside pattern (check cache, miss → fetch from DB, store in cache) for read-heavy workloads. The application cache is your primary defense against database overload.

Database Layer (Query Cache): Modern databases have built-in query caches and materialized views. PostgreSQL caches query plans and frequently accessed pages in shared buffers. MySQL has a query result cache (deprecated in 8.0 due to invalidation complexity). Materialized views precompute expensive joins and aggregations, refreshed on a schedule. Expected hit rate: 60-80% for page cache. This layer is automatic but understanding it helps you write cache-friendly queries (avoid random access patterns, use covering indexes).

How It Works

Let’s walk through implementing caching for a product catalog API that’s currently hitting the database on every request.

Step 1: Identify the Problem. You notice the /api/products endpoint takes 150ms average, with 80% of time spent on a database query that joins products, categories, and inventory tables. The query runs 10,000 times per minute, but product data only changes a few times per hour. You’re doing the same expensive work repeatedly.

Step 2: Choose Cache Strategy. For read-heavy data with infrequent updates, cache-aside (lazy loading) is appropriate. The flow: check cache → if hit, return cached data → if miss, query database, store in cache, return data. This is simpler than write-through (update cache on every write) and doesn’t require coordinating writes.

Step 3: Implement Application Cache. Add Redis with a 5-minute TTL. The code becomes: cache_key = 'products:all', check redis.get(cache_key), if null then query database and redis.setex(cache_key, 300, json_data). First request takes 150ms (cache miss), subsequent requests take 5ms (cache hit). With 10,000 requests/minute, you go from 10,000 database queries to ~34 queries (one every 5 minutes).

Step 4: Add Cache Invalidation. When products are updated, you need to invalidate the cache. Options: (a) delete the cache key on update (simple but causes cache miss), (b) update the cache directly (complex, requires same query logic), or (c) use shorter TTL and accept brief staleness (pragmatic for most cases). For a product catalog, option (c) with 5-minute TTL is usually acceptable.

Step 5: Monitor Cache Effectiveness. Track cache hit rate: hits / (hits + misses). If you’re seeing 60% hit rate, investigate: Is TTL too short? Are cache keys too specific (including user-specific data)? Are you caching the right data? Aim for 80-95% hit rate for read-heavy workloads. Also monitor cache memory usage and eviction rate—if you’re evicting frequently, you need more cache capacity or better key selection.

Step 6: Layer Additional Caches. Add CDN caching for the API response with Cache-Control: public, max-age=300 (5 minutes). Now most requests never reach your application servers. Add HTTP caching in the browser for individual product pages. The result: 95% of requests served from browser/CDN, 4% from application cache, 1% from database. Your database load drops 99%.

Cache-Aside Pattern Implementation Flow

sequenceDiagram
    participant Client
    participant App as Application Server
    participant Cache as Redis Cache
    participant DB as Database
    
    Note over Client,DB: Scenario 1: Cache Hit (95% of requests)
    Client->>App: GET /api/products
    App->>Cache: GET products:all
    Cache-->>App: ✓ Data found (5ms)
    App-->>Client: Return cached data
    
    Note over Client,DB: Scenario 2: Cache Miss (5% of requests)
    Client->>App: GET /api/products
    App->>Cache: GET products:all
    Cache-->>App: ✗ Cache miss
    App->>DB: SELECT * FROM products...
    DB-->>App: Query result (150ms)
    App->>Cache: SETEX products:all 300 data
    Cache-->>App: ✓ Stored
    App-->>Client: Return fresh data
    
    Note over Client,DB: Scenario 3: Cache Invalidation on Write
    Client->>App: PUT /api/products/123
    App->>DB: UPDATE products SET...
    DB-->>App: ✓ Updated
    App->>Cache: DEL products:all
    Cache-->>App: ✓ Invalidated
    App-->>Client: Success

Cache-aside pattern showing three scenarios: cache hit (fast path), cache miss (populate cache), and cache invalidation on write. First request takes 150ms, subsequent requests take 5ms—a 30x improvement. With 10,000 requests/minute and 5-minute TTL, database queries drop from 10,000 to ~34 per minute.

Variants

Cache-Aside (Lazy Loading): Application code explicitly manages cache. On read: check cache, if miss then fetch from database and populate cache. On write: update database, optionally invalidate cache. When to use: Read-heavy workloads, data that doesn’t change frequently, when you want simple cache logic. Pros: Only caches data that’s actually requested, simple to implement, cache failures don’t break writes. Cons: Cache misses cause latency spikes, potential for stale data, requires cache invalidation logic.

Write-Through: Every write goes to cache and database synchronously. Cache is always consistent with database. When to use: Write-heavy workloads where reads must see latest data, when consistency is critical (user sessions, shopping carts). Pros: Cache is always fresh, no stale data, simpler read logic. Cons: Write latency increases (two operations), wasted cache space for data that’s never read, more complex write path.

Write-Behind (Write-Back): Writes go to cache immediately, asynchronously written to database later. When to use: Write-heavy workloads where eventual consistency is acceptable, when you need to batch database writes for efficiency. Pros: Lowest write latency, can batch/coalesce writes, reduces database load. Cons: Risk of data loss if cache fails before write-back, complex failure handling, eventual consistency can confuse users.

Read-Through: Cache sits between application and database, automatically fetching on miss. When to use: When you want cache logic abstracted from application code, using cache proxies like Varnish. Pros: Cleaner application code, centralized cache logic. Cons: Less control over cache behavior, harder to debug, potential single point of failure.

Cache Strategy Comparison: Cache-Aside vs Write-Through vs Write-Behind

graph TB
    subgraph "Cache-Aside (Lazy Loading)"
        CA_Read["Read Request"]
        CA_Cache1{"Check Cache"}
        CA_Hit["✓ Cache Hit<br/>Return immediately"]
        CA_Miss["✗ Cache Miss"]
        CA_DB["Query Database"]
        CA_Store["Store in Cache<br/>TTL: 5 min"]
        CA_Return["Return Data"]
        
        CA_Read --> CA_Cache1
        CA_Cache1 --"Hit"--> CA_Hit
        CA_Cache1 --"Miss"--> CA_Miss
        CA_Miss --> CA_DB
        CA_DB --> CA_Store
        CA_Store --> CA_Return
        
        CA_Write["Write Request"]
        CA_UpdateDB["Update Database"]
        CA_Invalidate["Invalidate Cache<br/>(Optional)"]
        CA_Write --> CA_UpdateDB
        CA_UpdateDB --> CA_Invalidate
    end
    
    subgraph "Write-Through"
        WT_Write["Write Request"]
        WT_Cache["Update Cache"]
        WT_DB["Update Database<br/>(Synchronous)"]
        WT_Return["Return Success"]
        
        WT_Write --> WT_Cache
        WT_Cache --> WT_DB
        WT_DB --> WT_Return
        
        WT_Read["Read Request"]
        WT_CacheRead["Read from Cache<br/>(Always fresh)"]
        WT_Read --> WT_CacheRead
    end
    
    subgraph "Write-Behind (Write-Back)"
        WB_Write["Write Request"]
        WB_Cache["Update Cache<br/>Immediately"]
        WB_Return["Return Success<br/>(Fast)"]
        WB_Async["Async Worker"]
        WB_DB["Batch Write to DB<br/>(Later)"]
        
        WB_Write --> WB_Cache
        WB_Cache --> WB_Return
        WB_Cache -."Queue".-> WB_Async
        WB_Async --> WB_DB
    end
    
    Comparison["Use Cache-Aside for: Read-heavy workloads<br/>Use Write-Through for: Consistency-critical data<br/>Use Write-Behind for: Write-heavy workloads"]

Three cache strategies with different trade-offs. Cache-Aside (lazy loading) is simplest for read-heavy workloads but has cache miss latency. Write-Through ensures consistency but adds write latency. Write-Behind optimizes writes but risks data loss. Choose based on read/write ratio and consistency requirements.

Trade-offs

Latency vs Consistency: Caching reduces latency (5ms cache hit vs 150ms database query) but introduces staleness. With a 5-minute TTL, users might see data that’s 5 minutes old. Decision criteria: For product catalogs, news feeds, or recommendation results, 5-minute staleness is acceptable. For bank balances, inventory counts, or real-time bidding, you need shorter TTLs or cache invalidation on writes. Ask: What’s the business impact of showing stale data?

Complexity vs Performance: Adding caching improves performance (10-100x throughput increase) but adds operational complexity: cache invalidation bugs, cache stampede problems, memory management, monitoring. Decision criteria: Start with simple cache-aside for read-heavy endpoints. Only add write-through or write-behind when write performance becomes a bottleneck. Avoid premature optimization—measure first, cache second.

Memory Cost vs Database Load: Caching requires RAM (Redis cluster costs money) but reduces database load (fewer expensive instances). Decision criteria: Calculate cost savings: If caching lets you downsize from 10 database replicas to 2, the Redis cost is justified. If your database is already underutilized, caching might not be worth the complexity. The break-even point is typically when database CPU exceeds 60% during normal traffic.

Cache Size vs Hit Rate: Larger caches have higher hit rates but cost more. Decision criteria: Use the 80/20 rule: 20% of your data gets 80% of requests. Size your cache to hold that hot 20%. Monitor eviction rate—if you’re evicting frequently accessed keys, increase cache size. If eviction rate is low, you’re over-provisioned.

When to Use (and When Not To)

Implement caching when: (1) You have read-heavy workloads where the same data is requested repeatedly (product catalogs, user profiles, configuration data). (2) Database CPU exceeds 60% during normal traffic or query latency is increasing. (3) You’re making repeated expensive computations (recommendation algorithms, report generation, image processing). (4) External API calls are slow or rate-limited (payment processors, third-party data providers). (5) You’re serving global traffic and need to reduce latency across regions.

Avoid caching when: (1) Data changes frequently and staleness is unacceptable (real-time stock prices, live sports scores). (2) Each request is unique with no repeated access patterns (user-generated content that’s viewed once). (3) Your database is already underutilized and latency is acceptable. (4) Cache invalidation logic would be more complex than the performance gain justifies. (5) You’re caching large objects that don’t fit efficiently in memory (multi-GB datasets).

Red flags you have the No Caching antipattern: (1) Database CPU is consistently high (>70%) but queries are simple and fast. (2) The same queries appear thousands of times in slow query logs. (3) Application servers spend most time waiting on I/O (database, external APIs). (4) Performance degrades linearly with traffic—doubling users doubles database load. (5) You’re scaling by adding more database replicas instead of reducing read load.

Real-World Examples

Dropbox (File Metadata Caching): Dropbox initially hit PostgreSQL for every file metadata request (filename, size, modified date). At scale, this created millions of queries per second. They implemented a multi-layer cache: Memcached for hot metadata (95% hit rate, 1-hour TTL), edge caches for file listings (90% hit rate), and database query cache for complex folder queries. The result: 99% of metadata requests served from cache, database load reduced by 100x, enabling them to scale from millions to billions of files without proportional database growth. The interesting detail: they use cache warming during off-peak hours to preload popular folders, preventing cache stampede during morning traffic spikes.

Twitter (Timeline Caching): Twitter’s home timeline is expensive to compute: fetch tweets from followed users, rank by algorithm, filter blocked content. Computing this on every page load would require thousands of database queries. Instead, they precompute timelines and cache them in Redis with 5-minute TTL. When you tweet, they invalidate caches for your followers (write-through pattern). For celebrities with millions of followers, they use a hybrid approach: cache the celebrity’s tweets separately and merge them into follower timelines on read (fan-out on read). This reduces cache invalidation from millions of operations to thousands. The system serves 500,000 timeline requests per second with 95% cache hit rate.

Stripe (API Response Caching): Stripe’s API serves payment data that rarely changes (completed transactions are immutable). They use aggressive HTTP caching with ETags: clients send If-None-Match headers, Stripe returns 304 Not Modified if data hasn’t changed (saving bandwidth and processing). For list endpoints (transactions, customers), they cache responses in Redis with 60-second TTL and use cache keys that include pagination parameters. The result: 85% of API requests return cached data, reducing database load and improving API latency from 200ms to 20ms. The interesting detail: they use cache versioning (cache keys include schema version) to handle API updates without invalidating all caches.

Twitter Timeline Caching: Fan-Out Strategy

graph TB
    subgraph "Celebrity Tweet (1M followers)"
        Celebrity["@celebrity<br/>Posts Tweet"]
        TweetCache[("Tweet Cache<br/><i>Redis</i><br/>Store tweet separately")]
        Celebrity --"1. Store tweet"--> TweetCache
        
        Follower1["Follower 1<br/>Requests timeline"]
        Follower2["Follower 2<br/>Requests timeline"]
        FollowerN["Follower N<br/>...(1M followers)"]        
        TimelineService["Timeline Service"]
        
        Follower1 & Follower2 & FollowerN --"2. Request"--> TimelineService
        TimelineService --"3. Fetch celebrity tweets"--> TweetCache
        TimelineService --"4. Merge on read<br/>(Fan-out on read)"--> Follower1
        TimelineService --> Follower2
        TimelineService --> FollowerN
        
        Note1["✓ Avoids 1M cache invalidations<br/>✓ Merge happens at read time<br/>✓ Scales for high-follower accounts"]
        TweetCache -.-> Note1
    end
    
    subgraph "Regular User Tweet (500 followers)"
        RegularUser["@user<br/>Posts Tweet"]
        FollowerTimelines[("Follower Timeline Caches<br/><i>Redis</i><br/>500 separate caches")]
        RegularUser --"1. Fan-out on write<br/>Update 500 caches"--> FollowerTimelines
        
        RegFollower["Follower<br/>Requests timeline"]
        RegFollower --"2. Read from cache<br/>(Already computed)"--> FollowerTimelines
        
        Note2["✓ Pre-computed timelines<br/>✓ Fast reads (5ms)<br/>✓ Works for normal users"]
        FollowerTimelines -.-> Note2
    end
    
    Result["Hybrid Strategy:<br/>Fan-out on write for regular users<br/>Fan-out on read for celebrities<br/>Result: 95% cache hit rate, 500K timeline req/sec"]

Twitter’s hybrid caching strategy for timelines. Regular users use fan-out on write (update follower caches immediately), while celebrities use fan-out on read (merge tweets at request time). This avoids millions of cache invalidations for high-follower accounts while maintaining fast reads for most users.


Interview Essentials

Mid-Level

Explain the No Caching antipattern and its impact on system performance. Describe cache-aside pattern with code-level implementation. Calculate cache hit rate impact: if a database query takes 100ms and cache lookup takes 5ms, what’s the average latency at 80% hit rate? (Answer: 0.8 × 5ms + 0.2 × 100ms = 24ms vs 100ms without cache). Discuss where to add caching in a typical web application (browser, CDN, application, database). Explain cache invalidation strategies: TTL-based expiration vs explicit invalidation on writes.

Senior

Design a multi-layer caching strategy for a specific system (e.g., e-commerce product catalog). Justify cache strategy choice (cache-aside vs write-through) based on read/write patterns. Calculate capacity improvements: if your database handles 1,000 QPS and you achieve 90% cache hit rate, what’s your effective capacity? (Answer: 1,000 / (1 - 0.9) = 10,000 QPS). Discuss cache stampede problem: when cache expires, multiple requests simultaneously hit the database. Solutions: probabilistic early expiration, request coalescing, or cache warming. Explain how to monitor cache effectiveness: hit rate, eviction rate, memory usage, latency percentiles (p50, p99).

Staff+

Architect a caching strategy for a system with complex consistency requirements (e.g., inventory system where overselling is unacceptable). Discuss trade-offs between consistency and performance: when to use cache with short TTL vs database read replicas vs event-driven cache invalidation. Design cache invalidation for a distributed system: how do you ensure all cache nodes are invalidated when data changes? Solutions: pub/sub invalidation, cache versioning, or accepting eventual consistency. Calculate cost-benefit: if Redis costs $500/month but lets you downsize from 5 database replicas ($2,000/month) to 2 replicas ($800/month), what’s the ROI? Discuss cache warming strategies for preventing cold start problems after deployment or cache failures.

Common Interview Questions

How do you decide what to cache and what not to cache? (Answer: Cache data with high read-to-write ratio, expensive to compute/fetch, and acceptable staleness. Don’t cache rapidly changing data, user-specific data with low reuse, or data where staleness causes business problems.)

What’s the difference between cache-aside and write-through? When would you use each? (Answer: Cache-aside is application-managed, good for read-heavy workloads. Write-through updates cache on every write, good for consistency-critical data. Choose based on read/write ratio and staleness tolerance.)

How do you handle cache invalidation when data changes? (Answer: Options include TTL-based expiration, explicit invalidation on writes, event-driven invalidation via pub/sub, or cache versioning. Choice depends on consistency requirements and system complexity.)

What’s cache stampede and how do you prevent it? (Answer: When cache expires, multiple requests simultaneously hit the database. Prevent with probabilistic early expiration, request coalescing, or cache warming. The key is ensuring only one request refills the cache.)

How do you monitor cache effectiveness? (Answer: Track hit rate (target 80-95%), miss rate, eviction rate, memory usage, and latency improvements. Use percentile metrics (p50, p99) to catch outliers. Alert on hit rate drops or eviction spikes.)

Red Flags to Avoid

Saying ‘just add Redis’ without discussing cache strategy, TTL, or invalidation logic

Not considering cache stampede or cold start problems

Caching everything without analyzing access patterns or cost-benefit

Ignoring cache consistency issues (serving stale data when it matters)

Not monitoring cache effectiveness or having no plan to measure success

Claiming caching solves all performance problems without discussing trade-offs


Key Takeaways

The No Caching antipattern occurs when systems repeatedly fetch or compute the same data, causing 10-100x unnecessary load on databases and external services. It’s often invisible in development but crippling in production.

Multi-layer caching (browser → CDN → application → database) provides defense in depth. Each layer serves different time scales and data types. A 95% cache hit rate means 20x effective capacity increase.

Cache-aside (lazy loading) is the simplest and most common pattern for read-heavy workloads. Write-through ensures consistency but adds write latency. Write-behind optimizes writes but risks data loss. Choose based on read/write ratio and consistency requirements.

Cache effectiveness is measured by hit rate (target 80-95% for most workloads). Monitor hit rate, eviction rate, and latency improvements. A poorly implemented cache (low hit rate, frequent evictions) can be worse than no cache.

Cache invalidation is the hardest problem. Options include TTL-based expiration (simple but serves stale data), explicit invalidation on writes (complex but consistent), or event-driven invalidation (scalable but requires infrastructure). Choose based on staleness tolerance and system complexity.