CDN Caching: Edge Caching Strategy for Low Latency

TL;DR

CDN caching distributes static content across geographically dispersed edge servers, serving users from the nearest location to minimize latency. Edge locations cache content based on cache keys (URL + headers), while origin shield acts as a secondary cache layer to reduce origin load. Critical for global applications serving static assets like images, videos, and JavaScript bundles.

Cheat Sheet: Edge locations cache content near users → Origin shield protects origin server → Cache-Control headers define TTL → Purge API for invalidation → Reduces global latency by 70-90%.

The Analogy

Think of CDN caching like a franchise restaurant chain. Instead of everyone traveling to the original restaurant (origin server) in New York, the company opens identical locations (edge servers) in every major city. Each franchise keeps the same menu items (cached content) and serves local customers instantly. When the headquarters updates the menu (origin content changes), they send instructions to all franchises to update their copies (cache invalidation). The regional distribution center (origin shield) handles bulk orders from multiple franchises, preventing the headquarters kitchen from being overwhelmed.

Why This Matters in Interviews

CDN caching appears in almost every system design interview for consumer-facing applications. Interviewers expect you to know when to use a CDN (static content, global users), how to configure cache headers properly, and how to handle cache invalidation. Senior candidates must discuss origin shield architecture, cache key design for personalized content, and cost-performance tradeoffs. This is a litmus test for whether you’ve built production systems that serve millions of users globally—mentioning CDN early signals experience with scale.

Core Concept

CDN caching is a distributed caching strategy that places content on edge servers located in multiple geographic regions, typically within internet exchange points (IXPs) or colocation facilities. When a user in Tokyo requests an image from a US-based application, the CDN serves it from a Tokyo edge location in 20ms instead of 200ms from the origin server. Modern CDNs like Cloudflare, Fastly, and AWS CloudFront operate hundreds of edge locations worldwide, creating a global cache layer that sits between users and your origin infrastructure.

The fundamental value proposition is latency reduction through geographic proximity. A CDN transforms a single-region application into a globally distributed system without replicating your entire backend. Netflix serves 250+ million users globally by caching video chunks on CDN edge servers—over 95% of requests never touch Netflix’s origin servers. This architecture pattern is essential for any application with users across multiple continents or serving large static assets.

Cache Control Header Strategy

sequenceDiagram
    participant Browser
    participant CDN as CDN Edge
    participant Origin
    
    Note over Origin: Set Cache-Control headers<br/>for different content types
    
    Browser->>CDN: GET /profile.jpg
    CDN->>Origin: Cache MISS - fetch image
    Origin->>CDN: 200 OK<br/>Cache-Control: public, max-age=300, s-maxage=3600<br/>Vary: Accept-Encoding
    Note over CDN: Cache for 1 hour<br/>(s-maxage=3600)
    CDN->>Browser: Return image
    Note over Browser: Cache for 5 min<br/>(max-age=300)
    
    Browser->>CDN: GET /api/feed (5 min later)
    Note over CDN: Still cached at edge<br/>(within 1 hour)
    CDN->>Browser: Return from edge cache
    
    Browser->>CDN: GET /api/user/settings
    CDN->>Origin: Always fetch (Cache-Control: private)
    Origin->>CDN: 200 OK<br/>Cache-Control: private, no-store
    Note over CDN: Do NOT cache<br/>(private data)
    CDN->>Browser: Return fresh data

Different Cache-Control directives optimize caching at each layer. Use s-maxage for longer CDN caching while keeping browser cache shorter. Mark personalized content as private to prevent CDN caching.

Cache Stampede Problem and Origin Shield Solution

graph TB
    subgraph Without Origin Shield
        Purge1["⚡ Cache Purge<br/>Popular content"] --> Edge1A["Edge Tokyo"]
        Purge1 --> Edge1B["Edge London"]
        Purge1 --> Edge1C["Edge NYC"]
        Purge1 --> Edge1D["Edge Sydney"]
        Purge1 --> Edge1E["Edge Mumbai"]
        
        Edge1A --"Simultaneous<br/>requests"--> Origin1["😱 Origin<br/>Overwhelmed<br/>500 errors"]
        Edge1B --"Simultaneous<br/>requests"--> Origin1
        Edge1C --"Simultaneous<br/>requests"--> Origin1
        Edge1D --"Simultaneous<br/>requests"--> Origin1
        Edge1E --"Simultaneous<br/>requests"--> Origin1
    end
    
    subgraph With Origin Shield
        Purge2["⚡ Cache Purge<br/>Popular content"] --> Edge2A["Edge Tokyo"]
        Purge2 --> Edge2B["Edge London"]
        Purge2 --> Edge2C["Edge NYC"]
        Purge2 --> Edge2D["Edge Sydney"]
        Purge2 --> Edge2E["Edge Mumbai"]
        
        Edge2A --"Request"--> Shield["🛡️ Origin Shield<br/>Request Collapsing"]
        Edge2B --"Request"--> Shield
        Edge2C --"Request"--> Shield
        Edge2D --"Request"--> Shield
        Edge2E --"Request"--> Shield
        
        Shield --"Single request<br/>to origin"--> Origin2["✅ Origin<br/>Healthy<br/>1 request"]
        Origin2 --"Response"--> Shield
        Shield --"Cached response"--> Edge2A
        Shield --"Cached response"--> Edge2B
        Shield --"Cached response"--> Edge2C
        Shield --"Cached response"--> Edge2D
        Shield --"Cached response"--> Edge2E
    end

Without origin shield, cache purges cause all edge locations to simultaneously request content from origin, creating a stampede. Origin shield collapses these requests into a single origin fetch, protecting backend infrastructure.

Netflix Content Distribution Architecture

graph LR
    subgraph User Devices
        User1["👤 User<br/>Watching Stranger Things"]
    end
    
    subgraph ISP Network
        OCA["Open Connect<br/>Appliance<br/>(Edge Cache)<br/>Inside ISP"]
    end
    
    subgraph Netflix Infrastructure
        API["Netflix API<br/>(Metadata, UI)"]
        Control["Control Plane<br/>(Cache Management)"]
        Origin[("Origin Storage<br/>(Master Content)")]
    end
    
    User1 --"1. Request video chunk<br/>episode_01_chunk_042.mp4"--> OCA
    OCA --"2. Cache HIT (95%)<br/>Serve in <10ms"--> User1
    
    OCA -."3. Cache MISS (5%)<br/>Fetch from origin".-> Origin
    Origin -."4. Return chunk<br/>Cache forever (immutable)".-> OCA
    
    Control --"Pre-populate popular<br/>content during off-peak"--> OCA
    API --"Metadata requests<br/>(not cached)"--> User1
    
    Note1["📊 95% of traffic<br/>served from edge<br/>5% origin requests"]
    Note2["🔑 Content-addressed URLs<br/>Infinite TTL<br/>No purging needed"]

Netflix deploys Open Connect appliances inside ISP networks to cache video chunks near viewers. Content-addressed URLs with infinite TTLs eliminate purging complexity, while pre-population ensures high cache hit rates for new releases.

How It Works

When a user requests content, DNS resolution routes them to the nearest CDN edge location based on geographic proximity and network conditions. The edge server checks its local cache using a cache key—typically the full URL plus specific headers like Accept-Encoding or Cookie values. If the content exists and hasn’t expired (based on Cache-Control headers), the edge serves it immediately. On a cache miss, the edge server fetches content from either the origin shield (if configured) or directly from the origin server, caches it locally, and serves it to the user.

Origin shield is a critical architectural component that acts as an additional cache layer between edge locations and your origin. When multiple edge servers experience cache misses simultaneously (common after a purge or for newly published content), origin shield collapses these requests into a single fetch from origin. This prevents cache stampedes that could overwhelm your origin infrastructure. Fastly’s origin shield reduced Shopify’s origin traffic by 60% during flash sales by absorbing the initial request burst.

Cache keys determine what variations of content get cached separately. A simple cache key might be just the URL path, but real-world scenarios require more sophistication. If your API returns different responses based on the Accept-Language header, that header must be part of the cache key—otherwise, a French user might receive cached English content. The Vary response header tells CDNs which request headers affect the response, automatically incorporating them into cache keys. Instagram uses cache keys that include device type (mobile vs desktop) to serve appropriately sized images without maintaining separate URLs.

CDN Request Flow with Origin Shield

graph LR
    User["👤 User<br/>(Tokyo)"]
    DNS["DNS<br/>Resolver"]
    Edge["Edge Server<br/>(Tokyo PoP)"]
    Shield["Origin Shield<br/>(Regional Cache)"]
    Origin["Origin Server<br/>(US-East)"]
    
    User --"1. Request image.jpg"--> DNS
    DNS --"2. Return Tokyo<br/>edge IP"--> User
    User --"3. GET /image.jpg"--> Edge
    
    Edge --"4. Cache MISS<br/>Check shield"--> Shield
    Shield --"5. Cache MISS<br/>Fetch from origin"--> Origin
    Origin --"6. Return image<br/>+ Cache-Control"--> Shield
    Shield --"7. Cache & forward"--> Edge
    Edge --"8. Cache & serve<br/>(20ms latency)"--> User
    
    Edge -."Future requests<br/>Cache HIT (local)".-> User

When a user requests content, DNS routes them to the nearest edge location. On cache miss, the edge checks origin shield before hitting the origin server. Subsequent requests are served directly from edge cache with minimal latency.

Cache Key Composition for Personalized Content

graph TB
    Request["HTTP Request<br/>GET /api/products"]
    
    subgraph Cache Key Components
        URL["URL Path<br/>/api/products"]
        Headers["Request Headers"]
        Cookies["Cookie Values"]
    end
    
    subgraph Header Details
        Lang["Accept-Language<br/>en-US"]
        Encoding["Accept-Encoding<br/>gzip"]
        Device["User-Agent<br/>Mobile"]
    end
    
    subgraph Cookie Details
        Session["session_id<br/>abc123"]
        Currency["currency<br/>USD"]
    end
    
    Request --> URL
    Request --> Headers
    Request --> Cookies
    
    Headers --> Lang
    Headers --> Encoding
    Headers --> Device
    
    Cookies --> Session
    Cookies --> Currency
    
    URL --> CacheKey["Final Cache Key<br/>/api/products|en-US|gzip|Mobile|USD"]
    Lang --> CacheKey
    Encoding --> CacheKey
    Device --> CacheKey
    Currency --> CacheKey
    
    CacheKey --> Storage[("Edge Cache<br/>Storage")]

Cache keys combine URL path with relevant headers and cookies to ensure different user contexts receive appropriate cached responses. Each unique combination creates a separate cache entry.

Key Principles

principle: Cache at the Edge, Protect the Origin explanation: The primary goal is maximizing cache hit rates at edge locations to minimize origin requests. Every request that reaches your origin costs money, adds latency, and consumes capacity. Design your caching strategy to achieve 90%+ cache hit rates for static content. example: Spotify caches album artwork, audio chunks, and API responses at the edge. During peak hours, their origin servers handle only 5% of total traffic—the rest is served from CDN cache. This allows them to run a much smaller origin infrastructure and survive traffic spikes during new album releases.

principle: Explicit Cache Control Over Implicit Defaults explanation: Always set explicit Cache-Control headers rather than relying on CDN defaults. Use s-maxage for CDN-specific TTLs separate from browser cache TTLs (max-age). For dynamic content that shouldn’t be cached, use Cache-Control: private or no-store. example: Twitter sets Cache-Control: public, max-age=300, s-maxage=3600 on user profile images. Browsers cache for 5 minutes, but CDN edges cache for 1 hour. This balances freshness for individual users while maximizing CDN efficiency across millions of users viewing the same profiles.

principle: Design for Invalidation from Day One explanation: Cache invalidation is the hardest problem in CDN caching. Build your system to handle stale content gracefully and implement purge mechanisms before you need them. Use versioned URLs (e.g., /assets/app.v123.js) for assets that must update immediately. example: GitHub uses content-addressed URLs for all static assets—the filename includes a hash of the content. When they deploy new JavaScript, it gets a new URL, so there’s no need to purge old versions. Old URLs remain cached until they naturally expire, while new deployments are instantly live.

Deep Dive

Types / Variants

Edge Caching is the standard CDN caching layer where content is cached at hundreds of geographically distributed points of presence (PoPs). Each edge location operates independently with its own cache storage, typically using LRU eviction when storage fills up. Edge caching handles the vast majority of user requests and provides the primary latency benefit.

Origin Shield adds a regional cache layer between edge locations and your origin. Instead of 200 edge locations all potentially requesting the same content from your origin, they first check origin shield. This dramatically reduces origin load and provides better cache efficiency for less popular content that might not be cached at every edge. AWS CloudFront’s origin shield reduced origin requests by 70% for a major e-commerce site during Black Friday.

Tiered Caching extends this concept further with multiple cache layers. Cloudflare uses a two-tier architecture where smaller edge locations pull from larger regional caches before going to origin. This improves cache hit rates for the long tail of less popular content while maintaining low latency for popular content.

Dynamic Content Caching applies CDN caching to API responses and HTML pages, not just static assets. This requires sophisticated cache key design to handle personalization. Etsy caches product listing pages at the edge with cache keys that include user location and currency, achieving 60% cache hit rates on dynamic content that would traditionally bypass CDN entirely.

Trade-offs

Latency vs Freshness: Longer cache TTLs improve performance but increase staleness risk. Short TTLs (under 60 seconds) provide near-real-time updates but generate more origin traffic and reduce cache efficiency. The decision depends on your consistency requirements—news sites might use 30-second TTLs while product catalogs can cache for hours. Stripe uses 5-minute TTLs for API documentation, balancing freshness with CDN efficiency.

Cache Hit Rate vs Storage Costs: CDNs charge for both bandwidth and storage. Caching everything maximizes hit rates but increases storage costs, especially for large video files. Selective caching based on popularity (cache only content requested more than N times) optimizes costs but reduces hit rates for long-tail content. YouTube caches only videos that have been watched multiple times in a region, letting unpopular videos always fetch from origin.

Global Consistency vs Performance: Purging content globally takes time—typically 5-30 seconds for purge requests to propagate to all edge locations. During this window, different users see different versions. You can either accept eventual consistency or use versioned URLs that don’t require purging. Reddit uses versioned asset URLs for critical JavaScript but accepts 30-second staleness for cached HTML pages.

Origin Shield Cost vs Origin Protection: Origin shield adds an extra network hop (5-20ms latency) and additional CDN costs, but dramatically reduces origin load. For applications with expensive origin infrastructure (databases, complex APIs), origin shield pays for itself by allowing smaller origin capacity. For simple static file serving, the extra hop might not be worth it.

Common Pitfalls

pitfall: Caching Personalized Content Without Proper Cache Keys why_it_happens: Developers enable CDN caching on API endpoints that return user-specific data, forgetting that the default cache key is just the URL. User A’s data gets cached and served to User B, causing data leaks. how_to_avoid: Always include user identifiers in cache keys for personalized content, or mark responses as Cache-Control: private to prevent CDN caching entirely. Better yet, separate personalized and non-personalized data into different endpoints—cache the non-personalized parts aggressively.

pitfall: Ignoring Vary Header for Content Negotiation why_it_happens: APIs that return different formats (JSON vs XML) or compressed responses (gzip vs brotli) based on Accept headers don’t set Vary headers. The CDN caches the first response and serves it to all subsequent users regardless of their Accept headers. how_to_avoid: Set Vary: Accept-Encoding, Accept-Language for any response that changes based on request headers. Be cautious—each Vary header multiplies cache storage requirements. Vary: User-Agent creates a separate cache entry for every browser type, fragmenting your cache.

pitfall: Cache Stampede After Purge why_it_happens: Purging popular content causes all edge locations to simultaneously request fresh content from origin, overwhelming it. This is especially dangerous for high-traffic items like homepage HTML or popular product pages. how_to_avoid: Use origin shield to collapse simultaneous requests. Implement stale-while-revalidate to serve stale content while fetching fresh content in the background. Consider soft purges that mark content as stale but allow serving it if origin is slow or down.

pitfall: Over-Caching Dynamic Content why_it_happens: Teams cache API responses with long TTLs to improve performance, then struggle with stale data when underlying data changes. Purging becomes a complex distributed systems problem. how_to_avoid: Cache dynamic content conservatively with short TTLs (60-300 seconds). Use cache tags to group related content for targeted purging—when a product changes, purge all cached responses tagged with that product ID. Implement cache warming to pre-populate caches after purges.

Real-World Examples

company: Netflix system: Video Streaming Platform usage_detail: Netflix operates Open Connect, their custom CDN with thousands of edge servers inside ISP networks. Video chunks are cached at the edge with infinite TTLs—content never expires because video files are immutable and identified by content hash. During peak hours (8-11 PM), 95% of traffic is served from edge caches within the viewer’s ISP network, reducing latency to under 10ms. Origin servers only handle new content uploads and cache misses for unpopular titles. This architecture allows Netflix to stream 4K video to 250 million subscribers while keeping origin infrastructure costs manageable. They pre-populate edge caches with predicted popular content during off-peak hours, ensuring cache hits even for newly released shows.

company: Instagram system: Photo Sharing Platform usage_detail: Instagram uses Facebook’s CDN to cache profile pictures, photos, and videos with cache keys that include image size parameters. When you view someone’s profile, the CDN serves a 150x150 thumbnail from the nearest edge location. The same photo URL with different size parameters (?size=large) creates separate cache entries. They set Cache-Control: public, max-age=31536000 (1 year) because photos are immutable—if a user changes their profile picture, it gets a new URL. This aggressive caching strategy achieves 98% cache hit rates for images. For the Instagram feed API, they use shorter TTLs (60 seconds) with cache keys that include user ID, allowing personalized feeds while still benefiting from CDN caching for users who refresh frequently.

company: Shopify system: E-commerce Platform usage_detail: Shopify caches merchant storefronts at the edge with sophisticated cache key design. Product pages include cache keys based on customer location (for currency/language), cart state (empty vs items), and customer login status. They use Fastly’s origin shield to protect merchant servers during flash sales—when a popular product drops, thousands of edge locations might miss cache simultaneously, but origin shield collapses these into a single origin request. Product images use 1-year TTLs with versioned URLs, while product availability data uses 30-second TTLs to balance freshness with performance. During Black Friday, their CDN serves 80% of traffic from edge cache, allowing Shopify’s infrastructure to handle 10x normal traffic without scaling origin capacity proportionally.

Interview Expectations

Mid-Level

Mid-level candidates should explain the basic CDN architecture: edge locations cache content near users to reduce latency. Mention cache headers (Cache-Control, max-age) and understand that CDNs are primarily for static content like images, CSS, and JavaScript. Explain cache misses (edge fetches from origin) and cache hits (edge serves directly). Know that cache invalidation is challenging and mention purge APIs. When designing a system, suggest CDN for serving static assets and explain the latency benefit (200ms → 20ms). Recognize that CDNs cost money and aren’t free infrastructure.

Senior

Senior candidates must discuss origin shield architecture and explain why it’s critical for protecting origin servers from cache stampedes. Design cache keys for personalized content (include user ID or session token) and explain the Vary header for content negotiation. Discuss cache TTL tradeoffs: longer TTLs improve hit rates but increase staleness. Mention specific cache headers: s-maxage for CDN-specific TTLs, stale-while-revalidate for graceful updates. Calculate cache hit rate impact on origin load (90% hit rate = 10x reduction in origin traffic). Discuss versioned URLs vs purging for handling content updates. Explain how CDNs provide DDoS protection by absorbing attack traffic at the edge. Reference real systems like Netflix or Cloudflare in your design.

Staff+

Staff+ candidates should architect multi-tier caching strategies with edge caching, origin shield, and application caching working together. Discuss cache warming strategies for predictable traffic patterns (pre-populate caches before product launches). Design cache key hierarchies for complex personalization (location + currency + customer tier). Explain cache sharding strategies for very large content catalogs. Discuss cost optimization: selective caching based on popularity, compression tradeoffs, and bandwidth vs storage costs. Address cache consistency challenges in distributed systems: how to handle eventual consistency during purges, designing for graceful degradation when caches are stale. Discuss CDN selection criteria: PoP coverage, origin shield support, purge API capabilities, and cost structure. Explain how to monitor and optimize cache performance: hit rate metrics, origin offload percentage, and P99 latency by region.

Common Interview Questions

How would you design caching for a global video streaming service like YouTube?

Your CDN cache hit rate dropped from 90% to 60% overnight. How do you debug this?

How do you handle cache invalidation when a product price changes in an e-commerce system?

What’s the difference between max-age and s-maxage in Cache-Control headers?

How would you cache API responses that include personalized recommendations?

Explain how origin shield reduces origin server load. When would you not use it?

Key Takeaways

CDN caching distributes content to edge locations near users, reducing latency from 200ms to 20ms for global applications. Origin shield adds a secondary cache layer that collapses requests and protects origin servers from traffic spikes.

Cache keys determine what gets cached separately—include URL, relevant headers (Accept-Encoding, Accept-Language), and user identifiers for personalized content. Use Vary headers to automatically incorporate content negotiation headers into cache keys.

Set explicit Cache-Control headers: use s-maxage for CDN TTLs separate from browser TTLs (max-age). Longer TTLs improve cache hit rates but increase staleness risk. Design for invalidation with versioned URLs or purge APIs.

Origin shield is critical for high-traffic applications—it reduces origin load by 60-80% by collapsing simultaneous cache misses from multiple edge locations into single origin requests. Essential during traffic spikes and after cache purges.

CDN caching achieves 90%+ hit rates for static content, reducing origin traffic by 10x and enabling global scale without proportionally scaling origin infrastructure. Netflix serves 95% of traffic from edge cache, Instagram achieves 98% hit rates for images.

Prerequisites

Client Caching - Understanding browser cache headers and HTTP caching fundamentals

Cache Invalidation - Strategies for keeping cached content fresh and handling purges

HTTP Protocol - HTTP headers and status codes that control caching behavior

Next Steps

Application Caching - Server-side caching strategies that complement CDN caching

Load Balancing - How CDNs distribute traffic across edge locations

Content Delivery - Broader content delivery strategies beyond caching

DNS - How DNS routing directs users to nearest CDN edge location

API Gateway - Caching API responses at the gateway layer

Static Asset Management - Organizing and versioning assets for CDN delivery