Pull CDNs: On-Demand Edge Caching Explained

After this topic, you will be able to:

Describe the pull CDN workflow from initial cache miss to subsequent cache hits
Justify when to use pull CDNs based on content volatility and traffic unpredictability
Evaluate trade-offs between pull CDN bandwidth costs and storage efficiency
Analyze origin-pull strategies and their impact on origin server load

TL;DR

Pull CDNs fetch content from your origin server on-demand when users first request it, then cache it at edge locations for subsequent requests. Unlike push CDNs where you proactively upload content, pull CDNs use lazy loading—content only reaches the CDN when someone actually needs it. This makes pull CDNs ideal for sites with unpredictable traffic patterns, frequently changing content, or large catalogs where most items are rarely accessed. Cheat Sheet: Pull = lazy loading, best for dynamic content and unpredictable traffic; Push = proactive distribution, best for static content with known demand.

The Analogy

Think of a pull CDN like a library branch that doesn’t stock every book upfront. When you request a book they don’t have, the librarian orders it from the main library, puts it on their shelf, and lends it to you. The next person who wants that book gets it instantly from the branch. Books sit on the branch shelf for a set time (TTL), then get returned to make room for more popular titles. This is far more efficient than trying to stock every book at every branch (push CDN), especially when you don’t know which books will be popular.

Why This Matters in Interviews

Pull CDNs come up in nearly every system design interview involving content delivery—whether you’re designing YouTube, Netflix, or an e-commerce site. Interviewers want to see if you understand the fundamental trade-off between bandwidth costs and storage efficiency, and whether you can justify architectural decisions based on traffic patterns. The key differentiator for strong candidates is explaining when to use pull versus push, not just how pull works. Senior engineers are expected to discuss cache warming strategies, origin shielding, and the cold-start problem. This topic directly connects to caching strategies, which appear in 80% of system design interviews at FAANG companies.

Core Concept

Pull CDNs operate on a lazy-loading principle: content remains on your origin servers until the first user requests it through the CDN. When a request hits an edge location that doesn’t have the content cached (a cache miss), the edge server pulls the content from your origin, caches it locally, and serves it to the user. Subsequent requests for the same content hit the edge cache directly, avoiding origin server load. This architecture fundamentally differs from push CDNs where you proactively upload content to edge locations before users request it. The pull model trades first-request latency for operational simplicity and storage efficiency. You don’t need to manage content distribution—the CDN handles it automatically based on actual demand. This makes pull CDNs particularly attractive for large content catalogs, frequently updated content, or unpredictable traffic patterns where you can’t anticipate which content will be popular.

Multi-Tier Pull CDN with Origin Shielding

graph TB
    subgraph Users
        U1["User Tokyo"]
        U2["User Sydney"]
        U3["User Mumbai"]
        U4["User Singapore"]
    end
    
    subgraph Edge Layer
        E1["Tokyo Edge<br/><i>Cache Miss</i>"]
        E2["Sydney Edge<br/><i>Cache Miss</i>"]
        E3["Mumbai Edge<br/><i>Cache Miss</i>"]
        E4["Singapore Edge<br/><i>Cache Miss</i>"]
    end
    
    subgraph Shield Layer
        Shield["Regional Shield Cache<br/><i>Asia-Pacific</i><br/>Consolidates requests"]
    end
    
    Origin["Origin Server<br/><i>us-east-1</i><br/>Receives 1 request"]
    
    U1 --> E1
    U2 --> E2
    U3 --> E3
    U4 --> E4
    
    E1 --"Request 1"--> Shield
    E2 --"Request 2"--> Shield
    E3 --"Request 3"--> Shield
    E4 --"Request 4"--> Shield
    
    Shield --"Single consolidated<br/>request"--> Origin
    
    Origin --"Response<br/>propagates back"--> Shield
    Shield --> E1 & E2 & E3 & E4

Origin shielding prevents the thundering herd problem by adding a regional aggregation layer. When multiple edge locations simultaneously experience cache misses for viral content, the shield consolidates requests so only one reaches the origin server, preventing origin overload while still serving all edges efficiently.

Pull vs Push CDN Decision Tree

flowchart TB
    Start["Content Delivery<br/>Strategy Decision"]
    
    Start --> Q1{"Content update<br/>frequency?"}
    
    Q1 -->|"Frequent<br/>(hourly/daily)"| Q2{"Traffic pattern<br/>predictable?"}
    Q1 -->|"Rare<br/>(weekly/never)"| Q3{"Content catalog<br/>size?"}
    
    Q2 -->|"Unpredictable<br/>spiky traffic"| Pull1["✓ Pull CDN<br/><i>Lazy loading handles<br/>dynamic updates</i>"]
    Q2 -->|"Predictable<br/>steady traffic"| Q4{"Cache hit rate<br/>expectations?"}
    
    Q3 -->|"Large<br/>(millions of assets)"| Q5{"Access pattern?"}
    Q3 -->|"Small<br/>(thousands)"| Push1["✓ Push CDN<br/><i>Pre-distribute all<br/>static assets</i>"]
    
    Q4 -->|"Can tolerate<br/>90-95%"| Pull2["✓ Pull CDN<br/><i>Good enough hit rate<br/>with lower cost</i>"]
    Q4 -->|"Need 99%+"| Push2["✓ Push CDN<br/><i>Guaranteed availability<br/>at all edges</i>"]
    
    Q5 -->|"Long-tail<br/>(80% rarely accessed)"| Pull3["✓ Pull CDN<br/><i>Only cache what's<br/>actually requested</i>"]
    Q5 -->|"Uniform<br/>(all content popular)"| Hybrid["✓ Hybrid<br/><i>Push hot content<br/>Pull long-tail</i>"]

Decision tree for choosing between pull and push CDN strategies based on content characteristics and traffic patterns. Pull CDNs excel with frequent updates, unpredictable traffic, and large catalogs with long-tail access patterns, while push CDNs suit static content with predictable demand and requirements for guaranteed availability.

Cache Expiration Thundering Herd Problem

sequenceDiagram
    participant Origin as Origin Server<br/>(Capacity: 1000 req/s)
    participant Edge1 as Edge Location 1
    participant Edge2 as Edge Location 2
    participant Edge3 as Edge Location 3
    participant EdgeN as Edge Location N<br/>(200 total edges)
    
    Note over Edge1,EdgeN: All edges cached content at T=0<br/>TTL = 3600s (1 hour)
    
    Note over Edge1,EdgeN: T=3600s: TTL expires simultaneously
    
    rect rgb(255, 200, 200)
        Note over Origin,EdgeN: ❌ Without TTL Jitter
        Edge1->>Origin: Revalidate request
        Edge2->>Origin: Revalidate request
        Edge3->>Origin: Revalidate request
        EdgeN->>Origin: Revalidate request
        Note over Origin: 200 concurrent requests<br/>Origin overloaded! 🔥
        Origin--xEdge1: Timeout/Error
        Origin--xEdge2: Timeout/Error
    end
    
    Note over Edge1,EdgeN: T=7200s: Next expiration cycle
    
    rect rgb(200, 255, 200)
        Note over Origin,EdgeN: ✓ With TTL Jitter (±300s)
        Note over Edge1: Expires at T=7150s
        Edge1->>Origin: Revalidate
        Origin-->>Edge1: 304 Not Modified
        Note over Edge2: Expires at T=7230s
        Edge2->>Origin: Revalidate
        Origin-->>Edge2: 304 Not Modified
        Note over Edge3: Expires at T=7180s
        Edge3->>Origin: Revalidate
        Origin-->>Edge3: 304 Not Modified
        Note over Origin: Requests spread over 10 minutes<br/>Origin handles load smoothly ✓
    end

The thundering herd problem occurs when content expires simultaneously across all edge locations, overwhelming the origin with concurrent revalidation requests. TTL jitter (adding random variance to expiration times) spreads the load over time, preventing origin overload while maintaining cache freshness. Stale-while-revalidate further mitigates this by serving cached content during background revalidation.

How It Works

The pull CDN workflow follows a predictable pattern. First, you configure your CDN provider (like Cloudflare or Amazon CloudFront) with your origin server URL and rewrite your application URLs to point to the CDN domain. When a user in Tokyo requests an image, their browser hits the Tokyo edge location. If this is the first request for that image (cache miss), the Tokyo edge server makes an HTTP request to your origin server, retrieves the image, stores it in its local cache with a Time-To-Live (TTL), and returns it to the user. The second user in Tokyo requesting the same image gets it directly from the edge cache (cache hit)—no origin involvement. The TTL determines how long content stays cached before the edge considers it stale. When content expires, the next request triggers a revalidation check with the origin using conditional GET requests (If-Modified-Since headers). If the origin confirms the content hasn’t changed, the edge refreshes the TTL without re-downloading the entire file. If content has changed, the edge pulls the new version. This validation mechanism balances freshness with bandwidth efficiency.

Pull CDN Request Flow: Cache Miss and Cache Hit

graph LR
    User["User in Tokyo<br/><i>Browser</i>"]
    Edge["Tokyo Edge Server<br/><i>CDN POP</i>"]
    Origin["Origin Server<br/><i>us-east-1</i>"]
    Cache[("Edge Cache<br/><i>TTL: 24h</i>")]
    
    User --"1. GET /image.jpg<br/>(First Request)"--> Edge
    Edge --"2. Cache Miss<br/>Check local storage"--> Cache
    Edge --"3. Pull from origin<br/>HTTP GET"--> Origin
    Origin --"4. Return image<br/>+ Cache-Control"--> Edge
    Edge --"5. Store with TTL"--> Cache
    Edge --"6. Serve to user<br/>(~500ms total)"--> User
    
    User2["User 2 in Tokyo<br/><i>Browser</i>"]
    User2 --"7. GET /image.jpg<br/>(Subsequent Request)"--> Edge
    Edge --"8. Cache Hit<br/>Serve from local"--> Cache
    Edge --"9. Serve to user<br/>(~50ms total)"--> User2

The pull CDN workflow shows two scenarios: the first request (cache miss) requires fetching from origin with higher latency, while subsequent requests (cache hit) serve directly from edge cache with dramatically reduced latency. The TTL determines how long content remains cached before requiring revalidation.

Key Principles

principle: On-Demand Population explanation: Content only reaches edge locations when users actually request it, not when you publish it. This means your first user to each geographic region experiences slower response times (cold start), but you avoid wasting storage and bandwidth on content nobody accesses. example: An e-commerce site with 10 million product images doesn’t need all images at all edges. Only the 1000 most popular products in each region get cached, saving 99.99% of storage costs while still serving 95% of requests from cache.

principle: TTL-Based Expiration explanation: Every cached object has a TTL that determines how long it remains valid. Short TTLs (minutes) ensure freshness for dynamic content but increase origin load. Long TTLs (days/weeks) reduce origin traffic but risk serving stale content. The optimal TTL balances your content update frequency against cache hit rates. example: News sites use 5-minute TTLs for article pages to show breaking updates quickly, but 24-hour TTLs for article images that never change. This gives 99% cache hit rates on images while keeping text fresh.

principle: Origin Shielding explanation: When multiple edge locations simultaneously experience cache misses for the same content, they can overwhelm your origin with duplicate requests. Origin shielding adds a mid-tier cache layer that consolidates requests, so only one request reaches your origin even if 50 edges need the content. example: When a viral video gets posted, 200 edge locations might all miss cache simultaneously. Without shielding, your origin gets 200 concurrent requests. With shielding, the shield cache gets 200 requests but only sends 1 to your origin, preventing origin overload.

Deep Dive

Types / Variants

Pull CDNs come in several architectural flavors. Single-tier pull is the simplest: edge locations pull directly from your origin. This works for small-scale deployments but creates origin load proportional to the number of edge locations. Multi-tier pull with origin shields adds regional aggregation caches between edges and origin. Edges pull from shields, shields pull from origin. This dramatically reduces origin load—Netflix uses this pattern with regional caches serving hundreds of edge locations. Hybrid pull-push lets you proactively push critical content while pulling everything else on-demand. This combines the best of both worlds: guaranteed availability for important assets with storage efficiency for long-tail content. Amazon CloudFront supports this through Lambda@Edge functions that can pre-populate cache for predicted hot content. Hierarchical caching extends multi-tier further with parent-child relationships where smaller edge POPs pull from larger regional POPs, which pull from origin. This creates a cache hierarchy that naturally distributes load and improves hit rates through cache sharing.

Pull CDN Architecture Variants

graph TB
    subgraph Single-Tier Pull
        ST_Edge1["Edge 1"]
        ST_Edge2["Edge 2"]
        ST_Edge3["Edge 3"]
        ST_Origin["Origin<br/><i>High load</i>"]
        ST_Edge1 & ST_Edge2 & ST_Edge3 --> ST_Origin
    end
    
    subgraph Multi-Tier with Shields
        MT_Edge1["Edge 1"]
        MT_Edge2["Edge 2"]
        MT_Edge3["Edge 3"]
        MT_Shield["Regional Shield"]
        MT_Origin["Origin<br/><i>Low load</i>"]
        MT_Edge1 & MT_Edge2 & MT_Edge3 --> MT_Shield
        MT_Shield --> MT_Origin
    end
    
    subgraph Hybrid Pull-Push
        HP_Edge1["Edge 1"]
        HP_Edge2["Edge 2"]
        HP_Critical[("Critical Assets<br/><i>Pre-pushed</i>")]
        HP_OnDemand[("Long-tail Content<br/><i>Pulled on-demand</i>")]
        HP_Origin["Origin"]
        HP_Origin -."Push critical<br/>assets".-> HP_Critical
        HP_Critical --> HP_Edge1 & HP_Edge2
        HP_Edge1 & HP_Edge2 --"Pull when<br/>needed"--> HP_OnDemand
        HP_OnDemand --> HP_Origin
    end

Three common pull CDN architectures: single-tier where edges pull directly from origin (simple but high origin load), multi-tier with regional shields (reduces origin load by 90%+), and hybrid pull-push that combines proactive distribution of critical assets with on-demand pulling of long-tail content for optimal cost-performance balance.

Trade-offs

Bandwidth Vs Storage

Pull CDNs optimize for storage efficiency at the cost of bandwidth. You only store content that’s actually requested, but you pay bandwidth costs every time content is pulled from origin. Push CDNs invert this: higher storage costs (you pay to store everything everywhere) but lower bandwidth costs (one-time upload). For a site with 1TB of content where only 10GB is frequently accessed, pull saves 99% on storage but increases bandwidth costs for that 10GB. The crossover point depends on your content access patterns and CDN pricing. Cloudflare’s pricing makes pull economical for most use cases since bandwidth is cheaper than storage at scale.

First Request Latency

The cold-start problem is pull CDN’s Achilles heel. The first user to request content from each edge location experiences full origin latency plus CDN overhead—potentially 500ms+ for international requests. Subsequent users get sub-50ms edge responses. Push CDNs eliminate this by pre-populating all edges, guaranteeing consistent performance. The mitigation is cache warming: proactively requesting content through the CDN before real traffic arrives. Netflix warms caches overnight by having edge servers pull predicted popular content. This gives you push-like performance with pull-like efficiency.

Origin Load Patterns

Pull CDNs create spiky, unpredictable origin load. When cache expires or new content gets popular, you see sudden traffic bursts. Push CDNs create predictable, controlled origin load—you upload on your schedule. This matters for origin capacity planning. With pull, you need to over-provision origin servers for worst-case cache miss scenarios. With push, you can right-size for steady-state upload traffic. Origin shielding mitigates this by smoothing request patterns, but you still need headroom for cache invalidation storms.

Common Pitfalls

pitfall: Thundering Herd on Cache Expiration why_it_happens: When popular content expires simultaneously across all edges, every edge tries to revalidate at once, creating a request storm that can overwhelm your origin. This happens when you set the same TTL for all content or when you invalidate cache globally. how_to_avoid: Use TTL jitter—add random variance to TTLs so content expires at different times across edges. Set TTLs to 3600 ± 300 seconds instead of exactly 3600. Implement stale-while-revalidate: serve slightly stale content while asynchronously fetching fresh content in the background. CloudFront supports this through cache behaviors.

pitfall: Ignoring Cache Key Design why_it_happens: By default, CDNs use the full URL including query parameters as the cache key. This means example.com/image.jpg?v=1 and example.com/image.jpg?v=2 are separate cache entries, even if the image is identical. Poor cache key design leads to cache fragmentation and low hit rates. how_to_avoid: Explicitly configure cache keys to include only meaningful parameters. For images, ignore tracking parameters like utm_source. Use normalized URLs (lowercase, sorted parameters). Consider using cache key policies that hash content instead of URLs for truly immutable assets.

pitfall: Under-Provisioning Origin for Cache Misses why_it_happens: Teams size origin servers for steady-state traffic assuming 95%+ cache hit rates, then get overwhelmed during cache invalidation events or traffic spikes to new content. The origin can’t handle the sudden 20x load increase. how_to_avoid: Provision origin capacity for at least 20% of total traffic, not 5%. Implement rate limiting at the CDN level to protect origin during cache miss storms. Use origin shields to consolidate requests. Have auto-scaling policies that trigger on origin CPU, not just traffic volume.

Real-World Examples

company: Akamai system: Global Content Delivery Network usage_detail: Akamai pioneered pull CDN architecture in the late 1990s and still uses it for the majority of customer content. Their network of 300,000+ edge servers uses a three-tier pull model: edge servers pull from regional aggregation points, which pull from customer origins. They’ve optimized the cold-start problem through predictive cache warming—machine learning models analyze traffic patterns to pre-fetch content likely to be requested. For a major e-commerce client during Black Friday, Akamai’s system automatically warmed caches with predicted hot products 2 hours before the sale started, achieving 98% cache hit rates from the first second of traffic. Their origin shield layer reduced origin requests by 95%, handling 50 million requests per second with only 2.5 million hitting customer origins.

company: Amazon CloudFront system: AWS Content Delivery Network usage_detail: CloudFront uses pull CDN architecture with sophisticated origin shielding called Regional Edge Caches. When an edge location in Singapore experiences a cache miss, it doesn’t pull directly from your S3 bucket in us-east-1. Instead, it pulls from the Asia-Pacific regional cache in Singapore, which then pulls from S3 if needed. This two-tier system reduces cross-region bandwidth costs and origin load. For a media streaming customer, CloudFront’s pull model handled a viral video that went from 0 to 10 million views in 2 hours. The first request to each of 200 edge locations triggered origin pulls, but the regional cache layer meant only 12 requests actually hit the origin S3 bucket. Subsequent requests achieved 99.8% cache hit rates. CloudFront’s Lambda@Edge feature lets customers implement custom cache warming logic, pre-fetching predicted hot content during off-peak hours to eliminate cold-start latency.

Interview Expectations

Mid-Level

Mid-level candidates should explain the basic pull CDN workflow: cache miss triggers origin fetch, cache hit serves from edge, TTL controls expiration. You should be able to compare pull versus push CDNs and identify that pull is better for unpredictable traffic and frequently changing content. Mention the cold-start problem and that the first request to each edge is slower. Discuss TTL configuration as a trade-off between freshness and origin load. If asked about a specific system like an image hosting service, recommend pull CDN with 24-hour TTLs for images and explain why (images rarely change, don’t need freshness guarantees).

Senior

Senior candidates must go deeper on origin shielding and multi-tier caching architectures. Explain how shields prevent origin overload during cache miss storms. Discuss cache warming strategies and when you’d implement them (before predictable traffic spikes like product launches). Analyze the bandwidth versus storage cost trade-off with actual numbers: ‘For 1TB of content with 10% hot data, pull CDN costs $X in bandwidth versus $Y in storage for push.’ Address the thundering herd problem and mitigation strategies like TTL jitter and stale-while-revalidate. When designing a system, justify your pull CDN choice with specific reasoning: ‘We have 10 million products but 80% get zero traffic monthly, so pull CDN saves 80% on storage costs while maintaining 95% cache hit rates for the 20% that matter.‘

Staff+

Staff+ candidates should discuss pull CDN architecture in the context of global system design and cost optimization. Explain how you’d instrument and monitor cache hit rates, origin load, and edge performance to make data-driven TTL decisions. Discuss advanced topics like cache key normalization, content-based hashing for immutable assets, and using CDN logs to identify cache optimization opportunities. Address operational concerns: how do you safely invalidate cache across 200 edge locations without creating an origin overload? How do you implement gradual rollouts of new content versions through cache? Discuss hybrid strategies: ‘We use pull CDN for user-generated content (unpredictable access patterns) but push CDN for our core application assets (known demand, need guaranteed availability).’ Explain how you’d design a cache warming system using ML to predict hot content based on historical patterns, social media signals, and business events.

Common Interview Questions

When would you choose pull CDN over push CDN?

How do you handle the cold-start problem in pull CDNs?

What happens when content expires in the cache?

How do you prevent origin overload during cache miss storms?

How would you optimize cache hit rates for a pull CDN?

Red Flags to Avoid

Not understanding the cache miss workflow or thinking content is automatically distributed

Ignoring the first-request latency problem or not knowing about cache warming

Claiming pull CDNs are always better (or always worse) than push without considering use case

Not mentioning TTL configuration or thinking all content should have the same TTL

Overlooking origin capacity planning and assuming CDN eliminates all origin load

Key Takeaways

Pull CDNs use lazy loading: content only reaches edge locations when users request it, making them ideal for large catalogs with unpredictable access patterns where most content is rarely accessed.

The fundamental trade-off is bandwidth versus storage: pull CDNs minimize storage costs but increase bandwidth costs for origin fetches. Push CDNs invert this relationship.

TTL configuration is critical: short TTLs ensure freshness but increase origin load, long TTLs improve cache hit rates but risk stale content. Use different TTLs for different content types based on update frequency.

Origin shielding is essential for production systems: multi-tier caching with regional aggregation points prevents origin overload during cache miss storms and reduces cross-region bandwidth costs.

Cache warming eliminates the cold-start problem: proactively fetch predicted hot content before traffic arrives to achieve push-like performance with pull-like efficiency.

Prerequisites

CDN Overview - Understanding general CDN architecture and edge locations

HTTP Caching - Cache-Control headers and conditional requests used by pull CDNs

Next Steps

Push CDNs - Compare proactive content distribution approach

Cache Invalidation - Strategies for updating cached content

Load Balancing - How CDNs distribute requests across edge servers

Content Delivery Networks - Parent topic covering CDN fundamentals

DNS Resolution - How users get routed to nearest CDN edge