Back-of-the-Envelope Estimation for System Design

After this topic, you will be able to:

Calculate QPS, storage, and bandwidth requirements for given user scenarios
Apply power-of-two approximations to simplify complex calculations under time pressure
Demonstrate availability calculations using the nines methodology (99% to 99.999%)
Use latency numbers to justify architectural decisions in system design

TL;DR

Back-of-the-envelope estimation is the art of making quick, reasonable calculations to validate system designs under interview pressure. Using power-of-two approximations, standard latency numbers, and simple formulas, you can estimate QPS, storage, bandwidth, and availability requirements in minutes. Master this skill to confidently justify architectural decisions with numbers, not just hand-waving.

Cheat Sheet: 1 million requests/day ≈ 12 QPS average, 120 QPS peak (10x rule). Memory is ~100x faster than SSD, ~100,000x faster than network. Storage = daily_writes × retention_days × replication_factor. Bandwidth = data_per_request × QPS × 8 (bits).

The Analogy

Think of back-of-the-envelope estimation like a chef tasting a dish before serving. You’re not using laboratory equipment to measure exact sodium levels—you’re using experience and rough measurements to know if you need more salt. Similarly, in system design interviews, you don’t need precise calculations down to the byte. You need quick math that tells you whether your design will handle 10 servers or 10,000 servers, whether you need caching or can query the database directly, and whether your budget is $1,000/month or $100,000/month. The goal is directional correctness that informs design decisions, not scientific precision.

Why This Matters in Interviews

Interviewers use estimation questions to evaluate your practical engineering judgment and ability to make data-driven decisions under time pressure. When you say “we’ll cache this data,” they want to see you calculate cache size requirements. When you propose sharding, they expect you to estimate QPS per shard. Candidates who skip estimation often design over-engineered systems for small-scale problems or under-engineered systems that collapse under realistic load. Strong estimation skills separate engineers who’ve built production systems from those who’ve only read about them. At companies like Google, Netflix, and Uber, every architectural decision starts with a napkin calculation to validate feasibility before writing code.

Core Concept

Back-of-the-envelope estimation transforms vague requirements like “design Twitter” into concrete numbers: 500 million daily active users generating 200 million tweets per day means 2,300 writes per second average, 23,000 writes per second at peak, 50 TB of new tweet data per year, and 500 Gbps of bandwidth for the timeline service. These calculations aren’t guesses—they follow systematic formulas using standard assumptions about user behavior, system performance, and infrastructure costs. The technique relies on three foundations: power-of-two approximations for quick mental math, memorized latency numbers for common operations, and standard formulas for capacity metrics. Jeff Dean, Google Senior Fellow, describes it as “estimates you create using thought experiments and common performance numbers to get a good feel for which designs will meet your requirements.”

Power-of-Two Approximations Reference

graph TB
    subgraph Memory Units
        KB["2^10 ≈ 1 Thousand<br/>(1 KB = 1,024 bytes)"]
        MB["2^20 ≈ 1 Million<br/>(1 MB = 1,048,576 bytes)"]
        GB["2^30 ≈ 1 Billion<br/>(1 GB ≈ 1 billion bytes)"]
        TB["2^40 ≈ 1 Trillion<br/>(1 TB ≈ 1 trillion bytes)"]
        PB["2^50 ≈ 1 Quadrillion<br/>(1 PB ≈ 1,000 TB)"]
    end
    
    subgraph Quick Math Examples
        Ex1["500M users × 1KB = 500GB<br/><i>Not 488.28GB</i>"]
        Ex2["200M tweets × 300B = 60GB/day<br/><i>3% error acceptable</i>"]
        Ex3["1B requests/day ÷ 100K seconds<br/>= 10K QPS average"]
    end
    
    KB -."×1,024".-> MB
    MB -."×1,024".-> GB
    GB -."×1,024".-> TB
    TB -."×1,024".-> PB

Power-of-two approximations enable instant mental math by treating 1,024 as 1,000. The 3% error never changes architectural decisions in system design interviews, but saves critical thinking time.

Latency Numbers Every Engineer Should Know

graph LR
    L1["L1 Cache<br/>0.5 ns"]
    L2["L2 Cache<br/>7 ns"]
    RAM["Main Memory<br/>100 ns"]
    SSD["SSD Read<br/>150 μs<br/>(150,000 ns)"]
    Disk["Disk Seek<br/>10 ms<br/>(10,000,000 ns)"]
    DC["Network (same DC)<br/>0.5 ms<br/>(500,000 ns)"]
    Internet["Cross-continent<br/>150 ms<br/>(150,000,000 ns)"]
    
    L1 --"14x slower"--> L2
    L2 --"14x slower"--> RAM
    RAM --"1,500x slower"--> SSD
    SSD --"67x slower"--> Disk
    Disk --"0.05x faster"--> DC
    DC --"300x slower"--> Internet
    
    subgraph Design Implications
        Imp1["RAM is 1,000x faster than SSD<br/><i>→ Cache hot data in memory</i>"]
        Imp2["Disk seek is 100x slower than SSD<br/><i>→ Use SSDs for databases</i>"]
        Imp3["Network is 100,000x slower than RAM<br/><i>→ Minimize remote calls</i>"]
    end

Latency hierarchy shows that memory is ~1,000x faster than SSD and ~100,000x faster than network calls. These ratios drive every caching, database, and API design decision in distributed systems.

Complete URL Shortener Estimation Example

graph TB
    subgraph Input Requirements
        Req1["100M URLs created/month<br/>Read:Write = 100:1<br/>500 bytes per URL"]
    end
    
    subgraph QPS Calculations
        Write["Write QPS<br/>100M/month ÷ 30 ÷ 86,400<br/>= 38 avg, 380 peak"]
        Read["Read QPS<br/>38 × 100 ratio<br/>= 3,800 avg, 38,000 peak"]
    end
    
    subgraph Storage Calculations
        Monthly["Monthly Storage<br/>100M × 500B = 50 GB/month"]
        Total["10-Year Total<br/>50GB × 12 × 10 × 3 replication<br/>= 18 TB"]
    end
    
    subgraph Bandwidth Calculations
        Egress["Egress Bandwidth<br/>3,800 QPS × 500B × 8 bits<br/>= 15.2 Mbps avg, 152 Mbps peak"]
    end
    
    subgraph Architecture Decisions
        DB["Database Choice<br/>38K read QPS > 10K single DB limit<br/>→ Need caching or replicas"]
        Cache["Cache Sizing<br/>80/20 rule: 18TB × 0.2<br/>= 3.6 TB cache for 80% hits"]
    end
    
    Req1 --> Write
    Req1 --> Read
    Write --> Monthly
    Monthly --> Total
    Read --> Egress
    Read --> DB
    Total --> Cache

URL shortener estimation demonstrates how calculations cascade from requirements to architecture decisions. The 38K peak read QPS exceeds single database capacity, forcing the introduction of caching—a decision justified by numbers, not intuition.

Latency Budget Validation Example

sequenceDiagram
    participant Client
    participant API as API Gateway
    participant Cache as Redis Cache
    participant DB as Database
    participant Service as Backend Service
    
    Note over Client,Service: Target SLA: 100ms total latency
    
    Client->>API: 1. HTTP Request
    Note right of API: Network: 10ms
    
    API->>Cache: 2. Check cache
    Note right of Cache: Cache lookup: 1ms
    Cache-->>API: Cache miss
    
    API->>DB: 3. Query user data
    Note right of DB: DB query: 10ms
    DB-->>API: User data
    
    API->>DB: 4. Query posts (sequential)
    Note right of DB: DB query: 10ms
    DB-->>API: Posts data
    
    API->>DB: 5. Query followers (sequential)
    Note right of DB: DB query: 10ms
    DB-->>API: Followers data
    
    API->>Service: 6. Process & rank
    Note right of Service: Business logic: 20ms
    Service-->>API: Ranked results
    
    API->>Client: 7. HTTP Response
    Note right of API: Network: 10ms
    
    Note over Client,Service: Total: 10+1+10+10+10+20+10 = 71ms<br/>✓ Within 100ms budget<br/><br/>If we add 2 more sequential DB calls:<br/>71ms + 20ms = 91ms (still OK)<br/>71ms + 30ms = 101ms (FAILED)<br/><br/>Solution: Parallelize DB queries or cache

Latency budget validation reveals whether a design is feasible before implementation. Sequential database calls quickly consume the budget—three 10ms queries plus network and processing leaves only 29ms buffer in a 100ms SLA.

How It Works

The estimation process follows a predictable workflow. First, clarify scale assumptions with your interviewer: how many users, how active are they, what’s the read-to-write ratio? Second, calculate requests per second (QPS) by dividing daily operations by 86,400 seconds, then applying a 10x multiplier for peak load. Third, estimate storage by multiplying data per operation by operations per day by retention period by replication factor. Fourth, calculate bandwidth by multiplying average response size by QPS by 8 to convert bytes to bits. Finally, validate these numbers against known latency constraints and availability requirements. For example, if your design requires 100ms database queries but you’re estimating 50,000 QPS, you immediately know you need caching or read replicas because a single database can’t handle 5,000 concurrent queries (50,000 QPS × 0.1s query time).

Back-of-the-Envelope Estimation Workflow

graph TB
    Start(["System Design Problem<br/><i>e.g., Design Twitter</i>"])
    Clarify["1. Clarify Scale Assumptions<br/>• DAU: 500M users<br/>• Actions: 10 timeline loads/day<br/>• Read:Write ratio: 100:1"]
    QPS["2. Calculate QPS<br/>• Daily requests: 500M × 10 = 5B<br/>• Average: 5B ÷ 86,400 = 57,870 QPS<br/>• Peak (10x): 578,700 QPS"]
    Storage["3. Estimate Storage<br/>• Data/request: 15 KB timeline<br/>• Daily writes: 200M tweets × 300B<br/>• Retention: 5 years × 3 replication<br/>• Total: 328 TB"]
    Bandwidth["4. Calculate Bandwidth<br/>• Response size: 15 KB<br/>• Peak QPS: 578,700<br/>• Bandwidth: 578,700 × 15KB × 8<br/>= 69.4 Gbps"]
    Validate["5. Validate Against Constraints<br/>• Latency budget: 200ms total<br/>• DB query: 50ms (within budget)<br/>• Cache hit rate: 80% required<br/>• Availability: 99.9% (3 nines)"]
    Decision{"Design Feasible?"}
    Adjust["Adjust Architecture<br/>• Add caching layer<br/>• Implement sharding<br/>• Use read replicas"]
    Final(["Validated Design with Numbers"])
    
    Start --> Clarify
    Clarify --> QPS
    QPS --> Storage
    Storage --> Bandwidth
    Bandwidth --> Validate
    Validate --> Decision
    Decision --"No"--> Adjust
    Adjust --> Validate
    Decision --"Yes"--> Final

The estimation workflow transforms vague requirements into concrete numbers through five systematic steps. Each calculation informs the next, and validation against constraints often reveals the need for caching, sharding, or replication.

Key Principles

principle: Power-of-Two Approximations explanation: Use 2^10 = 1,000 (actually 1,024) as your mental shortcut. This means 2^20 ≈ 1 million, 2^30 ≈ 1 billion, 2^40 ≈ 1 trillion. When someone says “1 GB,” think “1 billion bytes” for quick math. When calculating storage for 500 million users with 1 KB profiles, think 500 million × 1,000 bytes = 500 billion bytes = 500 GB, not 488.28 GB. The 3% error doesn’t matter in system design—the difference between 500 GB and 488 GB won’t change your architecture. example: Twitter stores 200 million tweets/day at 300 bytes each. Quick math: 200M × 300 = 60,000M bytes = 60 GB/day raw data. With 3x replication and 5 years retention, that’s 60 × 3 × 365 × 5 = 328 TB total storage needed.

principle: Standard Latency Numbers explanation: Memorize the latency hierarchy: L1 cache (0.5 ns), L2 cache (7 ns), RAM (100 ns), SSD read (150 μs), disk seek (10 ms), network within datacenter (0.5 ms), cross-continent network (150 ms). These numbers inform architectural decisions. If your design requires 10 disk seeks per request and you need 10ms response time, you’ve already failed—10 seeks × 10ms = 100ms just for disk I/O. example: Netflix’s video recommendation service targets 50ms response time. They can’t query a disk-based database (10ms seek + 1ms transfer = 11ms per query) for every recommendation. This latency constraint forces them to use in-memory caching (100ns RAM access) for hot recommendations, explaining their investment in EVCache.

principle: Peak-to-Average Ratio explanation: Average QPS is misleading because traffic isn’t uniform. Use a 2x daily peak multiplier (lunch hour, evening) and a 10x instantaneous peak multiplier (viral events, breaking news) for capacity planning. If your average is 1,000 QPS, design for 10,000 QPS to handle spikes without degradation. This principle prevents the classic mistake of provisioning for average load and watching your system collapse during peak hours. example: Uber’s ride requests average 10,000 QPS globally but spike to 100,000 QPS during New Year’s Eve. Their architecture must handle 10x peak, which is why they use auto-scaling with pre-warmed capacity and circuit breakers to shed load gracefully when approaching limits.

Deep Dive

Types / Variants

Qps Estimation

Start with daily active users (DAU) and actions per user. Formula: QPS = (DAU × actions_per_user) / 86,400 seconds. For read-heavy systems like Twitter, multiply by read-to-write ratio (typically 100:1 for social media). Example: 500M DAU, each user reads 50 tweets/day = 25 billion reads/day = 289,000 read QPS average, 2.89M read QPS at peak. For write QPS, if 10% of users tweet once daily: 50M writes/day = 580 write QPS average, 5,800 write QPS peak.

Storage Estimation

Formula: Storage = records × size_per_record × retention_days × replication_factor. Always account for metadata overhead (add 20-30% to raw data size) and growth projections. Example: Instagram stores 100M photos/day at 2 MB each. Raw: 100M × 2 MB = 200 TB/day. With metadata (30% overhead), 3x replication, and 10-year retention: 200 TB × 1.3 × 3 × 365 × 10 = 8.5 PB total. This calculation determines whether you need object storage (S3) vs. building your own distributed file system.

Bandwidth Estimation

Formula: Bandwidth_bps = QPS × avg_response_size_bytes × 8. Remember to calculate both ingress (uploads) and egress (downloads) separately. Example: YouTube serves 1 billion video views/day at 5 MB average per view. QPS: 1B / 86,400 = 11,574 views/second. Bandwidth: 11,574 × 5 MB × 8 bits = 463 Gbps egress. At $0.08/GB for CDN bandwidth, that’s 11,574 × 5 MB × 86,400 seconds = 5 PB/day = $400,000/day in bandwidth costs alone, explaining why YouTube invests heavily in video compression and CDN optimization.

Trade-offs

dimension: Precision vs. Speed option_a: Precise calculations using exact numbers (1024 instead of 1000, actual SLA percentages) option_b: Rough approximations using round numbers (power-of-two shortcuts, 10x rules) decision_framework: In interviews, always choose speed. The difference between 1000 and 1024 won’t change whether you need 10 servers or 100 servers. Interviewers care about order-of-magnitude correctness and your ability to make quick decisions. Save precision for production capacity planning spreadsheets.

dimension: Average vs. Peak Provisioning option_a: Provision for average load with auto-scaling (cost-efficient but risky during spikes) option_b: Provision for peak load with static capacity (expensive but predictable) decision_framework: Use average + auto-scaling for predictable traffic patterns (e-commerce with known daily peaks). Use peak provisioning for unpredictable spikes (social media during breaking news) or when cold-start time exceeds spike duration. Calculate the cost difference: if peak is 10x average and occurs 1% of the time, over-provisioning costs 10x more than auto-scaling with occasional degradation.

Common Pitfalls

pitfall: Forgetting Replication and Overhead why_it_happens: Candidates calculate raw data size but forget that production systems replicate data (typically 3x) and add metadata overhead (20-30%). This causes 3-4x underestimation of storage costs. how_to_avoid: Always multiply storage by 3 for replication and add 30% for metadata as your default. State this assumption explicitly: “With 3x replication and 30% overhead, our 100 GB raw data becomes 390 GB provisioned storage.”

pitfall: Using Average QPS for Capacity Planning why_it_happens: Candidates divide daily requests by 86,400 and design for that number, ignoring traffic spikes. Systems designed for 1,000 average QPS collapse at 5,000 QPS during peak hours. how_to_avoid: Always apply the 10x peak multiplier and state it: “Our average is 1,000 QPS, so we’ll design for 10,000 QPS peak capacity.” This single habit prevents most interview capacity planning mistakes.

pitfall: Ignoring Latency Budget Constraints why_it_happens: Candidates propose architectures without validating whether latency numbers add up. They suggest 5 sequential database queries for a 100ms SLA, not realizing 5 × 10ms = 50ms just for database time, leaving only 50ms for network, application logic, and rendering. how_to_avoid: Create a latency budget breakdown: “100ms total budget = 10ms database + 20ms application logic + 20ms network + 50ms buffer for variance.” If your design exceeds the budget, you must parallelize queries or add caching.

Math & Calculations

Availability Calculation

Formula

Availability % = (total_time - downtime) / total_time × 100. Nines notation: 99% = 2 nines, 99.9% = 3 nines, 99.99% = 4 nines, 99.999% = 5 nines.

Variables

Total time = 365 days × 24 hours × 60 minutes = 525,600 minutes/year. Downtime allowed = total_time × (1 - availability_percentage).

Worked Example

Twitter promises 99.9% availability (3 nines). Allowed downtime: 525,600 minutes × 0.001 = 525.6 minutes/year = 8.76 hours/year = 43.8 minutes/month. If a single database has 99.9% availability and you need 99.99% (4 nines), you need redundancy: 1 - (1 - 0.999)^2 = 99.9999% with two independent databases. This math justifies multi-region deployments for high-availability requirements.

Complete System Estimation

Scenario

Design a URL shortener like bit.ly serving 100 million URLs created per month, with 100:1 read-to-write ratio.

Step By Step

Write QPS: 100M URLs/month = 3.3M/day = 38 writes/second average, 380 writes/second peak (10x). 2) Read QPS: 38 × 100 = 3,800 reads/second average, 38,000 reads/second peak. 3) Storage per URL: 500 bytes (original URL + short code + metadata). Monthly storage: 100M × 500 bytes = 50 GB/month raw. With 3x replication and 10-year retention: 50 GB × 3 × 12 × 10 = 18 TB total. 4) Bandwidth: Average read response = 500 bytes. Egress: 3,800 QPS × 500 bytes × 8 bits = 15.2 Mbps average, 152 Mbps peak. 5) Database choice: 38,000 read QPS peak exceeds single database capacity (~10,000 QPS), so we need caching (Redis) or read replicas. 6) Cache sizing: If 20% of URLs account for 80% of traffic (Pareto principle), cache size = 18 TB × 0.2 = 3.6 TB to serve 80% of reads from memory.

Real-World Examples

company: Netflix system: Video Streaming Infrastructure usage_detail: Netflix streams 250 million hours of video daily to 230 million subscribers. Estimation: 250M hours/day = 10.4M hours/hour = 2,889 concurrent streams/second average. At 5 Mbps per stream, bandwidth = 2,889 × 5 Mbps = 14.4 Gbps average, 144 Gbps peak. Storage: Netflix stores each title in 120 different formats (resolutions, bitrates, device profiles). A 2-hour movie at 5 GB average per format = 600 GB per title. With 5,000 titles, that’s 3 PB of video data. These calculations drove Netflix’s decision to build Open Connect CDN with 17,000+ servers in ISP data centers worldwide, reducing backbone bandwidth costs from $1B+/year to near-zero.

company: Twitter system: Timeline Generation Service usage_detail: Twitter’s 500M DAU each load their timeline 10 times/day. Read QPS: 500M × 10 / 86,400 = 57,870 reads/second average, 578,700 reads/second peak. Each timeline shows 50 tweets at 300 bytes each = 15 KB per request. Bandwidth: 578,700 QPS × 15 KB × 8 bits = 69.4 Gbps peak. Latency budget: 200ms total = 50ms fan-out query + 50ms ranking + 50ms rendering + 50ms network. The fan-out query must fetch tweets from 1,000 average followees in 50ms, which is impossible with database queries (1,000 queries × 10ms = 10 seconds). This calculation forced Twitter’s architecture toward pre-computed timelines stored in Redis, where 1,000 timeline entries can be fetched in 5ms (5 μs per key × 1,000 keys).

company: Uber system: Real-time Location Tracking usage_detail: Uber tracks 5 million active drivers updating location every 4 seconds. Write QPS: 5M drivers / 4 seconds = 1.25M location updates/second. Each update: 50 bytes (driver_id, lat, lon, timestamp, status). Storage: 1.25M updates/second × 50 bytes × 86,400 seconds/day = 5.4 TB/day raw data. With 3x replication and 30-day retention: 5.4 TB × 3 × 30 = 486 TB. Database choice: 1.25M writes/second exceeds any single database (PostgreSQL maxes at ~50K writes/second), requiring sharding across 25+ database nodes. This calculation justified Uber’s investment in Schemaless, their custom sharded MySQL solution, and later migration to Cassandra for better write scalability.

Interview Expectations

Mid-Level

Expected to perform basic QPS and storage calculations with guidance. Should know power-of-two approximations and calculate average QPS from daily users. Common question: “This system has 10 million users who each upload 5 photos per day. How much storage do we need per year?” Red flag: Not knowing that 1 GB = 1 billion bytes or forgetting to account for replication.

Senior

Expected to independently estimate QPS, storage, bandwidth, and validate designs against latency budgets. Should apply peak multipliers without prompting and identify when calculations invalidate proposed architectures. Common question: “Your design proposes 3 sequential database calls. Calculate the latency budget and tell me if this meets our 100ms SLA.” Red flag: Proposing caching without calculating cache size or hit rate requirements.

Staff+

Expected to perform multi-dimensional capacity planning including cost analysis, availability calculations, and growth projections. Should identify non-obvious constraints like network bandwidth limits or connection pool exhaustion. Common question: “Estimate the total infrastructure cost for this system at 100M users, then at 1B users. What changes in the architecture?” Red flag: Not considering cost implications or failing to identify when linear scaling becomes economically infeasible.

Common Interview Questions

How many servers do we need to handle this load?

What’s the total storage requirement for 5 years of data?

Calculate the bandwidth cost for this video streaming service.

If we cache 20% of data, what’s the cache size and hit rate?

How many database shards do we need for this write throughput?

Red Flags to Avoid

Refusing to make assumptions or asking for exact numbers instead of estimating

Calculating storage without accounting for replication or metadata overhead

Designing for average load without considering peak traffic multipliers

Proposing architectures without validating against latency or throughput constraints

Using precise numbers (1,024 instead of 1,000) that slow down mental math without adding value

Key Takeaways

Master power-of-two approximations (2^10 ≈ 1K, 2^20 ≈ 1M, 2^30 ≈ 1B) for instant mental math. The 3% error between 1000 and 1024 never changes architectural decisions in interviews.

Always apply the 10x peak multiplier to average QPS. Systems designed for average load fail during peak hours, which is exactly when users notice and complain.

Memorize the latency hierarchy: RAM (100ns) is 1,000x faster than SSD (100μs), which is 100x faster than network (10ms). These ratios inform every caching and database decision.

Storage estimation formula: raw_data × replication_factor (3x) × overhead (1.3x) × retention_period. Missing any multiplier causes 3-10x underestimation of infrastructure costs.

Validate every design against latency budgets. If your architecture requires 5 sequential 10ms database queries for a 50ms SLA, you’ve already failed before writing any code.

Prerequisites

What is System Design? - Understand why capacity planning drives architectural decisions

How to Approach System Design? - Learn when to perform estimations in the interview flow

Next Steps

Scalability - Apply estimation skills to design systems that scale horizontally

Caching Strategies - Use cache size calculations to justify caching layers

Database Sharding - Estimate shard count based on throughput requirements

Load Balancing - Calculate requests per server to determine load balancer configuration

CDN - Estimate bandwidth savings from edge caching