HTTP in System Design: Methods, Status & Headers

After this topic, you will be able to:

Explain the evolution from HTTP/1.1 to HTTP/2 to HTTP/3 and the problems each version solves
Evaluate when to use different HTTP methods and status codes in API design
Assess the performance implications of HTTP/2 multiplexing and HTTP/3’s QUIC protocol for high-scale systems

TL;DR

HTTP (Hypertext Transfer Protocol) is the foundational application-layer protocol for data exchange on the web, operating as a stateless request-response protocol between clients and servers. It has evolved from HTTP/1.1’s sequential processing through HTTP/2’s multiplexing to HTTP/3’s QUIC-based transport, each solving critical performance bottlenecks. Understanding HTTP versions, methods, status codes, and headers is essential for designing scalable APIs and web services.

Cheat Sheet: HTTP/1.1 uses persistent connections but processes requests sequentially (head-of-line blocking). HTTP/2 introduces binary framing and multiplexing over a single TCP connection, enabling parallel request handling. HTTP/3 replaces TCP with QUIC over UDP, eliminating head-of-line blocking at the transport layer and improving connection establishment from 2-3 RTTs to 0-1 RTT.

Background

HTTP was born in 1991 at CERN when Tim Berners-Lee needed a simple protocol to retrieve hypertext documents. The original HTTP/0.9 supported only GET requests and had no headers or status codes—just raw HTML responses. HTTP/1.0 (1996) added methods, headers, and status codes, but created a new TCP connection for every request, causing massive overhead.

HTTP/1.1 (1997) introduced persistent connections (keep-alive) and pipelining, allowing multiple requests over a single TCP connection. This became the workhorse protocol for two decades, powering the explosive growth of the web. However, as websites evolved from simple documents to complex applications loading hundreds of resources, HTTP/1.1’s sequential processing became a bottleneck. Browsers opened 6-8 parallel TCP connections per domain to work around this limitation, but each connection required a TCP handshake and TLS negotiation—expensive operations that added 100-200ms of latency.

Google’s SPDY experiment (2012) proved that multiplexing multiple requests over a single connection could dramatically improve performance. This research led to HTTP/2 (2015), which introduced binary framing, header compression, and true multiplexing. Yet HTTP/2 still suffered from TCP’s head-of-line blocking: a single lost packet blocked all streams on that connection. HTTP/3 (2020) solved this by replacing TCP with QUIC, a UDP-based protocol that provides independent streams, built-in encryption, and faster connection establishment. Today, over 25% of websites support HTTP/3, including Google, Facebook, and Cloudflare.

Architecture

HTTP operates in a client-server model where clients (browsers, mobile apps, services) send requests and servers return responses. The protocol is stateless—each request is independent, carrying all necessary context through headers. This statelessness enables horizontal scaling and caching but requires mechanisms like cookies or tokens for maintaining session state.

An HTTP transaction consists of four key components: the request line (method, URI, version), headers (metadata about the request/response), an optional body (payload data), and the status line (version, status code, reason phrase). The request line specifies the action (GET, POST, PUT, DELETE, PATCH) and target resource. Headers provide context: Content-Type describes the payload format, Authorization carries credentials, Cache-Control directs caching behavior, and Accept specifies desired response formats.

HTTP/1.1 uses text-based framing where each request and response is human-readable ASCII. A typical request looks like: GET /api/users/123 HTTP/1.1\r\nHost: api.example.com\r\nAuthorization: Bearer token\r\n\r\n. The server parses this text, processes the request, and returns a response with a status code (200 OK, 404 Not Found, 500 Internal Server Error) and optional body.

HTTP/2 fundamentally changes this architecture by introducing a binary framing layer. Instead of text, it uses binary frames (HEADERS, DATA, SETTINGS, PING) that are multiplexed over a single TCP connection. Each request becomes a stream with a unique ID, and frames from different streams are interleaved. The server can prioritize streams and even push resources proactively (server push). Header compression using HPACK reduces overhead by maintaining a dynamic table of previously seen headers.

HTTP/3 replaces the entire transport stack. Instead of TCP, it uses QUIC—a multiplexed, encrypted protocol built on UDP. QUIC provides connection migration (surviving IP address changes), 0-RTT connection establishment for repeat connections, and independent stream delivery. Each QUIC stream has its own flow control and reliability, eliminating head-of-line blocking at the transport layer.

HTTP Request-Response Lifecycle with Headers and Status Codes

sequenceDiagram
    participant Client as Client<br/>(Browser/App)
    participant Server as Web Server
    participant App as Application
    participant DB as Database

    Client->>+Server: 1. GET /api/users/123 HTTP/1.1<br/>Host: api.example.com<br/>Authorization: Bearer token<br/>Accept: application/json
    Note over Server: Parse request line<br/>and headers
    Server->>+App: 2. Route to handler<br/>Extract user ID: 123
    App->>+DB: 3. SELECT * FROM users<br/>WHERE id = 123
    DB-->>-App: 4. User data
    App-->>-Server: 5. Response object<br/>{"id": 123, "name": "Alice"}
    Server-->>-Client: 6. HTTP/1.1 200 OK<br/>Content-Type: application/json<br/>Cache-Control: max-age=3600<br/><br/>{"id": 123, "name": "Alice"}
    Note over Client,Server: Stateless: each request<br/>carries full context

HTTP operates as a stateless request-response protocol where clients send requests with methods, URIs, and headers, and servers return responses with status codes and optional bodies. Each request is independent, carrying all necessary context through headers like Authorization and Accept.

Internals

HTTP/1.1 operates over TCP connections established through a three-way handshake. With HTTPS, a TLS handshake follows, adding another 1-2 round trips. Once connected, the client sends a text-based request. The server parses the request line and headers using a state machine that looks for \r\n delimiters. If the request includes a body, the server reads Content-Length bytes or processes chunked transfer encoding. After processing, the server writes the response back through the same TCP socket.

Pipelining in HTTP/1.1 allows clients to send multiple requests without waiting for responses, but servers must respond in order. If the first request takes 500ms and the second takes 10ms, the client still waits 500ms for the second response. This head-of-line blocking, combined with the inability to prioritize requests, made pipelining rarely used in practice. Browsers instead opened multiple TCP connections (typically 6 per domain), each requiring separate handshakes and TLS negotiations.

HTTP/2’s binary framing layer divides messages into frames: HEADERS frames carry request/response headers, DATA frames carry payloads, and control frames (SETTINGS, WINDOW_UPDATE, PING) manage the connection. Each frame has a 9-byte header specifying length, type, flags, and stream ID. The multiplexer interleaves frames from different streams, allowing the server to send DATA frames for stream 5 while still sending HEADERS for stream 3. HPACK compression uses Huffman encoding and a dynamic table to compress headers. For example, the common header content-type: application/json might be represented as a single byte referencing table entry 31.

Stream prioritization in HTTP/2 uses a dependency tree where streams can depend on others with weights. A browser might prioritize CSS (weight 256) over images (weight 16), ensuring critical rendering resources arrive first. Server push allows the server to send resources before the client requests them—when serving /index.html, the server can push /style.css and /app.js immediately. However, push proved problematic in practice because servers couldn’t know what the client already had cached, often wasting bandwidth.

HTTP/3’s QUIC protocol implements reliability, flow control, and congestion control at the application layer. Each QUIC packet contains one or more frames, and packets are encrypted by default using TLS 1.3 integrated into the handshake. Connection establishment combines the transport handshake with TLS negotiation, reducing round trips from 3 (TCP + TLS) to 1 (QUIC + TLS). For repeat connections, QUIC supports 0-RTT by using cached parameters, allowing the client to send application data in the first packet. Stream multiplexing in QUIC is independent—if stream 5 loses a packet, only stream 5 is blocked while streams 3, 7, and 11 continue delivering data. This eliminates TCP’s head-of-line blocking where a single lost packet stalls all streams.

HTTP/2 Binary Framing and Stream Multiplexing

graph LR
    subgraph Client Side
        Req1["Request 1<br/>GET /api/users"]
        Req2["Request 2<br/>POST /api/orders"]
        Req3["Request 3<br/>GET /images/logo.png"]
    end
    
    subgraph Binary Framing Layer
        Frame1["HEADERS Frame<br/>Stream ID: 1<br/>Length: 45 bytes"]
        Frame2["HEADERS Frame<br/>Stream ID: 3<br/>Length: 52 bytes"]
        Frame3["DATA Frame<br/>Stream ID: 3<br/>Length: 128 bytes"]
        Frame4["HEADERS Frame<br/>Stream ID: 5<br/>Length: 38 bytes"]
        Frame5["DATA Frame<br/>Stream ID: 1<br/>Length: 1024 bytes"]
    end
    
    subgraph Single TCP Connection
        TCP["TCP Stream<br/><i>Interleaved frames</i><br/>Frame order: 1→3→3→5→1"]
    end
    
    subgraph Server Side
        Handler1["Stream 1 Handler<br/><i>Process GET</i>"]
        Handler2["Stream 3 Handler<br/><i>Process POST</i>"]
        Handler3["Stream 5 Handler<br/><i>Process GET</i>"]
    end
    
    Req1 --> Frame1
    Req2 --> Frame2
    Req2 --> Frame3
    Req3 --> Frame4
    Req1 --> Frame5
    
    Frame1 & Frame2 & Frame3 & Frame4 & Frame5 --> TCP
    
    TCP --> Handler1
    TCP --> Handler2
    TCP --> Handler3
    
    Note["Each frame: 9-byte header<br/>(length, type, flags, stream ID)<br/>+ payload data"]

HTTP/2 divides messages into binary frames (HEADERS, DATA, control frames) that are multiplexed over a single TCP connection. Each frame carries a stream ID, allowing the server to interleave responses from different streams and process requests in parallel rather than sequentially.

HTTP Version Comparison

HTTP/1.1 processes requests sequentially over persistent TCP connections. While keep-alive eliminates the overhead of creating new connections for each request, pipelining is rarely used due to head-of-line blocking. Browsers work around this by opening 6-8 parallel connections per domain, but each connection requires a full TCP handshake (1 RTT) and TLS negotiation (2 RTTs), adding 150-300ms of latency on typical networks. Text-based headers are verbose—a typical request might send 800-1500 bytes of headers, much of it repetitive (User-Agent, Accept, Cookie). For a page loading 100 resources, that’s 80-150KB of header overhead.

HTTP/2 multiplexes all requests over a single TCP connection, eliminating the need for multiple connections. Binary framing reduces parsing overhead, and HPACK compression typically reduces header size by 85-90%. A 1200-byte header set might compress to 150 bytes. Stream prioritization allows browsers to request critical resources first, and server push can proactively send resources. However, HTTP/2 still suffers from TCP head-of-line blocking: if a single packet is lost (1% packet loss is common on mobile networks), all streams stall while TCP retransmits. On a 100ms RTT network with 1% packet loss, this adds 100ms of delay to every affected request.

HTTP/3 with QUIC eliminates transport-layer head-of-line blocking by giving each stream independent delivery. A lost packet only affects its own stream. Connection establishment is faster: 1-RTT for new connections, 0-RTT for repeat connections (compared to 3 RTTs for HTTP/2 with TLS 1.3). QUIC’s connection migration allows connections to survive network changes—when a mobile user switches from WiFi to cellular, the QUIC connection continues seamlessly using a connection ID rather than the IP address tuple. However, HTTP/3 adoption faces challenges: UDP is sometimes blocked by corporate firewalls, and QUIC’s user-space implementation can be CPU-intensive compared to kernel-optimized TCP.

Migration considerations: HTTP/2 is widely supported (97% of browsers) and provides significant benefits over HTTP/1.1 with minimal risk. Enabling HTTP/2 typically requires only server configuration changes. HTTP/3 requires more infrastructure changes: load balancers must support QUIC, and applications must handle the UDP-based protocol. The standard approach is to advertise HTTP/3 support via the Alt-Svc header, allowing clients to upgrade opportunistically while falling back to HTTP/2 if needed. Google reports 75% of Chrome traffic to Google services uses HTTP/3, with 5-10% latency improvements on mobile networks.

HTTP/1.1 vs HTTP/2 Connection Multiplexing

graph TB
    subgraph HTTP/1.1: Multiple Connections
        Client1["Client"] 
        TCP1["TCP Conn 1<br/><i>150ms setup</i>"]
        TCP2["TCP Conn 2<br/><i>150ms setup</i>"]
        TCP3["TCP Conn 3<br/><i>150ms setup</i>"]
        Server1["Server"]
        
        Client1 --"1. GET /style.css"--> TCP1
        TCP1 --"Sequential"--> Server1
        Client1 --"2. GET /app.js"--> TCP2
        TCP2 --"Sequential"--> Server1
        Client1 --"3. GET /logo.png"--> TCP3
        TCP3 --"Sequential"--> Server1
        
        Note1["❌ 6-8 parallel connections<br/>❌ Each requires handshake<br/>❌ Head-of-line blocking<br/>per connection"]
    end
    
    subgraph HTTP/2: Single Connection Multiplexing
        Client2["Client"]
        TCP4["Single TCP Connection<br/><i>Binary frames</i>"]
        Server2["Server"]
        
        Client2 --"Stream 1: GET /style.css<br/>Stream 3: GET /app.js<br/>Stream 5: GET /logo.png"--> TCP4
        TCP4 --"Interleaved frames<br/>HEADERS + DATA"--> Server2
        Server2 --"Parallel responses<br/>Frame: S1-DATA, S3-DATA, S5-DATA"--> TCP4
        TCP4 --> Client2
        
        Note2["✅ Single connection<br/>✅ Parallel request handling<br/>✅ Header compression (HPACK)<br/>⚠️ TCP head-of-line blocking"]
    end

HTTP/1.1 opens 6-8 parallel TCP connections to work around sequential processing, with each connection requiring expensive handshakes. HTTP/2 multiplexes all requests over a single connection using binary frames, eliminating connection overhead and enabling parallel request handling with header compression.

HTTP/3 QUIC Stack vs HTTP/2 TCP+TLS Stack

graph TB
    subgraph HTTP/2 over TCP+TLS
        App1["Application Layer<br/><i>HTTP/2 Binary Frames</i>"]
        TLS1["TLS 1.3 Layer<br/><i>Encryption</i>"]
        TCP1["TCP Layer<br/><i>Reliability, Flow Control</i>"]
        IP1["IP Layer<br/><i>Packet Routing</i>"]
        
        App1 --> TLS1
        TLS1 --> TCP1
        TCP1 --> IP1
        
        Issue1["❌ 2-3 RTT connection setup<br/>❌ TCP head-of-line blocking<br/>❌ Lost packet blocks all streams<br/>❌ No connection migration"]
    end
    
    subgraph HTTP/3 over QUIC
        App2["Application Layer<br/><i>HTTP/3 Binary Frames</i>"]
        QUIC["QUIC Layer<br/><i>Integrated TLS 1.3</i><br/><i>Reliability per stream</i><br/><i>Flow control per stream</i><br/><i>Connection migration</i>"]
        UDP["UDP Layer<br/><i>Unreliable transport</i>"]
        IP2["IP Layer<br/><i>Packet Routing</i>"]
        
        App2 --> QUIC
        QUIC --> UDP
        UDP --> IP2
        
        Benefit["✅ 0-1 RTT connection setup<br/>✅ Independent stream delivery<br/>✅ Lost packet affects only 1 stream<br/>✅ Survives IP address changes"]
    end
    
    Handshake1["Connection Setup:<br/>TCP: 1 RTT<br/>TLS: 1-2 RTT<br/>Total: 2-3 RTT"] -.-> TCP1
    Handshake2["Connection Setup:<br/>QUIC+TLS: 1 RTT (new)<br/>QUIC+TLS: 0 RTT (repeat)<br/>Total: 0-1 RTT"] -.-> QUIC

HTTP/2 uses separate TCP and TLS layers, requiring 2-3 RTTs for connection setup and suffering from TCP head-of-line blocking. HTTP/3’s QUIC protocol integrates encryption and provides independent stream delivery over UDP, reducing connection setup to 0-1 RTT and eliminating transport-layer head-of-line blocking.

Performance Characteristics

HTTP/1.1 performance is dominated by connection overhead and head-of-line blocking. On a 50ms RTT network, establishing a new HTTPS connection requires 150ms (TCP handshake + TLS negotiation). Loading a page with 100 resources over 6 parallel connections means 17 resources per connection, with sequential processing adding significant latency. Typical page load times range from 2-5 seconds on desktop, 5-10 seconds on mobile. Header overhead averages 800-1500 bytes per request, consuming 80-150KB for 100 resources.

HTTP/2 reduces connection establishment overhead by reusing a single connection. Multiplexing allows all 100 resources to be requested simultaneously, with the server interleaving responses. HPACK compression reduces header overhead to 10-15% of HTTP/1.1 sizes. Real-world measurements show HTTP/2 reduces page load times by 20-40% compared to HTTP/1.1. Google reported that switching to HTTP/2 reduced latency by 7% on desktop and 23% on mobile. However, on networks with 2%+ packet loss, HTTP/2 can perform worse than HTTP/1.1 due to head-of-line blocking affecting all streams.

HTTP/3 excels on lossy networks. With 1% packet loss and 100ms RTT, HTTP/2 experiences 100ms delays affecting all streams, while HTTP/3 only delays the affected stream. Facebook reported that HTTP/3 reduced request errors by 3.6% and tail latency by 12.4%. Connection establishment is 100-200ms faster than HTTP/2 (0-1 RTT vs 2-3 RTT). Cloudflare measured 12% faster page loads with HTTP/3 on mobile networks. However, HTTP/3’s user-space implementation can consume 2-3x more CPU than kernel-optimized TCP, impacting server costs at scale.

Throughput characteristics: HTTP/1.1 achieves 10-50 requests/second per connection depending on latency. HTTP/2 can handle 100-1000 concurrent streams per connection, limited by flow control windows and server resources. HTTP/3 provides similar throughput to HTTP/2 but with better tail latency. For high-throughput scenarios like video streaming, HTTP/2 and HTTP/3 provide similar performance, but HTTP/3’s connection migration prevents interruptions during network changes—critical for mobile video streaming where users frequently switch between WiFi and cellular.

HTTP Version Performance Under Packet Loss

graph TB
    subgraph Scenario: 1% Packet Loss, 100ms RTT
        Network["Network Conditions<br/><i>1% packet loss</i><br/><i>100ms round-trip time</i>"]
    end
    
    subgraph HTTP/1.1: 6 Parallel Connections
        H1_Conn1["Connection 1<br/>Streams: 1,2,3"]
        H1_Conn2["Connection 2<br/>Streams: 4,5,6"]
        H1_Lost["❌ Packet lost on Conn 1<br/>Blocks streams 1,2,3<br/>+100ms delay"]
        H1_OK["✅ Conn 2 unaffected<br/>Streams 4,5,6 continue"]
        H1_Result["Result:<br/>Partial blocking<br/>~17 resources per connection"]
        
        Network --> H1_Conn1
        Network --> H1_Conn2
        H1_Conn1 --> H1_Lost
        H1_Conn2 --> H1_OK
        H1_Lost --> H1_Result
        H1_OK --> H1_Result
    end
    
    subgraph HTTP/2: Single TCP Connection
        H2_Conn["Single TCP Connection<br/>All 100 streams multiplexed"]
        H2_Lost["❌ Single packet lost<br/>TCP retransmits<br/>ALL streams blocked<br/>+100ms delay"]
        H2_Result["Result:<br/>Complete blocking<br/>Worse than HTTP/1.1<br/>on lossy networks"]
        
        Network --> H2_Conn
        H2_Conn --> H2_Lost
        H2_Lost --> H2_Result
    end
    
    subgraph HTTP/3: QUIC Independent Streams
        H3_Conn["QUIC Connection<br/>100 independent streams"]
        H3_Lost["❌ Packet lost on Stream 5<br/>Only Stream 5 blocked<br/>+100ms delay"]
        H3_OK["✅ Streams 1-4, 6-100<br/>continue unaffected"]
        H3_Result["Result:<br/>Minimal impact<br/>10-20% faster than HTTP/2<br/>on mobile networks"]
        
        Network --> H3_Conn
        H3_Conn --> H3_Lost
        H3_Conn --> H3_OK
        H3_Lost --> H3_Result
        H3_OK --> H3_Result
    end

On networks with packet loss, HTTP/2 can perform worse than HTTP/1.1 because TCP head-of-line blocking affects all multiplexed streams. HTTP/3 eliminates this problem with QUIC’s independent stream delivery, where a lost packet only blocks its own stream while others continue, providing 10-20% latency improvements on lossy mobile networks.

Trade-offs

HTTP’s primary strength is universality—it’s supported everywhere, from browsers to IoT devices to backend services. The text-based nature of HTTP/1.1 makes debugging trivial with tools like curl or browser DevTools. HTTP’s stateless design enables horizontal scaling and caching at every layer (browser, CDN, reverse proxy). The request-response model is simple to reason about and implement.

HTTP/1.1’s weaknesses are well-documented: head-of-line blocking, connection overhead, and verbose headers. It’s inefficient for modern web applications loading hundreds of resources. The workaround of opening multiple connections creates connection management complexity and wastes server resources. Pipelining is rarely used because it’s fragile and provides limited benefits.

HTTP/2 solves HTTP/1.1’s performance problems but introduces complexity. Binary framing makes debugging harder—you can’t simply read packets with tcpdump. Server push seemed promising but proved problematic because servers can’t know what clients have cached, often pushing resources unnecessarily. HTTP/2’s single connection means one slow request can impact others if the server processes requests sequentially. Most critically, TCP head-of-line blocking limits HTTP/2’s benefits on lossy networks, making it sometimes slower than HTTP/1.1 on poor mobile connections.

HTTP/3 eliminates transport-layer head-of-line blocking and provides faster connection establishment, but UDP-based protocols face deployment challenges. Some corporate networks block UDP traffic. QUIC’s user-space implementation is CPU-intensive—Netflix reported 2x CPU usage compared to TCP. The protocol is still evolving, with less mature tooling and debugging support. Connection migration is powerful for mobile but adds complexity in tracking connection state.

For API design, HTTP’s methods and status codes provide clear semantics, but the protocol is chattier than binary alternatives like gRPC. Each request carries full headers, making HTTP inefficient for high-frequency, low-latency communication between microservices. The text-based nature of HTTP/1.1 makes it 2-3x slower to parse than binary protocols. However, HTTP’s ubiquity means it works through firewalls, proxies, and CDNs without special configuration—a critical advantage for public APIs.

When to Use (and When Not To)

Use HTTP/1.1 when you need maximum compatibility with legacy systems or when debugging simplicity is paramount. It’s appropriate for internal tools, admin interfaces, or systems where request volume is low (< 10 requests/second). If your infrastructure doesn’t support HTTP/2 or you’re working with embedded systems with limited resources, HTTP/1.1 remains viable. However, for any modern web application or public API, HTTP/2 should be the baseline.

HTTP/2 is the right choice for most web applications and APIs today. It provides significant performance improvements with minimal migration risk. Use HTTP/2 when serving web pages with many resources, building REST APIs with moderate request rates (100-10,000 requests/second), or when clients are primarily browsers or modern HTTP clients. The single-connection model simplifies server resource management compared to HTTP/1.1’s multiple connections. Enable HTTP/2 if you’re using HTTPS (required by browsers) and your infrastructure supports it—most modern web servers (nginx, Apache, Caddy) and cloud load balancers support HTTP/2 with simple configuration changes.

HTTP/3 is ideal for mobile applications, video streaming services, and global applications where users experience variable network conditions. If your users frequently switch networks (WiFi to cellular), HTTP/3’s connection migration prevents interruptions. For applications serving users on lossy networks (mobile, satellite, developing regions), HTTP/3’s independent stream delivery provides 10-20% latency improvements. However, adopt HTTP/3 only if you can handle the infrastructure complexity and CPU overhead. Implement it as an opt-in upgrade using the Alt-Svc header, maintaining HTTP/2 as a fallback.

Consider alternatives when HTTP’s overhead becomes problematic. For high-frequency, low-latency microservice communication, gRPC over HTTP/2 provides better performance with binary serialization and streaming support. For real-time bidirectional communication, WebSockets (which upgrade from HTTP) are more efficient than polling. For IoT devices with severe bandwidth constraints, MQTT or CoAP are lighter-weight alternatives. For internal services where you control both client and server, custom binary protocols over TCP can be more efficient, though you lose HTTP’s ecosystem of tools, caching, and load balancing.

Real-World Examples

Google Search: Google was an early HTTP/2 adopter, deploying it across all services in 2015. They measured 7% latency reduction on desktop and 23% on mobile. Google’s search results page loads 90-120 resources (HTML, CSS, JavaScript, images, fonts), making multiplexing critical. They use HTTP/2 server push to send critical CSS and JavaScript before the browser parses the HTML. In 2020, Google began deploying HTTP/3, reporting 3-5% additional latency improvements. Their QUIC implementation handles over 50% of Google traffic, with connection migration allowing seamless transitions as mobile users move between networks. Google’s scale required custom QUIC implementations in their edge servers to handle millions of concurrent connections efficiently.

Netflix Video Streaming: Netflix serves 250 million subscribers streaming billions of hours monthly. They use HTTP/1.1 for video delivery because video streaming is sequential—there’s no benefit to multiplexing. However, their API layer uses HTTP/2 to reduce latency for metadata requests (browsing, search, recommendations). Each Netflix client makes 50-100 API requests when browsing, and HTTP/2’s multiplexing reduces load times by 30%. Netflix experimented with HTTP/3 but found the CPU overhead (2x compared to TCP) didn’t justify the benefits for their use case. They continue using HTTP/1.1 for video with careful tuning: persistent connections, optimal TCP buffer sizes, and CDN placement to minimize RTT. This shows that newer isn’t always better—HTTP/1.1 remains optimal for large sequential transfers.

Stripe Payments API: Stripe’s API handles millions of payment requests daily with strict latency requirements (p99 < 500ms). They use HTTP/2 for all API traffic, with HPACK compression reducing header overhead by 85%. Stripe’s API clients often make multiple related requests (create customer, create payment method, create charge), and HTTP/2’s multiplexing allows these to execute in parallel over a single connection. They carefully tune HTTP/2 settings: 100 concurrent streams per connection, 1MB flow control windows, and disabled server push (which they found wasteful). Stripe uses HTTP status codes semantically: 200 for success, 402 for payment failures, 429 for rate limiting, and detailed error codes in the response body. Their API design demonstrates HTTP’s strength for RESTful services: clear semantics, excellent tooling support, and universal client compatibility.

Interview Essentials

Mid-Level

Mid-level candidates should explain HTTP’s request-response model and the purpose of common methods (GET, POST, PUT, DELETE, PATCH). You should know that GET and DELETE are idempotent while POST is not, and explain why this matters for retry logic. Understand HTTP status code categories: 2xx (success), 3xx (redirection), 4xx (client error), 5xx (server error). Explain the difference between 401 (Unauthorized) and 403 (Forbidden), or between 500 (Internal Server Error) and 503 (Service Unavailable). Know that HTTP/1.1 uses persistent connections but processes requests sequentially, while HTTP/2 multiplexes requests over a single connection. Explain common headers: Content-Type, Authorization, Cache-Control, Accept. Understand that HTTPS adds TLS encryption, requiring a certificate and adding latency for the TLS handshake. Be able to design a simple REST API using appropriate HTTP methods and status codes.

Senior

Senior candidates should explain HTTP/2’s binary framing layer and how multiplexing works at the frame level. Understand HPACK header compression and why it’s necessary (header overhead in HTTP/1.1). Explain HTTP/2’s stream prioritization and why server push largely failed in practice. Know the head-of-line blocking problem: HTTP/1.1 has application-layer HOL blocking, HTTP/2 has transport-layer HOL blocking, HTTP/3 eliminates it with QUIC. Explain the performance implications of each version: when HTTP/2 is slower than HTTP/1.1 (high packet loss), when HTTP/3 provides the most benefit (mobile networks, lossy connections). Understand connection establishment costs: 3 RTTs for HTTP/1.1 with TLS, 2 RTTs for HTTP/2 with TLS 1.3, 1 RTT for HTTP/3 new connections, 0 RTT for HTTP/3 repeat connections. Design an API with proper use of HTTP methods (idempotency), status codes (semantic meaning), and headers (caching, authentication). Explain how CDNs cache HTTP responses using Cache-Control headers and how to invalidate cached content.

Staff+

Staff+ candidates should discuss HTTP/3’s QUIC protocol in detail: how it implements reliability over UDP, how connection migration works using connection IDs, and why 0-RTT has security implications (replay attacks). Explain the trade-offs of HTTP/3 adoption: CPU overhead (2-3x), UDP blocking by firewalls, and immature tooling versus benefits on lossy networks. Discuss real-world migration strategies: using Alt-Svc headers for opportunistic upgrades, maintaining HTTP/2 fallback, and measuring performance improvements. Understand HTTP/2’s flow control at both stream and connection levels, and how misconfigured flow control windows cause performance problems. Explain why HTTP/2 server push failed: cache invalidation problems, bandwidth waste, and difficulty predicting what clients need. Design a high-scale API considering HTTP version selection, connection pooling, header compression, and caching strategies. Discuss when HTTP is the wrong choice: high-frequency microservice communication (use gRPC), real-time bidirectional communication (use WebSockets), or bandwidth-constrained IoT (use MQTT). Explain how companies like Google and Facebook measure HTTP performance: synthetic monitoring, real user monitoring (RUM), and A/B testing protocol versions.

Common Interview Questions

Explain the difference between HTTP/1.1, HTTP/2, and HTTP/3. When would you choose each?

Why is HTTP/2 sometimes slower than HTTP/1.1? (Answer: TCP head-of-line blocking on lossy networks)

How does HTTP/2 multiplexing work? What problem does it solve?

Design a REST API for a social media platform. What HTTP methods and status codes would you use?

Explain HTTP/3’s QUIC protocol. How does it eliminate head-of-line blocking?

What are the trade-offs of HTTP/2 server push? Why did it fail in practice?

How would you optimize HTTP performance for a mobile application?

Explain idempotency in HTTP methods. Why does it matter for distributed systems?

What’s the difference between 401 and 403 status codes? Between 500 and 503?

How does HPACK compression work in HTTP/2? What problem does it solve?

Red Flags to Avoid

Confusing HTTP methods (using POST for idempotent operations, using GET with request bodies)

Not understanding the difference between HTTP versions or claiming HTTP/2 is always faster

Misusing HTTP status codes (returning 200 with error messages in the body)

Not knowing that HTTPS requires TLS or that HTTP/3 uses UDP instead of TCP

Claiming HTTP/2 server push is a major benefit (it’s largely deprecated)

Not understanding head-of-line blocking or why it matters for performance

Designing APIs without considering idempotency, caching, or proper status codes

Not knowing common headers like Cache-Control, Authorization, or Content-Type

Confusing HTTP (the protocol) with REST (an architectural style)

Not understanding when HTTP is the wrong choice (high-frequency RPC, real-time bidirectional communication)

Key Takeaways

HTTP evolved from HTTP/1.1’s sequential processing through HTTP/2’s multiplexing to HTTP/3’s QUIC-based transport, each solving critical performance bottlenecks. HTTP/1.1 suffers from head-of-line blocking and connection overhead; HTTP/2 eliminates application-layer HOL blocking but still has transport-layer HOL blocking; HTTP/3 eliminates all HOL blocking with independent QUIC streams.

HTTP/2 provides 20-40% latency improvements over HTTP/1.1 through multiplexing, binary framing, and HPACK header compression. However, on networks with 2%+ packet loss, HTTP/2 can be slower due to TCP head-of-line blocking. HTTP/3 excels on lossy mobile networks with 10-20% additional improvements but requires 2-3x more CPU.

HTTP methods have semantic meaning: GET and DELETE are idempotent (safe to retry), POST is not. Status codes communicate outcomes: 2xx (success), 4xx (client error), 5xx (server error). Proper use of methods, status codes, and headers (Cache-Control, Authorization) is critical for scalable API design.

Connection establishment costs dominate latency: HTTP/1.1 with TLS requires 3 RTTs (150-300ms on typical networks), HTTP/2 with TLS 1.3 requires 2 RTTs, HTTP/3 requires 1 RTT for new connections and 0 RTT for repeat connections. This makes HTTP/3 ideal for mobile applications with frequent network changes.

HTTP is not always the right choice. For high-frequency microservice communication, gRPC provides better performance. For real-time bidirectional communication, WebSockets are more efficient. For bandwidth-constrained IoT, MQTT or CoAP are lighter-weight. Choose HTTP for its universality, tooling, and caching ecosystem, but understand its limitations.