Layer 4 Load Balancing: TCP/UDP Traffic Routing

intermediate 14 min read Updated 2026-02-11

After this topic, you will be able to:

  • Explain how Layer 4 load balancing operates at the transport layer using TCP/UDP information
  • Describe the performance advantages of L4 LB (low latency, high throughput) and their causes
  • Apply L4 load balancing to scenarios requiring maximum performance and protocol-agnostic routing
  • Distinguish between connection-based and packet-based L4 load balancing

TL;DR

Layer 4 load balancers operate at the OSI transport layer, routing traffic based on TCP/UDP headers (IP addresses, ports, protocol) without inspecting application payloads. They deliver microsecond-level latency and millions of connections per second by performing minimal packet processing—typically just Network Address Translation (NAT) or Direct Server Return (DSR). Use L4 when you need maximum throughput, protocol-agnostic routing, or sub-millisecond overhead.

Cheat Sheet: L4 = transport layer (TCP/UDP) | Inspects: IP + port only | Performance: <1ms latency, 10M+ req/s | Use for: high-throughput services, non-HTTP protocols, gaming, streaming | Trade-off: no content-based routing.

Background

When Netflix streams video to 200 million subscribers simultaneously, or when a multiplayer game server handles 50,000 concurrent TCP connections, the load balancer can’t afford to spend milliseconds parsing HTTP headers or inspecting JSON payloads. This is where Layer 4 load balancing shines. Born in the late 1990s alongside the explosion of internet traffic, L4 load balancers were designed to solve a fundamental problem: how do you distribute millions of network connections across server pools without becoming the bottleneck yourself?

The key insight was to work at the transport layer of the OSI model—the layer where TCP and UDP live. At this layer, a load balancer only needs to examine the packet header (source IP, destination IP, source port, destination port, protocol type) to make routing decisions. No payload inspection, no SSL termination, no HTTP parsing. This minimalism translates directly into speed: modern L4 load balancers can process packets in under 50 microseconds and handle 10+ million concurrent connections on commodity hardware.

Layer 4 load balancing became the backbone of high-performance infrastructure at companies like Google (for internal RPC traffic), Cloudflare (for DDoS mitigation), and game companies like Riot Games (for League of Legends matchmaking). The technology evolved from simple round-robin packet forwarding to sophisticated connection tracking with stateful failover, but the core principle remains: make routing decisions using only transport-layer information, and do it blazingly fast.

OSI Model Context

The OSI (Open Systems Interconnection) model defines seven layers of network communication, from physical cables (Layer 1) to application protocols like HTTP (Layer 7). Layer 4—the transport layer—sits in the middle, responsible for end-to-end communication, flow control, and reliability. This is where TCP and UDP operate.

At Layer 4, a load balancer sees the 5-tuple: source IP address, source port, destination IP address, destination port, and protocol (TCP or UDP). It does NOT see anything above this layer—no HTTP methods, no URLs, no cookies, no TLS-encrypted payloads. A Layer 4 load balancer treats an HTTP request, a database query, a video stream, and a gaming packet identically: they’re all just TCP or UDP flows defined by IP addresses and ports.

This contrasts sharply with Layer 7 (application layer), where load balancers parse application protocols like HTTP/HTTPS, inspect headers, read cookies, and route based on URL paths or request content. Layer 7 requires deserializing and understanding the payload, which adds latency (typically 1-5ms) and CPU overhead. Layer 4’s advantage is speed: by ignoring everything above the transport layer, it can forward packets with minimal processing—often just a routing table lookup and a NAT operation.

Understanding this distinction is critical for system design interviews. When an interviewer asks “L4 or L7?”, they’re really asking: “Do you need content-based routing (L7) or maximum performance (L4)?” The OSI layer determines what information is available for routing decisions, which in turn determines what use cases each load balancer type can handle.

OSI Model Layers: L4 vs L7 Load Balancing

graph TB
    subgraph OSI Model
        L7["Layer 7: Application<br/><i>HTTP, HTTPS, FTP, SMTP</i>"]
        L6["Layer 6: Presentation<br/><i>SSL/TLS, Encryption</i>"]
        L5["Layer 5: Session<br/><i>Session Management</i>"]
        L4["Layer 4: Transport<br/><i>TCP, UDP</i>"]
        L3["Layer 3: Network<br/><i>IP Routing</i>"]
        L2["Layer 2: Data Link<br/><i>MAC Addresses</i>"]
        L1["Layer 1: Physical<br/><i>Cables, Signals</i>"]
    end
    
    L7LB["L7 Load Balancer<br/>Inspects: URLs, Headers, Cookies<br/>Latency: 1-5ms"] -.->|"Operates at"| L7
    L4LB["L4 Load Balancer<br/>Inspects: IP + Port + Protocol<br/>Latency: 50-500μs"] -.->|"Operates at"| L4

Layer 4 load balancers operate at the transport layer, inspecting only TCP/UDP headers (5-tuple: source IP, dest IP, source port, dest port, protocol). Layer 7 load balancers operate at the application layer, parsing HTTP/HTTPS payloads. The layer determines what information is available for routing decisions and directly impacts latency.

Architecture

A Layer 4 load balancer sits between clients and backend servers, intercepting TCP or UDP connections and forwarding them to healthy servers based on transport-layer information. The architecture consists of three core components:

1. Connection Table (State Tracking): The load balancer maintains a connection table mapping each client connection (identified by the 5-tuple) to a backend server. For TCP, this tracks the full connection lifecycle: SYN, established, FIN/RST. For UDP (stateless protocol), the LB creates pseudo-connections based on recent packet activity, typically with a 30-60 second timeout. This table is the heart of L4 load balancing—it ensures all packets from a single client connection reach the same backend server, maintaining session consistency.

2. Packet Processor: When a packet arrives, the processor extracts the 5-tuple from the IP and TCP/UDP headers. For new connections, it selects a backend server using a load balancing algorithm (see Load Balancing Algorithms for distribution strategies), creates a connection table entry, and forwards the packet. For existing connections, it performs a table lookup and forwards to the previously selected server. The processor operates in the kernel or on specialized hardware (ASIC/FPGA) for maximum speed.

3. NAT Engine (Network Address Translation): The load balancer rewrites packet headers to route traffic correctly. In DNAT (Destination NAT) mode, it changes the destination IP from the load balancer’s VIP (Virtual IP) to the backend server’s IP. In SNAT (Source NAT) mode, it also changes the source IP to its own, forcing return traffic back through the load balancer. Alternatively, Direct Server Return (DSR) mode skips SNAT—the load balancer only modifies the destination MAC address (Layer 2), and servers respond directly to clients, bypassing the load balancer for return traffic. DSR is critical for high-bandwidth scenarios like video streaming, where response payloads are much larger than requests.

The load balancer typically runs in active-passive or active-active HA pairs, with connection table state replicated between nodes for failover. Google’s Maglev, for example, uses consistent hashing to distribute connections across multiple L4 load balancers, achieving both horizontal scalability and fault tolerance without stateful replication.

L4 Load Balancer Architecture with Connection Tracking

graph LR
    Client1["Client 1<br/>192.168.1.10:5000"]
    Client2["Client 2<br/>192.168.1.20:5001"]
    
    subgraph L4 Load Balancer
        PacketProc["Packet Processor<br/><i>Extract 5-tuple</i>"]
        ConnTable["Connection Table<br/><i>Hash Table</i><br/>192.168.1.10:5000 → Server1<br/>192.168.1.20:5001 → Server2"]
        NAT["NAT Engine<br/><i>DNAT/SNAT</i>"]
        Algorithm["LB Algorithm<br/><i>Least Connections</i>"]
    end
    
    Server1["Server 1<br/>10.0.1.10"]
    Server2["Server 2<br/>10.0.1.11"]
    Server3["Server 3<br/>10.0.1.12"]
    
    Client1 --"1. SYN packet<br/>5-tuple: 192.168.1.10:5000<br/>→ LB_VIP:443"--> PacketProc
    Client2 --"1. SYN packet<br/>5-tuple: 192.168.1.20:5001<br/>→ LB_VIP:443"--> PacketProc
    
    PacketProc --"2. Hash 5-tuple<br/>Lookup in table"--> ConnTable
    ConnTable --"3. New connection<br/>Select backend"--> Algorithm
    Algorithm --"4. Choose server"--> NAT
    ConnTable --"3. Existing connection<br/>Return cached server"--> NAT
    
    NAT --"5. DNAT: Change dest IP<br/>LB_VIP → 10.0.1.10"--> Server1
    NAT --"5. DNAT: Change dest IP<br/>LB_VIP → 10.0.1.11"--> Server2
    NAT -.->|"Available"| Server3

L4 load balancer architecture showing the three core components: Packet Processor extracts the 5-tuple from TCP/UDP headers, Connection Table maintains a hash table mapping client connections to backend servers, and NAT Engine rewrites packet headers. For new connections, the LB algorithm selects a backend; for existing connections, the cached mapping is used to ensure session consistency.

Internals

Under the hood, Layer 4 load balancers are optimized for packet processing speed. Here’s how they achieve microsecond-level latency:

Connection Tracking with Hash Tables: The connection table is implemented as a hash table keyed by the 5-tuple. Modern implementations use lock-free data structures or per-CPU hash tables to avoid contention. When a SYN packet arrives, the load balancer hashes the 5-tuple, selects a backend server (often using consistent hashing or least-connections), and inserts an entry. Subsequent packets hit the hash table in O(1) time. For TCP, the LB tracks connection state (SYN_SENT, ESTABLISHED, FIN_WAIT) to handle graceful shutdowns and detect half-open connections.

Kernel Bypass and Zero-Copy: High-performance L4 load balancers use DPDK (Data Plane Development Kit) or XDP (eXpress Data Path) to bypass the Linux kernel networking stack. Instead of packets traversing the kernel’s TCP/IP stack (which adds 10-50μs), they’re processed directly in userspace or in eBPF programs running in the kernel. This eliminates context switches and memory copies. HAProxy’s QUIC implementation, for example, uses kernel bypass to handle 1M+ packets/second per core.

Direct Server Return (DSR) Mechanics: In DSR mode, the load balancer receives the SYN packet, selects a backend, and rewrites only the destination MAC address to the backend’s MAC (Layer 2 rewrite). The destination IP remains the load balancer’s VIP. The backend server must be configured with the VIP as a loopback interface so it accepts packets destined for that IP. When the server responds, it sets the source IP to the VIP and sends packets directly to the client, bypassing the load balancer. This asymmetric routing is why DSR scales to 100+ Gbps: the load balancer only handles incoming traffic (small requests), while outgoing traffic (large responses) flows directly from servers to clients.

Connection Pooling and Persistence: L4 load balancers support connection persistence (sticky sessions) using source IP hashing. All connections from a client IP are routed to the same backend server for a configurable duration (e.g., 5 minutes). This is critical for stateful protocols like FTP or database connections where session state lives on the server. However, source IP persistence has limitations: clients behind NAT (e.g., corporate networks) appear as a single IP, causing uneven load distribution. More sophisticated L4 LBs use the full 5-tuple for persistence, providing finer granularity.

Health Checking: L4 load balancers perform TCP or UDP health checks—simple socket connections to verify servers are reachable. Unlike Layer 7 health checks (which might test /health endpoints), L4 checks only verify transport-layer connectivity. A server passing L4 health checks might still be unhealthy at the application layer (e.g., database connection pool exhausted), which is why critical systems often combine L4 and L7 load balancers in a two-tier architecture.

Direct Server Return (DSR) Flow - Asymmetric Routing

sequenceDiagram
    participant Client
    participant L4_LB as L4 Load Balancer<br/>(VIP: 203.0.113.10)
    participant Server as Backend Server<br/>(Real IP: 10.0.1.10)<br/>(Loopback: 203.0.113.10)
    
    Note over Client,Server: Request Path (through Load Balancer)
    Client->>L4_LB: 1. SYN to 203.0.113.10:443<br/>Source: 192.168.1.10:5000
    Note over L4_LB: 2. Select backend server<br/>Rewrite ONLY dest MAC address<br/>(Layer 2 rewrite)<br/>Dest IP stays 203.0.113.10
    L4_LB->>Server: 3. Forward packet<br/>Dest IP: 203.0.113.10 (VIP)<br/>Dest MAC: Server's MAC
    Note over Server: 4. Accept packet<br/>(VIP configured on loopback)
    
    Note over Client,Server: Response Path (bypasses Load Balancer)
    Server-->>Client: 5. SYN-ACK directly to client<br/>Source: 203.0.113.10:443<br/>Dest: 192.168.1.10:5000
    Client->>Server: 6. ACK + Data (direct)
    Server-->>Client: 7. Response payload (direct)<br/>Large video stream data
    
    Note over L4_LB: Load balancer only handles<br/>incoming traffic (small requests)<br/>Outgoing traffic (large responses)<br/>bypasses LB entirely

Direct Server Return (DSR) enables asymmetric routing where the load balancer only handles incoming traffic. The LB rewrites only the destination MAC address (Layer 2), keeping the destination IP as the VIP. Servers must have the VIP configured on a loopback interface. Response traffic flows directly from servers to clients, bypassing the load balancer—critical for high-bandwidth scenarios like video streaming where responses are 10-100x larger than requests.

Performance Characteristics

Layer 4 load balancers are the speed demons of load balancing, optimized for raw throughput and minimal latency:

Latency: Modern L4 load balancers add 50-500 microseconds of latency per request. HAProxy in L4 mode adds ~100μs on average. Google’s Maglev, running on custom hardware, achieves sub-50μs p99 latency. This is 10-100x faster than Layer 7 load balancers, which add 1-5ms due to SSL termination and HTTP parsing. For latency-sensitive applications like gaming (where 10ms feels laggy) or high-frequency trading (where microseconds matter), L4 is the only viable option.

Throughput: A single L4 load balancer instance can handle 10-20 million packets per second (Mpps) and sustain 100+ Gbps of throughput. Cloudflare’s L4 load balancers, built on XDP, process 300+ Mpps during DDoS attacks. By comparison, L7 load balancers top out at 1-2 Mpps due to payload inspection overhead. The performance gap widens with connection count: L4 LBs can maintain 10+ million concurrent TCP connections (limited by memory for connection table entries), while L7 LBs struggle beyond 100k-1M connections due to per-connection SSL state.

Scalability: L4 load balancers scale horizontally using ECMP (Equal-Cost Multi-Path) routing or consistent hashing. In ECMP, routers distribute incoming packets across multiple L4 LB instances using a hash of the 5-tuple. Each LB independently processes its share of connections. Google’s Maglev uses consistent hashing to minimize connection disruption when LBs are added or removed—only 1/N connections are rehashed when the cluster size changes. This architecture scales to hundreds of load balancer instances handling terabits per second.

Resource Efficiency: L4 load balancers consume minimal CPU and memory. A connection table entry is ~200 bytes (5-tuple + backend IP + state), so 10M connections require only 2GB of RAM. CPU usage is dominated by packet processing (hashing, table lookups, NAT rewrites), which modern CPUs handle efficiently—a single core can process 1-2 Mpps. This efficiency means L4 LBs rarely become the bottleneck; backend servers usually saturate first.

L4 vs L7 Performance Comparison

graph TB
    subgraph Latency Comparison
        L4Lat["L4 Load Balancer<br/>50-500 microseconds<br/><i>Packet forwarding only</i>"]
        L7Lat["L7 Load Balancer<br/>1-5 milliseconds<br/><i>SSL + HTTP parsing</i>"]
    end
    
    subgraph Throughput Comparison
        L4Thr["L4 Throughput<br/>10-20M packets/sec<br/>100+ Gbps<br/><i>Minimal processing</i>"]
        L7Thr["L7 Throughput<br/>1-2M packets/sec<br/>10-20 Gbps<br/><i>Payload inspection</i>"]
    end
    
    subgraph Connection Capacity
        L4Conn["L4 Connections<br/>10M+ concurrent<br/><i>~200 bytes per entry</i>"]
        L7Conn["L7 Connections<br/>100K-1M concurrent<br/><i>SSL state overhead</i>"]
    end
    
    subgraph Processing Pipeline
        L4Pipe["L4 Pipeline<br/>1. Extract 5-tuple<br/>2. Hash table lookup<br/>3. NAT rewrite<br/>4. Forward"]
        L7Pipe["L7 Pipeline<br/>1. TCP handshake<br/>2. SSL handshake<br/>3. HTTP parsing<br/>4. Route decision<br/>5. Backend connection<br/>6. Response processing"]
    end
    
    L4Lat -.->|"10-100x faster"| L7Lat
    L4Thr -.->|"10x higher"| L7Thr
    L4Conn -.->|"10-100x more"| L7Conn

Performance comparison showing L4’s advantages: 10-100x lower latency (microseconds vs milliseconds), 10x higher throughput (20M vs 2M packets/sec), and 10-100x more concurrent connections (10M+ vs 100K-1M). L4’s speed comes from minimal processing—just 5-tuple extraction, hash lookup, and NAT rewrite—while L7 must perform SSL termination, HTTP parsing, and payload inspection.

Trade-offs

Layer 4 load balancing excels at speed and simplicity but sacrifices application-layer intelligence:

Advantages:

  • Blazing Fast: Sub-millisecond latency and 10M+ req/s throughput make L4 the only choice for high-performance scenarios.
  • Protocol Agnostic: Works with any TCP/UDP protocol—HTTP, HTTPS, WebSockets, MQTT, database protocols (MySQL, PostgreSQL), gaming protocols, video streaming (RTMP, RTP). No need to configure protocol-specific parsing.
  • Low Resource Usage: Minimal CPU and memory footprint. A single L4 LB can replace dozens of L7 LBs for simple routing.
  • Transparent to Applications: Backends see the client’s real IP (with DSR or proxy protocol), simplifying logging and security.

Limitations:

  • No Content-Based Routing: Cannot route based on HTTP headers, URLs, cookies, or request body. A request to /api/users and /api/orders looks identical at L4—both are just TCP connections to port 443. For microservices architectures requiring path-based routing, you need Layer 7 (see Layer 7 Load Balancing).
  • No SSL Termination: L4 LBs cannot decrypt TLS traffic, so they can’t inspect encrypted payloads or offload SSL from backends. You must terminate SSL on backend servers (increasing their CPU load) or use a separate L7 LB for SSL termination.
  • Limited Health Checks: TCP/UDP health checks only verify connectivity, not application health. A server might pass L4 health checks but return HTTP 500 errors.
  • Session Persistence Challenges: Source IP-based persistence works poorly for clients behind NAT. The full 5-tuple provides better granularity but doesn’t survive client IP changes (e.g., mobile networks).

When L4 Isn’t Enough: Many production systems use a two-tier architecture: L4 load balancers at the edge for raw speed and DDoS protection, with L7 load balancers behind them for intelligent routing. Netflix, for example, uses L4 LBs to distribute traffic across AWS regions, then L7 LBs within each region to route requests to microservices based on URL paths.

When to Use (and When Not To)

Choose Layer 4 load balancing when performance trumps application-layer intelligence:

Ideal Use Cases:

  • High-Throughput Services: Video streaming (Netflix, YouTube), large file downloads, CDN origin servers—scenarios where response payloads are 10-100x larger than requests. DSR mode lets servers send responses directly to clients, bypassing the load balancer bottleneck.
  • Non-HTTP Protocols: Database connection pooling (MySQL, PostgreSQL), message queues (MQTT, AMQP), gaming servers (UDP-based), VoIP (SIP, RTP), DNS, NTP. L4 is protocol-agnostic, so it works out of the box.
  • Latency-Sensitive Applications: Real-time gaming, financial trading systems, IoT telemetry—any system where 1-5ms of L7 overhead is unacceptable.
  • DDoS Mitigation: L4 LBs can absorb massive packet floods (300+ Mpps) and drop malicious traffic before it reaches application servers. Cloudflare’s L4 LBs are the first line of defense against volumetric attacks.
  • TCP/UDP Services with Simple Routing: If you’re just distributing connections across a homogeneous server pool (e.g., a stateless API with no path-based routing), L4 is simpler and faster than L7.

When to Use Layer 7 Instead:

  • You need content-based routing (path, headers, cookies)—e.g., routing /api/users to one service and /api/orders to another.
  • You need SSL termination to offload encryption from backends or inspect encrypted traffic.
  • You need advanced health checks (HTTP status codes, response body validation).
  • You need request/response modification (header injection, URL rewriting).

Hybrid Approach: Use L4 at the edge for speed and DDoS protection, with L7 behind it for intelligent routing. This is how most large-scale systems (AWS ELB, Google Cloud Load Balancing, Cloudflare) are architected.

Two-Tier Load Balancing Architecture

graph TB
    Internet["Internet Traffic<br/><i>Mixed protocols</i>"]
    
    subgraph Edge Layer - L4 Load Balancers
        L4_1["L4 LB 1<br/><i>ECMP/Anycast</i>"]
        L4_2["L4 LB 2<br/><i>ECMP/Anycast</i>"]
        L4_3["L4 LB 3<br/><i>ECMP/Anycast</i>"]
    end
    
    subgraph Application Layer - L7 Load Balancers
        L7_1["L7 LB 1<br/><i>SSL Termination</i>"]
        L7_2["L7 LB 2<br/><i>SSL Termination</i>"]
    end
    
    subgraph Microservices
        Users["User Service<br/><i>/api/users/*</i>"]
        Orders["Order Service<br/><i>/api/orders/*</i>"]
        Auth["Auth Service<br/><i>/api/auth/*</i>"]
    end
    
    subgraph Non-HTTP Services
        DB["Database Pool<br/><i>PostgreSQL:5432</i>"]
        MQTT["IoT Gateway<br/><i>MQTT:1883</i>"]
        Stream["Video Stream<br/><i>RTMP:1935</i>"]
    end
    
    Internet --"1. Raw throughput<br/>DDoS protection<br/>Protocol-agnostic"--> L4_1 & L4_2 & L4_3
    
    L4_1 & L4_2 & L4_3 --"2. HTTP/HTTPS traffic<br/>Sub-ms latency"--> L7_1 & L7_2
    L4_1 & L4_2 & L4_3 --"2. Non-HTTP traffic<br/>Direct routing"--> DB & MQTT & Stream
    
    L7_1 & L7_2 --"3. Path-based routing<br/>Header inspection<br/>Cookie-based routing"--> Users & Orders & Auth
    
    Note1["L4 Layer:<br/>• 100+ Gbps throughput<br/>• Sub-ms latency<br/>• All protocols<br/>• DDoS mitigation"]
    Note2["L7 Layer:<br/>• Content-based routing<br/>• SSL termination<br/>• Advanced health checks<br/>• Request modification"]

Production two-tier architecture combining L4 and L7 load balancers. L4 load balancers at the edge handle raw throughput (100+ Gbps), DDoS protection, and protocol-agnostic routing for all traffic. HTTP/HTTPS traffic flows to L7 load balancers for intelligent routing to microservices based on URL paths and headers. Non-HTTP protocols (databases, MQTT, video streaming) are routed directly from L4 to backend services. This architecture is used by AWS (NLB + ALB), Google Cloud, and Netflix.

Real-World Examples

company: Google system: Maglev implementation: Google’s Maglev is a software-based L4 load balancer handling all traffic entering Google’s datacenters—billions of requests per second. It uses consistent hashing to distribute connections across a cluster of load balancer instances, with each instance processing packets in userspace via kernel bypass (no Linux networking stack). Maglev achieves sub-50μs p99 latency and scales horizontally to hundreds of instances. When a new LB is added, consistent hashing ensures only 1/N connections are rehashed, minimizing disruption. Maglev also integrates with Google’s SDN (Software-Defined Networking) to perform ECMP routing at the network layer, distributing packets across LBs before they even reach a single machine. interesting_detail: Maglev’s connection table uses a custom lock-free hash table optimized for x86 cache lines. Each entry is exactly 64 bytes (one cache line), minimizing cache misses. This micro-optimization contributes significantly to its sub-50μs latency.

company: Cloudflare system: Unimog (L4 Load Balancer) implementation: Cloudflare’s Unimog is an XDP-based L4 load balancer deployed at the edge of their network, handling 30+ million HTTP requests per second and absorbing 300+ Mpps during DDoS attacks. Unimog uses eBPF programs running in the Linux kernel to process packets without context switches. It performs stateful connection tracking for TCP and stateless load balancing for UDP. During attacks, Unimog drops malicious packets at line rate (100 Gbps per server) before they consume backend resources. It also integrates with Cloudflare’s Anycast network, allowing traffic to be routed to the nearest datacenter and then load-balanced across servers. interesting_detail: Unimog uses a technique called ‘SYN cookies’ to defend against SYN flood attacks. Instead of storing connection state for every SYN packet (which would exhaust memory), it encodes state in the TCP sequence number, allowing stateless SYN-ACK responses. Only after the client completes the handshake does Unimog allocate connection table space.

company: HAProxy system: HAProxy in TCP Mode implementation: HAProxy, one of the most popular open-source load balancers, supports both L4 (TCP mode) and L7 (HTTP mode). In TCP mode, HAProxy performs L4 load balancing with sub-millisecond latency, handling 1M+ concurrent connections on a single instance. It’s widely used for database connection pooling (e.g., load balancing PostgreSQL read replicas) and as a frontend for microservices. HAProxy’s L4 mode supports advanced features like connection persistence (source IP hashing), health checks (TCP connect tests), and seamless reloads (zero-downtime configuration changes). Companies like Reddit and Stack Overflow use HAProxy in L4 mode to distribute traffic across application servers. interesting_detail: HAProxy’s ‘seamless reload’ feature allows configuration changes without dropping connections. When you reload HAProxy, the old process keeps handling existing connections while the new process accepts new ones. Once all old connections close, the old process exits. This is critical for production systems that can’t afford downtime during config updates.


Interview Essentials

Mid-Level

Explain L4 vs L7 load balancing: Clearly articulate that L4 operates at the transport layer (TCP/UDP), routing based on IP addresses and ports, while L7 operates at the application layer, routing based on HTTP headers, URLs, and payloads. Emphasize the performance trade-off: L4 is 10-100x faster but can’t do content-based routing.

Describe how L4 load balancers maintain session consistency: Explain connection tables (hash tables keyed by 5-tuple) that map client connections to backend servers. For TCP, the LB tracks connection state; for UDP, it creates pseudo-connections with timeouts. This ensures all packets from a single connection reach the same server.

Explain NAT and DSR: Describe DNAT (rewriting destination IP) and SNAT (rewriting source IP). Explain DSR (Direct Server Return) as an optimization where the load balancer only modifies the destination MAC address, allowing servers to respond directly to clients. Mention that DSR is critical for high-bandwidth scenarios like video streaming.

Senior

Design a two-tier load balancing architecture: Propose using L4 load balancers at the edge for speed and DDoS protection, with L7 load balancers behind them for intelligent routing to microservices. Justify the trade-offs: L4 handles raw throughput, L7 provides application-layer intelligence. Mention real-world examples like AWS ELB (L4 NLB + L7 ALB).

Discuss connection persistence strategies: Explain source IP hashing (simple but breaks with NAT), 5-tuple hashing (better granularity), and cookie-based persistence (requires L7). Discuss the trade-off between load distribution and session stickiness. Mention that L4 persistence doesn’t survive client IP changes (e.g., mobile networks switching towers).

Explain how L4 LBs scale horizontally: Describe ECMP routing (routers distribute packets across LB instances using 5-tuple hashing) and consistent hashing (minimizes connection disruption when LBs are added/removed). Mention Google’s Maglev as an example of consistent hashing at scale. Discuss the challenge of connection table synchronization in active-active HA setups.

Staff+

Architect a global L4 load balancing system: Design a multi-region architecture using Anycast (same IP announced from multiple datacenters, routing traffic to the nearest one) with L4 load balancers in each region. Discuss challenges: connection draining during datacenter failover, handling stateful protocols (FTP, database connections) across regions, and minimizing latency for global users. Propose solutions like connection migration (moving connection state between LBs) or client-side retry logic.

Optimize L4 load balancer performance: Discuss kernel bypass (DPDK, XDP) to eliminate kernel networking stack overhead, lock-free data structures for connection tables, and NUMA-aware memory allocation. Explain how to profile packet processing (perf, eBPF tracing) to identify bottlenecks. Mention hardware acceleration (ASIC/FPGA-based LBs) for 100+ Gbps throughput.

Design for DDoS resilience: Explain how L4 LBs defend against SYN floods (SYN cookies, rate limiting), UDP amplification attacks (source validation), and connection exhaustion (connection limits per source IP). Discuss integration with upstream DDoS mitigation services (Cloudflare, AWS Shield). Propose a defense-in-depth strategy: L4 LB drops volumetric attacks, L7 LB detects application-layer attacks (slowloris, HTTP floods).

Common Interview Questions

When would you choose L4 over L7 load balancing? Answer: When you need maximum performance (sub-millisecond latency, 10M+ req/s), protocol-agnostic routing (non-HTTP protocols like databases, gaming, streaming), or when you don’t need content-based routing. Use L7 when you need intelligent routing (path, headers, cookies) or SSL termination.

How does Direct Server Return (DSR) work, and when would you use it? Answer: DSR rewrites only the destination MAC address (Layer 2), leaving the destination IP as the load balancer’s VIP. Servers respond directly to clients, bypassing the LB for return traffic. Use DSR for high-bandwidth scenarios (video streaming, large file downloads) where response payloads are much larger than requests.

What are the limitations of L4 load balancing? Answer: No content-based routing (can’t route by URL or headers), no SSL termination (can’t inspect encrypted traffic), limited health checks (TCP/UDP connectivity only, not application health), and session persistence challenges (source IP hashing breaks with NAT).

Red Flags to Avoid

Confusing L4 and L7: Saying L4 can route based on HTTP headers or URLs. This is a Layer 7 capability. L4 only sees IP addresses, ports, and protocol.

Not understanding DSR trade-offs: Claiming DSR works for all scenarios. DSR requires servers to have the VIP configured as a loopback interface and only works within a single Layer 2 network (same subnet). It also breaks source IP visibility unless you use proxy protocol.

Ignoring connection persistence: Proposing L4 load balancing for stateful protocols (FTP, database connections) without discussing session stickiness. Without persistence, connections would be routed to different servers, breaking session state.

Overestimating L4 capabilities: Suggesting L4 LBs can perform advanced health checks (HTTP status codes, response body validation) or request modification (header injection). These require Layer 7 inspection.


Key Takeaways

Layer 4 load balancers operate at the OSI transport layer (TCP/UDP), routing traffic based solely on IP addresses, ports, and protocol—no payload inspection. This minimalism delivers sub-millisecond latency and 10M+ req/s throughput, making L4 the fastest load balancing option.

Direct Server Return (DSR) is critical for high-bandwidth scenarios. By rewriting only the destination MAC address, DSR allows servers to respond directly to clients, bypassing the load balancer for return traffic. This scales to 100+ Gbps for video streaming and large file downloads.

L4 excels at protocol-agnostic routing and raw performance but lacks application-layer intelligence. Use L4 for non-HTTP protocols (databases, gaming, streaming) or when speed is paramount. Use Layer 7 when you need content-based routing (URL paths, headers) or SSL termination.

Connection persistence (session stickiness) at L4 uses source IP or 5-tuple hashing. This ensures all packets from a client connection reach the same backend server, critical for stateful protocols. However, source IP persistence breaks with NAT, and neither survives client IP changes.

Production systems often use two-tier architectures: L4 at the edge for speed and DDoS protection, L7 behind it for intelligent routing. This combines the best of both worlds—L4’s throughput with L7’s application-layer capabilities. Examples: AWS NLB + ALB, Google Cloud Load Balancing.