Load Balancers Explained: Types & Use Cases
After this topic, you will be able to:
- Explain the role of load balancers in distributed systems and their impact on availability
- Identify the key components of load balancing architecture including health checks and failover
- Describe how load balancers fit into the overall system design landscape
- Compare hardware vs software load balancers and their trade-offs
TL;DR
Load balancers distribute incoming traffic across multiple servers to prevent overload, eliminate single points of failure, and improve availability. They perform health checks to route traffic only to healthy backends, handle SSL termination to offload encryption work, and can maintain session persistence. Understanding load balancers is foundational for designing scalable systems—they’re the traffic cops that keep your application responsive under load.
Cheat Sheet: Load balancer = traffic distributor + health monitor. Prevents server overload, enables horizontal scaling, provides failover. Key features: health checks, SSL termination, session persistence. Trade-off: adds complexity and potential bottleneck. Examples: AWS ELB/ALB, Nginx, HAProxy.
Why This Matters
Every system design interview eventually asks: “How do you handle millions of users?” The answer always involves load balancers. They’re the critical component that transforms a single-server application into a distributed system capable of handling massive scale. Without load balancers, you’re stuck with vertical scaling—buying bigger servers until you hit physical limits. With them, you unlock horizontal scaling—adding more servers to handle more traffic.
In production systems at companies like Netflix and AWS, load balancers are everywhere. Netflix uses Zuul (their custom load balancer) to route billions of API requests daily across thousands of microservices. AWS built Elastic Load Balancing to handle trillions of requests per year. These aren’t optional components—they’re fundamental infrastructure that enables high availability, fault tolerance, and the ability to deploy code without downtime.
For interviews, load balancers demonstrate your understanding of distributed systems fundamentals. They touch on availability (health checks and failover), performance (traffic distribution), scalability (horizontal scaling), and operational concerns (SSL termination, session management). A candidate who can articulate why load balancers matter and how they work shows they understand the building blocks of modern architecture. This topic sets the foundation for the entire module—you can’t discuss scaling strategies, caching layers, or microservices without understanding how traffic gets distributed in the first place.
Single Server vs Load Balanced Architecture
graph LR
subgraph Single Server Problem
C1["Client 1"] --> S["Single Server<br/><i>Overloaded</i>"]
C2["Client 2"] --> S
C3["Client 3"] --> S
C4["Client N..."] --> S
S -."SPOF".-x F["❌ Failure = Total Outage"]
end
subgraph Load Balanced Solution
LC1["Client 1"] --> LB["Load Balancer<br/><i>Traffic Distributor</i>"]
LC2["Client 2"] --> LB
LC3["Client 3"] --> LB
LC4["Client N..."] --> LB
LB --"Distributed Load"--> LS1["Server 1"]
LB --"Distributed Load"--> LS2["Server 2"]
LB --"Distributed Load"--> LS3["Server 3"]
LS2 -."Failure Isolated".-x LF["✓ Other Servers Handle Traffic"]
end
Load balancers transform a single point of failure into a distributed system. When one server fails in a load-balanced architecture, traffic automatically routes to healthy servers, enabling horizontal scaling and fault tolerance.
The Landscape
The load balancing landscape spans hardware appliances, software solutions, and cloud-managed services, each with distinct trade-offs. Hardware load balancers like F5 BIG-IP and Citrix NetScaler offer dedicated processing power and can handle millions of connections per second, but they’re expensive (often $20,000-$100,000+) and require specialized expertise. They’re still common in enterprise data centers and financial services where performance and reliability justify the cost.
Software load balancers have democratized traffic distribution. HAProxy and Nginx are the dominant open-source options, running on commodity hardware and offering flexibility through configuration. HAProxy excels at high-performance TCP/HTTP load balancing and powers infrastructure at Reddit, Stack Overflow, and GitHub. Nginx started as a web server but evolved into a powerful reverse proxy and load balancer, used by Dropbox, Netflix, and WordPress.com. These solutions cost nothing for the software but require operational expertise to configure, monitor, and scale.
Cloud providers have abstracted load balancing into managed services. AWS offers Elastic Load Balancing (ELB) with three types: Classic Load Balancer (legacy), Application Load Balancer (Layer 7, HTTP/HTTPS), and Network Load Balancer (Layer 4, TCP/UDP). Google Cloud has Cloud Load Balancing with global and regional options. Azure provides Azure Load Balancer and Application Gateway. These services handle scaling, health checks, and failover automatically, but you pay per hour and per GB processed—costs that can surprise teams at scale.
The landscape also includes service mesh solutions like Envoy (used in Istio and AWS App Mesh) that provide load balancing within containerized environments. Envoy runs as a sidecar proxy alongside each service, enabling sophisticated routing and observability. This represents the modern evolution of load balancing for microservices architectures, where traditional centralized load balancers give way to distributed, application-aware routing.
Choosing between these options depends on your constraints. Startups typically start with cloud-managed load balancers for simplicity, then migrate to self-hosted Nginx or HAProxy as costs scale. Enterprises with existing data centers might use hardware appliances for critical traffic and software load balancers for internal services. The key insight: load balancing is a solved problem with many solutions—the challenge is picking the right one for your context.
Load Balancing Solution Landscape
graph TB
subgraph Hardware Appliances
HW1["F5 BIG-IP<br/><i>$20K-$100K+</i>"]
HW2["Citrix NetScaler<br/><i>High Performance</i>"]
HW_PRO["✓ Dedicated hardware<br/>✓ Millions conn/sec<br/>✓ Enterprise support"]
HW_CON["✗ Expensive<br/>✗ Specialized expertise<br/>✗ Vendor lock-in"]
end
subgraph Software Solutions
SW1["HAProxy<br/><i>Open Source</i>"]
SW2["Nginx<br/><i>Open Source</i>"]
SW_PRO["✓ Free software<br/>✓ Flexible configuration<br/>✓ Commodity hardware"]
SW_CON["✗ Operational overhead<br/>✗ Manual scaling<br/>✗ Self-managed HA"]
end
subgraph Cloud Managed
CL1["AWS ALB/NLB<br/><i>Pay per use</i>"]
CL2["GCP Load Balancing<br/><i>Global/Regional</i>"]
CL3["Azure Load Balancer<br/><i>Managed Service</i>"]
CL_PRO["✓ Auto-scaling<br/>✓ Managed HA<br/>✓ No ops overhead"]
CL_CON["✗ Cost at scale<br/>✗ Less control<br/>✗ Vendor lock-in"]
end
subgraph Service Mesh
SM1["Envoy/Istio<br/><i>Sidecar Proxy</i>"]
SM2["Linkerd<br/><i>Kubernetes Native</i>"]
SM_PRO["✓ App-aware routing<br/>✓ Observability<br/>✓ Microservices"]
SM_CON["✗ High complexity<br/>✗ Resource overhead<br/>✗ Learning curve"]
end
STARTUP["Startup<br/><i>Simple, Fast</i>"] --> CL1
MIDSIZE["Mid-size<br/><i>Cost Optimization</i>"] --> SW1
ENTERPRISE["Enterprise<br/><i>Critical Traffic</i>"] --> HW1
MICROSERVICES["Microservices<br/><i>Container Native</i>"] --> SM1
The load balancing landscape spans hardware appliances (expensive, high-performance), software solutions (flexible, self-managed), cloud services (managed, auto-scaling), and service meshes (microservices-focused). Companies typically start with cloud-managed solutions and migrate to self-hosted as costs scale.
Key Areas
name: Traffic Distribution description: The core function of any load balancer is distributing incoming requests across a pool of backend servers. This prevents any single server from becoming overwhelmed while others sit idle. Distribution strategies range from simple round-robin (each server gets the next request in sequence) to sophisticated algorithms that consider server load, response times, and geographic proximity. See Load Balancing Algorithms for detailed coverage of specific strategies. The distribution mechanism directly impacts system performance—poor distribution leads to hot spots where some servers are overloaded while others are underutilized. In interviews, demonstrating awareness of different distribution strategies shows you understand that load balancing isn’t just “spread traffic evenly”—it’s about intelligent routing based on system state.
name: Health Monitoring and Failover description: Load balancers continuously monitor backend servers to detect failures and automatically route traffic away from unhealthy instances. This transforms a system with multiple single points of failure into a fault-tolerant architecture. Health checks can be active (the load balancer sends periodic probes) or passive (monitoring actual request success rates). When a server fails health checks, the load balancer removes it from the rotation until it recovers. This capability is what enables zero-downtime deployments—you can take servers offline for updates while the load balancer routes traffic to healthy instances. The health check configuration (interval, timeout, failure threshold) directly impacts both availability and recovery time. Set checks too aggressive and you’ll mark healthy servers as failed during temporary slowdowns; too lenient and you’ll route traffic to failing servers.
name: Layer 4 vs Layer 7 Load Balancing description: Load balancers operate at different layers of the OSI model, with fundamentally different capabilities. Layer 4 load balancers work at the transport layer (TCP/UDP), making routing decisions based on IP addresses and ports without inspecting packet contents. They’re fast and protocol-agnostic but limited in routing intelligence. Layer 7 load balancers operate at the application layer (HTTP/HTTPS), parsing request contents to make sophisticated routing decisions based on URLs, headers, cookies, or request methods. This enables content-based routing (send /api requests to API servers, /images to CDN origins) and advanced features like SSL termination and request rewriting. The trade-off: Layer 7 processing adds latency and CPU overhead. See Layer 4 Load Balancing and Layer 7 Load Balancing for technical deep dives.
name: SSL Termination and Session Management description: Load balancers often handle SSL/TLS encryption and decryption, offloading this CPU-intensive work from backend servers. This is called SSL termination—the load balancer decrypts incoming HTTPS requests, routes them to backends over plain HTTP, then encrypts responses before returning them to clients. This simplifies certificate management (one certificate on the load balancer instead of N certificates on N servers) and reduces backend server load. However, it means traffic between the load balancer and backends is unencrypted, which may violate security requirements in some environments. Session persistence (also called sticky sessions) ensures that requests from the same client always reach the same backend server, which is necessary for stateful applications that store session data locally. The load balancer typically uses cookies or source IP hashing to maintain this affinity.
name: High Availability and Redundancy description: A single load balancer creates a new single point of failure—if it goes down, your entire system becomes unavailable even if all backend servers are healthy. Production systems address this with redundant load balancers in active-passive or active-active configurations. Active-passive uses a primary load balancer with a standby that takes over if the primary fails (typically using VRRP or similar protocols). Active-active distributes traffic across multiple load balancers simultaneously, providing both redundancy and additional capacity. Cloud providers handle this automatically—AWS ALB runs across multiple availability zones by default. For self-hosted solutions, you need to configure failover mechanisms, health checks between load balancers, and shared state synchronization. The complexity increases significantly, but so does availability—properly configured redundant load balancers can achieve 99.99% uptime or better.
Health Checks & Failover
Health checks are the mechanism by which load balancers detect and respond to backend server failures, transforming a collection of independent servers into a fault-tolerant system. Understanding health check mechanics is critical because they directly impact both availability and user experience.
Active health checks involve the load balancer periodically sending probe requests to each backend server. These probes can be simple TCP connection attempts (“can I establish a connection to port 80?”), HTTP requests to a specific endpoint (“does GET /health return 200 OK?”), or application-specific checks (“does the database connection pool have available connections?”). The load balancer configures several parameters: check interval (how often to probe, typically 5-30 seconds), timeout (how long to wait for a response, typically 2-5 seconds), and failure threshold (how many consecutive failures before marking the server unhealthy, typically 2-3). For example, AWS ALB checks every 30 seconds with a 5-second timeout and marks servers unhealthy after 2 consecutive failures.
The health check endpoint design matters enormously. A naive implementation might just return “200 OK” unconditionally, which tells you the web server is running but nothing about whether the application can actually serve requests. A better health check verifies critical dependencies: can the application connect to its database? Is the cache reachable? Are downstream services responding? However, checks that are too comprehensive can create cascading failures—if your health check calls 10 downstream services and any one is slow, your server gets marked unhealthy even though it could serve most requests fine.
Passive health checks (also called connection-based health monitoring) observe actual request traffic rather than sending synthetic probes. The load balancer monitors error rates, response times, and connection failures for real user requests. If a server starts returning 500 errors or timing out, the load balancer reduces traffic to it or removes it from rotation. This approach catches problems that active checks might miss (like the server responding to health checks but failing real requests) but responds more slowly since it requires observing multiple failed requests.
When a server fails health checks, the failover process begins. The load balancer immediately stops routing new requests to the failed server. Existing connections might be handled differently depending on configuration: some load balancers immediately terminate them (forcing clients to retry), while others allow in-flight requests to complete. The failed server enters a “draining” state where it finishes existing work but receives no new requests. Meanwhile, the load balancer redistributes traffic across remaining healthy servers.
The recovery process is equally important. Once a failed server starts passing health checks again, the load balancer doesn’t immediately flood it with traffic. Most implementations use a “success threshold” (typically 2-3 consecutive successful checks) before marking the server healthy again. Some load balancers implement gradual traffic ramping, slowly increasing the percentage of requests sent to the recovered server to avoid overwhelming it if it’s still struggling.
Graceful degradation strategies help maintain availability during partial failures. If half your servers fail health checks, the load balancer doesn’t just route all traffic to the remaining servers (which might overload them). Instead, it can implement circuit breaker patterns, rate limiting, or even route some traffic to the “unhealthy” servers if the alternative is complete system failure. Netflix’s Hystrix library implements these patterns, allowing degraded service rather than complete outages.
The health check configuration creates a trade-off between availability and responsiveness. Aggressive checks (short intervals, low thresholds) detect failures quickly but risk false positives during temporary slowdowns. Conservative checks (long intervals, high thresholds) avoid false positives but leave failed servers in rotation longer. In interviews, discussing this trade-off and how you’d tune it based on SLA requirements demonstrates operational maturity. For a system with a 99.9% availability target (43 minutes downtime per month), you might accept 30-second detection times. For 99.99% (4 minutes per month), you need sub-10-second detection and automated failover.
Health Check and Failover Process
sequenceDiagram
participant LB as Load Balancer
participant S1 as Server 1<br/>(Healthy)
participant S2 as Server 2<br/>(Will Fail)
participant S3 as Server 3<br/>(Healthy)
Note over LB,S3: Normal Operation
LB->>S1: Health Check (GET /health)
S1-->>LB: 200 OK
LB->>S2: Health Check (GET /health)
S2-->>LB: 200 OK
LB->>S3: Health Check (GET /health)
S3-->>LB: 200 OK
Note over LB,S3: Server 2 Fails
LB->>S2: Health Check (GET /health)
S2--xLB: Timeout (5s)
Note over LB: Failure 1/2
LB->>S2: Health Check (GET /health)
S2--xLB: Timeout (5s)
Note over LB: Failure 2/2<br/>Mark S2 Unhealthy
Note over LB,S3: Traffic Redistribution
LB->>S1: Route requests
LB->>S3: Route requests
Note over S2: No new traffic<br/>Draining connections
Note over LB,S3: Recovery Detection
LB->>S2: Health Check (GET /health)
S2-->>LB: 200 OK
Note over LB: Success 1/2
LB->>S2: Health Check (GET /health)
S2-->>LB: 200 OK
Note over LB: Success 2/2<br/>Mark S2 Healthy
LB->>S2: Gradual traffic ramp-up
Health checks continuously monitor backend servers. After consecutive failures (typically 2-3), the load balancer marks a server unhealthy and stops routing traffic. Recovery requires consecutive successful checks before traffic resumes, preventing flapping during intermittent issues.
How Things Connect
Load balancers sit at the intersection of multiple system design concerns, connecting availability, scalability, and performance into a cohesive architecture. Understanding these connections helps you see the bigger picture in interviews and design discussions.
The relationship between load balancers and horizontal scaling is foundational. See Horizontal Scaling for the full scaling strategy, but the key insight is that load balancers enable you to add capacity by adding servers rather than upgrading existing ones. Without load balancers, horizontal scaling is impossible—you have no mechanism to distribute traffic across multiple servers. This connection explains why load balancers appear in virtually every scalable system design.
Load balancing algorithms determine how traffic gets distributed, and the choice of algorithm impacts both performance and complexity. See Load Balancing Algorithms for detailed coverage, but understand that the algorithm must match your workload characteristics. Round-robin works for homogeneous servers with similar request costs. Least-connections works better when request processing times vary significantly. Consistent hashing enables cache-friendly routing. The load balancer’s distribution strategy and your caching strategy must work together—poor coordination leads to cache thrashing and degraded performance.
The Layer 4 vs Layer 7 distinction connects to your system’s architectural needs. Layer 4 load balancers provide simple, fast traffic distribution and work with any protocol (HTTP, WebSocket, gRPC, custom TCP protocols). Layer 7 load balancers enable sophisticated routing based on request content, which is essential for microservices architectures where different URLs route to different services. The choice impacts your entire architecture: Layer 7 enables patterns like API gateways and content-based routing, while Layer 4 keeps things simple and fast. See Layer 4 Load Balancing and Layer 7 Load Balancing for the technical details.
Load balancers and reverse proxies are closely related but serve different primary purposes. See Load Balancer vs Reverse Proxy for the distinction, but the key is that many tools (Nginx, HAProxy) serve both roles. In practice, your “load balancer” often handles caching, SSL termination, request routing, and security filtering—functions traditionally associated with reverse proxies. Understanding this overlap helps you design more efficient architectures rather than deploying separate components for each function.
The connection to availability and fault tolerance is direct: load balancers with health checks transform a system where any server failure causes partial outages into one where failures are automatically detected and routed around. This enables zero-downtime deployments, rolling updates, and resilience to hardware failures. However, the load balancer itself becomes a critical component—hence the need for redundant load balancers in production systems.
Finally, load balancers connect to operational concerns like monitoring, logging, and debugging. They’re natural points for collecting metrics (requests per second, error rates, latency percentiles) and implementing cross-cutting concerns like rate limiting, authentication, and request logging. In microservices architectures, load balancers often inject tracing headers that enable distributed tracing across services. Understanding these operational connections helps you design systems that are not just scalable but also observable and debuggable in production.
Real-World Context
Understanding how companies actually deploy and operate load balancers provides crucial context for system design interviews. Real-world usage reveals patterns, trade-offs, and lessons learned that aren’t obvious from textbook descriptions.
Netflix operates one of the world’s largest load balancing infrastructures, handling billions of requests daily across thousands of microservices. They built Zuul, a custom Layer 7 load balancer and API gateway, because existing solutions couldn’t meet their needs for dynamic routing, real-time traffic shaping, and operational visibility. Zuul runs on AWS and routes traffic based on sophisticated rules: A/B test assignments, canary deployments, regional failover, and user-specific routing. Netflix’s experience demonstrates that at sufficient scale, you often need custom solutions—but they built Zuul on top of standard components (Netty, RxJava) rather than reinventing everything. Their architecture uses multiple load balancing tiers: AWS ELB for initial traffic distribution, Zuul for application-layer routing, and Ribbon (client-side load balancing) for service-to-service communication within their microservices mesh.
AWS Elastic Load Balancing processes trillions of requests per year and provides insight into cloud-native load balancing. AWS offers three types because different workloads need different capabilities. Application Load Balancer (Layer 7) handles HTTP/HTTPS with content-based routing, WebSocket support, and integration with AWS services like Auto Scaling and ECS. Network Load Balancer (Layer 4) provides ultra-low latency and handles millions of requests per second for TCP/UDP traffic. Classic Load Balancer is the legacy option that predates the Layer 4/7 split. AWS’s architecture is fully managed and automatically scales—you don’t configure server capacity, you just pay for usage. This shifts the operational burden from capacity planning to cost optimization. Companies like Airbnb and Slack use ALB extensively, with Slack running hundreds of load balancers to route traffic across their microservices.
The cost implications are significant. AWS ALB charges $0.0225 per hour per load balancer plus $0.008 per LCU (Load Balancer Capacity Unit, a measure of processed traffic). For a high-traffic service processing 1000 requests/second with 1KB average response size, you might pay $200-300/month per load balancer. Companies with dozens of microservices can spend thousands monthly on load balancing alone. This drives many companies to self-host Nginx or HAProxy once they reach scale—the operational complexity becomes cheaper than the AWS bill.
Cloudflare provides a different model: load balancing as part of their global CDN and DDoS protection service. Their load balancers run in 200+ data centers worldwide, routing traffic based on geographic proximity, server health, and real-time performance metrics. This distributed approach provides both load balancing and DDoS mitigation—malicious traffic gets filtered at the edge before reaching your infrastructure. Companies like Discord use Cloudflare’s load balancing to handle massive traffic spikes (like when a popular streamer goes live) without provisioning excess capacity in their own data centers.
The operational reality is that load balancers require ongoing tuning and monitoring. Health check configurations need adjustment as traffic patterns change. SSL certificate renewals must be automated (Let’s Encrypt integration is common). Connection pool sizes, timeout values, and retry policies all impact system behavior under load. Companies typically start with conservative defaults, then tune based on observed behavior. The monitoring setup is critical: you need visibility into request rates, error rates, latency percentiles, and backend server health. Tools like Prometheus, Grafana, and Datadog are standard for load balancer observability.
In interviews, referencing these real-world patterns demonstrates practical knowledge. Mentioning that Netflix built Zuul for dynamic routing or that AWS offers three load balancer types for different use cases shows you understand that load balancing isn’t one-size-fits-all—it’s a set of trade-offs that depend on scale, budget, and architectural requirements.
Multi-Tier Load Balancing Architecture (Netflix-Style)
graph TB
subgraph Internet
USERS["Users<br/><i>Billions of requests/day</i>"]
end
subgraph Edge Layer - AWS
DNS["Route 53<br/><i>DNS Routing</i>"]
ELB1["AWS ELB<br/><i>us-east-1</i>"]
ELB2["AWS ELB<br/><i>us-west-2</i>"]
ELB3["AWS ELB<br/><i>eu-west-1</i>"]
end
subgraph Application Layer - Zuul
ZUUL1["Zuul Gateway 1<br/><i>Dynamic Routing</i>"]
ZUUL2["Zuul Gateway 2<br/><i>A/B Testing</i>"]
ZUUL3["Zuul Gateway 3<br/><i>Canary Deploys</i>"]
end
subgraph Microservices - Client-Side LB
USER_SVC["User Service<br/><i>Ribbon LB</i>"]
VIDEO_SVC["Video Service<br/><i>Ribbon LB</i>"]
REC_SVC["Recommendation<br/><i>Ribbon LB</i>"]
end
subgraph Backend Instances
USER_POOL["User Service<br/>Instance Pool"]
VIDEO_POOL["Video Service<br/>Instance Pool"]
REC_POOL["Recommendation<br/>Instance Pool"]
end
USERS --"1. DNS Query"--> DNS
DNS --"2. Geographic Routing"--> ELB1
DNS --> ELB2
DNS --> ELB3
ELB1 --"3. Distribute"--> ZUUL1
ELB1 --> ZUUL2
ELB1 --> ZUUL3
ZUUL1 --"4. Route /api/users"--> USER_SVC
ZUUL1 --"4. Route /api/videos"--> VIDEO_SVC
ZUUL1 --"4. Route /api/recommendations"--> REC_SVC
USER_SVC --"5. Client-side LB"--> USER_POOL
VIDEO_SVC --"5. Client-side LB"--> VIDEO_POOL
REC_SVC --"5. Client-side LB"--> REC_POOL
Netflix uses multiple load balancing tiers: AWS ELB for geographic distribution, Zuul for application-layer routing (A/B tests, canary deployments), and Ribbon for client-side load balancing between microservices. This multi-tier approach provides redundancy, intelligent routing, and resilience at scale.
Interview Essentials
Mid-Level
At the mid-level, interviewers expect you to explain load balancers as a solution to the “single server” problem. You should articulate that load balancers distribute traffic across multiple servers to prevent overload and enable horizontal scaling. Explain the basic architecture: clients connect to the load balancer, which forwards requests to backend servers based on some distribution algorithm. Mention health checks as the mechanism for detecting failed servers and automatically routing traffic away from them.
You should be able to discuss the benefits: preventing server overload, eliminating single points of failure (though the load balancer itself becomes one), enabling zero-downtime deployments, and providing a single entry point for clients. Acknowledge the trade-offs: load balancers add complexity, can become bottlenecks if not properly sized, and introduce an additional failure point. Mention common solutions like Nginx, HAProxy, and AWS ELB/ALB.
When designing a system, place the load balancer between clients and application servers in your architecture diagram. Explain that you’d configure health checks to detect server failures and use an appropriate algorithm (round-robin for simple cases, least-connections if request costs vary). If asked about SSL, mention that load balancers can handle SSL termination to offload encryption work from backend servers. The key is demonstrating that you understand load balancers as a fundamental building block for scalable systems, not just a box you draw in diagrams.
Senior
Senior engineers should demonstrate deep understanding of load balancer mechanics and trade-offs. Discuss the Layer 4 vs Layer 7 distinction and when to use each: Layer 4 for high-performance, protocol-agnostic load balancing; Layer 7 for content-based routing and HTTP-specific features. Explain how this choice impacts your architecture—Layer 7 enables microservices routing patterns but adds latency and CPU overhead.
Dive into health check design: active vs passive checks, the trade-off between detection speed and false positives, and how to design health check endpoints that verify critical dependencies without creating cascading failures. Discuss failure scenarios: what happens when a server fails mid-request? How do you handle gradual degradation (server slowing down but not fully failed)? Explain connection draining and graceful shutdown patterns.
Address the load balancer redundancy problem: a single load balancer is a single point of failure, so production systems need redundant load balancers in active-passive or active-active configurations. Discuss how this is typically implemented (VRRP, DNS-based failover, cloud provider managed solutions). Explain the trade-offs between hardware and software load balancers, and when you’d choose cloud-managed services vs self-hosted solutions.
For capacity planning, discuss how to size load balancers based on expected traffic. Mention that load balancers have limits (connections per second, bandwidth, SSL handshakes per second) and how these limits impact system design. If designing a high-traffic system, explain how you’d use multiple tiers of load balancers or geographic distribution to handle scale. Reference real-world examples: “Netflix uses Zuul for application-layer routing” or “AWS ALB automatically scales but charges per LCU, which impacts cost at scale.”
Layer 4 vs Layer 7 Load Balancing
graph TB
subgraph Layer 4 - Transport Layer
L4_CLIENT["Client"] --"TCP/UDP Packet"--> L4_LB["Layer 4 LB<br/><i>IP:Port Only</i>"]
L4_LB --"Fast Forwarding<br/>Protocol Agnostic"--> L4_S1["Backend 1"]
L4_LB --> L4_S2["Backend 2"]
L4_LB --> L4_S3["Backend 3"]
L4_FEATURES["Features:<br/>• TCP/UDP load balancing<br/>• IP/Port-based routing<br/>• Protocol agnostic<br/>• Ultra-low latency<br/>• High throughput"]
L4_USE["Use Cases:<br/>• Non-HTTP protocols<br/>• Database connections<br/>• WebSocket/gRPC<br/>• Maximum performance"]
end
subgraph Layer 7 - Application Layer
L7_CLIENT["Client"] --"HTTP Request<br/>GET /api/users"--> L7_LB["Layer 7 LB<br/><i>Content Inspection</i>"]
L7_LB --"/api/* requests"--> L7_API["API Servers"]
L7_LB --"/images/* requests"--> L7_CDN["CDN Origin"]
L7_LB --"/admin/* requests"--> L7_ADMIN["Admin Servers"]
L7_FEATURES["Features:<br/>• HTTP header inspection<br/>• URL-based routing<br/>• SSL termination<br/>• Request rewriting<br/>• Cookie/session handling"]
L7_USE["Use Cases:<br/>• Microservices routing<br/>• Content-based routing<br/>• A/B testing<br/>• API gateway patterns"]
end
COMPARE["Trade-off:<br/>L4 = Speed & Simplicity<br/>L7 = Intelligence & Features"]
Layer 4 load balancers operate at the transport layer, routing based on IP and port without inspecting packet contents—fast but limited. Layer 7 load balancers parse HTTP requests to enable content-based routing and advanced features—powerful but adds latency. Choose based on whether you need protocol flexibility (L4) or intelligent routing (L7).
Staff+
Staff+ engineers should demonstrate strategic thinking about load balancing in the context of overall system architecture and organizational constraints. Discuss how load balancing strategy evolves with company growth: startups use cloud-managed load balancers for simplicity, mid-size companies might self-host to control costs, and large companies often build custom solutions for specific needs (like Netflix’s Zuul).
Address the service mesh evolution: how client-side load balancing (like Netflix Ribbon) and sidecar proxies (like Envoy) represent a shift from centralized load balancers to distributed, application-aware routing. Explain when this complexity is justified (microservices at scale) vs when it’s premature optimization. Discuss the operational trade-offs: service meshes provide better observability and fine-grained control but require significant operational expertise.
Explain how load balancing intersects with other architectural concerns: caching strategies (consistent hashing for cache-friendly routing), security (load balancers as enforcement points for rate limiting and authentication), and observability (load balancers as natural points for metrics collection and distributed tracing). Discuss how to design for multi-region deployments: global load balancing with geographic routing, handling cross-region failover, and managing data consistency across regions.
Address the cost vs control trade-off at scale. Explain how to evaluate whether to use cloud-managed load balancers (simple but expensive at scale) vs self-hosted solutions (complex but cost-effective). Discuss the hidden costs: operational expertise, monitoring infrastructure, and incident response capabilities. Reference specific cost models: “AWS ALB charges per LCU, which includes processed bytes, connections, and rule evaluations—at 10,000 requests/second, this can cost $X/month, making self-hosted Nginx attractive despite operational overhead.”
For complex scenarios, discuss advanced patterns: canary deployments with traffic splitting, A/B testing infrastructure, gradual rollouts with automated rollback, and chaos engineering approaches to validate failover mechanisms. Explain how to design load balancing infrastructure that supports rapid iteration and safe deployment practices. The expectation is that you can architect load balancing solutions that balance technical requirements, operational constraints, and business needs while considering the full lifecycle from initial design through long-term operation and evolution.
Common Interview Questions
How would you design a load balancer for a high-traffic web application? Start with requirements: expected traffic (requests/second), availability target (99.9% vs 99.99%), and budget constraints. For cloud deployment, recommend AWS ALB or similar managed service for simplicity unless cost is prohibitive. Explain you’d configure health checks (HTTP GET to /health endpoint, 10-second interval, 2-failure threshold), use least-connections or round-robin algorithm depending on request uniformity, and enable SSL termination. Discuss redundancy: managed load balancers typically run across multiple availability zones automatically. For self-hosted, you’d deploy redundant HAProxy or Nginx instances with VRRP for failover. Mention monitoring: track request rate, error rate, latency (p50, p95, p99), and backend server health. Scale the load balancer tier as traffic grows—either by upgrading instance size or adding more load balancers behind DNS.
What happens when a backend server fails? Walk me through the failure detection and recovery process. Explain the health check mechanism: the load balancer sends periodic probes (e.g., HTTP GET /health every 10 seconds). If a server fails to respond within the timeout (e.g., 5 seconds) for consecutive checks (e.g., 2 failures), it’s marked unhealthy. The load balancer immediately stops routing new requests to it. Existing connections might be terminated or allowed to complete depending on configuration. Traffic redistributes across remaining healthy servers. The failed server continues receiving health checks. Once it passes consecutive checks (e.g., 2 successes), it’s marked healthy and gradually receives traffic again. Discuss the trade-off: aggressive health checks detect failures quickly but risk false positives during temporary slowdowns; conservative checks avoid false positives but leave failed servers in rotation longer.
How do you handle session persistence with load balancers? Explain that stateful applications storing session data locally need requests from the same client to reach the same server. Describe three approaches: (1) Cookie-based persistence—the load balancer sets a cookie identifying the backend server, then uses that cookie to route subsequent requests. (2) Source IP hashing—the load balancer hashes the client’s IP address to consistently route to the same server. (3) Application-level session management—store sessions in a shared data store (Redis, DynamoDB) so any server can handle any request. Recommend approach #3 for new systems because it’s more resilient (server failures don’t lose sessions) and enables better load distribution. Approaches #1 and #2 are useful for legacy applications that can’t be refactored. Mention the trade-off: sticky sessions can create uneven load distribution if some users generate more traffic than others.
What’s the difference between Layer 4 and Layer 7 load balancing, and when would you use each? Layer 4 operates at the transport layer (TCP/UDP), making routing decisions based on IP addresses and ports without inspecting packet contents. It’s fast, protocol-agnostic, and works with any TCP/UDP traffic (HTTP, WebSocket, gRPC, databases). Layer 7 operates at the application layer (HTTP/HTTPS), parsing request contents to route based on URLs, headers, cookies, or methods. It enables content-based routing (send /api to API servers, /images to CDN), SSL termination, and request rewriting. Use Layer 4 when you need maximum performance, protocol flexibility, or are load balancing non-HTTP traffic. Use Layer 7 when you need intelligent routing for microservices, want SSL termination, or need to route based on request content. Mention the trade-off: Layer 7 adds latency (parsing HTTP) and CPU overhead but provides much more routing flexibility.
How would you prevent the load balancer itself from becoming a single point of failure? Explain that a single load balancer creates a new SPOF even if backend servers are redundant. Describe redundancy approaches: (1) Active-passive—deploy two load balancers with VRRP or similar protocol. The primary handles all traffic; the secondary monitors the primary and takes over if it fails. (2) Active-active—deploy multiple load balancers with DNS round-robin or anycast routing. All handle traffic simultaneously, providing both redundancy and additional capacity. (3) Cloud-managed—services like AWS ALB automatically run across multiple availability zones with built-in redundancy. Mention that you’d also need to consider the DNS layer—if your DNS provider fails, clients can’t resolve your domain even if load balancers are healthy. Recommend using a reliable DNS provider with their own redundancy (Route53, Cloudflare). The key insight: achieving high availability requires redundancy at every layer, and each layer of redundancy adds operational complexity.
Red Flags to Avoid
Not understanding that load balancers enable horizontal scaling. If you describe load balancers as “just for high availability” without connecting them to scaling strategy, it suggests you don’t understand their primary purpose. Load balancers are what make horizontal scaling possible—without them, you can’t distribute traffic across multiple servers.
Ignoring the load balancer as a potential bottleneck or single point of failure. Candidates who draw a single load balancer in their architecture without discussing its capacity limits or redundancy show incomplete thinking. Production systems need redundant load balancers and capacity planning to ensure the load balancer tier doesn’t become the bottleneck.
Confusing load balancers with reverse proxies or not understanding the relationship. While the terms are related and many tools serve both roles, they have different primary purposes. See Load Balancer vs Reverse Proxy for the distinction. Conflating them or not knowing the difference suggests surface-level understanding.
Not considering health checks or explaining how failures are detected. Load balancers without health checks are just traffic distributors—they’ll route requests to failed servers until manually reconfigured. If you don’t mention health checks when discussing load balancers, it shows you don’t understand how they provide fault tolerance.
Proposing Layer 7 load balancing for everything without considering the performance trade-off. Layer 7 provides powerful routing capabilities but adds latency and CPU overhead. Candidates who default to Layer 7 without discussing the trade-off or considering whether Layer 4 would suffice show they’re pattern-matching rather than thinking critically about requirements.
Not understanding SSL termination or where encryption/decryption happens. SSL termination is a common load balancer feature with security and performance implications. Not knowing whether traffic between the load balancer and backends is encrypted, or why you might terminate SSL at the load balancer, suggests lack of production experience.
Key Takeaways
Load balancers are the foundational component that enables horizontal scaling by distributing traffic across multiple servers. Without them, you’re limited to vertical scaling (bigger servers) which has hard limits. With them, you can scale to billions of requests by adding more servers.
Health checks transform load balancers from simple traffic distributors into fault-tolerant systems. By continuously monitoring backend servers and automatically routing traffic away from failures, load balancers enable high availability and zero-downtime deployments. The health check configuration (interval, timeout, threshold) directly impacts both availability and false positive rates.
The Layer 4 vs Layer 7 distinction is fundamental: Layer 4 provides fast, protocol-agnostic load balancing based on IP/port, while Layer 7 enables intelligent routing based on HTTP request content. Choose Layer 4 for performance and simplicity, Layer 7 for microservices routing and application-aware features. Many production systems use both—Layer 4 for initial distribution, Layer 7 for application routing.
Load balancers themselves can become single points of failure and performance bottlenecks. Production systems require redundant load balancers (active-passive or active-active) and capacity planning to ensure the load balancer tier scales with traffic. Cloud-managed services handle this automatically but at a cost; self-hosted solutions require operational expertise.
Real-world load balancing involves trade-offs between simplicity (cloud-managed), cost (self-hosted), and capabilities (hardware vs software). Companies typically start with managed services for simplicity, then evaluate self-hosting as costs scale. At sufficient scale, custom solutions (like Netflix’s Zuul) become justified. The right choice depends on traffic volume, budget, and operational maturity.