API Gateway Pattern: Rate Limiting, Auth & Routing

intermediate 13 min read Updated 2026-02-11

After this topic, you will be able to:

  • Evaluate the trade-offs between API gateway patterns and direct service-to-service communication
  • Design API gateway routing strategies for different client types (web, mobile, third-party)
  • Assess when API gateway becomes a bottleneck and recommend mitigation strategies
  • Justify the placement of cross-cutting concerns (auth, rate limiting, logging) in the gateway layer

TL;DR

An API Gateway is a single entry point that sits between clients and backend microservices, handling cross-cutting concerns like authentication, rate limiting, request routing, and protocol translation. It solves the problem of clients needing to know about and communicate with dozens of microservices directly, instead providing a unified interface that simplifies client logic and centralizes operational concerns. Think of it as a smart reverse proxy that understands your application’s business logic, not just network routing.

The Problem It Solves

When you decompose a monolith into microservices, you create a new problem: clients now need to communicate with dozens of services instead of one. A mobile app that previously made one API call to /api/user-profile might now need to orchestrate calls to user-service, preferences-service, subscription-service, and recommendation-service, then stitch the responses together. Each service might use different protocols (REST, gRPC, GraphQL), require separate authentication, and have different rate limits. The client becomes a distributed systems expert overnight.

Worse, cross-cutting concerns like authentication, logging, and rate limiting get duplicated across every service. When you need to rotate API keys or change your rate limiting strategy, you’re updating 50 services instead of one place. Security becomes a nightmare because every service needs to validate tokens, check permissions, and log access attempts. Netflix faced this exact problem in 2012 when they had hundreds of microservices and clients (web, mobile, smart TVs, game consoles) that each needed different data shapes and performance characteristics.

The final pain point is operational complexity. Without a gateway, you expose internal service topology to clients. When you refactor services or change deployment strategies, clients break. When a service goes down, clients get cryptic errors instead of graceful degradation. You’ve traded monolith complexity for distributed chaos.

Client Complexity Without API Gateway

graph TB
    subgraph "Without API Gateway"
        Mobile["Mobile App"]
        Web["Web Browser"]
        
        Mobile -->|"1. Auth + /users/123"| UserSvc["User Service<br/><i>REST</i>"]
        Mobile -->|"2. Auth + /preferences"| PrefSvc["Preferences Service<br/><i>gRPC</i>"]
        Mobile -->|"3. Auth + /subscription"| SubSvc["Subscription Service<br/><i>GraphQL</i>"]
        Mobile -->|"4. Auth + /recommendations"| RecSvc["Recommendation Service<br/><i>REST</i>"]
        Mobile -.->|"5. Stitch responses"| Mobile
        
        Web -->|"Auth + Multiple calls"| UserSvc
        Web -->|"Auth + Multiple calls"| PrefSvc
        Web -->|"Auth + Multiple calls"| SubSvc
        Web -->|"Auth + Multiple calls"| RecSvc
    end

Without an API Gateway, clients must orchestrate calls to multiple services with different protocols, handle authentication for each service, and aggregate responses themselves. This creates tight coupling between clients and service topology.

Solution Overview

An API Gateway acts as a reverse proxy with application-layer intelligence, providing a single entry point for all client requests. Instead of clients calling services directly, they call the gateway, which handles routing, composition, and transformation. The gateway owns cross-cutting concerns: it authenticates requests once, enforces rate limits globally, logs every transaction, and translates between protocols.

The gateway pattern centralizes three critical functions. First, request routing and composition: the gateway knows which backend services to call and can aggregate multiple service responses into a single client response. When a mobile app requests /api/home-feed, the gateway might call content-service, ads-service, and personalization-service in parallel, then merge the results. Second, protocol translation: the gateway speaks HTTP/REST to clients but can call backend services using gRPC, GraphQL, or message queues. Third, policy enforcement: authentication, authorization, rate limiting, and request validation happen once at the gateway, not in every service.

The gateway becomes your API’s public face. Internal services can evolve independently—you can split a service, change its API contract, or migrate to a new technology—without breaking clients. The gateway adapts the internal changes to maintain a stable external contract. This is why companies like Stripe can maintain API backwards compatibility for years while completely rewriting their backend architecture.

API Gateway as Unified Entry Point

graph LR
    subgraph "Client Layer"
        Mobile["Mobile App"]
        Web["Web Browser"]
        Partner["Partner API"]
    end
    
    subgraph "API Gateway"
        Gateway["API Gateway<br/><i>Single Entry Point</i>"]
        Auth["Authentication<br/>& Authorization"]
        RateLimit["Rate Limiting"]
        Router["Request Router<br/>& Aggregator"]
        
        Gateway --> Auth
        Auth --> RateLimit
        RateLimit --> Router
    end
    
    subgraph "Backend Services"
        UserSvc["User Service<br/><i>gRPC</i>"]
        PrefSvc["Preferences<br/><i>gRPC</i>"]
        SubSvc["Subscription<br/><i>gRPC</i>"]
        RecSvc["Recommendations<br/><i>gRPC</i>"]
    end
    
    Mobile -->|"HTTPS/REST"| Gateway
    Web -->|"HTTPS/REST"| Gateway
    Partner -->|"HTTPS/REST"| Gateway
    
    Router -->|"gRPC"| UserSvc
    Router -->|"gRPC"| PrefSvc
    Router -->|"gRPC"| SubSvc
    Router -->|"gRPC"| RecSvc

The API Gateway provides a single entry point that handles cross-cutting concerns (authentication, rate limiting) and routes requests to backend services. Clients use simple REST APIs while the gateway handles protocol translation and service orchestration.

How It Works

Let’s walk through a real request flow to understand how API gateways operate in practice. When a client makes a request, the gateway processes it through several stages:

Stage 1: Request Reception and Protocol Handling. The gateway receives an HTTPS request from a mobile app: GET /api/v2/user/profile. It terminates TLS, parses the HTTP request, and extracts headers like Authorization: Bearer <token> and X-Client-Version: iOS-2.3.1. This is where protocol translation begins—the gateway might receive REST but will call backend services using gRPC for efficiency.

Stage 2: Authentication and Authorization. The gateway validates the JWT token in the Authorization header. Instead of every service implementing JWT validation, the gateway does it once, verifying the signature against a public key and checking expiration. If valid, it extracts the user ID (say, user_id: 12345) and adds it to the request context. For authorization, the gateway checks if this user has permission to access profile data—perhaps consulting a policy service or checking cached permissions. This prevents unauthorized requests from ever reaching backend services.

Stage 3: Rate Limiting and Throttling. Before routing the request, the gateway checks rate limits. It might enforce “100 requests per minute per user” by checking a Redis counter keyed on user:12345. If the limit is exceeded, it returns 429 Too Many Requests immediately, protecting backend services from overload. See Rate Limiting for detailed algorithm implementations.

Stage 4: Request Routing and Service Discovery. The gateway consults its routing table to determine which backend services to call. For /api/v2/user/profile, it might need data from three services: user-service (basic profile), subscription-service (plan details), and preferences-service (settings). The gateway uses service discovery to find healthy instances of each service—see Service Discovery for how this works. It then makes parallel requests: gRPC call to user-service.GetUser(12345), gRPC call to subscription-service.GetSubscription(12345), and gRPC call to preferences-service.GetPreferences(12345).

Stage 5: Response Aggregation and Transformation. The gateway receives three gRPC responses and aggregates them into a single JSON response for the client. It might apply transformations: filtering sensitive fields (like internal IDs), renaming fields for API compatibility, or adding computed fields. If subscription-service times out, the gateway can return a partial response with a degraded experience rather than failing the entire request. This is graceful degradation in action.

Stage 6: Caching and Response. Before returning the response, the gateway might cache it in Redis with a key like profile:12345:v2 and a 60-second TTL. Subsequent requests for the same profile can be served from cache without hitting backend services. Finally, it adds response headers (like X-RateLimit-Remaining: 87), logs the transaction with latency metrics, and returns the response to the client.

This entire flow happens in milliseconds. At Netflix, their Zuul gateway processes millions of requests per second, routing to thousands of backend services while maintaining sub-10ms latency overhead.

Request Flow Through API Gateway

sequenceDiagram
    participant Client as Mobile App
    participant Gateway as API Gateway
    participant Cache as Redis Cache
    participant UserSvc as User Service
    participant SubSvc as Subscription Service
    participant PrefSvc as Preferences Service
    
    Client->>Gateway: 1. GET /api/v2/user/profile<br/>Authorization: Bearer <token>
    
    Gateway->>Gateway: 2. Validate JWT token<br/>Extract user_id: 12345
    
    Gateway->>Cache: 3. Check rate limit<br/>key: user:12345
    Cache-->>Gateway: 87/100 requests used
    
    Gateway->>Cache: 4. Check cache<br/>key: profile:12345:v2
    Cache-->>Gateway: Cache miss
    
    par Parallel Service Calls
        Gateway->>UserSvc: 5a. gRPC GetUser(12345)
        Gateway->>SubSvc: 5b. gRPC GetSubscription(12345)
        Gateway->>PrefSvc: 5c. gRPC GetPreferences(12345)
    end
    
    UserSvc-->>Gateway: User data
    SubSvc-->>Gateway: Subscription data
    PrefSvc-->>Gateway: Preferences data
    
    Gateway->>Gateway: 6. Aggregate & transform<br/>Filter sensitive fields
    
    Gateway->>Cache: 7. Cache response<br/>TTL: 60s
    
    Gateway->>Client: 8. JSON response<br/>X-RateLimit-Remaining: 87

A complete request flow showing the six stages: authentication, rate limiting, cache check, parallel service calls, response aggregation, and caching. The gateway adds 5-20ms overhead but eliminates client-side orchestration complexity.

Variants

API Gateway patterns come in several flavors, each optimized for different architectural needs:

Single Gateway (Monolithic Gateway). One gateway instance handles all client types and routes to all backend services. This is the simplest approach: deploy Kong, AWS API Gateway, or a custom Node.js/Go service that knows about your entire service topology. When to use: Small to medium systems with fewer than 20 microservices and similar client needs. Pros: Simple to operate, easy to reason about, centralized configuration. Cons: Becomes a bottleneck as you scale, requires redeployment for any routing change, and couples all teams to a single gateway codebase.

Backends for Frontends (BFF). Instead of one gateway, you deploy separate gateways for each client type: web-gateway, mobile-gateway, partner-gateway. Each BFF is optimized for its client’s needs. The mobile BFF might aggregate more aggressively to reduce round trips over cellular networks, while the web BFF returns richer data assuming high bandwidth. When to use: When different clients need significantly different data shapes or have different performance requirements. Netflix uses this pattern extensively—their TV gateway returns pre-rendered UI components, while their mobile gateway returns raw data. Pros: Each BFF can evolve independently, optimized for its client. Cons: More operational overhead, potential code duplication across BFFs.

Micro-Gateway (Sidecar Pattern). Instead of a centralized gateway, each service gets its own lightweight gateway deployed as a sidecar container. This is the bridge between API Gateway and service mesh patterns. When to use: When you need gateway functionality but want to avoid a central bottleneck, or when transitioning to a service mesh. Pros: No single point of failure, scales naturally with services. Cons: More complex to configure consistently, harder to enforce global policies.

GraphQL Gateway. The gateway exposes a GraphQL API to clients, allowing them to request exactly the data they need. The gateway resolves GraphQL queries by calling multiple REST or gRPC services. When to use: When clients need flexible data fetching and you want to avoid over-fetching. Shopify uses GraphQL gateways to let third-party apps query exactly the store data they need. Pros: Eliminates over-fetching, reduces API versioning burden. Cons: Adds query complexity, requires careful performance tuning to avoid N+1 queries.

Backends-for-Frontends (BFF) Pattern

graph TB
    subgraph "Client Layer"
        Mobile["Mobile App<br/><i>iOS/Android</i>"]
        Web["Web Browser<br/><i>React SPA</i>"]
        TV["Smart TV<br/><i>Roku/Fire TV</i>"]
        Partner["Partner API<br/><i>Third-party</i>"]
    end
    
    subgraph "BFF Layer"
        MobileGW["Mobile Gateway<br/><i>Aggressive aggregation</i><br/><i>Optimized for cellular</i>"]
        WebGW["Web Gateway<br/><i>Rich data responses</i><br/><i>Real-time updates</i>"]
        TVGW["TV Gateway<br/><i>Pre-rendered UI</i><br/><i>Large payloads</i>"]
        PartnerGW["Partner Gateway<br/><i>Strict rate limits</i><br/><i>API versioning</i>"]
    end
    
    subgraph "Backend Services"
        UserSvc["User Service"]
        ContentSvc["Content Service"]
        AdsSvc["Ads Service"]
        RecSvc["Recommendations"]
    end
    
    Mobile --> MobileGW
    Web --> WebGW
    TV --> TVGW
    Partner --> PartnerGW
    
    MobileGW --> UserSvc
    MobileGW --> ContentSvc
    MobileGW --> RecSvc
    
    WebGW --> UserSvc
    WebGW --> ContentSvc
    WebGW --> AdsSvc
    WebGW --> RecSvc
    
    TVGW --> ContentSvc
    TVGW --> AdsSvc
    TVGW --> RecSvc
    
    PartnerGW --> UserSvc
    PartnerGW --> ContentSvc

The BFF pattern deploys separate gateways for each client type, allowing each to optimize for its specific needs. Mobile gateways aggregate aggressively to reduce cellular round trips, while TV gateways return pre-rendered UI components.

API Gateway vs Service Mesh

API Gateway and Service Mesh are often confused because both handle traffic management, but they solve different problems and operate at different layers. Understanding the distinction is crucial for system design interviews.

API Gateway handles north-south traffic: requests from external clients (browsers, mobile apps, third-party APIs) into your system. It’s your public API surface, focused on client-facing concerns like authentication, rate limiting, API versioning, and request transformation. The gateway sits at the edge of your infrastructure, often in a DMZ, and makes decisions based on business logic: “Mobile clients get aggregated responses, web clients get full data.”

Service Mesh handles east-west traffic: service-to-service communication within your cluster. It’s infrastructure-level, focused on reliability, observability, and security between microservices. A service mesh like Istio or Linkerd uses sidecar proxies to handle retries, circuit breaking, mutual TLS, and distributed tracing between services. It makes decisions based on operational concerns: “Retry failed requests 3 times with exponential backoff.”

The key difference is who the client is. API Gateway serves external clients who you don’t control—they might be on slow mobile networks, using old app versions, or making malicious requests. Service Mesh serves internal services that you do control—they’re on fast networks, use consistent protocols, and are trusted.

When to use each: Use an API Gateway when you need to expose APIs to external clients, aggregate data from multiple services, or enforce business-level policies. Use a Service Mesh when you have complex microservices communication patterns and need consistent observability and reliability across all service-to-service calls. Many large systems use both: Uber has API Gateways at the edge for client requests and a service mesh (built on Envoy) for internal service communication. The gateway handles “Is this user authenticated?” while the mesh handles “Did the payment service call succeed or should we retry?”

API Gateway vs Service Mesh Traffic Patterns

graph TB
    subgraph "External Clients"
        Mobile["Mobile App"]
        Web["Web Browser"]
    end
    
    subgraph "Edge Layer - North-South Traffic"
        Gateway["API Gateway<br/><i>Authentication</i><br/><i>Rate Limiting</i><br/><i>API Versioning</i>"]
    end
    
    subgraph "Service Mesh - East-West Traffic"
        subgraph "Service A Pod"
            SvcA["Service A"]
            ProxyA["Sidecar Proxy<br/><i>mTLS</i><br/><i>Retries</i><br/><i>Circuit Breaking</i>"]
        end
        
        subgraph "Service B Pod"
            SvcB["Service B"]
            ProxyB["Sidecar Proxy<br/><i>mTLS</i><br/><i>Retries</i><br/><i>Circuit Breaking</i>"]
        end
        
        subgraph "Service C Pod"
            SvcC["Service C"]
            ProxyC["Sidecar Proxy<br/><i>mTLS</i><br/><i>Retries</i><br/><i>Circuit Breaking</i>"]
        end
    end
    
    Mobile -->|"HTTPS<br/>Untrusted"| Gateway
    Web -->|"HTTPS<br/>Untrusted"| Gateway
    
    Gateway -->|"Business Logic<br/>Routing"| ProxyA
    
    ProxyA <-->|"mTLS<br/>Trusted"| SvcA
    ProxyB <-->|"mTLS<br/>Trusted"| SvcB
    ProxyC <-->|"mTLS<br/>Trusted"| SvcC
    
    ProxyA <-->|"Infrastructure<br/>Concerns"| ProxyB
    ProxyB <-->|"Infrastructure<br/>Concerns"| ProxyC

API Gateway handles north-south traffic from external clients with business logic (authentication, API versioning). Service Mesh handles east-west traffic between internal services with infrastructure concerns (mTLS, retries, circuit breaking). Large systems use both.

Trade-offs

API Gateway introduces significant trade-offs that you must evaluate based on your system’s needs:

Latency vs Simplicity. Every request through the gateway adds latency—typically 5-20ms for routing, authentication, and logging. For a system making 10 internal service calls, that’s 5-20ms overhead on top of your business logic. Decision criteria: If you’re building a high-frequency trading system where every millisecond counts, a gateway might be too slow. If you’re building a mobile app where network latency is already 100-500ms, the gateway overhead is negligible. Netflix accepts gateway latency because the simplification of client logic and operational benefits far outweigh 10ms overhead.

Single Point of Failure vs Operational Simplicity. A centralized gateway is a single point of failure. If it goes down, your entire API surface is unavailable, even if backend services are healthy. Mitigation strategies: Deploy gateways in multiple availability zones with automatic failover, use health checks and circuit breakers, and implement aggressive caching so the gateway can serve stale data during outages. Decision criteria: For systems requiring 99.99% uptime, you need redundant gateways with sophisticated failover. For internal tools, a single gateway with manual failover might suffice.

Flexibility vs Consistency. Gateways centralize cross-cutting concerns, which is great for consistency but reduces flexibility. If one team needs a custom authentication flow, they’re constrained by the gateway’s capabilities. Decision criteria: Use a single gateway when consistency is paramount (e.g., all APIs must have identical rate limiting). Use the BFF pattern when teams need flexibility (e.g., mobile team needs aggressive caching, web team needs real-time data).

Gateway Logic vs Service Logic. Where do you put business logic? If the gateway does too much (complex aggregation, business rules), it becomes a distributed monolith. If it does too little (just routing), you lose the benefits of centralization. Decision criteria: Keep the gateway focused on cross-cutting concerns and simple aggregation. Complex business logic belongs in services. Stripe’s gateway handles authentication and rate limiting but delegates payment processing logic to backend services.

Vendor Lock-in vs Time to Market. Managed gateways (AWS API Gateway, Google Cloud Endpoints) are fast to deploy but lock you into a vendor’s ecosystem. Open-source gateways (Kong, Tyk) give you control but require operational expertise. Decision criteria: For startups, managed gateways accelerate time to market. For large enterprises with specific requirements, open-source gateways provide necessary flexibility.

When to Use (and When Not To)

API Gateway is the right choice when you have multiple microservices and need to provide a unified API to external clients. Specifically, use an API Gateway when:

You have more than 5-10 microservices that clients need to interact with. Below this threshold, clients can call services directly without much complexity. Above it, the orchestration burden becomes significant. If your mobile app needs to make 15 API calls on startup to render the home screen, you need a gateway to aggregate those calls.

You have multiple client types (web, mobile, IoT, third-party) with different needs. The BFF pattern shines here. If your iOS app needs different data than your Android app, and both differ from your web app, separate BFFs let each client get exactly what it needs without over-fetching.

You need to enforce consistent policies across all APIs. If every service must check authentication, enforce rate limits, and log requests, a gateway prevents duplication. Without it, you’re implementing the same logic in 50 services, and when you need to change your auth strategy, you’re updating 50 codebases.

You’re migrating from monolith to microservices. The gateway provides a stable API contract while you decompose services behind it. Clients continue calling /api/users while you split the user service into user-profile, user-auth, and user-preferences services. The gateway hides the internal refactoring.

Anti-patterns to avoid: Don’t use an API Gateway if you have a simple monolith with no plans to decompose—you’re adding complexity for no benefit. Don’t put heavy business logic in the gateway; it should orchestrate, not compute. Don’t use a gateway for internal service-to-service communication; that’s what service meshes are for (see Microservices for internal communication patterns). And don’t use a single gateway for systems with vastly different SLA requirements—your high-availability payment API shouldn’t share a gateway with your best-effort analytics API.

Real-World Examples

company: Netflix system: Zuul API Gateway how_they_use_it: Netflix built Zuul as their edge service to handle billions of requests daily from 800+ device types (smart TVs, game consoles, mobile apps, web browsers). Zuul routes requests to thousands of backend microservices, handles authentication, performs dynamic request routing based on device capabilities, and implements sophisticated resilience patterns like adaptive retries and request hedging. When a backend service degrades, Zuul automatically routes traffic to healthy instances or returns cached responses. interesting_detail: Netflix’s Zuul gateway performs real-time A/B testing by routing a percentage of traffic to experimental service versions. They can deploy a new recommendation algorithm to 1% of users through gateway configuration without touching backend services. Zuul also implements ‘chaos engineering’ at the gateway level, randomly injecting latency or failures to test system resilience. This gateway-level experimentation capability lets Netflix iterate rapidly while maintaining stability for the majority of users.

company: Stripe system: API Gateway for Payment Processing how_they_use_it: Stripe’s API Gateway provides a stable REST API to millions of developers while the backend has evolved through multiple architectural generations. The gateway handles API versioning (Stripe maintains backwards compatibility for years), rate limiting per API key, request validation, and idempotency key management. It routes payment requests to different backend services based on payment method, geography, and merchant configuration. The gateway also implements sophisticated retry logic for transient failures, ensuring payment requests succeed even when individual services hiccup. interesting_detail: Stripe’s gateway implements ‘API versioning through transformation’ rather than maintaining multiple backend versions. When a developer uses an old API version (say, 2019-05-16), the gateway transforms their request to the current internal format, calls modern backend services, then transforms the response back to the old format. This means Stripe can evolve their backend freely while maintaining a stable contract with millions of API clients. The gateway also enforces idempotency: if a client retries a payment request with the same idempotency key, the gateway returns the cached result instead of charging the customer twice.


Interview Essentials

Mid-Level

At the mid-level, you should be able to explain what an API Gateway is and why it’s useful. Focus on the basics: it’s a single entry point that handles authentication, routing, and aggregation. Be ready to draw a simple diagram showing clients calling the gateway, which then routes to multiple backend services. Explain one or two cross-cutting concerns (like authentication or rate limiting) and why centralizing them in the gateway is better than implementing them in every service. You should understand the latency trade-off: the gateway adds overhead but simplifies client logic. If asked to design a system, propose a gateway when you have multiple microservices and external clients, and explain that it handles auth and routing.

Senior

Senior engineers must justify gateway design decisions with trade-off analysis. When would you use a single gateway vs BFF pattern? Explain that BFF makes sense when clients have significantly different needs—mobile needs aggressive aggregation, web needs real-time data. Discuss the single point of failure concern and mitigation strategies: deploy in multiple AZs, use health checks, implement circuit breakers. Be ready to discuss where to put business logic: simple aggregation in the gateway, complex logic in services. Explain how the gateway enables API versioning and backwards compatibility. You should also understand when NOT to use a gateway: internal service communication should use service mesh patterns, not gateway routing. Discuss performance implications: a gateway adds 5-20ms latency, which matters for high-frequency systems but is negligible for mobile apps. Be prepared to discuss caching strategies at the gateway level and how to handle cache invalidation.

Staff+

Staff+ engineers must design gateway architectures for specific organizational and technical constraints. Discuss the evolution from monolithic gateway to BFF to micro-gateway patterns and when each transition makes sense. Explain how gateway design impacts team autonomy: a centralized gateway can become a bottleneck if every team needs to modify it, while BFFs give teams independence at the cost of operational overhead. Discuss the gateway vs service mesh distinction deeply: gateways handle north-south traffic with business logic, meshes handle east-west traffic with infrastructure concerns, and large systems need both. Be ready to discuss advanced patterns like request hedging (sending duplicate requests to reduce tail latency), adaptive rate limiting (adjusting limits based on backend health), and gateway-level experimentation (A/B testing through routing rules). Explain how to design for multi-region deployments: should each region have its own gateway, or do you route globally? Discuss the organizational implications: who owns the gateway, how do teams deploy routing changes, and how do you prevent the gateway from becoming a distributed monolith? Finally, be ready to discuss observability: the gateway is the perfect place to implement distributed tracing, but you need to ensure trace context propagates to all backend services.

Common Interview Questions

How does an API Gateway differ from a load balancer? (Load balancers operate at Layer 4/7 and route based on network properties; gateways operate at Layer 7 with application logic, handling authentication, aggregation, and transformation.)

Where do you put authentication: gateway or services? (Gateway authenticates to verify identity, services authorize to check permissions. Gateway validates JWT tokens, services check if the user can access specific resources.)

How do you handle gateway failures? (Deploy in multiple AZs with automatic failover, use health checks, implement aggressive caching so the gateway can serve stale data, and use circuit breakers to prevent cascading failures.)

What’s the difference between API Gateway and Service Mesh? (Gateway handles external client traffic with business logic; service mesh handles internal service-to-service traffic with infrastructure concerns. Use both in large systems.)

How do you prevent the gateway from becoming a bottleneck? (Scale horizontally with multiple gateway instances behind a load balancer, use caching aggressively, implement request coalescing for duplicate requests, and push complex logic to backend services.)

Red Flags to Avoid

Putting heavy business logic in the gateway instead of keeping it focused on cross-cutting concerns and simple aggregation

Not discussing the single point of failure concern and mitigation strategies when proposing a centralized gateway

Confusing API Gateway with load balancer or reverse proxy—gateways have application-layer intelligence

Proposing a gateway for internal service-to-service communication instead of using service mesh patterns

Not considering the latency overhead of the gateway or discussing when that overhead is acceptable vs problematic

Ignoring API versioning and backwards compatibility concerns when designing the gateway’s API contract


Key Takeaways

API Gateway is a single entry point that handles cross-cutting concerns (authentication, rate limiting, logging) and routes requests to backend microservices, simplifying client logic and centralizing operational concerns.

The gateway adds 5-20ms latency overhead but provides massive operational benefits: consistent policy enforcement, simplified client code, and the ability to evolve backend services without breaking clients.

Use the Backends-for-Frontends (BFF) pattern when different client types (web, mobile, third-party) have significantly different needs; use a single gateway when consistency is more important than flexibility.

API Gateway handles north-south traffic (external clients to services) with business logic; Service Mesh handles east-west traffic (service-to-service) with infrastructure concerns. Large systems need both.

The gateway is a single point of failure that must be mitigated through multi-AZ deployment, health checks, aggressive caching, and circuit breakers. Don’t put heavy business logic in the gateway—it should orchestrate, not compute.