Gateway Routing Pattern: Route Requests to Services

TL;DR

Gateway Routing consolidates multiple backend services behind a single entry point, intelligently directing requests based on URL paths, headers, or other criteria. Think of it as a smart receptionist who knows exactly which department to send each visitor to. Essential for microservices architectures where you need to expose dozens of services through one unified API endpoint.

Cheat Sheet: Single endpoint → Multiple backends | Route by path/header/method | Simplifies client logic | Enables service versioning | Different from load balancing (distributes across instances) vs routing (directs to different services)

The Analogy

Imagine a large hospital with one main entrance. When you arrive, the receptionist asks about your needs and directs you to the right department: cardiology on floor 3, radiology in building B, emergency room through the red doors. You don’t need to know the hospital’s internal layout or which building houses which department. The receptionist (gateway) handles all that complexity, giving you simple directions based on your needs. If the hospital reorganizes departments or adds new wings, you still go to the same main entrance—the receptionist’s routing logic just updates behind the scenes.

Why This Matters in Interviews

Gateway Routing appears in virtually every microservices design discussion. Interviewers use it to assess whether you understand the difference between routing (directing to different services) and load balancing (distributing across instances of the same service). It’s a litmus test for microservices maturity: junior engineers often confuse it with a simple reverse proxy, while senior engineers discuss routing strategies, version migration patterns, and failure isolation. Expect this topic when designing APIs, discussing service mesh architectures, or explaining how clients interact with distributed systems. The depth of your answer reveals whether you’ve actually built production microservices or just read about them.

Core Concept

Gateway Routing is a design pattern where a single gateway component receives all client requests and routes them to appropriate backend services based on request attributes. Unlike a load balancer that distributes requests across multiple instances of the same service, a routing gateway directs requests to entirely different services based on URL paths, HTTP headers, query parameters, or request content.

This pattern emerged as a response to microservices proliferation. When Netflix transitioned from a monolithic architecture to hundreds of microservices, clients couldn’t reasonably maintain connections to every service. The API Gateway pattern (with routing as a core capability) solved this by providing a unified entry point. Clients make one connection to api.netflix.com, and the gateway routes /recommendations to the recommendation service, /playback to the streaming service, and /billing to the payment service.

The routing gateway sits at the edge of your system, acting as the single point of entry for all external traffic. It maintains a routing table that maps request patterns to backend service endpoints. When a request arrives, the gateway evaluates routing rules in order, finds the first match, and forwards the request to the corresponding service. This decouples clients from your internal service topology, allowing you to reorganize, scale, or replace backend services without changing client code.

How It Works

Step 1: Client Request Arrives A client sends an HTTP request to the gateway’s public endpoint (e.g., https://api.company.com/users/123/orders). The gateway receives this request on its edge network, typically behind a load balancer that distributes traffic across multiple gateway instances for high availability. At this point, the request hasn’t touched any backend service yet.

Step 2: Route Evaluation The gateway evaluates its routing rules against the incoming request. Rules are typically checked in priority order. For our example, the gateway might have rules like: (1) /users/{id}/orders → orders-service, (2) /users/{id}/profile → user-service, (3) /users/** → user-service (catch-all). The gateway matches the request path against these patterns, extracting path variables like the user ID.

Step 3: Service Discovery Once the gateway identifies the target service (orders-service), it needs the actual network address. In dynamic environments, services don’t have fixed IPs. The gateway queries a service registry (like Consul, Eureka, or Kubernetes DNS) to get current healthy instances of orders-service. The registry returns something like [orders-service-1:8080, orders-service-2:8080, orders-service-3:8080].

Step 4: Request Transformation Before forwarding, the gateway may transform the request. It might add authentication headers, strip sensitive information, rewrite URLs (changing /users/123/orders to /orders?userId=123 if the backend expects a different format), or inject tracing headers for observability. This transformation layer is crucial for maintaining backward compatibility when backend APIs evolve.

Step 5: Backend Invocation The gateway forwards the transformed request to one of the healthy orders-service instances (using load balancing to choose which instance). It establishes a connection, sends the request, and waits for a response. The gateway typically sets timeouts here—if the backend doesn’t respond within, say, 5 seconds, the gateway can fail fast rather than hanging indefinitely.

Step 6: Response Processing When the backend responds, the gateway receives the response and may transform it before returning to the client. This might include adding CORS headers, removing internal fields, or aggregating responses from multiple services if the gateway implements the Backend for Frontend pattern. The gateway then sends the final response back to the client.

Step 7: Observability and Logging Throughout this process, the gateway logs metrics: request count, latency, error rates, and which routes are being used. This data is critical for understanding traffic patterns and debugging issues. The gateway also participates in distributed tracing, propagating trace IDs so you can follow a request’s journey through your entire system.

Gateway Routing Request Flow

graph LR
    Client["Client<br/><i>Mobile/Web App</i>"]
    Gateway["API Gateway<br/><i>Routing Layer</i>"]
    Registry["Service Registry<br/><i>Consul/Eureka</i>"]
    UserSvc["User Service<br/><i>Port 8080</i>"]
    OrderSvc["Order Service<br/><i>Port 8081</i>"]
    
    Client --"1. GET /users/123/orders"--> Gateway
    Gateway --"2. Match route pattern"--> Gateway
    Gateway --"3. Query for orders-service"--> Registry
    Registry --"4. Return healthy instances"--> Gateway
    Gateway --"5. Transform & forward request"--> OrderSvc
    OrderSvc --"6. Process & respond"--> Gateway
    Gateway --"7. Transform & return response"--> Client

Complete request lifecycle through a gateway: from client request to route matching, service discovery, request transformation, backend invocation, and response processing. The gateway adds 2-10ms of latency but decouples clients from backend topology.

Key Principles

Principle 1: Single Entry Point All external traffic flows through one logical endpoint, even if physically distributed across multiple gateway instances. This means clients only need to know one hostname and don’t manage connections to individual services. When Uber’s mobile app makes API calls, it connects to api.uber.com, not to separate endpoints for rides, payments, and maps. This simplification is massive: the mobile app doesn’t need service discovery logic, doesn’t need to handle different authentication schemes per service, and doesn’t break when Uber reorganizes their backend services. The gateway handles all that complexity.

Principle 2: Route by Intent, Not Implementation Routing rules should reflect business capabilities, not technical implementation details. A route like /orders is better than /order-service-v2-prod-cluster-a. This abstraction lets you change implementations without breaking clients. Stripe’s API routes by resource type (/charges, /customers, /subscriptions), not by which internal service handles each resource. When they refactored their billing system, splitting one service into three, the API routes stayed the same—only the gateway’s routing table changed.

Principle 3: Fail Fast with Fallbacks When a backend service is unavailable, the gateway should detect this quickly and respond appropriately rather than hanging. Netflix’s API Gateway (Zuul) uses circuit breakers: after a threshold of failures to a service, it stops trying and immediately returns cached data or degraded responses. This prevents cascading failures where slow backends cause gateway threads to exhaust, bringing down the entire system. The gateway monitors backend health and removes unhealthy instances from rotation within seconds.

Principle 4: Version Coexistence The gateway enables running multiple API versions simultaneously by routing based on version indicators. Clients can specify versions via URL paths (/v1/users vs /v2/users), headers (Accept: application/vnd.company.v2+json), or query parameters. Shopify’s API Gateway routes to different backend implementations based on API version, allowing merchants to migrate at their own pace. The gateway might route v1 requests to legacy services while v2 requests go to new microservices, all transparent to clients.

Principle 5: Observability as a First-Class Concern Every request through the gateway generates telemetry: logs, metrics, and traces. This isn’t optional—it’s how you understand your system. The gateway is uniquely positioned to provide a complete view of API usage because it sees all traffic. Twitter’s API Gateway logs every request with timing breakdowns (time in gateway, time in backend, time in network), allowing them to identify whether slowness is from their services or network issues. This data drives capacity planning, identifies hot paths, and helps debug production incidents.

API Versioning Through Gateway Routing

graph LR
    Client1["Client A<br/><i>Version: 2023-01-15</i>"]
    Client2["Client B<br/><i>Version: 2024-06-01</i>"]
    Client3["Client C<br/><i>No version header</i>"]
    
    Gateway["API Gateway<br/><i>Version Router</i>"]
    
    LegacySvc["Legacy Service<br/><i>v1 Implementation</i>"]
    ModernSvc["Modern Service<br/><i>v2 Implementation</i>"]
    DefaultSvc["Default Service<br/><i>Latest Stable</i>"]
    
    Client1 --"1. GET /charges<br/>Stripe-Version: 2023-01-15"--> Gateway
    Client2 --"2. GET /charges<br/>Stripe-Version: 2024-06-01"--> Gateway
    Client3 --"3. GET /charges<br/>(no version)"--> Gateway
    
    Gateway --"Route to v1<br/>(old API contract)"--> LegacySvc
    Gateway --"Route to v2<br/>(new API contract)"--> ModernSvc
    Gateway --"Route to latest<br/>(default behavior)"--> DefaultSvc

Header-based routing enables API versioning without breaking existing clients. The gateway routes to different backend implementations based on version headers, allowing Stripe to maintain compatibility with API versions going back years while evolving their platform.

Deep Dive

Types / Variants

Path-Based Routing The most common routing strategy matches URL paths to services. A rule like /api/users/** → user-service sends all user-related requests to one service. This is straightforward and RESTful, making it easy for developers to understand. Amazon API Gateway uses this extensively: different path prefixes map to different Lambda functions or backend services. The challenge comes with overlapping paths—you need careful rule ordering. Use this when your API naturally divides by resource type and you want simple, predictable routing. Pros: intuitive, easy to debug, works with HTTP caching. Cons: can lead to overly granular services if you create a service for every path prefix, doesn’t handle cross-cutting concerns well.

Header-Based Routing Routing decisions based on HTTP headers enable sophisticated traffic management. You might route based on X-API-Version: 2 headers, User-Agent for mobile vs web, or custom headers like X-Tenant-ID for multi-tenant systems. Salesforce uses header-based routing to direct enterprise customers to dedicated infrastructure while routing smaller customers to shared infrastructure, all through the same API endpoint. This is powerful for A/B testing: route 5% of traffic (identified by a header) to a new service version. Use this when you need routing logic that isn’t visible in the URL. Pros: flexible, enables gradual rollouts, supports multi-tenancy. Cons: harder to debug (routing logic isn’t in the URL), requires client cooperation to send correct headers.

Content-Based Routing The gateway inspects request bodies or query parameters to make routing decisions. A payment gateway might route credit card transactions to one processor and ACH transactions to another based on the payment_method field in the request body. GraphQL gateways use this: they parse the GraphQL query and route to different services based on which fields are requested. Shopify’s GraphQL gateway routes queries touching product data to the catalog service, while queries touching order data go to the fulfillment service. Use this when routing depends on what the client is asking for, not just where they’re asking. Pros: enables fine-grained routing, works well with GraphQL. Cons: requires parsing request bodies (slower), more complex error handling, harder to cache.

Weighted Routing (Canary Releases) Distribute traffic between service versions based on percentages. Route 95% of traffic to the stable version and 5% to the canary version. This isn’t random load balancing—it’s deliberately sending a subset of production traffic to a new version to validate it before full rollout. Google uses this extensively: when deploying a new version of a service, they gradually shift traffic from 1% to 5% to 25% to 100% over hours or days, monitoring error rates at each step. Use this for risk mitigation during deployments. Pros: safe deployments, easy rollback, real production validation. Cons: requires sophisticated monitoring to detect issues in small traffic percentages, can complicate debugging when two versions are live.

Geographic Routing Route requests to different backend regions based on client location. A request from Europe goes to EU data centers, while a request from Asia goes to Asian data centers. Cloudflare’s Workers and AWS API Gateway support this natively. Netflix uses geographic routing to direct users to the nearest CDN and streaming service, reducing latency. Use this for compliance (GDPR data residency), performance (lower latency), or cost (avoiding cross-region data transfer). Pros: lower latency, regulatory compliance, cost optimization. Cons: requires multi-region infrastructure, complicates data consistency, harder to debug cross-region issues.

Canary Release with Weighted Routing

graph TB
    subgraph Traffic Distribution
        Gateway["API Gateway<br/><i>Weighted Router</i>"]
    end
    
    subgraph Stable Version - 95%
        Stable1["Instance 1<br/><i>v1.5.0</i>"]
        Stable2["Instance 2<br/><i>v1.5.0</i>"]
        Stable3["Instance 3<br/><i>v1.5.0</i>"]
        StableLB["Load Balancer<br/><i>Stable Pool</i>"]
    end
    
    subgraph Canary Version - 5%
        Canary1["Instance 1<br/><i>v1.6.0-canary</i>"]
        CanaryLB["Load Balancer<br/><i>Canary Pool</i>"]
    end
    
    subgraph Monitoring
        Metrics["Metrics Dashboard<br/><i>Error Rate: 0.3%<br/>P99 Latency: 145ms</i>"]
    end
    
    Gateway --"95% of traffic<br/>(hash-based routing)"--> StableLB
    Gateway --"5% of traffic<br/>(same users consistently)"--> CanaryLB
    
    StableLB --> Stable1
    StableLB --> Stable2
    StableLB --> Stable3
    
    CanaryLB --> Canary1
    
    Stable1 & Stable2 & Stable3 -."emit metrics".-> Metrics
    Canary1 -."emit metrics".-> Metrics

Weighted routing enables safe canary releases by directing a small percentage of production traffic to new versions. The gateway uses consistent hashing to ensure the same users always hit the same version, preventing confusion. If canary metrics degrade, instantly roll back by routing 100% to stable.

Trade-offs

Centralized vs Distributed Routing Centralized routing uses a dedicated gateway cluster (like Kong, Apigee, or AWS API Gateway) that all traffic flows through. This provides a single control point for policies, monitoring, and routing logic. Distributed routing embeds routing logic in each service using a service mesh (like Istio or Linkerd), where sidecars handle routing decisions locally. Centralized is simpler to reason about and easier to secure (one place to enforce authentication), but creates a single point of failure and potential bottleneck. Distributed routing scales better and eliminates the gateway bottleneck, but makes debugging harder (routing logic is scattered) and requires more sophisticated infrastructure. Choose centralized when you’re starting out or have moderate traffic (< 100K RPS). Choose distributed when you need extreme scale, have deep Kubernetes expertise, or want fine-grained per-service routing control. Spotify migrated from centralized (Nginx) to distributed (Envoy service mesh) as they scaled to millions of requests per second.

Static vs Dynamic Routing Rules Static routing rules are configured at deployment time (in config files or infrastructure as code) and require redeployment to change. Dynamic routing rules can be updated at runtime through an admin API or control plane. Static rules are simpler, more predictable, and easier to version control—you know exactly what’s running. Dynamic rules enable rapid experimentation and emergency routing changes without deployment. LinkedIn uses static routing for core API paths but dynamic routing for A/B tests, allowing product teams to shift traffic without involving infrastructure teams. Choose static for stable, well-understood routes. Choose dynamic when you need frequent routing changes, run many experiments, or want self-service routing for product teams. The risk with dynamic routing is accidental misconfiguration taking down production—implement strong validation and rollback mechanisms.

Synchronous vs Asynchronous Routing Synchronous routing waits for the backend service to respond before returning to the client. This is standard HTTP request-response. Asynchronous routing accepts the request, immediately returns an acknowledgment, and processes the request in the background. The gateway might publish the request to a message queue and return a job ID. Synchronous is simpler and matches how most APIs work, but ties up gateway resources while waiting for slow backends. Asynchronous decouples the gateway from backend latency, enabling better throughput, but complicates client code (clients must poll for results) and makes error handling harder. Stripe uses synchronous routing for most API calls but asynchronous for long-running operations like generating reports—the API returns a report ID immediately, and clients poll a separate endpoint for completion. Choose synchronous for fast operations (< 1 second) where clients expect immediate results. Choose asynchronous for slow operations, high-throughput batch processing, or when backend availability is unreliable.

Smart Gateway vs Dumb Gateway A smart gateway implements business logic: request validation, response aggregation, data transformation, and caching. It’s a Backend for Frontend (BFF) that tailors responses to client needs. A dumb gateway is a thin routing layer that forwards requests with minimal processing. Smart gateways reduce client complexity and backend load (by caching and aggregating), but become a bottleneck for changes—every new feature requires gateway updates. Dumb gateways are simpler, more maintainable, and push logic to where it belongs (in services), but require more sophisticated clients. Amazon’s API Gateway is relatively dumb—it routes and enforces policies but doesn’t aggregate or transform much. Netflix’s Zuul is smarter—it aggregates data from multiple services into single responses for mobile clients. Choose smart gateways when you control the clients (mobile apps) and need to optimize for their constraints. Choose dumb gateways for public APIs where clients are diverse and you want to keep the gateway simple.

Centralized Gateway vs Service Mesh Routing

graph TB
    subgraph Centralized Gateway Architecture
        Client1["Client"]
        CentralGW["Central Gateway<br/><i>Single routing point</i>"]
        Svc1["Service A"]
        Svc2["Service B"]
        Svc3["Service C"]
        
        Client1 --> CentralGW
        CentralGW --> Svc1
        CentralGW --> Svc2
        CentralGW --> Svc3
        Svc1 -."direct call".-> Svc2
    end
    
    subgraph Service Mesh Architecture
        Client2["Client"]
        EdgeProxy["Edge Proxy<br/><i>Ingress</i>"]
        
        subgraph Service A Pod
            SvcA["Service A"]
            ProxyA["Sidecar<br/><i>Envoy</i>"]
        end
        
        subgraph Service B Pod
            SvcB["Service B"]
            ProxyB["Sidecar<br/><i>Envoy</i>"]
        end
        
        subgraph Service C Pod
            SvcC["Service C"]
            ProxyC["Sidecar<br/><i>Envoy</i>"]
        end
        
        Client2 --> EdgeProxy
        EdgeProxy --> ProxyA
        ProxyA --> SvcA
        SvcA --> ProxyA
        ProxyA --> ProxyB
        ProxyB --> SvcB
    end

Centralized gateways provide a single control point but can become bottlenecks. Service mesh distributes routing to sidecars, eliminating the central bottleneck and enabling fine-grained per-service routing. Choose centralized for simplicity (< 100K RPS), distributed for extreme scale and when you have Kubernetes expertise.

Math & Calculations

Capacity Planning for Gateway Routing

Gateway capacity depends on request rate, routing complexity, and backend latency. Let’s calculate requirements for a realistic scenario.

Given:

Target: 50,000 requests per second (RPS) peak traffic
Average routing decision time: 2ms (includes rule evaluation and service discovery)
Average backend response time: 100ms
Gateway timeout: 5 seconds
Desired CPU utilization: 70% (leaving headroom for spikes)

Step 1: Calculate Concurrent Connections

Each request occupies a connection for the duration of the backend call:

Concurrent connections = RPS × (routing_time + backend_time)
Concurrent connections = 50,000 × (0.002 + 0.100)
Concurrent connections = 50,000 × 0.102 = 5,100 connections

With a 5-second timeout as the worst case:

Max concurrent (if all requests timeout) = 50,000 × 5 = 250,000 connections

You need to provision for the worst case to avoid connection exhaustion.

Step 2: Calculate Gateway Instances

Assuming each gateway instance can handle 10,000 concurrent connections and process 5,000 RPS at 70% CPU:

Instances needed (for RPS) = 50,000 / 5,000 = 10 instances
Instances needed (for connections) = 250,000 / 10,000 = 25 instances

You need 25 instances to handle worst-case connection load. Add 30% overhead for failures and deployments:

Total instances = 25 × 1.3 = 33 instances

Step 3: Calculate Network Bandwidth

Assuming average request size of 2KB and response size of 10KB:

Ingress bandwidth = 50,000 RPS × 2KB = 100 MB/s = 800 Mbps
Egress bandwidth = 50,000 RPS × 10KB = 500 MB/s = 4 Gbps

You need 4 Gbps egress capacity. With 33 instances:

Per-instance egress = 4 Gbps / 33 = 121 Mbps per instance

This is well within typical instance network limits (1-10 Gbps).

Step 4: Calculate Latency Impact

The gateway adds latency to every request. With 2ms routing time:

Total latency = routing_time + backend_time + network_time
Total latency = 2ms + 100ms + 5ms (network) = 107ms

The gateway adds ~2% overhead. At 99th percentile, if routing takes 10ms:

P99 latency = 10ms + 200ms (backend P99) + 5ms = 215ms

Real-World Example: Uber’s API Gateway handles ~1M RPS at peak. With similar assumptions (100ms backend latency), they need ~100M concurrent connections capacity. They run thousands of gateway instances across multiple regions, with each instance handling ~50K connections. Their routing logic is highly optimized (< 1ms) because even small increases multiply across millions of requests.

Real-World Examples

Netflix: Zuul API Gateway

Netflix built Zuul, one of the first large-scale API gateways, to handle routing for their microservices architecture. When a Netflix client (web, mobile, TV app) makes a request, it hits Zuul, which routes to over 500 backend services. Zuul uses path-based routing (/api/recommendations → recommendation service, /api/playback → streaming service) combined with dynamic filters that can modify requests and responses. The interesting detail: Zuul implements dynamic routing rules that can be updated without redeployment. During incidents, Netflix engineers can instantly reroute traffic away from failing services or redirect to cached responses. Zuul also handles A/B testing—it can route a percentage of users to experimental versions of services based on user IDs. At peak, Zuul handles millions of requests per second across multiple AWS regions. Netflix open-sourced Zuul, and it became the foundation for Spring Cloud Gateway. The key lesson: Netflix treats the gateway as critical infrastructure, running it with the same reliability standards as their streaming service itself.

Stripe: API Versioning Through Routing

Stripe’s API Gateway enables seamless API versioning, allowing them to evolve their API without breaking existing integrations. When a request arrives at api.stripe.com, the gateway examines the Stripe-Version header (e.g., 2023-10-16) and routes to the appropriate backend implementation. Older versions might route to legacy services, while newer versions route to refactored microservices. This is more sophisticated than simple path-based versioning—the same endpoint /v1/charges can route to different backends based on the version header. Stripe maintains compatibility with API versions going back years, supporting thousands of merchants who haven’t upgraded. The gateway handles request/response transformations to maintain backward compatibility: it might translate new field names to old ones for legacy clients. The interesting detail: Stripe’s gateway logs which API versions are being used, helping them identify when it’s safe to deprecate old versions. They’ve publicly stated that less than 1% of traffic uses versions older than 2 years, informing their deprecation policy. This demonstrates how gateway routing enables API evolution at scale.

Shopify: GraphQL Gateway with Schema Stitching

Shopify’s API Gateway routes GraphQL queries to multiple backend services based on the fields requested. When a merchant’s app queries for product and order data in a single GraphQL request, the gateway parses the query, identifies that it needs data from both the catalog service and fulfillment service, routes sub-queries to each service, and stitches the responses together. This is content-based routing at its most sophisticated. The gateway maintains a unified GraphQL schema that’s composed from individual service schemas. When a new service is deployed with new fields, the gateway automatically updates its routing logic. The interesting detail: Shopify’s gateway implements intelligent batching—if multiple clients request the same product data within a 10ms window, it batches those requests into a single backend call, dramatically reducing load. They’ve published that this batching reduces backend queries by 60% during peak traffic. The gateway also implements field-level caching: frequently requested fields (like product titles) are cached at the gateway layer, while dynamic fields (like inventory counts) always hit the backend. This hybrid approach balances freshness with performance, and it’s only possible because the gateway understands the semantic meaning of each field in the GraphQL schema.

Shopify GraphQL Gateway with Schema Stitching

graph LR
    Client["Merchant App<br/><i>GraphQL Client</i>"]
    Gateway["GraphQL Gateway<br/><i>Schema Stitching</i>"]
    
    subgraph Query Parsing
        Parser["Query Parser<br/><i>Extract fields</i>"]
    end
    
    subgraph Backend Services
        CatalogSvc["Catalog Service<br/><i>Product schema</i>"]
        FulfillSvc["Fulfillment Service<br/><i>Order schema</i>"]
        InventorySvc["Inventory Service<br/><i>Stock schema</i>"]
    end
    
    subgraph Response Processing
        Stitcher["Response Stitcher<br/><i>Merge results</i>"]
        Cache["Field Cache<br/><i>Product titles cached</i>"]
    end
    
    Client --"1. GraphQL Query<br/>{product, orders, inventory}"--> Gateway
    Gateway --> Parser
    Parser --"2. Identify required services"--> Gateway
    
    Gateway --"3a. Query products"--> CatalogSvc
    Gateway --"3b. Query orders"--> FulfillSvc
    Gateway --"3c. Query stock"--> InventorySvc
    
    CatalogSvc --"4a. Product data"--> Stitcher
    FulfillSvc --"4b. Order data"--> Stitcher
    InventorySvc --"4c. Stock data"--> Stitcher
    
    Cache -."Cache hit for<br/>frequently requested fields".-> Stitcher
    
    Stitcher --"5. Unified response"--> Client

Shopify’s GraphQL gateway demonstrates sophisticated content-based routing: it parses queries, routes sub-queries to appropriate services, batches concurrent requests, and stitches responses. Field-level caching reduces backend load by 60% during peak traffic while maintaining data freshness for dynamic fields.

Interview Expectations

Mid-Level

What You Should Know: Explain the difference between routing and load balancing clearly. Describe path-based routing with concrete examples (e.g., /users goes to user-service, /orders goes to order-service). Discuss why a gateway is useful: it decouples clients from backend topology, provides a single entry point, and simplifies client code. Understand basic routing strategies (path, header, query parameter) and when to use each. Be able to draw a simple architecture diagram showing clients, gateway, and multiple backend services. Discuss common gateway features beyond routing: authentication, rate limiting, and logging.

Bonus Points: Mention service discovery and how the gateway finds backend services dynamically. Discuss failure handling: what happens when a backend service is down? Describe circuit breakers at a high level. Talk about API versioning through routing (e.g., /v1/users vs /v2/users). Mention real-world gateways like Kong, Nginx, or AWS API Gateway. Discuss the gateway as a potential single point of failure and how to mitigate it (multiple instances, health checks).

Senior

What You Should Know: Design a complete gateway routing system with multiple routing strategies. Discuss tradeoffs between centralized gateways and service mesh approaches. Explain how to handle routing during deployments (blue-green, canary releases) and how the gateway enables these patterns. Discuss performance implications: gateway latency, connection pooling, and when the gateway becomes a bottleneck. Describe how to implement weighted routing for gradual rollouts. Explain the relationship between gateway routing and other patterns: Backend for Frontend, API Composition, and Strangler Fig (for migrating from monoliths). Discuss observability: what metrics matter (latency, error rates per route), how to implement distributed tracing through the gateway.

Bonus Points: Discuss dynamic routing rules and the tradeoffs vs static configuration. Explain how to implement request transformation and response aggregation at the gateway layer. Describe multi-region routing strategies and how to handle data residency requirements. Talk about security implications: the gateway as a security boundary, how to prevent routing-based attacks. Discuss real production incidents related to gateway routing (e.g., misconfigured routes taking down production) and how to prevent them. Mention advanced patterns like GraphQL federation or gRPC routing. Discuss capacity planning: how many gateway instances do you need for X requests per second?

Staff+

What You Should Know: Architect an enterprise-grade gateway routing system that handles millions of requests per second across multiple regions. Discuss the evolution from centralized gateways to service mesh and when to make that transition. Design routing strategies for complex scenarios: multi-tenant systems, gradual migrations from monoliths, hybrid cloud deployments. Explain how to build routing logic that’s both flexible (for rapid experimentation) and safe (preventing misconfigurations). Discuss the organizational implications: who owns routing rules, how to enable self-service routing for product teams while maintaining safety. Design observability systems that provide actionable insights from gateway telemetry.

Distinguishing Signals: Discuss the economics of gateway routing: cost of running gateway infrastructure vs cost of client complexity. Explain how gateway routing decisions impact system evolvability—how do routing patterns make it easier or harder to refactor services? Describe how you’d build a routing system that supports both synchronous and asynchronous patterns. Discuss failure modes unique to gateways: thundering herd problems, routing loops, and cascading failures. Explain how to implement progressive delivery (feature flags, A/B tests, canary releases) through routing. Describe how you’ve debugged production issues involving gateway routing—what tools and techniques are essential? Discuss the future: how does gateway routing evolve with technologies like WebAssembly, edge computing, and serverless? Share specific metrics from production systems: what percentage of latency comes from the gateway, how routing decisions impact cache hit rates, how to measure the ROI of gateway investments.

Common Interview Questions

Question 1: How would you design a gateway routing system for a company transitioning from a monolith to microservices?

60-second answer: Use the Strangler Fig pattern. Deploy a gateway in front of the monolith. Initially, route all traffic to the monolith. As you extract services, add routing rules to direct specific paths to new services (e.g., /api/users → new user-service) while everything else still goes to the monolith. This allows gradual migration without a big-bang rewrite. The gateway provides a stable API contract to clients while the backend evolves.

2-minute answer: Start with a reverse proxy (Nginx or AWS ALB) in front of the monolith—this establishes the routing layer without changing functionality. Define your target microservices architecture and identify which services to extract first (choose low-risk, well-bounded domains). For each extracted service, add a routing rule: path-based routing works well here (/api/orders/** → orders-service). Use weighted routing to gradually shift traffic: start with 5% of traffic to the new service while 95% still goes to the monolith, monitoring error rates and latency. If metrics look good, increase to 25%, then 50%, then 100%. The gateway handles request transformation—the new service might have a different API contract than the monolith, so the gateway translates requests/responses to maintain backward compatibility. Implement feature flags at the gateway level so you can instantly roll back to the monolith if issues arise. As you extract more services, the monolith shrinks until eventually, all routes point to microservices and you can decommission the monolith. Key success factors: comprehensive monitoring (compare monolith vs microservice metrics), automated rollback mechanisms, and clear ownership of routing rules.

Red flags: Saying you’d rewrite everything at once (big-bang migrations rarely work). Not mentioning backward compatibility or how to handle API differences. Ignoring the need for gradual rollout and monitoring. Suggesting you’d change client code to point directly to new services (defeats the purpose of the gateway).

Question 2: Your gateway is becoming a bottleneck at 100K RPS. How do you scale it?

60-second answer: First, identify the bottleneck: is it CPU (routing logic), memory (connection state), or network bandwidth? Profile the gateway to find hot paths. Optimize routing logic—cache service discovery results, use efficient data structures for route matching. Scale horizontally: add more gateway instances behind a load balancer. Consider moving to a service mesh (Istio, Linkerd) where routing happens at the sidecar level, eliminating the centralized bottleneck. Implement caching at the gateway for frequently requested data.

2-minute answer: Start with observability: instrument the gateway to measure CPU, memory, network I/O, and latency per route. Identify which routes consume the most resources. Common bottlenecks: (1) Service discovery—if the gateway queries a service registry for every request, cache those results with a short TTL (5-10 seconds). (2) Route matching—if you have thousands of routes, optimize the matching algorithm (use trie data structures instead of linear regex matching). (3) Connection management—ensure connection pooling to backend services; establishing new connections for every request is expensive. (4) Logging—if you’re logging every request synchronously, switch to asynchronous logging or sampling (log 1% of requests). For horizontal scaling, deploy multiple gateway instances (10-50 depending on instance size) behind a Layer 4 load balancer. Ensure session affinity isn’t required—the gateway should be stateless. If you’re still bottlenecked, consider architectural changes: move to a service mesh where each service has a sidecar proxy handling routing, eliminating the centralized gateway. Alternatively, implement edge routing: deploy lightweight gateways in multiple regions close to users, reducing latency and distributing load. For extreme scale (> 1M RPS), consider specialized hardware: AWS API Gateway or Cloudflare Workers that run on globally distributed infrastructure. Finally, question whether all traffic needs to go through the gateway—can some service-to-service traffic bypass it? Internal services might communicate directly, reserving the gateway for external traffic only.

Red flags: Immediately suggesting a service mesh without understanding the actual bottleneck (premature optimization). Not mentioning profiling or metrics—you can’t fix what you don’t measure. Suggesting vertical scaling (bigger instances) as the primary solution—gateways need horizontal scalability. Ignoring the possibility that backend services, not the gateway, are the real bottleneck.

Question 3: How do you handle routing for API versioning?

60-second answer: Three main approaches: (1) Path-based: /v1/users vs /v2/users—simple but clutters URLs. (2) Header-based: clients send Accept: application/vnd.company.v2+json—cleaner URLs but requires client sophistication. (3) Query parameter: /users?version=2—easy for testing but not RESTful. I prefer header-based for public APIs (follows REST principles) and path-based for internal APIs (easier to debug). The gateway routes based on the version indicator to different backend implementations, allowing multiple versions to coexist.

2-minute answer: API versioning through routing enables backward compatibility while evolving your API. Path-based versioning (/v1/resource vs /v2/resource) is the most common because it’s explicit and easy to understand. The gateway routes each version to potentially different backend services—v1 might go to a legacy service while v2 goes to a refactored microservice. This is what Stripe does. Header-based versioning uses custom headers (API-Version: 2023-10-16) or content negotiation (Accept: application/vnd.company.v2+json). This keeps URLs clean and follows REST principles, but requires clients to send correct headers. GitHub uses this approach. The gateway inspects headers and routes accordingly. Query parameter versioning (/users?api_version=2) is the least common but easiest for ad-hoc testing. Beyond routing, the gateway often handles request/response transformation to maintain compatibility. When a v1 client calls a v2 backend, the gateway translates field names, adds default values for new required fields, and removes fields that didn’t exist in v1. This transformation logic can get complex—document it carefully and test thoroughly. Implement version sunset policies: the gateway logs which versions are being used, helping you identify when it’s safe to deprecate old versions. Set deprecation headers (Deprecation: true, Sunset: 2024-12-31) in responses for old versions. For gradual migration, use weighted routing: route 90% of v1 traffic to the old implementation and 10% to the new implementation (with transformation), validating that transformation logic works before forcing all v1 clients to the new backend. Key principle: never break existing clients—the gateway should make API evolution transparent to clients who don’t want to upgrade.

Red flags: Saying you’d force all clients to upgrade simultaneously (breaks backward compatibility). Not mentioning transformation logic—different versions often need different data formats. Suggesting you’d maintain separate codebases for each version indefinitely (technical debt accumulates). Not discussing deprecation strategy or how to sunset old versions.

Question 4: What happens when a backend service is down? How should the gateway handle it?

60-second answer: The gateway should fail fast rather than hanging. Implement health checks: the gateway periodically pings backend services and removes unhealthy instances from rotation. Use circuit breakers: after a threshold of failures (e.g., 5 consecutive errors), stop sending traffic to that service for a cooldown period (30-60 seconds), then try again. Return meaningful errors to clients: 503 Service Unavailable with a retry-after header. For critical paths, implement fallbacks: return cached data or degraded responses rather than hard failures.

2-minute answer: Gateway failure handling is critical for system reliability. First layer: health checks. The gateway actively probes backend services (HTTP GET to /health endpoint) every 5-10 seconds. If a service fails health checks, remove it from the routing pool immediately—don’t wait for requests to fail. Second layer: timeouts. Set aggressive timeouts (1-5 seconds depending on the operation) so the gateway doesn’t wait indefinitely for unresponsive services. When a timeout occurs, return an error to the client quickly rather than tying up gateway resources. Third layer: circuit breakers. Track error rates per backend service. If errors exceed a threshold (e.g., 50% error rate over 10 seconds), open the circuit—stop sending traffic to that service entirely for a cooldown period. This prevents cascading failures where the gateway exhausts its connection pool trying to reach a dead service. After the cooldown (30-60 seconds), try a single request (half-open state). If it succeeds, close the circuit and resume normal traffic. If it fails, open the circuit again. Fourth layer: fallbacks. For read operations, return cached data if the backend is down—stale data is better than no data. For write operations, queue the request for later processing or return a 503 with a Retry-After header. Fifth layer: load shedding. If the gateway itself is overloaded, reject requests early (return 429 Too Many Requests) rather than accepting them and timing out. Implement this at the edge before expensive routing logic runs. Observability is crucial: emit metrics for circuit breaker state changes, timeout rates, and fallback usage. Alert when circuit breakers open—this indicates a service is down. Netflix’s Hystrix library pioneered many of these patterns, and they’re now standard in gateways like Envoy and Spring Cloud Gateway.

Red flags: Not mentioning timeouts (hanging requests are a common failure mode). Suggesting the gateway should retry indefinitely (causes cascading failures). Not discussing circuit breakers or how to prevent thundering herd when a service recovers. Saying you’d return 500 errors without context (clients need actionable error messages).

Question 5: How do you implement canary releases using gateway routing?

60-second answer: Deploy the new version alongside the old version. Configure the gateway to route a small percentage of traffic (5%) to the canary version while 95% goes to the stable version. Monitor error rates, latency, and business metrics for the canary. If metrics are healthy, gradually increase traffic to the canary (10%, 25%, 50%, 100%) over hours or days. If metrics degrade, instantly roll back by routing 100% to the stable version. Use consistent routing (same user always goes to the same version) to avoid confusing users with inconsistent behavior.

2-minute answer: Canary releases use weighted routing to validate new versions with real production traffic before full rollout. Step 1: Deploy the canary version to a small subset of infrastructure (e.g., 2 instances) while keeping the stable version running (e.g., 18 instances). Step 2: Configure the gateway to route 5% of traffic to the canary based on a consistent hash of user ID or session ID—this ensures the same user always hits the same version, preventing confusion from inconsistent behavior. Step 3: Define success criteria: error rate < 1%, P99 latency < 200ms, key business metrics (conversion rate, etc.) within 5% of stable version. Monitor these metrics in real-time using dashboards. Step 4: If metrics are healthy after 30 minutes, increase canary traffic to 10%, then 25%, then 50%, monitoring at each step. If metrics degrade at any point, instantly roll back by routing 100% to stable. Step 5: Once canary reaches 100% and has been stable for several hours, decommission the old version. Implementation details: the gateway needs a routing rule like if hash(user_id) % 100 < canary_percentage then route_to_canary else route_to_stable. Use feature flags to control the canary percentage—this allows instant rollback without redeploying the gateway. Log which version served each request for debugging. For critical services, implement automated rollback: if error rates spike, automatically set canary percentage to 0. Google uses this extensively: they’ve published that canary releases catch 90% of bugs before they impact all users. The key is patience—resist the urge to rush to 100%. Many incidents happen because teams increased canary traffic too quickly without sufficient monitoring time.

Red flags: Not mentioning consistent routing (random routing causes users to see inconsistent behavior). Suggesting you’d route based on time of day or other non-user attributes (makes debugging impossible). Not discussing rollback strategy or success criteria. Saying you’d go from 5% to 100% in one step (defeats the purpose of gradual rollout).

Red Flags to Avoid

Red Flag 1: “Gateway routing and load balancing are the same thing.”

Why it’s wrong: This reveals a fundamental misunderstanding. Load balancing distributes traffic across multiple instances of the same service (horizontal scaling). Gateway routing directs traffic to different services based on request attributes. A load balancer might distribute requests across 10 instances of the user-service, while a gateway routes /users to user-service and /orders to order-service. They solve different problems and often work together: the gateway routes to a service, then a load balancer distributes across instances of that service.

What to say instead: “Gateway routing and load balancing are complementary. The gateway routes requests to the appropriate service based on URL path, headers, or content, while load balancers distribute requests across multiple instances of each service. In a typical setup, the gateway sits in front of load balancers—it routes /users to the user-service load balancer, which then distributes across user-service instances. Some gateways include load balancing functionality, but conceptually they’re distinct concerns.”

Red Flag 2: “Just use Nginx as a reverse proxy—you don’t need a fancy API gateway.”

Why it’s wrong: While Nginx can do basic routing, dismissing API gateways misses their value for complex systems. API gateways provide service discovery integration, dynamic routing rules, circuit breakers, rate limiting, authentication, request transformation, and observability out of the box. Nginx requires custom Lua scripts or third-party modules for these features. For simple systems with a few services, Nginx is fine. For microservices at scale, a purpose-built gateway (Kong, Apigee, AWS API Gateway) saves months of development time.

What to say instead: “Nginx works well for simple routing scenarios—if you have 5-10 services with static routes, it’s a solid choice. But as you scale to dozens or hundreds of services, you need features like dynamic service discovery, circuit breakers, and sophisticated routing rules. Purpose-built API gateways provide these out of the box. The decision depends on your scale and complexity. I’d start with Nginx for an MVP, then migrate to a full gateway as the system grows.”

Red Flag 3: “The gateway should aggregate data from multiple services to reduce client requests.”

Why it’s wrong: While this can be useful (Backend for Frontend pattern), it’s not a core responsibility of a routing gateway and can cause problems. Aggregation logic couples the gateway to multiple services—when any service’s API changes, the gateway needs updates. It also makes the gateway stateful and complex, harder to scale and maintain. Aggregation belongs in dedicated BFF services or in the client itself. The gateway should focus on routing, not business logic.

What to say instead: “Aggregation can be useful, but it’s not a core gateway responsibility. If you need aggregation, implement it in a Backend for Frontend (BFF) service that sits behind the gateway. The gateway routes to the BFF, which then calls multiple services and aggregates responses. This keeps the gateway simple and focused on routing. Alternatively, use GraphQL with a gateway that supports schema stitching—the gateway understands the semantic meaning of queries and can intelligently route and aggregate. But for a standard REST API gateway, keep it dumb—just route requests.”

Red Flag 4: “Route all traffic through the gateway, including service-to-service communication.”

Why it’s wrong: Routing internal service-to-service traffic through a centralized gateway creates a bottleneck and single point of failure. It also adds unnecessary latency to every internal call. The gateway should handle edge traffic (external clients to your system), but internal services should communicate directly or through a service mesh. Forcing everything through a central gateway doesn’t scale and complicates debugging.

What to say instead: “The gateway should handle edge traffic—requests from external clients like mobile apps, web browsers, or third-party integrations. Internal service-to-service communication should bypass the gateway and use direct service-to-service calls or a service mesh. This reduces latency, eliminates a central bottleneck, and scales better. The gateway enforces policies at the system boundary, but internal services trust each other and communicate directly. If you need routing logic for internal traffic, use a service mesh like Istio where routing happens at the sidecar level, not through a central gateway.”

Red Flag 5: “Use regular expressions for all routing rules—they’re flexible.”

Why it’s wrong: While regex is powerful, it’s slow and error-prone for routing. Evaluating complex regex patterns on every request adds latency. Regex is also hard to reason about—overlapping patterns can cause unexpected behavior, and debugging regex-based routing is painful. Most gateways use prefix matching or path templates (e.g., /users/{id}/orders) which are faster and more maintainable. Reserve regex for edge cases, not as the primary routing mechanism.

What to say instead: “Use simple prefix matching or path templates for most routes—they’re fast and easy to understand. For example, /users/** matches all user-related paths, and /users/{id}/orders matches a specific pattern. These are O(log n) or O(1) lookups with efficient data structures like tries. Reserve regex for complex edge cases where you need pattern matching that prefix matching can’t handle. But even then, be cautious—regex evaluation is O(n) in the worst case and can become a performance bottleneck. Most production gateways use a combination: fast prefix matching for 95% of routes, with regex as a fallback for the remaining 5%.”

Key Takeaways

Gateway Routing consolidates multiple backend services behind a single entry point, intelligently directing requests based on URL paths, headers, or content. It decouples clients from your internal service topology, enabling you to reorganize services without breaking client code.
Routing is not load balancing. Routing directs requests to different services based on request attributes. Load balancing distributes requests across instances of the same service. They’re complementary: the gateway routes to a service, then a load balancer distributes across instances.
Choose routing strategies based on your needs: path-based for RESTful APIs, header-based for versioning and multi-tenancy, content-based for GraphQL, weighted for canary releases, and geographic for compliance and latency. Most production systems use a combination.
The gateway is critical infrastructure—design for failure. Implement health checks, circuit breakers, timeouts, and fallbacks. When backend services fail, the gateway should fail fast and return meaningful errors rather than hanging. Monitor gateway metrics obsessively: latency, error rates, and circuit breaker state.
Keep the gateway focused on routing, not business logic. Smart gateways that aggregate data and implement business logic become bottlenecks and complicate maintenance. Push logic to Backend for Frontend services or to clients. The gateway should be a thin, fast routing layer that scales horizontally.

Prerequisites:

Load Balancing - Understanding load balancing helps clarify the difference between routing (to different services) and load balancing (across instances)
Microservices Architecture - Gateway routing is essential for microservices; understand the architecture first
Service Discovery - Gateways use service discovery to find backend services dynamically
API Design - Gateway routing decisions depend on API structure (REST, GraphQL, gRPC)

Related Patterns:

Backend for Frontend - BFF services sit behind the gateway and aggregate data for specific clients
Circuit Breaker - Gateways implement circuit breakers to handle backend failures gracefully
Strangler Fig - Gateway routing enables gradual migration from monoliths to microservices
API Gateway Pattern - Gateway routing is a core capability of the broader API Gateway pattern

Next Steps:

Service Mesh - Evolution from centralized gateway routing to distributed routing with sidecars
Rate Limiting - Gateways often implement rate limiting alongside routing
API Versioning - Deep dive into versioning strategies that depend on gateway routing
Distributed Tracing - Gateways are the entry point for traces; understand how to propagate trace context