gRPC: High-Performance RPC Framework Guide

After this topic, you will be able to:

Explain gRPC’s architecture including Protocol Buffers, HTTP/2 transport, and code generation
Evaluate gRPC’s four streaming modes (unary, server streaming, client streaming, bidirectional) for different use cases
Assess when gRPC’s performance benefits justify its complexity compared to REST or other protocols

TL;DR

gRPC is Google’s open-source RPC framework built on HTTP/2 and Protocol Buffers, delivering 5-10x better performance than REST for service-to-service communication. It provides four streaming modes (unary, server streaming, client streaming, bidirectional), strongly-typed contracts, and automatic code generation in 11+ languages. Use gRPC for internal microservices where performance matters; stick with REST for public APIs where browser compatibility and human readability are priorities.

Cheat Sheet: Protocol Buffers for serialization • HTTP/2 for transport • 4 streaming modes • Sub-millisecond latency • Strongly-typed contracts • Poor browser support

Background

gRPC emerged from Google’s internal RPC system called Stubby, which handled over 10 billion requests per second across their infrastructure. When Google open-sourced gRPC in 2015, they addressed a fundamental problem: REST APIs over HTTP/1.1 with JSON were becoming performance bottlenecks in microservices architectures. At scale, JSON parsing consumes significant CPU (Netflix reported 20-30% CPU overhead), HTTP/1.1’s head-of-line blocking limits throughput, and text-based protocols waste bandwidth.

The name gRPC originally stood for “gRPC Remote Procedure Calls” (a recursive acronym), though Google now emphasizes it as a general-purpose framework. Unlike REST, which treats everything as resources with HTTP verbs, gRPC embraces the RPC model where clients call server methods directly as if they were local functions. This paradigm shift, combined with binary serialization and HTTP/2 multiplexing, enables the sub-millisecond latencies required for modern distributed systems.

For RPC fundamentals and how gRPC compares to REST philosophically, see RPC. gRPC’s killer feature isn’t just speed—it’s the combination of performance, type safety through Protocol Buffers, and first-class support for streaming, which REST handles awkwardly through techniques like long polling or Server-Sent Events.

Architecture

gRPC’s architecture consists of four key layers that work together to provide high-performance remote procedure calls. At the top, developers define services in .proto files using Protocol Buffers Interface Definition Language (IDL). A service definition looks like a traditional interface: methods with typed request and response messages. The Protocol Buffers compiler (protoc) generates client stubs and server skeletons in your target language—Java, Go, Python, C++, and 11 others.

The client stub provides a local interface that looks like a normal function call. When you invoke userService.GetUser(userId), the stub serializes the request into Protocol Buffers binary format, adds gRPC metadata (method name, timeout, authentication tokens), and sends it over an HTTP/2 connection. This is where gRPC leverages HTTP/2’s multiplexing: multiple RPC calls share a single TCP connection without head-of-line blocking. For details on HTTP/2’s multiplexing and stream prioritization, see HTTP.

On the server side, the gRPC runtime receives the binary payload, deserializes it using the generated code, and invokes your business logic. The response follows the reverse path: serialize, transmit over HTTP/2, deserialize on the client. The entire round trip typically completes in 1-5 milliseconds for local data center calls, compared to 10-50ms for equivalent REST calls with JSON.

The architecture supports four communication patterns through streaming modes: unary (request-response), server streaming (one request, stream of responses), client streaming (stream of requests, one response), and bidirectional streaming (both sides stream). Each mode uses the same underlying HTTP/2 transport but with different flow control semantics. Load balancing happens at multiple levels: client-side load balancing using service discovery, proxy-based load balancing through Envoy or NGINX, or DNS-based load balancing for simpler deployments.

gRPC Request-Response Architecture

graph LR
    Client["Client Application"]
    Stub["Client Stub<br/><i>Generated Code</i>"]
    HTTP2["HTTP/2 Connection<br/><i>Multiplexed Streams</i>"]
    Server["gRPC Server<br/><i>Generated Skeleton</i>"]
    Business["Business Logic<br/><i>Service Implementation</i>"]
    
    Client --"1. Call method<br/>getUserInfo(123)"--> Stub
    Stub --"2. Serialize to<br/>Protocol Buffers"--> HTTP2
    HTTP2 --"3. Binary payload<br/>over HTTP/2 stream"--> Server
    Server --"4. Deserialize<br/>protobuf message"--> Business
    Business --"5. Process request<br/>fetch user data"--> Server
    Server --"6. Serialize response<br/>to protobuf"--> HTTP2
    HTTP2 --"7. Binary response<br/>over same stream"--> Stub
    Stub --"8. Deserialize<br/>return User object"--> Client

gRPC’s layered architecture showing how a client method call is serialized to Protocol Buffers, transmitted over HTTP/2, and deserialized on the server. The generated stubs handle all serialization and network communication, making remote calls look like local function invocations.

gRPC Streaming Modes

gRPC’s four streaming modes address different communication patterns that microservices encounter in production. Understanding when to use each mode is critical for interview discussions and real-world design decisions.

Unary RPC is the simplest mode: client sends one request, server sends one response. This mirrors traditional REST GET/POST calls and works for 80% of use cases. Example: GetUser(userId) → User. Uber uses unary RPCs for their rider app to fetch driver locations, where each request-response pair is independent. Latency is typically 1-3ms in the same availability zone.

Server Streaming handles cases where the server needs to send multiple responses to a single client request. The client opens a stream, sends one request, and receives a sequence of responses until the server closes the stream. Netflix uses this for their recommendation service: the client requests “recommendations for user X,” and the server streams back batches of movie suggestions as they’re computed, allowing the UI to render incrementally rather than waiting for all results. This reduces perceived latency from 500ms to 100ms for first results.

Client Streaming inverts the pattern: the client sends a stream of requests and receives a single response when done. This is perfect for uploading large datasets or aggregating metrics. Google Cloud’s logging service uses client streaming—applications stream log entries continuously, and the server acknowledges with a single response containing the number of entries persisted. This reduces network overhead by batching acknowledgments.

Bidirectional Streaming allows both client and server to send streams independently. This is the most powerful mode for real-time systems. Spotify uses bidirectional streaming for their collaborative playlist feature: as users add songs, the client streams updates to the server, which simultaneously streams changes from other users back. The streams are fully asynchronous—neither side blocks waiting for the other. Trading platforms use this for market data feeds where clients stream orders while receiving price updates.

Each streaming mode uses HTTP/2 flow control to prevent fast senders from overwhelming slow receivers. The receiver advertises a window size, and the sender pauses when the window is exhausted. This backpressure mechanism prevents memory exhaustion, a common problem with naive streaming implementations.

Four gRPC Streaming Modes

sequenceDiagram
    participant C1 as Client
    participant S1 as Server
    
    Note over C1,S1: Unary RPC (Request-Response)
    C1->>S1: GetUser(userId=123)
    S1->>C1: User{name="Alice", age=30}
    
    Note over C1,S1: Server Streaming (One Request, Multiple Responses)
    C1->>S1: ListRecommendations(userId=123)
    S1->>C1: Movie{title="Inception"}
    S1->>C1: Movie{title="Interstellar"}
    S1->>C1: Movie{title="Tenet"}
    S1->>C1: [Stream closed]
    
    Note over C1,S1: Client Streaming (Multiple Requests, One Response)
    C1->>S1: [Open UploadLogs stream]
    C1->>S1: LogEntry{level="INFO", msg="..."}
    C1->>S1: LogEntry{level="ERROR", msg="..."}
    C1->>S1: LogEntry{level="WARN", msg="..."}
    C1->>S1: [Close stream]
    S1->>C1: UploadResult{count=3, status="OK"}
    
    Note over C1,S1: Bidirectional Streaming (Both Stream)
    C1->>S1: [Open LocationTracking stream]
    C1->>S1: Location{lat=37.7, lng=-122.4}
    S1->>C1: DriverUpdate{eta=5min, distance=2km}
    C1->>S1: Location{lat=37.71, lng=-122.41}
    S1->>C1: DriverUpdate{eta=4min, distance=1.5km}
    C1->>S1: Location{lat=37.72, lng=-122.42}
    S1->>C1: DriverUpdate{eta=3min, distance=1km}

Sequence diagrams showing gRPC’s four streaming modes. Unary is simple request-response. Server streaming sends multiple responses (Netflix recommendations). Client streaming aggregates multiple requests (log upload). Bidirectional streaming enables real-time communication (Uber location tracking).

Internals

Under the hood, gRPC’s performance comes from three technical decisions: Protocol Buffers serialization, HTTP/2 transport, and efficient connection management.

Protocol Buffers Serialization: Protocol Buffers (protobuf) is a binary serialization format that’s 3-10x smaller and 20-100x faster than JSON. A protobuf message is defined with numbered fields: message User { string name = 1; int32 age = 2; }. The compiler generates code that serializes this into a compact binary format where each field is encoded as a tag (field number + wire type) followed by the value. For example, name="Alice" becomes 0x0a 0x05 0x41 0x6c 0x69 0x63 0x65 (tag 0x0a = field 1, wire type 2; length 5; UTF-8 bytes). This binary encoding eliminates JSON’s overhead of field names, quotes, and whitespace.

Protobuf’s schema evolution is carefully designed: you can add new fields without breaking old clients (they ignore unknown fields), and you can remove fields as long as you don’t reuse field numbers. This forward/backward compatibility is crucial for microservices that deploy independently. The numbered fields also enable efficient parsing—the parser jumps directly to field 15 without scanning through fields 1-14.

HTTP/2 Transport: gRPC maps RPC calls to HTTP/2 streams. Each RPC becomes a single HTTP/2 stream with headers (:method POST, :path /service/method, content-type: application/grpc), data frames containing the serialized protobuf, and trailers for status codes. HTTP/2’s binary framing eliminates HTTP/1.1’s text parsing overhead, and multiplexing allows 100+ concurrent RPCs on one TCP connection without head-of-line blocking at the application layer (though TCP-level HOL blocking still exists).

gRPC uses HTTP/2’s flow control and prioritization. Clients can mark RPCs as high-priority (user-facing) or low-priority (background batch jobs), and the server processes them accordingly. This is how Google Search handles mixed workloads—interactive queries get priority over index updates.

Connection Management: gRPC maintains persistent connections with keepalive pings every 20 seconds by default. This avoids TCP handshake overhead (1-2ms per connection) and TLS handshake overhead (5-10ms). However, persistent connections complicate load balancing—new servers don’t receive traffic until clients reconnect. gRPC solves this with client-side load balancing: clients query a name resolver (DNS, Consul, etcd), get a list of server IPs, and distribute RPCs using round-robin or least-request algorithms. When a server fails, the client detects it via keepalive timeout and removes it from the pool.

Error handling uses gRPC status codes (16 codes like UNAVAILABLE, DEADLINE_EXCEEDED, PERMISSION_DENIED) mapped to HTTP/2 trailers. Clients can implement retry logic with exponential backoff, and gRPC provides built-in retry policies in the service config. Deadlines (timeouts) propagate across service boundaries—if Service A calls Service B with a 100ms deadline, and Service B calls Service C, the deadline shrinks to account for time already spent.

Protocol Buffers vs JSON Serialization

graph TB
    subgraph JSON Serialization
        JSON_Input["User Object<br/>name='Alice'<br/>age=30<br/>email='alice@example.com'"]
        JSON_Text["Text Format<br/>{<br/>  'name': 'Alice',<br/>  'age': 30,<br/>  'email': 'alice@example.com'<br/>}<br/><b>Size: 62 bytes</b>"]
        JSON_Parse["Parse Overhead<br/>• Scan for quotes<br/>• Parse field names<br/>• Type conversion<br/><b>Time: ~500ns</b>"]
    end
    
    subgraph Protocol Buffers Serialization
        Proto_Input["User Object<br/>name='Alice'<br/>age=30<br/>email='alice@example.com'"]
        Proto_Schema["Schema Definition<br/>message User {<br/>  string name = 1;<br/>  int32 age = 2;<br/>  string email = 3;<br/>}"]
        Proto_Binary["Binary Format<br/>0x0a 0x05 0x41 0x6c 0x69 0x63 0x65<br/>0x10 0x1e<br/>0x1a 0x13 0x61 0x6c 0x69 0x63 0x65...<br/><b>Size: 28 bytes (55% smaller)</b>"]
        Proto_Parse["Parse Overhead<br/>• Read tag (field# + type)<br/>• Read length<br/>• Copy bytes<br/><b>Time: ~25ns (20x faster)</b>"]
    end
    
    JSON_Input --> JSON_Text --> JSON_Parse
    Proto_Input --> Proto_Schema --> Proto_Binary --> Proto_Parse

Comparison of JSON and Protocol Buffers serialization for the same data. Protobuf achieves 55% smaller payload size and 20x faster parsing by using binary encoding with numbered fields instead of text-based field names. The tag-length-value format enables efficient parsing without scanning the entire message.

gRPC Load Balancing Strategies

graph TB
    subgraph Problem: L4 Load Balancing
        Client1["Client 1"]
        L4LB["L4 Load Balancer<br/><i>TCP-based</i>"]
        Server1["Server 1<br/>⚠️ Gets all traffic"]
        Server2["Server 2<br/>⚠️ Idle"]
        Server3["Server 3<br/>⚠️ Idle"]
        
        Client1 --"Single HTTP/2<br/>connection"--> L4LB
        L4LB --"All RPCs on<br/>one connection"--> Server1
        L4LB -."No traffic".-> Server2
        L4LB -."No traffic".-> Server3
    end
    
    subgraph Solution 1: Client-Side Load Balancing
        Client2["Client with LB Logic"]
        SD["Service Discovery<br/><i>Consul/etcd/DNS</i>"]
        ServerA["Server A"]
        ServerB["Server B"]
        ServerC["Server C"]
        
        Client2 --"1. Query backends"--> SD
        SD --"2. Return IPs"--> Client2
        Client2 --"3. RPC 1"--> ServerA
        Client2 --"4. RPC 2"--> ServerB
        Client2 --"5. RPC 3"--> ServerC
    end
    
    subgraph Solution 2: L7 Proxy Load Balancing
        Client3["Client"]
        Envoy["Envoy Proxy<br/><i>L7 Load Balancer</i>"]
        ServerX["Server X"]
        ServerY["Server Y"]
        ServerZ["Server Z"]
        
        Client3 --"HTTP/2 connection"--> Envoy
        Envoy --"Distribute RPCs<br/>by stream"--> ServerX
        Envoy --"Round-robin<br/>or least-request"--> ServerY
        Envoy --> ServerZ
    end

gRPC load balancing challenges and solutions. L4 load balancers fail because persistent HTTP/2 connections route all traffic to one server. Client-side load balancing distributes RPCs across backends using service discovery. L7 proxies like Envoy inspect HTTP/2 streams and distribute individual RPCs across servers.

Performance Characteristics

gRPC delivers 5-10x better throughput and 50-80% lower latency than REST with JSON in typical microservices scenarios. Benchmarks from Google Cloud show gRPC handling 100,000 requests per second per core compared to 10,000-20,000 for REST. Latency for a simple unary RPC in the same data center averages 1-2ms (p50) and 5-10ms (p99), while equivalent REST calls measure 10-20ms (p50) and 50-100ms (p99).

The performance advantage comes from three factors. First, Protocol Buffers serialization is 20-100x faster than JSON parsing. Netflix measured 30% CPU reduction after migrating internal APIs from JSON to protobuf. Second, HTTP/2 multiplexing eliminates connection overhead—a single TCP connection handles thousands of concurrent RPCs, whereas HTTP/1.1 requires connection pooling with 6-8 connections per client. Third, binary framing reduces bandwidth by 30-60% compared to text-based protocols.

Streaming performance is where gRPC truly shines. Server streaming can push 10,000-50,000 messages per second over a single connection with sub-millisecond latency between messages. Bidirectional streaming achieves similar throughput in both directions simultaneously. This enables real-time systems that would struggle with REST’s request-response model.

Scalability characteristics depend on the deployment model. With client-side load balancing, gRPC scales horizontally—adding servers immediately increases capacity since clients distribute load across all available backends. With proxy-based load balancing (L7 load balancer like Envoy), the proxy becomes a potential bottleneck, though modern proxies handle 100,000+ requests per second.

Memory usage is efficient due to connection reuse and binary serialization. A gRPC server handling 10,000 concurrent RPCs typically uses 100-200MB of memory, compared to 500MB-1GB for equivalent REST servers with connection pooling. However, bidirectional streaming can consume significant memory if clients stream large datasets without proper flow control—this is a common production issue.

Cold start latency is higher than REST because gRPC requires HTTP/2 connection establishment (1 RTT) plus TLS handshake (1-2 RTTs). For serverless environments or short-lived connections, this overhead (10-20ms) can negate gRPC’s performance benefits. REST over HTTP/1.1 with connection reuse may actually perform better for infrequent, bursty traffic.

Trade-offs

gRPC excels at internal service-to-service communication where performance, type safety, and streaming matter. It falls short for public APIs, browser-based clients, and scenarios requiring human readability.

Strengths: The primary advantage is performance—5-10x better throughput and 50-80% lower latency than REST. This translates directly to cost savings at scale. Uber reported 30% reduction in infrastructure costs after migrating critical paths to gRPC. Type safety through Protocol Buffers catches errors at compile time rather than runtime—if you change a field from int32 to string, all clients fail to compile until updated. This prevents entire classes of production bugs.

Streaming support is first-class, not an afterthought. Bidirectional streaming enables real-time features (chat, collaborative editing, live dashboards) without WebSocket complexity. Code generation eliminates boilerplate—you define the service once in .proto, and the compiler generates client and server code in 11+ languages. This ensures consistency across polyglot microservices.

Weaknesses: Browser support is poor. gRPC requires HTTP/2, which browsers support, but gRPC-Web (the browser variant) needs a proxy to translate between gRPC and browser-compatible formats. This adds latency and operational complexity. For public APIs consumed by web apps, REST remains the better choice.

Human readability suffers with binary protocols. You can’t curl a gRPC endpoint and inspect the response—you need specialized tools like grpcurl or Postman. Debugging production issues is harder because you can’t simply read request/response payloads in logs. Teams often add JSON logging for debugging, which defeats the bandwidth savings.

Load balancing is complex. L4 load balancers (TCP-based) don’t work well because gRPC multiplexes all RPCs over one connection—all traffic goes to the first backend. You need L7 load balancing (Envoy, NGINX) or client-side load balancing, both of which add operational overhead. REST works fine with simple L4 load balancers.

Ecosystem maturity lags REST. While gRPC has good support in major languages, third-party libraries and tools assume REST. API gateways, monitoring systems, and security tools often require custom integration for gRPC. REST’s ubiquity means better tooling, more Stack Overflow answers, and easier hiring.

When to Use (and When Not To)

Choose gRPC when performance and type safety justify the operational complexity. The decision matrix is straightforward: internal microservices benefit from gRPC; public APIs favor REST.

Use gRPC for:

Internal microservices where you control both client and server. Uber uses gRPC for all backend services, achieving 30% cost reduction. The type safety prevents breaking changes, and the performance handles high throughput.
Real-time streaming requirements. Spotify uses gRPC for collaborative playlists and live lyrics. Bidirectional streaming is far simpler than managing WebSocket connections with custom protocols.
Polyglot environments where services are written in different languages. Netflix has Java, Python, Node.js, and Go services all communicating via gRPC. The generated code ensures compatibility.
High-throughput, low-latency systems. Trading platforms use gRPC for order routing where every millisecond matters. The binary protocol and HTTP/2 multiplexing deliver consistent sub-5ms latencies.
Mobile backends where bandwidth costs money. Protocol Buffers’ compact encoding reduces data transfer by 30-60%, lowering mobile data usage and improving battery life.

Avoid gRPC for:

Public APIs consumed by web browsers. REST with JSON is the standard, and gRPC-Web adds complexity without clear benefits. Stripe, Twilio, and AWS all use REST for public APIs.
Simple CRUD applications where REST’s simplicity wins. If you’re building a basic web app with a database, REST over HTTP/1.1 is easier to develop, debug, and deploy.
Serverless environments with cold starts. AWS Lambda and Google Cloud Functions have 100-500ms cold start times, which dwarf gRPC’s performance benefits. REST’s simpler connection model may actually perform better.
Teams without Protocol Buffers expertise. gRPC requires learning protobuf syntax, code generation workflows, and binary debugging tools. If your team is comfortable with REST and JSON, the migration cost may not justify the performance gains.

Alternatives: For public APIs, use REST with JSON. For real-time browser communication, use WebSockets or Server-Sent Events. For extreme performance, consider custom binary protocols over raw TCP (though you lose gRPC’s ecosystem). For simple internal services, REST with MessagePack or Avro provides a middle ground—better performance than JSON without gRPC’s complexity.

gRPC vs REST Decision Tree

flowchart TB
    Start(["Need API Communication"])
    Internal{"Internal<br/>microservices?"}    
    Performance{"High throughput<br/>or low latency<br/>required?"}    
    Streaming{"Need real-time<br/>streaming?"}    
    Browser{"Browser clients<br/>or public API?"}    
    Team{"Team comfortable<br/>with protobuf?"}    
    
    UseGRPC["✅ Use gRPC<br/>• Binary protobuf<br/>• HTTP/2 multiplexing<br/>• Type safety<br/>• 5-10x better performance"]
    
    UseREST["✅ Use REST<br/>• JSON over HTTP/1.1<br/>• Simple debugging<br/>• Universal browser support<br/>• Mature ecosystem"]
    
    UseWebSocket["✅ Use WebSocket<br/>• Full-duplex browser support<br/>• Custom protocol<br/>• Good for chat/gaming"]
    
    Hybrid["✅ Hybrid Approach<br/>• gRPC for internal services<br/>• REST for public API<br/>• API gateway translates"]
    
    Start --> Internal
    Internal -->|Yes| Performance
    Internal -->|No| Browser
    
    Performance -->|Yes| Streaming
    Performance -->|No| Team
    
    Streaming -->|Yes| UseGRPC
    Streaming -->|No| Team
    
    Team -->|Yes| UseGRPC
    Team -->|No| UseREST
    
    Browser -->|Yes| UseREST
    Browser -->|No| Streaming

Decision tree for choosing between gRPC and REST. Use gRPC for internal microservices with high performance needs and streaming requirements. Use REST for public APIs, browser clients, and teams without Protocol Buffers expertise. Consider a hybrid approach with gRPC internally and REST externally.

Real-World Examples

company: Netflix context: Microservices communication for streaming platform implementation: Netflix migrated their internal API layer from REST with JSON to gRPC with Protocol Buffers in 2018. Their architecture has 700+ microservices handling billions of requests per day. The migration reduced CPU usage by 30% across their fleet, translating to millions in infrastructure savings. They use server streaming for recommendation services—the client requests recommendations, and the server streams results as they’re computed from multiple ML models. This reduced p99 latency from 500ms to 150ms for the first batch of recommendations, improving user experience. Netflix built custom load balancing logic that considers server load and geographic proximity, routing requests to the least-loaded backend in the same AWS region. They also implemented automatic retries with exponential backoff for transient failures, achieving 99.99% success rates. interesting_detail: Netflix developed an internal tool called ‘gRPC Inspector’ that logs request/response payloads in JSON format for debugging while keeping production traffic in binary protobuf. This solved the observability problem without sacrificing performance.

company: Uber context: Real-time location tracking and dispatch system implementation: Uber’s dispatch system uses bidirectional streaming to coordinate drivers and riders. When a rider requests a ride, the client opens a bidirectional stream: the client streams location updates every 2 seconds, while the server streams driver locations and ETAs. This replaced their previous polling-based system (REST calls every 5 seconds), reducing backend load by 70% and improving battery life on mobile devices by 20%. The system handles 10 million concurrent streams during peak hours. Uber uses client-side load balancing with a custom service discovery system called ‘Ringpop’ that maintains consistent hashing of drivers to backend servers. When a server fails, affected drivers automatically reconnect to the next server in the ring within 1-2 seconds. interesting_detail: Uber’s gRPC implementation includes custom metadata for tracing requests across 1000+ microservices. Each RPC includes a trace ID that propagates through the entire call chain, enabling end-to-end latency analysis. They found that 80% of p99 latency issues were caused by cascading retries in downstream services.

company: Google context: Internal infrastructure and Google Cloud Platform implementation: Google’s internal infrastructure processes over 10 billion gRPC calls per second across millions of servers. Every Google service—Search, Gmail, YouTube, Maps—uses gRPC for internal communication. Google Cloud Platform exposes many services via gRPC (Cloud Spanner, Cloud Pub/Sub, Cloud Speech-to-Text) alongside REST APIs. Their Cloud Spanner database uses bidirectional streaming for transaction processing: clients stream SQL queries while receiving result sets and transaction status updates. This achieves 99.999% availability with single-digit millisecond latencies globally. Google’s load balancing uses a custom protocol called ‘gRPC Load Balancing Protocol’ where backends report their load to a central load balancer, which distributes traffic based on actual server capacity rather than simple round-robin. interesting_detail: Google’s internal gRPC implementation includes automatic deadline propagation across service boundaries. If Service A calls Service B with a 100ms deadline, and 20ms have elapsed, Service B receives an 80ms deadline. This prevents cascading timeouts and ensures requests fail fast when deadlines are exceeded.

Interview Essentials

Mid-Level

Explain gRPC’s architecture: Protocol Buffers for serialization, HTTP/2 for transport, and code generation. Describe how a client stub serializes a request, sends it over HTTP/2, and the server deserializes and processes it.

Compare gRPC’s four streaming modes with examples: unary (simple request-response), server streaming (one request, multiple responses like Netflix recommendations), client streaming (multiple requests, one response like log aggregation), and bidirectional streaming (both stream like Uber’s dispatch system).

Discuss Protocol Buffers’ advantages over JSON: 3-10x smaller payload size, 20-100x faster serialization, and schema evolution with backward/forward compatibility through numbered fields.

Explain gRPC’s load balancing challenges: persistent HTTP/2 connections mean L4 load balancers don’t distribute traffic evenly. Solutions include client-side load balancing with service discovery or L7 proxies like Envoy.

Describe when to use gRPC vs REST: gRPC for internal microservices with high throughput and streaming needs; REST for public APIs, browser clients, and simple CRUD applications.

Senior

Design a gRPC-based microservices architecture for a real-time system like Uber’s dispatch. Explain bidirectional streaming for location updates, client-side load balancing with consistent hashing, and failure handling with automatic reconnection.

Analyze gRPC’s performance characteristics: 5-10x better throughput than REST due to binary serialization and HTTP/2 multiplexing. Discuss latency profiles (1-2ms p50, 5-10ms p99) and when cold start overhead negates benefits (serverless environments).

Explain Protocol Buffers’ wire format and schema evolution: how fields are encoded as tag-length-value, why field numbers matter for compatibility, and strategies for evolving schemas without breaking clients (add new fields, never reuse numbers).

Discuss gRPC’s error handling: 16 status codes (UNAVAILABLE, DEADLINE_EXCEEDED, etc.), deadline propagation across services, and retry strategies with exponential backoff. How do you prevent cascading failures?

Compare gRPC’s streaming with alternatives: WebSockets for browser clients, Server-Sent Events for one-way streaming, and message queues for asynchronous communication. When does each make sense?

Staff+

Architect a migration from REST to gRPC for a large-scale system. Address challenges: gradual rollout with dual REST/gRPC support, load balancer configuration (L7 vs client-side), observability (binary protocol debugging), and team training. How do you measure success?

Design a custom load balancing strategy for gRPC that considers server load, geographic proximity, and request priority. Explain how backends report load, how clients make routing decisions, and how to handle server failures without dropping requests.

Analyze gRPC’s trade-offs at scale: connection management (persistent connections vs connection pooling), memory usage with bidirectional streaming (flow control to prevent memory exhaustion), and operational complexity (debugging binary protocols, load balancing).

Discuss gRPC’s limitations and workarounds: poor browser support (gRPC-Web with proxy), human readability (JSON logging for debugging), and ecosystem maturity (custom integrations for API gateways and monitoring). When do these limitations disqualify gRPC?

Design a hybrid architecture using both gRPC and REST: gRPC for internal services, REST for public APIs. Explain the API gateway layer that translates between protocols, authentication/authorization strategies, and how to maintain consistency across both APIs.

Common Interview Questions

Why does gRPC use HTTP/2 instead of raw TCP? (Answer: HTTP/2 provides multiplexing, flow control, and standardization while avoiding custom protocol complexity. Raw TCP requires implementing these features manually.)

How does gRPC handle backward compatibility when you change a service definition? (Answer: Protocol Buffers’ numbered fields enable schema evolution. Add new fields without breaking old clients; they ignore unknown fields. Never reuse field numbers.)

What’s the difference between client-side and server-side streaming? (Answer: Server streaming: one request, multiple responses (e.g., search results). Client streaming: multiple requests, one response (e.g., log upload). Bidirectional: both stream independently.)

Why is gRPC faster than REST? (Answer: Binary Protocol Buffers vs JSON text, HTTP/2 multiplexing vs HTTP/1.1 connection overhead, and persistent connections vs per-request handshakes. Typically 5-10x better throughput.)

How do you load balance gRPC services? (Answer: L4 load balancers don’t work due to persistent connections. Use L7 proxies (Envoy, NGINX) or client-side load balancing with service discovery (Consul, etcd, DNS).)

Red Flags to Avoid

Claiming gRPC is always better than REST without discussing trade-offs (browser support, debugging, ecosystem maturity).

Not understanding Protocol Buffers’ schema evolution or how numbered fields enable backward compatibility.

Confusing the four streaming modes or unable to provide concrete use cases for each (unary, server streaming, client streaming, bidirectional).

Ignoring load balancing challenges with persistent HTTP/2 connections or proposing L4 load balancers for gRPC.

Recommending gRPC for public APIs without acknowledging browser compatibility issues and the need for gRPC-Web with a proxy.

Key Takeaways

gRPC combines Protocol Buffers (binary serialization), HTTP/2 (multiplexing), and code generation to deliver 5-10x better performance than REST. Use it for internal microservices where throughput and latency matter.

Four streaming modes address different patterns: unary (simple RPC), server streaming (one request, multiple responses), client streaming (multiple requests, one response), and bidirectional streaming (both stream independently). Real-time systems like Uber’s dispatch rely on bidirectional streaming.

Protocol Buffers provide type safety and schema evolution through numbered fields. You can add new fields without breaking old clients, but never reuse field numbers. This prevents breaking changes in polyglot microservices.

Load balancing is complex due to persistent HTTP/2 connections. L4 load balancers don’t distribute traffic evenly. Use L7 proxies (Envoy) or client-side load balancing with service discovery (Consul, etcd).

gRPC’s weaknesses: poor browser support (needs gRPC-Web proxy), binary protocol makes debugging harder (can’t curl endpoints), and ecosystem maturity lags REST. For public APIs and simple CRUD apps, REST remains the better choice.