Communication Protocols in System Design

After this topic, you will be able to:

Differentiate between synchronous and asynchronous communication patterns in distributed systems
Analyze the trade-offs between request-response, streaming, and bidirectional communication models
Map communication protocols to appropriate use cases based on latency, throughput, and reliability requirements

TL;DR

Communication protocols define how distributed systems exchange data. Network protocols (TCP, UDP) handle reliable transport, while application protocols (HTTP, REST, RPC, GraphQL, gRPC) define message structure and semantics. Understanding the trade-offs between synchronous/asynchronous patterns, request-response/streaming models, and protocol characteristics is essential for designing scalable systems.

Cheat Sheet:

Synchronous: Caller waits for response (REST, RPC). Simple but couples services.
Asynchronous: Fire-and-forget or callback-based (message queues, webhooks). Decouples but adds complexity.
Streaming: Continuous data flow (gRPC streams, WebSockets). Efficient for real-time data.
Protocol layers: Network (TCP/UDP) vs Application (HTTP/REST/RPC). Choose based on reliability needs and latency tolerance.

Why This Matters

No modern system exists in isolation. Whether you’re building a mobile app that talks to backend services, microservices that coordinate to fulfill orders, or data pipelines that process events, communication protocols are the connective tissue of distributed systems. In interviews, your ability to choose the right protocol reveals whether you understand the fundamental trade-offs that shape system behavior: latency versus throughput, consistency versus availability, simplicity versus flexibility.

The protocol landscape has exploded beyond simple HTTP request-response. Netflix uses gRPC for internal microservice communication to achieve sub-10ms latency. Uber’s real-time location tracking relies on WebSockets for bidirectional streaming. Stripe’s webhook system uses asynchronous HTTP callbacks to decouple payment processing from merchant systems. Each choice reflects deep understanding of requirements: How much latency can users tolerate? Do we need guaranteed delivery? Can clients maintain persistent connections? Will messages fit in a single packet?

Interviewers use protocol selection to probe your systems thinking. When you choose REST over gRPC, or TCP over UDP, you’re making statements about reliability requirements, client capabilities, operational complexity, and failure modes. A senior engineer doesn’t just know what protocols exist—they understand why each exists, what problems it solves, and what new problems it introduces. This topic provides the mental framework to navigate these decisions confidently, whether you’re designing a chat system, a video streaming platform, or a distributed transaction coordinator.

The Landscape

The communication protocol landscape operates across two fundamental layers. At the network layer, TCP and UDP provide the foundation for reliable or fast data transport. TCP guarantees ordered, error-checked delivery at the cost of latency overhead—essential for financial transactions or file transfers where every byte matters. UDP sacrifices reliability for speed, making it ideal for video streaming or gaming where occasional packet loss is acceptable but lag is not. These protocols handle the mechanics of getting bits from one machine to another.

At the application layer, protocols define how services structure and interpret messages. HTTP remains the universal language of the web, powering everything from browser requests to API calls. REST architectural style layers on top of HTTP, using standard methods (GET, POST, PUT, DELETE) and resource-oriented URLs to create intuitive APIs. RPC (Remote Procedure Call) frameworks like gRPC treat network calls like local function invocations, optimizing for performance with binary serialization and HTTP/2 multiplexing. GraphQL introduces a query language that lets clients request exactly the data they need, solving the over-fetching and under-fetching problems of REST.

Beyond these established protocols, specialized patterns address specific needs. WebSockets enable full-duplex communication for real-time applications like chat or collaborative editing. Server-Sent Events (SSE) provide server-to-client streaming over standard HTTP. Message queues like RabbitMQ and Kafka implement publish-subscribe patterns for asynchronous, event-driven architectures. Each protocol occupies a niche in the design space, optimized for particular trade-offs.

The key insight is that protocols exist in a hierarchy. You might use TCP for reliable transport, HTTP/2 for efficient multiplexing, gRPC for structured RPC calls, and Protocol Buffers for compact serialization—all in a single system. Understanding how these layers compose and where each adds value is what separates junior engineers who know protocol names from senior engineers who design robust distributed systems.

Communication Protocol Stack: Network and Application Layers

graph TB
    subgraph Application Layer
        REST["REST API<br/><i>Resource-oriented</i>"]
        gRPC["gRPC<br/><i>Binary RPC</i>"]
        GraphQL["GraphQL<br/><i>Query Language</i>"]
        WS["WebSockets<br/><i>Bidirectional</i>"]
        SSE["Server-Sent Events<br/><i>Server Push</i>"]
    end
    
    subgraph Transport Layer
        HTTP1["HTTP/1.1<br/><i>Text-based</i>"]
        HTTP2["HTTP/2<br/><i>Multiplexing</i>"]
        WS_Proto["WebSocket Protocol<br/><i>Full-duplex</i>"]
    end
    
    subgraph Network Layer
        TCP["TCP<br/><i>Reliable, Ordered</i>"]
        UDP["UDP<br/><i>Fast, Unreliable</i>"]
    end
    
    REST --> HTTP1
    REST --> HTTP2
    GraphQL --> HTTP1
    GraphQL --> HTTP2
    gRPC --> HTTP2
    SSE --> HTTP1
    WS --> WS_Proto
    
    HTTP1 --> TCP
    HTTP2 --> TCP
    WS_Proto --> TCP
    
    UDP -."Used by QUIC/HTTP3".-> HTTP2

Communication protocols form a layered hierarchy where application protocols (REST, gRPC, GraphQL) build on transport protocols (HTTP/1.1, HTTP/2) which in turn rely on network protocols (TCP, UDP). Each layer solves specific problems: TCP/UDP handle reliable or fast transport, HTTP provides request-response semantics, and application protocols define message structure and API patterns.

Communication Patterns: Request-Response, Streaming, and Asynchronous

graph LR
    subgraph Request-Response Pattern
        C1["Client"] --"1. Request"--> S1["Server"]
        S1 --"2. Response"--> C1
    end
    
    subgraph Server Streaming
        C2["Client"] --"1. Request"--> S2["Server"]
        S2 --"2. Stream Response 1"--> C2
        S2 --"3. Stream Response 2"--> C2
        S2 --"4. Stream Response N"--> C2
    end
    
    subgraph Bidirectional Streaming
        C3["Client"] <--"Continuous<br/>Data Flow"--> S3["Server"]
    end
    
    subgraph Asynchronous Pattern
        P["Publisher"] --"1. Publish Event"--> Q["Message Queue"]
        Q --"2. Consume (later)"--> Sub1["Subscriber 1"]
        Q --"3. Consume (later)"--> Sub2["Subscriber 2"]
    end

Four fundamental communication patterns serve different use cases: Request-response (REST, RPC) for synchronous interactions where immediate response is needed; Server streaming (SSE, gRPC streams) for continuous server-to-client data flow; Bidirectional streaming (WebSockets) for real-time two-way communication; and Asynchronous messaging (queues, events) for decoupled, fire-and-forget interactions.

Synchronous vs Asynchronous Communication Trade-offs

graph TB
    subgraph Synchronous Communication
        SC_Client["Client"] --"1. Request (blocks)"--> SC_ServiceA["Service A"]
        SC_ServiceA --"2. Call Service B"--> SC_ServiceB["Service B"]
        SC_ServiceB --"3. Call Service C"--> SC_ServiceC["Service C"]
        SC_ServiceC --"4. Response"--> SC_ServiceB
        SC_ServiceB --"5. Response"--> SC_ServiceA
        SC_ServiceA --"6. Response"--> SC_Client
        
        SC_Note["❌ Tight Coupling<br/>❌ Latency Propagation<br/>❌ Failure Cascade<br/>✅ Simple to Debug<br/>✅ Immediate Feedback"]
    end
    
    subgraph Asynchronous Communication
        AC_Client["Client"] --"1. Publish Event"--> AC_Queue["Message Queue"]
        AC_Client -."2. Continue immediately".-> AC_Client
        AC_Queue --"3. Consume (later)"--> AC_ServiceA["Service A"]
        AC_ServiceA --"4. Publish Event"--> AC_Queue2["Message Queue"]
        AC_Queue2 --"5. Consume (later)"--> AC_ServiceB["Service B"]
        
        AC_Note["✅ Loose Coupling<br/>✅ Fault Tolerance<br/>✅ Parallel Processing<br/>❌ Complex Error Handling<br/>❌ Eventual Consistency"]
    end

Synchronous communication creates tight coupling where Service A’s response time depends on Service B and C, propagating latency and failures through the call chain. Asynchronous communication decouples services through message queues, allowing each service to process independently and continue even if downstream services are temporarily unavailable, but at the cost of complexity in error handling and eventual consistency.

Key Areas

name: Communication Patterns description: Request-response is the most common pattern: client sends a request, waits for a response, then proceeds. REST APIs, database queries, and RPC calls all follow this synchronous model. It’s simple to reason about but creates tight coupling—the caller blocks until the callee responds. If the callee is slow or unavailable, the caller suffers. Streaming patterns send continuous data flows rather than discrete messages. Server streaming (one request, many responses) powers live sports scores or stock tickers. Client streaming (many requests, one response) enables file uploads or sensor data aggregation. Bidirectional streaming (many requests, many responses) supports real-time collaboration or multiplayer games. Asynchronous patterns decouple sender and receiver through message queues or event buses. The sender publishes a message and continues immediately; the receiver processes it later. This improves resilience (receivers can be temporarily down) and scalability (multiple receivers can process messages in parallel) but adds complexity around message ordering, duplicate handling, and eventual consistency. Choosing the right pattern depends on whether you need immediate feedback, can tolerate delays, and how tightly coupled your services should be.

name: Synchronous vs Asynchronous Communication description: Synchronous communication means the caller waits for a response before continuing. HTTP requests, gRPC calls, and database queries are synchronous. This model is intuitive—you ask a question, you get an answer—but it propagates failures and latency. If Service A calls Service B synchronously, and B calls C, then C’s slowness affects A’s response time. A cascade of synchronous calls creates a distributed monolith where one slow service degrades the entire system. Asynchronous communication decouples services through message queues, event streams, or webhooks. Service A publishes an event and moves on; Service B processes it later. This improves fault tolerance (B can be down temporarily), enables parallel processing (multiple B instances consume messages), and smooths traffic spikes (queue acts as a buffer). However, asynchronous systems are harder to debug (no stack traces across services), require careful handling of message ordering and idempotency, and complicate error handling (what if B fails to process a message?). In practice, most systems use both: synchronous for user-facing requests where immediate feedback matters, asynchronous for background processing, analytics, and cross-service notifications. The decision hinges on whether the caller needs an immediate answer and whether temporary unavailability is acceptable.

name: Network Layer Protocols (TCP vs UDP) description: TCP (Transmission Control Protocol) provides reliable, ordered, error-checked delivery. It establishes a connection, acknowledges received packets, retransmits lost ones, and ensures bytes arrive in order. This reliability comes at a cost: connection setup overhead (three-way handshake), retransmission delays when packets are lost, and head-of-line blocking (later packets wait for earlier ones to arrive). TCP is the foundation for HTTP, gRPC, database connections, and any application where data integrity is non-negotiable. UDP (User Datagram Protocol) is connectionless and unreliable. It sends packets without acknowledgment, doesn’t retransmit losses, and doesn’t guarantee order. This makes UDP fast and lightweight—ideal for video streaming (occasional dropped frames are acceptable), online gaming (old position updates are irrelevant), DNS queries (small enough to fit in one packet), and VoIP (real-time audio can’t wait for retransmissions). The choice between TCP and UDP is fundamental: Do you need every byte to arrive correctly, or is speed more important than perfection? For most business logic, TCP’s reliability is worth the overhead. For real-time media or high-frequency telemetry, UDP’s speed wins. Some modern protocols like QUIC (used by HTTP/3) build reliability features on top of UDP to get the best of both worlds.

name: Application Layer Protocols description: HTTP is the universal application protocol, powering web browsers, mobile apps, and API integrations. It’s text-based, stateless, and human-readable, making it easy to debug and widely supported. REST leverages HTTP’s methods and status codes to create resource-oriented APIs that feel intuitive. However, HTTP/1.1’s request-response model suffers from head-of-line blocking and connection overhead. HTTP/2 introduced multiplexing (multiple requests over one connection), header compression, and server push, dramatically improving performance for modern web applications. gRPC builds on HTTP/2 with binary serialization (Protocol Buffers), strongly-typed service definitions, and built-in streaming support. It’s faster and more efficient than REST but requires code generation and isn’t browser-friendly without proxies. GraphQL provides a query language that lets clients specify exactly what data they need, solving REST’s over-fetching (getting too much data) and under-fetching (needing multiple requests) problems. It’s powerful for complex UIs with varying data needs but adds server-side complexity and can be harder to cache. Each protocol optimizes for different priorities: HTTP/REST for simplicity and ubiquity, gRPC for performance and type safety, GraphQL for flexible data fetching. The right choice depends on your clients (browsers vs services), performance requirements (milliseconds matter?), and team expertise.

name: Real-Time Communication Protocols description: WebSockets enable full-duplex, bidirectional communication over a single TCP connection. After an HTTP handshake, the connection upgrades to WebSocket, allowing both client and server to send messages anytime. This is essential for chat applications, live dashboards, multiplayer games, and collaborative editing where updates must flow in both directions with minimal latency. Server-Sent Events (SSE) provide server-to-client streaming over standard HTTP, making them simpler than WebSockets for one-way updates like live sports scores or stock prices. SSE automatically reconnects on disconnect and works through HTTP proxies without special configuration. Long polling is a fallback technique where the client makes an HTTP request that the server holds open until new data is available, then immediately makes another request. It’s less efficient than WebSockets or SSE but works everywhere HTTP works. The choice depends on whether you need bidirectional communication (WebSockets), server-to-client streaming (SSE), or maximum compatibility (long polling). For details on implementing these patterns, see the long-polling-websockets-sse topic.

Protocol Selection Decision Tree

Choosing the right protocol requires mapping requirements to protocol characteristics. Start with latency sensitivity: If sub-100ms response times matter (trading systems, gaming, real-time collaboration), consider gRPC or WebSockets. If seconds are acceptable (batch processing, analytics), REST or message queues work fine. Next, evaluate message size and frequency: Small, frequent messages favor binary protocols like gRPC or Protocol Buffers. Large, infrequent messages (file uploads, reports) work fine with HTTP multipart or chunked encoding.

Client type heavily influences protocol choice. Browser-based clients almost always use HTTP/REST or WebSockets because browsers don’t natively support other protocols. Mobile apps can use gRPC for efficiency but must handle network transitions gracefully. Server-to-server communication has the most flexibility—gRPC excels here for performance, but REST remains popular for simplicity and debugging. Reliability requirements determine the network layer: TCP for guaranteed delivery (payments, orders, user data), UDP for speed over reliability (video streaming, telemetry, gaming).

Consider communication direction: One-way requests (client asks, server responds) suit REST or RPC. Server-initiated updates (notifications, live data) need WebSockets, SSE, or webhooks. Bidirectional conversations (chat, collaboration) require WebSockets or bidirectional gRPC streams. Coupling tolerance matters too: Synchronous protocols (REST, gRPC) tightly couple services, propagating failures and latency. Asynchronous patterns (message queues, event streams) decouple services but complicate error handling and debugging.

Finally, weigh operational complexity: REST over HTTP is universally understood, easy to debug with curl, and works through firewalls and proxies. gRPC requires code generation, binary debugging tools, and proxy configuration. WebSockets need connection management and reconnection logic. Message queues add infrastructure (Kafka, RabbitMQ) and operational overhead. Start simple (REST) and evolve to specialized protocols (gRPC, WebSockets) only when requirements demand it. The best protocol is the simplest one that meets your needs.

Protocol Selection Decision Tree

flowchart TB
    Start["Protocol Selection"] --> Latency{"Latency<br/>Requirement?"}
    
    Latency -->|"< 100ms"| RealTime{"Bidirectional?"}
    Latency -->|"Seconds OK"| Async["Message Queue<br/>Kafka, RabbitMQ"]
    
    RealTime -->|"Yes"| WS["WebSockets<br/>Chat, Collaboration"]
    RealTime -->|"No"| ServerPush{"Server Push<br/>Only?"}
    
    ServerPush -->|"Yes"| SSE["Server-Sent Events<br/>Live Updates"]
    ServerPush -->|"No"| ClientType{"Client Type?"}
    
    ClientType -->|"Browser"| REST["REST API<br/>HTTP/JSON"]
    ClientType -->|"Mobile App"| Mobile{"Performance<br/>Critical?"}
    ClientType -->|"Service-to-Service"| Internal{"Performance<br/>Critical?"}
    
    Mobile -->|"Yes"| gRPC_Mobile["gRPC<br/>Binary, Efficient"]
    Mobile -->|"No"| REST_Mobile["REST API<br/>Simple, Compatible"]
    
    Internal -->|"Yes"| gRPC_Internal["gRPC<br/>Type-safe, Fast"]
    Internal -->|"No"| REST_Internal["REST API<br/>Easy to Debug"]
    
    Async --> Reliability{"Guaranteed<br/>Delivery?"}
    Reliability -->|"Yes"| TCP_Queue["TCP-based Queue<br/>At-least-once"]
    Reliability -->|"No"| UDP_Stream["UDP Streaming<br/>Best-effort"]

Protocol selection follows a decision tree based on requirements: latency sensitivity determines whether you need real-time protocols; communication direction (bidirectional vs server-push) narrows choices to WebSockets or SSE; client type and performance requirements guide the choice between REST and gRPC; and reliability needs determine whether to use TCP-based or UDP-based transport.

How Things Connect

Communication protocols form a layered stack where each layer solves specific problems. At the bottom, TCP and UDP handle reliable or fast transport across networks. These network protocols are covered in depth in the tcp and udp topics. On top of TCP, HTTP provides a request-response model with methods, headers, and status codes—the foundation for web communication, detailed in http.

Application protocols build on HTTP to address specific use cases. REST applies architectural constraints (statelessness, resource-orientation, uniform interface) to create intuitive, cacheable APIs, explored in rest. RPC frameworks like gRPC optimize for performance with binary serialization and HTTP/2 multiplexing, making remote calls feel like local function invocations—see rpc and grpc. GraphQL introduces a query language that shifts control to clients, letting them request exactly the data they need, covered in graphql.

For real-time, bidirectional communication, WebSockets upgrade HTTP connections to enable full-duplex streaming. Server-Sent Events provide server-to-client streaming over standard HTTP. Long polling offers a fallback for environments where persistent connections aren’t supported. These patterns are compared in long-polling-websockets-sse.

The key insight is that these protocols aren’t mutually exclusive—they compose. A modern system might use REST for public APIs (simplicity, cacheability), gRPC for internal microservices (performance, type safety), WebSockets for real-time dashboards (bidirectional streaming), and message queues for asynchronous background processing (decoupling, resilience). Understanding how these pieces fit together—when to use each, how they interact, what trade-offs they make—is what enables you to design systems that are both performant and maintainable.

Protocol Composition in a Modern System

graph TB
    subgraph External Clients
        Browser["Web Browser"]
        Mobile["Mobile App"]
        ThirdParty["Third-party API"]
    end
    
    subgraph API Gateway Layer
        Gateway["API Gateway<br/><i>REST/HTTP</i>"]
        WSGateway["WebSocket Gateway<br/><i>Real-time</i>"]
    end
    
    subgraph Internal Services
        AuthService["Auth Service"]
        UserService["User Service"]
        OrderService["Order Service"]
        NotificationService["Notification Service"]
    end
    
    subgraph Async Processing
        Queue["Message Queue<br/><i>Kafka</i>"]
        Worker["Background Worker"]
    end
    
    Browser --"REST/HTTP"--> Gateway
    Mobile --"REST/HTTP"--> Gateway
    Mobile --"WebSocket"--> WSGateway
    ThirdParty --"REST/HTTP"--> Gateway
    
    Gateway --"gRPC"--> AuthService
    Gateway --"gRPC"--> UserService
    Gateway --"gRPC"--> OrderService
    
    WSGateway --"gRPC"--> NotificationService
    
    OrderService --"Publish Event"--> Queue
    Queue --"Consume"--> Worker
    Worker --"gRPC"--> NotificationService

Real-world systems compose multiple protocols strategically: REST/HTTP for external-facing APIs (compatibility, simplicity), WebSockets for real-time client updates (bidirectional streaming), gRPC for internal microservice communication (performance, type safety), and message queues for asynchronous background processing (decoupling, resilience). Each protocol serves its optimal use case.

Real-World Context

Real-world systems rarely use a single protocol—they compose multiple protocols to meet different requirements. Netflix uses REST for public APIs that third-party developers integrate with, ensuring broad compatibility and ease of use. Internally, Netflix microservices communicate via gRPC to achieve the sub-10ms latency required for real-time recommendations and playback decisions. For asynchronous processing like encoding videos or updating search indexes, Netflix uses message queues to decouple services and handle traffic spikes gracefully.

Uber’s real-time location tracking demonstrates protocol composition at scale. Mobile apps use WebSockets to stream driver locations to riders with minimal latency. The backend uses gRPC for high-throughput communication between microservices—matching riders to drivers, calculating ETAs, routing trips. For less time-sensitive operations like billing or analytics, Uber employs Kafka event streams to process millions of events per second asynchronously. This layered approach optimizes each interaction for its specific latency, throughput, and reliability requirements.

Stripe’s payment platform shows how asynchronous communication enables resilience. When a merchant processes a payment, Stripe’s API responds synchronously with immediate success or failure. But for subsequent events—payment captured, refund processed, dispute created—Stripe uses webhooks (asynchronous HTTP callbacks) to notify merchants. This decouples Stripe’s internal processing from merchant systems, allowing Stripe to retry failed notifications and merchants to handle events at their own pace. Stripe also provides REST APIs for merchants to poll for updates, giving them flexibility in how they receive information.

Slack combines multiple protocols for different features. The web and desktop clients use WebSockets for real-time message delivery, ensuring instant updates when someone posts in a channel. For file uploads and downloads, Slack uses standard HTTP multipart requests. Slack’s APIs expose REST endpoints for integrations and bots, prioritizing developer experience and compatibility. Internally, Slack’s microservices communicate via gRPC for performance. This pragmatic mixing of protocols—using the right tool for each job—is characteristic of mature, large-scale systems.

The lesson from production systems is that protocol choice isn’t about finding the “best” protocol—it’s about matching protocols to requirements. Use REST for public APIs where simplicity and compatibility matter. Use gRPC for internal services where performance is critical. Use WebSockets for real-time bidirectional communication. Use message queues for asynchronous, decoupled processing. The best systems use multiple protocols strategically, not dogmatically.

Interview Essentials

Mid-Level

At the mid-level, demonstrate understanding of the protocol landscape and basic trade-offs. Explain the difference between TCP and UDP: TCP guarantees reliable, ordered delivery but adds latency; UDP is fast but unreliable. Describe common application protocols: REST for resource-oriented APIs, gRPC for high-performance RPC, WebSockets for real-time bidirectional communication. Articulate the synchronous vs asynchronous distinction: synchronous calls block until response (simple but couples services), asynchronous messaging decouples services but complicates error handling. When designing a system, choose protocols that match requirements: REST for public APIs (compatibility), gRPC for internal microservices (performance), message queues for background processing (decoupling). Show you understand that most systems use multiple protocols, not just one. Be ready to explain why you’d use HTTP/REST for a mobile app backend (browser compatibility, simplicity) but gRPC for service-to-service calls (speed, type safety).

Senior

Senior engineers must justify protocol choices with concrete trade-offs and production considerations. When choosing between REST and gRPC, discuss not just performance (gRPC is faster) but operational complexity (REST is easier to debug, works through proxies, has better tooling). Explain how synchronous communication propagates latency and failures: if Service A calls B calls C, C’s slowness affects A’s response time. Describe when asynchronous patterns (message queues, event streams) improve resilience and scalability, but acknowledge the complexity they add: message ordering, idempotency, eventual consistency, debugging distributed traces. Discuss protocol layering: how HTTP/2 multiplexing improves REST performance, how gRPC builds on HTTP/2 and Protocol Buffers, how WebSockets upgrade HTTP connections. Explain real-time communication trade-offs: WebSockets for bidirectional streaming (chat, collaboration), SSE for server-to-client updates (live dashboards), long polling as a fallback (maximum compatibility). Show awareness of failure modes: what happens when a WebSocket disconnects? How do you handle message queue backlogs? How do you version gRPC service definitions? Reference real systems: Netflix uses gRPC internally for performance, Stripe uses webhooks for asynchronous notifications, Slack uses WebSockets for real-time messaging.

Staff+

Staff-plus engineers must demonstrate strategic thinking about communication architecture across an entire organization. Discuss how protocol choices affect system evolution: REST’s loose coupling enables independent service deployment, but gRPC’s strong typing catches breaking changes at compile time. Explain the organizational implications: REST APIs are easier for external developers to adopt, gRPC requires code generation and language-specific tooling. Describe how to migrate between protocols: running REST and gRPC in parallel, using API gateways to translate protocols, versioning strategies. Discuss observability: how do you trace requests across synchronous calls (distributed tracing with correlation IDs), how do you monitor asynchronous message flows (event tracking, dead letter queues). Explain capacity planning: how protocol choice affects throughput (binary protocols like gRPC pack more requests per connection), latency (HTTP/2 multiplexing reduces head-of-line blocking), and operational costs (WebSocket connections consume server resources). Address security: how do you authenticate and authorize gRPC calls (mTLS, JWT tokens), how do you secure WebSocket connections (origin validation, token-based auth), how do you prevent message queue poisoning (schema validation, rate limiting). Show you can design communication patterns that balance performance, reliability, operational complexity, and team capabilities—and evolve them as requirements change.

Common Interview Questions

When would you choose REST over gRPC? Choose REST for public APIs where compatibility and ease of integration matter more than raw performance. REST works in browsers without special libraries, is easy to debug with curl, and has mature tooling. Choose gRPC for internal microservices where performance is critical, type safety matters, and you control both client and server. gRPC’s binary serialization and HTTP/2 multiplexing provide 5-10x better throughput than REST.

How do you handle real-time updates in a web application? For server-to-client updates (live dashboards, notifications), use Server-Sent Events (SSE) for simplicity or WebSockets if you also need client-to-server messages. For bidirectional communication (chat, collaboration), WebSockets are the standard choice. Implement reconnection logic with exponential backoff, and consider long polling as a fallback for restrictive networks. Use message queuing on the backend to buffer updates and handle connection failures gracefully.

What are the trade-offs between synchronous and asynchronous communication? Synchronous communication (REST, gRPC) is simpler to reason about and debug—you make a call, you get a response. But it couples services tightly, propagates failures and latency, and limits scalability. Asynchronous communication (message queues, event streams) decouples services, improves resilience (receivers can be temporarily down), and enables parallel processing. However, it complicates error handling (what if a message fails?), requires careful handling of ordering and idempotency, and makes debugging harder (no stack traces across services). Use synchronous for user-facing requests where immediate feedback matters, asynchronous for background processing and cross-service notifications.

How do you ensure reliable message delivery in a distributed system? At the network layer, use TCP for guaranteed delivery. At the application layer, implement idempotency (processing the same message multiple times has the same effect as processing it once) and at-least-once delivery semantics. Use message queues with acknowledgments: the receiver explicitly confirms successful processing before the message is removed from the queue. Implement dead letter queues for messages that fail repeatedly after retries. For critical operations like payments, use transactional outbox pattern: write the message to a database in the same transaction as the business logic, then reliably publish it to the message queue. Monitor queue depths and processing rates to detect backlogs early.

Red Flags to Avoid

Claiming one protocol is always better. Saying “gRPC is always faster than REST” or “WebSockets are always better than HTTP” shows lack of nuance. Every protocol has trade-offs. gRPC is faster but harder to debug and less compatible. WebSockets enable real-time communication but complicate connection management. Good engineers choose protocols based on requirements, not dogma.

Not understanding the OSI model layers. Confusing TCP/UDP (network layer) with HTTP/REST/gRPC (application layer) suggests shallow understanding. You should know that HTTP runs on top of TCP, that gRPC uses HTTP/2, and that WebSockets upgrade HTTP connections. Protocol layering is fundamental to system design.

Ignoring operational complexity. Choosing gRPC or WebSockets without acknowledging the operational overhead (code generation, binary debugging, connection management, proxy configuration) shows you haven’t run these systems in production. Always discuss how you’ll debug, monitor, and operate the protocols you choose.

Not considering failure modes. Failing to discuss what happens when connections drop, messages are lost, or services are unavailable suggests you haven’t thought through resilience. Every protocol has failure modes: synchronous calls propagate failures, WebSockets need reconnection logic, message queues need dead letter handling. Address these proactively.

Overengineering with exotic protocols. Jumping straight to gRPC, GraphQL, or WebSockets for simple use cases shows poor judgment. Start with the simplest protocol that meets requirements (usually REST), and evolve to specialized protocols only when you have concrete performance or functionality needs. Premature optimization is a red flag.

Key Takeaways

Communication protocols operate in layers: Network protocols (TCP, UDP) handle reliable or fast transport; application protocols (HTTP, REST, RPC, GraphQL, gRPC) define message structure and semantics. Understanding this layering is essential for choosing the right protocol stack.

Synchronous vs asynchronous is a fundamental trade-off: Synchronous communication (REST, gRPC) is simple but couples services and propagates failures. Asynchronous communication (message queues, webhooks) decouples services and improves resilience but complicates error handling and debugging. Most systems use both strategically.

Match protocols to requirements, not trends: Use REST for public APIs (compatibility, simplicity), gRPC for internal microservices (performance, type safety), WebSockets for real-time bidirectional communication (chat, collaboration), and message queues for background processing (decoupling, scalability). The best protocol is the simplest one that meets your needs.

Real-world systems compose multiple protocols: Netflix uses REST for public APIs, gRPC for internal services, and message queues for asynchronous processing. Uber combines WebSockets for real-time location tracking, gRPC for microservices, and Kafka for event streaming. Don’t pick one protocol—design a communication architecture that uses the right tool for each job.

Operational complexity matters as much as performance: gRPC is faster than REST, but it requires code generation, binary debugging tools, and proxy configuration. WebSockets enable real-time communication, but they need connection management and reconnection logic. Always consider how you’ll debug, monitor, and operate the protocols you choose, not just their theoretical performance.