Async Request-Reply Pattern Explained
After this topic, you will be able to:
- Implement async request-reply pattern using message queues and correlation IDs
- Design callback mechanisms for long-running operations
- Evaluate when async request-reply is preferable to synchronous RPC
TL;DR
The async request-reply pattern decouples long-running backend operations from frontend clients by returning an immediate acknowledgment with a correlation ID, then delivering results later via polling or callbacks. This prevents clients from blocking while waiting for operations like video transcoding, report generation, or payment processing that take seconds to minutes. Cheat Sheet: Client sends request → Backend returns 202 Accepted + correlation ID → Client polls status endpoint or registers callback → Backend completes work → Client retrieves result using correlation ID.
The Problem It Solves
Modern applications frequently need to perform operations that take too long for synchronous HTTP request-response cycles. When a user uploads a video for transcoding, generates a complex financial report, or initiates a payment that requires fraud checks, holding the HTTP connection open for 30 seconds or more creates terrible user experience and wastes server resources. Synchronous blocking ties up connection pools, makes clients vulnerable to timeouts, and forces backends to maintain expensive long-lived connections. The problem intensifies at scale: if 10,000 users simultaneously request operations that each take 45 seconds, you need infrastructure to maintain 10,000 concurrent connections, most of which are idle while waiting. Cloud providers charge for connection time, and load balancers often have hard timeout limits (typically 60-120 seconds). You need a way to acknowledge requests immediately, free up connections, and deliver results when processing completes—without forcing clients to implement complex distributed systems patterns themselves.
Solution Overview
The async request-reply pattern breaks the request-response cycle into three distinct phases. First, the client sends a request and immediately receives a 202 Accepted response containing a correlation ID and a status endpoint URL. The backend queues the work and returns control to the client within milliseconds. Second, the backend processes the request asynchronously—this might involve calling multiple services, waiting for external APIs, or performing compute-intensive operations. Third, the client retrieves results either by polling the status endpoint or receiving a callback webhook when processing completes. The correlation ID acts as a claim ticket that ties the original request to its eventual response across time and system boundaries. This pattern transforms a blocking operation into a non-blocking one without requiring clients to understand message queues, distributed transactions, or event-driven architectures. The backend can use any asynchronous processing mechanism—message queues, worker pools, serverless functions—while presenting a simple HTTP interface to clients.
Async Request-Reply System Architecture
graph LR
Client["Client<br/><i>Web/Mobile App</i>"] -->|"1. POST /api/job<br/>{params}"| LB["Load Balancer"]
LB --> API["API Gateway<br/><i>Validates & assigns ID</i>"]
API -->|"2. Publish message<br/>{correlation_id, params}"| Queue[("Message Queue<br/><i>SQS/RabbitMQ</i>")]
API -->|"3. Insert status<br/>{id, PENDING}"| StatusDB[("Status Database<br/><i>PostgreSQL</i>")]
API -.->|"4. 202 Accepted<br/>{id, status_url}"| Client
Queue -->|"5. Consume"| Worker1["Worker 1<br/><i>Processing</i>"]
Queue -->|"5. Consume"| Worker2["Worker 2<br/><i>Processing</i>"]
Queue -->|"5. Consume"| Worker3["Worker 3<br/><i>Processing</i>"]
Worker1 & Worker2 & Worker3 -->|"6. Update progress<br/>{id, PROCESSING, 45%}"| StatusDB
Worker1 & Worker2 & Worker3 -->|"7. Store result"| Blob[("Blob Storage<br/><i>S3/Azure Blob</i>")]
Worker1 & Worker2 & Worker3 -->|"8. Update final status<br/>{id, COMPLETED, result_url}"| StatusDB
Client -.->|"9. Poll: GET /api/job/{id}<br/>(every 1s, 2s, 4s...)"| API
API -->|"10. Query status"| StatusDB
API -.->|"11. Return status<br/>{status, progress, result_url}"| Client
Worker1 -.->|"12. Optional: POST webhook<br/>{id, result}"| Client
subgraph Frontend Layer
Client
LB
end
subgraph Backend Layer
API
Queue
Worker1
Worker2
Worker3
end
subgraph Storage Layer
StatusDB
Blob
end
Complete async request-reply architecture showing how components interact to decouple long-running operations from HTTP connections. The message queue enables horizontal scaling of workers while the status database provides a single source of truth for operation state.
How It Works
Step 1: Client Initiates Request. The client sends a POST request to initiate a long-running operation. For example, POST /api/reports with parameters for a complex analytics report. The API gateway validates the request and assigns a unique correlation ID (typically a UUID). Instead of processing the report synchronously, the gateway publishes a message to a queue with the correlation ID and request parameters, then immediately returns 202 Accepted with a response body like {"id": "550e8400-e29b-41d4-a716-446655440000", "status_url": "/api/reports/550e8400-e29b-41d4-a716-446655440000"}. This entire flow completes in under 100ms.
Step 2: Backend Processes Asynchronously. Worker processes consume messages from the queue and perform the actual work. For the report generation, this might involve querying multiple databases, aggregating data, and rendering a PDF. The worker updates a status table in the database as it progresses: PENDING → PROCESSING → COMPLETED or FAILED. The correlation ID serves as the primary key for tracking state. If processing fails, the worker can retry with exponential backoff or move the message to a dead-letter queue. The key insight: the original HTTP connection is long gone, but the correlation ID maintains the logical link.
Step 3: Client Retrieves Results (Polling). The client periodically calls GET /api/reports/550e8400-e29b-41d4-a716-446655440000 to check status. Early responses return {"status": "PROCESSING", "progress": 45}. When complete, the response includes {"status": "COMPLETED", "result_url": "/api/reports/550e8400-e29b-41d4-a716-446655440000/download"}. Clients typically use exponential backoff polling: check after 1s, 2s, 4s, 8s to balance responsiveness with server load. The status endpoint is cheap to serve—just a database lookup—so polling doesn’t create significant load.
Step 3 Alternative: Callback Webhooks. Instead of polling, the client can provide a callback URL in the initial request: {"callback_url": "https://client.com/webhooks/report-complete"}. When processing finishes, the backend POSTs results to this URL with the correlation ID. This eliminates polling overhead but requires clients to expose a publicly accessible endpoint and handle webhook security (signature verification, replay protection). Stripe uses this pattern extensively for payment processing—you initiate a payment and receive a webhook when it settles.
Step 4: Timeout and Cleanup. The backend sets a maximum processing time (e.g., 5 minutes). If work doesn’t complete, it marks the request as TIMEOUT and may retry or alert operators. Status records are typically retained for 24-48 hours to allow clients to retrieve results, then purged to manage storage costs. This prevents unbounded growth of the status table.
Async Request-Reply Flow with Polling
sequenceDiagram
participant Client
participant API Gateway
participant Queue
participant Worker
participant Database
participant Storage
Client->>API Gateway: 1. POST /api/reports<br/>{params}
API Gateway->>API Gateway: 2. Generate correlation ID<br/>(UUID)
API Gateway->>Queue: 3. Publish message<br/>{id, params}
API Gateway->>Database: 4. Insert status record<br/>{id, PENDING}
API Gateway-->>Client: 5. 202 Accepted<br/>{id, status_url}
Note over Client,API Gateway: Connection closed (~100ms)
Worker->>Queue: 6. Consume message
Worker->>Database: 7. Update status<br/>{id, PROCESSING}
Worker->>Worker: 8. Generate report<br/>(30-60 seconds)
Worker->>Storage: 9. Store result<br/>(PDF/JSON)
Worker->>Database: 10. Update status<br/>{id, COMPLETED, result_url}
loop Polling with exponential backoff
Client->>API Gateway: 11. GET /api/reports/{id}
API Gateway->>Database: 12. Query status
API Gateway-->>Client: 13. {status: PROCESSING}
Note over Client: Wait 1s, 2s, 4s, 8s...
end
Client->>API Gateway: 14. GET /api/reports/{id}
API Gateway->>Database: 15. Query status
API Gateway-->>Client: 16. {status: COMPLETED,<br/>result_url}
Client->>API Gateway: 17. GET result_url
API Gateway->>Storage: 18. Fetch result
API Gateway-->>Client: 19. Return report data
Complete async request-reply flow showing the three phases: immediate acknowledgment with correlation ID, asynchronous backend processing, and client polling for results. The original HTTP connection closes in under 100ms while work continues in the background.
Timeout and Retry Handling Flow
flowchart TB
Start([Worker receives message]) --> Process[Start processing]
Process --> Check{Processing<br/>complete?}
Check -->|Yes| Success[Update status:<br/>COMPLETED]
Success --> Store[Store result in<br/>blob storage]
Store --> Notify[Send webhook<br/>if configured]
Notify --> Cleanup[Schedule cleanup<br/>after 24-48h]
Cleanup --> End([Done])
Check -->|No| Timeout{Exceeded<br/>max time<br/>5 min?}
Timeout -->|No| Progress[Update progress<br/>in database]
Progress --> Check
Timeout -->|Yes| RetryCount{Retry<br/>count < 3?}
RetryCount -->|Yes| Requeue[Requeue with<br/>exponential backoff]
Requeue --> UpdateRetry[Update status:<br/>RETRYING]
UpdateRetry --> Start
RetryCount -->|No| Failed[Update status:<br/>FAILED]
Failed --> DLQ[Move to dead-letter<br/>queue for investigation]
DLQ --> Alert[Alert operations team]
Alert --> End
Process --> Error{Error<br/>occurred?}
Error -->|Yes| Transient{Transient<br/>error?}
Transient -->|Yes| RetryCount
Transient -->|No| Failed
Error -->|No| Check
Backend worker timeout and retry logic showing how the system handles long-running operations, transient failures, and permanent errors. Automatic retries with exponential backoff prevent cascading failures while dead-letter queues capture operations that need manual intervention.
Variants
Polling-Based Variant: Client repeatedly queries status endpoint until completion. When to use: Clients are simple web browsers or mobile apps that can’t receive webhooks. Pros: No client infrastructure required, works behind firewalls, simple to implement. Cons: Creates polling traffic, adds latency (client must wait for next poll), wastes client resources checking status.
Webhook Callback Variant: Backend pushes results to client-provided URL when ready. When to use: Server-to-server communication where both parties can expose HTTP endpoints. Pros: Zero polling overhead, immediate notification, efficient resource usage. Cons: Requires client to expose public endpoint, adds webhook security complexity, fails if client is temporarily unavailable. Stripe, Twilio, and GitHub use this for event notifications.
Hybrid Variant: Backend supports both polling and optional webhooks. Client can poll for immediate needs but also register a webhook for background processing. When to use: Public APIs serving diverse client types. Pros: Maximum flexibility, clients choose their preferred pattern. Cons: More complex to implement and document. AWS S3 event notifications support both CloudWatch polling and SNS/SQS webhooks.
Result Caching Variant: Backend stores completed results in blob storage and returns a pre-signed URL. When to use: Results are large (videos, reports, datasets) and expensive to regenerate. Pros: Clients can download results multiple times, reduces backend load for repeated requests. Cons: Requires managing blob storage lifecycle, adds storage costs. YouTube uses this for processed videos.
Polling vs Webhook Callback Comparison
graph TB
subgraph Polling Pattern
PC[Client] -->|1. POST /api/job| PG[API Gateway]
PG -->|2. 202 Accepted + ID| PC
PC -.->|3. Poll every 2s| PG
PC -.->|4. Poll every 4s| PG
PC -.->|5. Poll every 8s| PG
PG -->|6. COMPLETED + result| PC
PNote["✓ No client infrastructure<br/>✓ Works behind firewalls<br/>✗ Polling overhead<br/>✗ Added latency"]
end
subgraph Webhook Pattern
WC[Client] -->|1. POST /api/job<br/>{callback_url}| WG[API Gateway]
WG -->|2. 202 Accepted + ID| WC
WG -->|3. Process async| WW[Worker]
WW -->|4. POST callback_url<br/>{id, result}| WC
WNote["✓ Zero polling traffic<br/>✓ Immediate notification<br/>✗ Requires public endpoint<br/>✗ Webhook security needed"]
end
subgraph Hybrid Pattern
HC[Client] -->|1. POST /api/job<br/>{callback_url: optional}| HG[API Gateway]
HG -->|2. 202 Accepted + ID| HC
HC -.->|3a. Can poll if needed| HG
HG -->|3b. Webhook if provided| HC
HNote["✓ Maximum flexibility<br/>✓ Serves diverse clients<br/>✗ More complex to implement"]
end
Three variants of async request-reply showing different result delivery mechanisms. Polling suits simple clients, webhooks optimize for efficiency, and hybrid approaches support both patterns for maximum flexibility.
Trade-offs
Complexity vs Responsiveness: Synchronous APIs are simpler to implement and understand—one request, one response. Async request-reply adds correlation IDs, status tracking, and polling/webhook logic. But synchronous APIs force clients to wait, potentially timing out on slow operations. Decision criteria: Use async when operations take >5 seconds or have unpredictable duration. The added complexity pays for itself in improved user experience and resource efficiency.
Polling vs Webhooks: Polling is simpler for clients (no endpoint exposure) but creates constant background traffic and adds latency. Webhooks are efficient but require clients to handle security, retries, and idempotency. Decision criteria: Use polling for browser/mobile clients or when latency isn’t critical. Use webhooks for server-to-server integration where efficiency matters. Offer both for public APIs.
Stateful vs Stateless: Storing status in a database makes the pattern stateful, requiring database lookups for every status check. Stateless alternatives (encoding status in signed tokens) eliminate database dependency but can’t be updated once issued. Decision criteria: Use stateful tracking for long-running operations where progress updates matter. Use stateless tokens for simple “job submitted” acknowledgments.
Immediate vs Eventual Consistency: Async patterns accept eventual consistency—the client sees “processing” for some time before seeing “completed.” Synchronous APIs provide immediate consistency. Decision criteria: Use async when the business process naturally has latency (payment clearing, video encoding). Avoid async for operations where users expect instant feedback (liking a post, sending a message).
When to Use (and When Not To)
Use async request-reply when: Operations take more than 5 seconds to complete (video transcoding, ML model inference, complex report generation). You need to integrate with external services that have unpredictable latency (payment gateways, third-party APIs). The operation involves multiple steps that might fail independently (multi-stage data pipelines). You want to decouple frontend responsiveness from backend processing capacity. Your infrastructure has connection timeout limits that operations might exceed.
Avoid async request-reply when: Operations complete in under 1 second—the overhead of correlation IDs and status tracking isn’t worth it. Users expect immediate feedback (CRUD operations, real-time chat). The operation must complete before the user can proceed (authentication, authorization checks). You’re building an internal API where all clients are under your control and can use message queues directly. The added complexity of polling or webhooks outweighs the benefits.
Anti-patterns: Using async for operations that are actually fast but occasionally slow (fix the slow cases instead). Implementing async without proper timeout handling (leads to zombie requests). Polling too frequently (creates unnecessary load) or too infrequently (poor user experience). Not providing progress updates for long operations (users don’t know if the system is working). Forgetting to clean up completed status records (database bloat).
Real-World Examples
company: Stripe
system: Payment Processing API
implementation: When you create a payment intent, Stripe returns a 200 OK with status: requires_action and a client secret. The actual payment processing happens asynchronously, involving fraud checks, 3D Secure authentication, and bank communication. Stripe sends webhook events (payment_intent.succeeded, payment_intent.failed) to your callback URL when processing completes. You can also poll the payment intent endpoint to check status. This pattern handles the reality that payment processing can take 5-30 seconds and occasionally requires user interaction.
interesting_detail: Stripe includes an idempotency_key header that clients can set to safely retry requests. If you retry with the same key, Stripe returns the original response instead of creating a duplicate payment. This solves the “did my request succeed?” problem when network failures occur during async operations.
company: AWS Lambda
system: Asynchronous Invocation
implementation: When you invoke a Lambda function asynchronously (using the Event invocation type), AWS returns a 202 Accepted immediately with a request ID. Lambda queues the invocation and processes it when capacity is available. You can configure a destination (SQS, SNS, EventBridge) to receive success or failure notifications. For long-running workflows, Step Functions extends this pattern by orchestrating multiple async Lambda invocations with state tracking.
interesting_detail: AWS automatically retries failed async invocations twice with exponential backoff, then sends the event to a dead-letter queue. This built-in retry logic means you don’t need to implement it yourself, but you must design Lambda functions to be idempotent since they might execute multiple times.
company: Microsoft Azure
system: Durable Functions
implementation: Azure Durable Functions implements async request-reply using the orchestrator pattern. When you start an orchestration, you receive a 202 Accepted with URLs for checking status, sending events, or terminating the orchestration. The framework handles correlation IDs, status persistence, and retry logic automatically. Orchestrations can run for days or weeks, checkpointing progress to survive infrastructure failures.
interesting_detail: Durable Functions provides a built-in HTTP API that returns status URLs following a standard format: {host}/runtime/webhooks/durabletask/instances/{instanceId}. This eliminates the need to build custom status endpoints. The framework also supports external events—you can POST to the instance URL to send signals to running orchestrations, enabling human-in-the-loop workflows.
Interview Essentials
Mid-Level
Explain the three phases of async request-reply (submit, process, retrieve) and why correlation IDs are necessary. Implement a basic polling client that checks status every 2 seconds with exponential backoff. Describe how to store status in a database and what fields are needed (id, status, result, created_at, updated_at). Discuss the difference between 202 Accepted (request queued) and 200 OK (request completed). Explain why you can’t use HTTP status codes alone to track async operations.
Senior
Design a complete async request-reply system including queue selection, worker scaling, status API, and webhook delivery. Implement idempotency using request IDs to handle duplicate submissions. Handle edge cases: what happens if the client never polls for results? How do you prevent status table bloat? How do you handle partial failures in multi-step operations? Compare polling vs webhooks and recommend when to use each. Discuss timeout strategies: should you fail fast or retry indefinitely? Explain how to provide progress updates for long-running operations (percentage complete, current step).
Idempotency and Duplicate Request Handling
sequenceDiagram
participant Client
participant API
participant Cache
participant Queue
participant DB
Note over Client,DB: First Request (Network Success)
Client->>API: 1. POST /api/job<br/>Idempotency-Key: abc-123
API->>Cache: 2. Check if key exists
Cache-->>API: 3. Key not found
API->>Cache: 4. Store key → correlation_id<br/>(TTL: 24h)
API->>DB: 5. Insert {id, PENDING}
API->>Queue: 6. Publish message
API-->>Client: 7. 202 Accepted<br/>{id: uuid-456}
Note over Client,DB: Retry (Client thinks request failed)
Client->>API: 8. POST /api/job<br/>Idempotency-Key: abc-123<br/>(Same key!)
API->>Cache: 9. Check if key exists
Cache-->>API: 10. Found! Returns uuid-456
API->>DB: 11. Query existing status
DB-->>API: 12. {id: uuid-456, PROCESSING}
API-->>Client: 13. 202 Accepted<br/>{id: uuid-456}<br/>(Same response!)
Note over Client,API: No duplicate work created
Note over Client,DB: Worker Processing (Idempotent)
Queue->>Queue: 14. Message redelivered<br/>(Worker crashed)
Queue->>Worker: 15. Process message<br/>{id: uuid-456}
Worker->>DB: 16. Check status
DB-->>Worker: 17. Already COMPLETED
Worker->>Worker: 18. Skip processing<br/>(Idempotent check)
Note over Worker: No duplicate execution
Idempotency implementation using client-provided keys to handle duplicate requests safely. The cache stores the mapping from idempotency key to correlation ID, ensuring retries return the same response without creating duplicate work. Workers also check status before processing to handle message redelivery.
Staff+
Architect async request-reply at scale: how do you handle 1M concurrent operations? Design a system that survives database failures without losing track of in-flight requests. Implement exactly-once delivery semantics for webhooks despite network failures and retries. Discuss the CAP theorem implications: can you guarantee that status queries always reflect current state? Design a multi-tenant system where different customers have different SLAs (some need results in 1 minute, others can wait 1 hour). Explain how to migrate from synchronous to async APIs without breaking existing clients. Discuss observability: what metrics and traces are essential for debugging async operations? How do you implement circuit breakers for downstream services in async workflows?
Common Interview Questions
Why not just use WebSockets or Server-Sent Events instead of polling? (Answer: Async request-reply works over standard HTTP, doesn’t require persistent connections, and is firewall-friendly. WebSockets are better for real-time bidirectional communication, not one-time async operations.)
How do you prevent clients from polling too frequently and overwhelming your servers? (Answer: Implement rate limiting on status endpoints, return Retry-After headers suggesting next poll time, or use exponential backoff recommendations in documentation.)
What happens if the backend crashes while processing a request? (Answer: The message queue retains the work item and another worker picks it up. The status remains ‘PROCESSING’ until the new worker updates it. This is why idempotency matters—the operation might execute multiple times.)
How long should you retain completed status records? (Answer: Depends on your use case. Stripe retains webhook events for 30 days. AWS Lambda keeps logs for 7 days by default. Balance client needs for result retrieval against storage costs. Implement automatic cleanup with configurable TTLs.)
Red Flags to Avoid
Claiming async request-reply is always better than synchronous APIs (it adds complexity that’s only justified for long-running operations)
Not implementing timeouts (leads to zombie requests that consume resources forever)
Polling without exponential backoff (creates unnecessary server load)
Forgetting to make operations idempotent (causes duplicate processing when retries occur)
Not providing a way for clients to cancel in-flight requests (wastes resources on work nobody wants)
Storing large results in the status table instead of using blob storage (causes database bloat)
Not documenting expected processing times (clients don’t know how long to wait before considering a request failed)
Key Takeaways
Async request-reply decouples long-running operations from HTTP request-response cycles by returning 202 Accepted with a correlation ID immediately, then delivering results via polling or webhooks when processing completes.
Correlation IDs are the glue that ties requests to responses across time and system boundaries. They enable stateful tracking without requiring persistent connections.
Choose polling for simple clients (browsers, mobile apps) that can’t expose endpoints. Choose webhooks for server-to-server integration where efficiency matters. Offer both for public APIs.
Idempotency is critical because async operations may execute multiple times due to retries. Use request IDs or correlation IDs to detect and ignore duplicates.
The pattern trades simplicity for responsiveness and scalability. Use it when operations take >5 seconds or have unpredictable duration. Avoid it for fast operations where the overhead isn’t justified.