Choreography Pattern: Saga & Event-Driven Coordination

After this topic, you will be able to:

Compare choreography vs orchestration for service coordination
Design event-driven workflows using choreography pattern
Evaluate trade-offs between decoupling benefits and debugging complexity
Assess when choreography is preferable to orchestration

TL;DR

Choreography is a distributed coordination pattern where services communicate through events without a central controller. Each service listens for events, performs its work, and publishes new events for others to consume. Unlike orchestration, no single service owns the workflow—the business process emerges from independent service interactions.

Cheat Sheet: Event-driven coordination | No central controller | Services react to domain events | High decoupling, harder debugging | Best for: loosely coupled domains, event-driven architectures, saga compensations

The Problem It Solves

In microservices architectures, coordinating multi-step business processes across service boundaries creates a fundamental tension. You need services to work together to complete workflows like order fulfillment (inventory check → payment → shipping → notification), but tight coupling through direct service-to-service calls creates brittle systems. When Service A calls Service B, which calls Service C, you’ve created a dependency chain where any failure cascades and changes ripple across boundaries.

The traditional solution—a central orchestrator that commands each service—solves coordination but reintroduces coupling at the workflow level. The orchestrator becomes a god service that knows too much about every participant’s internal logic. When the payment service adds a fraud check step, the orchestrator needs updating. When you add a new notification channel, the orchestrator grows more complex. You’ve traded service coupling for orchestrator coupling.

The real problem is this: how do you coordinate distributed workflows while preserving service autonomy? Services should be able to evolve independently, deploy on different schedules, and fail without bringing down the entire system. You need coordination without a coordinator—a way for services to collaborate on business processes while remaining loosely coupled and independently deployable.

Solution Overview

Choreography solves distributed coordination by inverting the control flow. Instead of a central orchestrator telling services what to do, each service watches for domain events and decides independently what actions to take. Services communicate by publishing events to a message broker (Kafka, RabbitMQ, AWS SNS/SQS), and interested services subscribe to relevant event types.

When an order is placed, the Order Service publishes an “OrderCreated” event and forgets about it. The Inventory Service listens for OrderCreated events, reserves stock, and publishes “InventoryReserved.” The Payment Service listens for InventoryReserved, processes payment, and publishes “PaymentCompleted.” Each service only knows about the events it cares about—not about the other services or the overall workflow.

The business process emerges from these independent reactions. There’s no master workflow definition in code. Instead, the workflow is implicit in how services respond to events. This creates extreme decoupling: services can be added, removed, or modified without touching other services. The Payment Service doesn’t know or care that a new Fraud Detection Service started listening to OrderCreated events. The workflow adapts organically as services join the event stream.

This pattern shines in event-driven architectures and saga patterns for distributed transactions. When you need to coordinate across bounded contexts (in Domain-Driven Design terms) or when services are owned by different teams with different release cycles, choreography preserves autonomy while enabling collaboration.

Choreography Architecture: Event-Driven Service Coordination

graph TB
    subgraph Client Layer
        Client["Customer<br/><i>Web/Mobile App</i>"]
    end
    
    subgraph Event-Driven Choreography
        subgraph Message Broker
            Kafka["Kafka / RabbitMQ<br/><i>Event Bus</i>"]
            Topics["Topics:<br/>• order-events<br/>• inventory-events<br/>• payment-events<br/>• shipping-events"]
        end
        
        subgraph Independent Services
            OrderSvc["Order Service<br/><i>Publishes: OrderCreated</i><br/><i>Subscribes: ShipmentCreated</i>"]
            InvSvc["Inventory Service<br/><i>Publishes: InventoryReserved</i><br/><i>Subscribes: OrderCreated</i>"]
            PaySvc["Payment Service<br/><i>Publishes: PaymentCompleted</i><br/><i>Subscribes: InventoryReserved</i>"]
            ShipSvc["Shipping Service<br/><i>Publishes: ShipmentCreated</i><br/><i>Subscribes: PaymentCompleted</i>"]
            NotifSvc["Notification Service<br/><i>Subscribes: PaymentCompleted</i>"]
        end
        
        subgraph Data Stores
            OrderDB[("Order DB")]
            InvDB[("Inventory DB")]
            PayDB[("Payment DB")]
            ShipDB[("Shipping DB")]
        end
    end
    
    Client --"POST /orders"--> OrderSvc
    
    OrderSvc --"Publish events"--> Kafka
    InvSvc --"Publish events"--> Kafka
    PaySvc --"Publish events"--> Kafka
    ShipSvc --"Publish events"--> Kafka
    
    Kafka --"Subscribe"--> InvSvc
    Kafka --"Subscribe"--> PaySvc
    Kafka --"Subscribe"--> ShipSvc
    Kafka --"Subscribe"--> NotifSvc
    Kafka --"Subscribe"--> OrderSvc
    
    OrderSvc --> OrderDB
    InvSvc --> InvDB
    PaySvc --> PayDB
    ShipSvc --> ShipDB

In choreography, services communicate exclusively through a message broker, publishing domain events and subscribing to events they care about. No service calls another directly—the workflow emerges from independent reactions. Each service maintains its own database and can be deployed, scaled, and modified independently. Notice how new services (like Notification) can be added without modifying existing services.

Choreography vs Orchestration

The choice between choreography and orchestration is one of the most common system design decisions you’ll face in distributed systems. Understanding when each pattern fits requires examining multiple dimensions:

Control Flow: Orchestration uses explicit, centralized control—an orchestrator service executes a workflow definition that calls other services in sequence. Choreography uses implicit, distributed control—services react to events independently, and the workflow emerges from their interactions. Choose orchestration when you need a single source of truth for workflow logic (regulatory compliance, audit trails). Choose choreography when services span organizational boundaries or need independent evolution.

Coupling: Orchestration couples services to the orchestrator’s workflow definition. When business logic changes, you modify the orchestrator. Choreography couples services to event schemas. When logic changes, you modify event handlers within each service. Choose orchestration when a single team owns all services and can coordinate changes. Choose choreography when services are owned by different teams or when you expect frequent independent changes.

Visibility: Orchestration provides clear workflow visibility—you can inspect the orchestrator to see the entire process. Choreography requires distributed tracing to reconstruct workflows from event logs. Choose orchestration when you need to explain workflows to non-technical stakeholders or when debugging must be straightforward. Choose choreography when you have mature observability infrastructure (distributed tracing, event logging) and can invest in tooling.

Failure Handling: Orchestration centralizes retry logic and compensation in the orchestrator. Choreography distributes failure handling across services—each service must implement its own retry and compensation logic. Choose orchestration when failure scenarios are complex and require centralized decision-making. Choose choreography when services can handle failures independently and when you need resilience to orchestrator failures.

Scalability: Orchestration creates a bottleneck at the orchestrator, which must handle all workflow instances. Choreography scales horizontally—each service scales independently based on its event load. Choose orchestration when workflow volume is moderate and orchestrator scaling is acceptable. Choose choreography when you need extreme scale or when different services have vastly different load characteristics.

Uber uses choreography for trip lifecycle events (trip requested → driver assigned → trip started → trip completed) because services like surge pricing, driver matching, and fraud detection need to react independently to trip events. They use orchestration for complex workflows like driver onboarding, where a single team owns the multi-week process and needs centralized visibility for support teams.

Orchestration vs Choreography: Control Flow Comparison

graph TB
    subgraph Orchestration Pattern
        O["Orchestrator<br/><i>Central Controller</i>"]
        S1["Inventory Service"]
        S2["Payment Service"]
        S3["Shipping Service"]
        O --"1. Check inventory"--> S1
        S1 --"2. Stock available"--> O
        O --"3. Process payment"--> S2
        S2 --"4. Payment success"--> O
        O --"5. Create shipment"--> S3
        S3 --"6. Shipment created"--> O
    end
    
    subgraph Choreography Pattern
        MB["Message Broker<br/><i>Kafka/RabbitMQ</i>"]
        I["Inventory Service"]
        P["Payment Service"]
        Sh["Shipping Service"]
        I --"OrderCreated event"--> MB
        MB --"Subscribe"--> I
        I --"InventoryReserved event"--> MB
        MB --"Subscribe"--> P
        P --"PaymentCompleted event"--> MB
        MB --"Subscribe"--> Sh
    end

Orchestration uses centralized control where the orchestrator commands services sequentially, while choreography uses distributed control where services react independently to events published through a message broker. Notice how orchestration creates a hub-and-spoke dependency on the orchestrator, while choreography eliminates direct service-to-service coupling.

How It Works

Let’s walk through a concrete example: an e-commerce order fulfillment workflow using choreography. This will show how services coordinate without a central controller.

Step 1: Order Creation. A customer submits an order through the Order Service. The service validates the request, persists the order with status “PENDING,” and publishes an “OrderCreated” event to Kafka containing order ID, customer ID, items, and total amount. The Order Service’s job is done—it doesn’t call other services or wait for responses.

Step 2: Inventory Reservation. The Inventory Service subscribes to “OrderCreated” events. When it receives one, it checks stock levels for the requested items. If sufficient inventory exists, it reserves the items, updates its database, and publishes an “InventoryReserved” event with order ID and reserved item details. If inventory is insufficient, it publishes an “InventoryInsufficient” event instead. The Inventory Service doesn’t know what happens next—it just publishes the outcome.

Step 3: Payment Processing. The Payment Service subscribes to “InventoryReserved” events (it ignores “InventoryInsufficient” events—those are handled by other services). When inventory is reserved, Payment Service charges the customer’s payment method. On success, it publishes “PaymentCompleted” with transaction ID and amount. On failure, it publishes “PaymentFailed” with error details. Each event represents a state transition in the payment domain.

Step 4: Parallel Reactions. Multiple services react to “PaymentCompleted” simultaneously. The Shipping Service creates a shipment and publishes “ShipmentCreated.” The Notification Service sends a confirmation email. The Loyalty Service awards points. These services don’t coordinate with each other—they independently react to the same event. This parallel processing is a key choreography advantage.

Step 5: Compensation Flow. If payment fails, the Payment Service publishes “PaymentFailed.” The Inventory Service listens for this event and releases the reserved inventory, publishing “InventoryReleased.” The Order Service updates the order status to “CANCELLED.” This compensation logic is distributed—each service knows how to undo its own work when it sees failure events.

Step 6: Workflow Completion. The Order Service subscribes to “ShipmentCreated” events. When it receives one, it updates the order status to “SHIPPED” and publishes “OrderShipped.” The workflow is complete, but no single service orchestrated the entire process. The business outcome emerged from independent service reactions to domain events.

Notice what’s missing: no service calls another service directly. No service knows the complete workflow. Each service only understands its domain and the events it cares about. This is choreography’s core principle—coordination through events, not commands.

E-commerce Order Fulfillment: Choreographed Event Flow

sequenceDiagram
    participant Customer
    participant OrderSvc as Order Service
    participant Kafka as Event Broker
    participant InvSvc as Inventory Service
    participant PaySvc as Payment Service
    participant ShipSvc as Shipping Service
    participant NotifSvc as Notification Service
    
    Customer->>OrderSvc: Submit Order
    OrderSvc->>OrderSvc: Persist order (PENDING)
    OrderSvc->>Kafka: Publish OrderCreated event
    Note over OrderSvc: Job done, no waiting
    
    Kafka->>InvSvc: OrderCreated event
    InvSvc->>InvSvc: Check & reserve stock
    InvSvc->>Kafka: Publish InventoryReserved
    
    Kafka->>PaySvc: InventoryReserved event
    PaySvc->>PaySvc: Charge payment method
    PaySvc->>Kafka: Publish PaymentCompleted
    
    par Parallel Reactions
        Kafka->>ShipSvc: PaymentCompleted event
        ShipSvc->>ShipSvc: Create shipment
        ShipSvc->>Kafka: Publish ShipmentCreated
    and
        Kafka->>NotifSvc: PaymentCompleted event
        NotifSvc->>Customer: Send confirmation email
    end
    
    Kafka->>OrderSvc: ShipmentCreated event
    OrderSvc->>OrderSvc: Update status (SHIPPED)

The order fulfillment workflow emerges from independent service reactions to domain events. Each service publishes events after completing its work and subscribes only to events it cares about. Notice how Payment and Notification services react in parallel to PaymentCompleted—no coordination needed between them.

Choreography Compensation Flow: Handling Payment Failure

graph LR
    subgraph Happy Path
        OC["OrderCreated"] --> IR["InventoryReserved"]
        IR --> PC["PaymentCompleted"]
        PC --> SC["ShipmentCreated"]
    end
    
    subgraph Compensation Path
        IR2["InventoryReserved"] --> PF["PaymentFailed"]
        PF --> IRL["InventoryReleased"]
        PF --> OCA["OrderCancelled"]
    end
    
    subgraph Service Responsibilities
        InvSvc["Inventory Service<br/><i>Listens: OrderCreated, PaymentFailed</i><br/><i>Publishes: InventoryReserved, InventoryReleased</i>"]
        PaySvc["Payment Service<br/><i>Listens: InventoryReserved</i><br/><i>Publishes: PaymentCompleted, PaymentFailed</i>"]
        OrdSvc["Order Service<br/><i>Listens: PaymentFailed</i><br/><i>Publishes: OrderCancelled</i>"]
    end

When payment fails, services independently execute compensation logic by listening for failure events. The Inventory Service releases reserved stock upon receiving PaymentFailed, while the Order Service cancels the order. Each service knows how to undo its own work without central coordination—this distributed compensation is a key choreography pattern.

Variants

Event Sourcing Choreography: Services don’t just publish events—they use events as the primary source of truth for state. Each service rebuilds its state by replaying events from the event log. This variant provides complete audit trails and time-travel debugging capabilities. Use when you need regulatory compliance, complex temporal queries, or the ability to rebuild state from scratch. The trade-off is increased complexity in event schema evolution and storage costs for retaining all events. Netflix uses this variant for billing systems where audit requirements demand complete event history.

Saga Choreography: A specialized variant for distributed transactions where each service publishes both success and compensation events. When a step fails, services listen for compensation events and undo their work. For example, “PaymentFailed” triggers “InventoryReleased” and “OrderCancelled” events. Use when you need ACID-like guarantees across services without distributed transactions. The trade-off is complex compensation logic and potential for inconsistent states during failures. This is the most common choreography variant in microservices architectures.

Hybrid Choreography-Orchestration: Critical path steps use orchestration for visibility and control, while ancillary reactions use choreography for decoupling. For example, order → payment → shipping uses an orchestrator, but notifications, analytics, and loyalty points use choreography off the “OrderShipped” event. Use when you need both workflow visibility and extensibility. The trade-off is architectural complexity from mixing patterns. Stripe uses this variant—payment processing is orchestrated, but webhooks enable choreographed reactions for customer systems.

Trade-offs

Decoupling vs. Visibility: Choreography maximizes service independence—services can be deployed, scaled, and modified without coordinating with others. You gain organizational autonomy and independent evolution. However, you sacrifice workflow visibility. There’s no single place to see the complete business process. To understand what happens when an order is placed, you must trace events across multiple services. Choose choreography when team autonomy and independent deployment are more valuable than centralized workflow understanding. Choose orchestration when stakeholders need to visualize and modify workflows without diving into distributed traces.

Scalability vs. Debuggability: Choreography scales horizontally without bottlenecks—each service processes events independently, and you can add consumers without affecting publishers. You gain extreme scalability and resilience to individual service failures. However, debugging becomes significantly harder. When an order fails, you must correlate events across services using distributed tracing. Race conditions and timing issues are difficult to reproduce. Choose choreography when you need to handle millions of events per second and have mature observability infrastructure. Choose orchestration when debugging must be straightforward and you can accept orchestrator scaling limits.

Flexibility vs. Consistency: Choreography enables easy workflow extension—new services can subscribe to existing events without modifying publishers. You gain flexibility to add features (fraud detection, analytics, A/B testing) without touching core services. However, maintaining consistency is harder. Services might process events in different orders, leading to temporary inconsistencies. Ensuring all services eventually reach the same conclusion requires careful event design and idempotency. Choose choreography when you expect frequent workflow changes and can tolerate eventual consistency. Choose orchestration when you need strong consistency guarantees and predictable execution order.

Resilience vs. Complexity: Choreography eliminates single points of failure—no orchestrator means no orchestrator outages. Services continue processing events even if other services are down. You gain resilience through decentralization. However, failure handling becomes more complex. Each service must implement retry logic, dead letter queues, and compensation logic independently. Ensuring consistent failure behavior across services requires discipline and shared libraries. Choose choreography when you need extreme availability and can invest in distributed failure handling. Choose orchestration when centralized failure management is acceptable and you want simpler service implementations.

Choreography Trade-offs: Visibility vs Debugging Complexity

graph TB
    subgraph Orchestration: Clear Visibility
        OW["Workflow Definition<br/><i>Single source of truth</i>"]
        OW --> Step1["1. Check Inventory"]
        Step1 --> Step2["2. Process Payment"]
        Step2 --> Step3["3. Create Shipment"]
        Step3 --> Complete["Order Complete"]
        
        Debug1["Debugging: Inspect orchestrator state"] -.-> OW
    end
    
    subgraph Choreography: Distributed Traces
        E1["OrderCreated<br/><i>t=0ms, correlation_id=abc123</i>"]
        E2["InventoryReserved<br/><i>t=45ms, correlation_id=abc123</i>"]
        E3["PaymentCompleted<br/><i>t=120ms, correlation_id=abc123</i>"]
        E4["ShipmentCreated<br/><i>t=200ms, correlation_id=abc123</i>"]
        
        E1 -.-> E2 -.-> E3 -.-> E4
        
        Debug2["Debugging: Query event logs<br/>Reconstruct from traces"] -.-> E1
        Debug2 -.-> E2
        Debug2 -.-> E3
        Debug2 -.-> E4
    end
    
    Trade["Trade-off Decision"]
    Trade --> Choice1["Need workflow visibility?<br/>Choose Orchestration"]
    Trade --> Choice2["Need service autonomy?<br/>Choose Choreography<br/><i>Invest in observability</i>"]

Orchestration provides a single workflow definition that’s easy to inspect and debug, while choreography requires reconstructing workflows from distributed event logs using correlation IDs and tracing tools. The visibility trade-off is choreography’s biggest challenge—you gain service independence but must invest heavily in observability infrastructure to understand system behavior.

When to Use (and When Not To)

Use choreography when:

Services span organizational boundaries. When different teams or companies own services, choreography preserves autonomy. Each team can evolve their service independently as long as they honor event contracts. This is why B2B integrations and partner ecosystems favor choreography—you can’t force external partners to coordinate deployments with your orchestrator changes.

Workflow changes frequently. When business requirements evolve rapidly and you need to add new reactions to existing events without modifying core services, choreography shines. Adding a new notification channel or analytics pipeline shouldn’t require touching the order service. Event-driven architectures enable this extensibility.

You need extreme scale. When handling millions of events per second, orchestrator bottlenecks become unacceptable. Choreography allows each service to scale independently based on its event load. Uber’s trip events and Netflix’s viewing events require choreography-level scale.

Services are loosely coupled domains. When services represent distinct bounded contexts in Domain-Driven Design terms, choreography respects domain boundaries. Payment, inventory, and shipping are separate domains that shouldn’t be tightly coupled through orchestrator logic.

Avoid choreography when:

Workflows require strict ordering. When steps must execute in a specific sequence with no parallelism, orchestration’s explicit control flow is clearer. Choreography can enforce ordering through event dependencies, but it’s more complex than orchestrator sequence definitions.

Debugging must be straightforward. When non-technical stakeholders need to understand workflows or when your team lacks distributed tracing expertise, orchestration’s centralized visibility is valuable. Choreography requires sophisticated observability infrastructure.

Compensation logic is complex. When failure scenarios require centralized decision-making (should we retry payment or cancel the order?), orchestration’s centralized control simplifies implementation. Choreography distributes these decisions across services, increasing complexity.

You have few services. When your system has 3-5 services owned by a single team, orchestration’s simplicity outweighs choreography’s decoupling benefits. The overhead of event infrastructure and distributed tracing isn’t justified for small systems.

Real-World Examples

company: Uber system: Trip Lifecycle Management how_they_use_it: Uber uses choreography for coordinating trip events across 50+ microservices. When a trip is requested, the Trip Service publishes a “TripRequested” event. The Surge Pricing Service, Driver Matching Service, Fraud Detection Service, and ETA Calculation Service all react independently to this event. When a driver accepts, “TripAccepted” triggers parallel reactions in the Notification Service (rider SMS), Mapping Service (route calculation), and Payment Service (pre-authorization). No central orchestrator coordinates these services—the trip workflow emerges from independent event reactions. interesting_detail: Uber’s choreography architecture enables them to add new features like safety checks or accessibility matching without modifying core trip services. New services simply subscribe to existing trip events. However, they maintain a dedicated “Trip Observability” team that builds tooling to visualize event flows across services, addressing choreography’s visibility challenge. They use distributed tracing with correlation IDs to reconstruct trip workflows from event logs.

company: Netflix system: Content Encoding Pipeline how_they_use_it: When new content is uploaded, Netflix uses choreography to coordinate encoding, quality analysis, thumbnail generation, and metadata extraction. The Upload Service publishes a “ContentUploaded” event. The Encoding Service creates multiple quality versions (4K, 1080p, 720p) and publishes “EncodingCompleted” for each. The Quality Analysis Service validates each encoding, the Thumbnail Service generates preview images, and the CDN Service distributes files to edge locations—all reacting independently to encoding completion events. interesting_detail: Netflix chose choreography because encoding requirements change frequently (new codecs, quality levels, analysis algorithms). Adding a new encoding format or analysis step doesn’t require modifying the upload service or coordinating deployments. However, they found that debugging encoding failures required building custom tooling to correlate events across services. They now maintain an “Encoding Dashboard” that reconstructs the complete pipeline state from event logs, providing orchestration-like visibility for support teams.

company: Stripe system: Webhook Delivery System how_they_use_it: Stripe uses choreography for delivering payment events to customer systems. When a payment succeeds, Stripe publishes internal events that trigger multiple reactions: updating the customer’s balance, recording the transaction for accounting, checking for fraud patterns, and queuing webhook deliveries to customer endpoints. Customer systems then react to Stripe’s webhooks with their own choreographed workflows—updating order status, sending confirmation emails, triggering fulfillment. interesting_detail: Stripe’s webhook system is a real-world example of choreography enabling ecosystem extensibility. Thousands of customer systems react to Stripe events without Stripe knowing or caring about their internal workflows. However, Stripe provides extensive webhook debugging tools (event logs, replay functionality, delivery status) because they recognized that choreography’s visibility challenges affect their customers. They essentially built orchestration-like observability on top of a choreographed system.

Interview Essentials

Mid-Level

At the mid-level, interviewers expect you to explain choreography’s basic mechanics and compare it to orchestration. You should be able to design a simple choreographed workflow (order fulfillment, user registration) and explain how services communicate through events. Demonstrate understanding of event-driven architecture fundamentals: publish-subscribe patterns, message brokers, and event schemas. Be prepared to discuss basic trade-offs: ‘Choreography decouples services but makes debugging harder.’ Show awareness of when choreography fits: ‘When services are owned by different teams and need independent deployment.’ Red flags: confusing choreography with simple pub-sub (choreography coordinates workflows, not just notifications), not understanding the visibility trade-off, or claiming choreography is always better than orchestration.

Senior

Senior engineers must demonstrate deep understanding of choreography’s trade-offs and implementation challenges. Explain how to handle distributed failures: idempotency, dead letter queues, compensation events, and eventual consistency. Discuss event schema evolution: versioning strategies, backward compatibility, and schema registries. Show experience with observability: ‘We used correlation IDs and distributed tracing to reconstruct workflows from event logs.’ Compare choreography variants: saga choreography vs. event sourcing choreography. Be prepared to design complex workflows: ‘How would you handle a multi-step booking process with potential failures at each step?’ Discuss when to choose choreography over orchestration with nuance: ‘We used choreography for the core order flow because services were owned by different teams, but we used orchestration for the returns process because it required complex decision logic.’ Red flags: not addressing failure handling, ignoring observability challenges, or being dogmatic about pattern choice without considering context.

Staff+

Staff+ engineers must demonstrate strategic thinking about choreography’s organizational and architectural implications. Discuss how choreography affects team structure: ‘Event-driven choreography enables teams to own services end-to-end without coordinating deployments.’ Explain how to evolve choreographed systems: ‘We migrated from orchestration to choreography by introducing events alongside existing service calls, then gradually removing direct dependencies.’ Address governance challenges: ‘We established an event catalog and schema registry to prevent event proliferation and ensure consistent event design across teams.’ Show experience with hybrid approaches: ‘We use choreography for the happy path and orchestration for exception handling because stakeholders need visibility into failure scenarios.’ Discuss observability at scale: ‘We built custom tooling to visualize event flows and detect missing event handlers.’ Be prepared to debate architectural philosophy: ‘Choreography optimizes for organizational scalability at the cost of technical complexity—it’s a trade-off worth making when you have 50+ services owned by different teams.’ Red flags: not addressing organizational implications, lacking experience with choreography at scale, or not having opinions on when choreography is the wrong choice.

Common Interview Questions

When would you choose choreography over orchestration? (Answer: When services span team boundaries, when workflows change frequently, when you need extreme scale, or when services represent distinct bounded contexts. Avoid choreography when workflows require strict ordering, when debugging must be straightforward, or when you have few services owned by one team.)

How do you handle failures in choreographed workflows? (Answer: Each service implements idempotency to handle duplicate events, uses dead letter queues for poison messages, publishes compensation events when operations fail, and relies on eventual consistency. Distributed tracing with correlation IDs helps debug failures across services.)

How do you maintain visibility in choreographed systems? (Answer: Use distributed tracing with correlation IDs to reconstruct workflows, maintain an event catalog documenting all events and their consumers, build dashboards that visualize event flows, and implement comprehensive logging. Some teams build ‘workflow reconstruction’ tools that query event logs to show the current state of business processes.)

How do you evolve event schemas without breaking consumers? (Answer: Use a schema registry to version events, maintain backward compatibility by adding optional fields rather than removing required ones, use separate event types for breaking changes, and coordinate schema changes through API contracts. Some teams use techniques like Postel’s Law: be conservative in what you publish, liberal in what you accept.)

Red Flags to Avoid

Claiming choreography is always better than orchestration without discussing trade-offs

Not addressing failure handling and compensation logic in choreographed workflows

Ignoring observability and debugging challenges inherent in distributed event-driven systems

Confusing choreography with simple pub-sub or event notifications (choreography coordinates multi-step workflows)

Not understanding eventual consistency implications and how to handle temporary inconsistencies

Designing choreographed workflows without considering event schema evolution and versioning

Not having experience with distributed tracing or correlation IDs for debugging

Being unable to explain when orchestration would be a better choice than choreography

Key Takeaways

Choreography coordinates distributed workflows through events rather than central control. Services react independently to domain events, and the business process emerges from their interactions. This maximizes service autonomy but sacrifices workflow visibility.

The choreography vs. orchestration decision hinges on organizational structure and scale requirements. Choose choreography when services span team boundaries, when workflows change frequently, or when you need extreme scale. Choose orchestration when you need workflow visibility, strict ordering, or when a single team owns all services.

Choreography requires sophisticated observability infrastructure. Without distributed tracing, correlation IDs, and event logging, debugging choreographed workflows becomes nearly impossible. Invest in observability tooling before adopting choreography at scale.

Failure handling is distributed in choreography. Each service must implement idempotency, dead letter queues, and compensation logic independently. This increases service complexity but eliminates single points of failure and enables resilient systems.

Event schema evolution is critical for long-term choreography success. Use schema registries, maintain backward compatibility, and version events carefully. Breaking changes to event schemas require coordinating updates across all consumers, undermining choreography’s decoupling benefits.