Sequential Convoy Pattern: Ordered Message Processing

After this topic, you will be able to:

Implement sequential convoy pattern for ordered message processing
Design session-based message grouping strategies
Evaluate trade-offs between ordering guarantees and throughput

TL;DR

Sequential Convoy processes related messages in strict order without blocking unrelated message groups. Messages are grouped by session ID or partition key, ensuring order within each group while maintaining parallelism across groups. Critical for workflows where order matters (payment → fulfillment) but global ordering would kill throughput.

Cheat Sheet: Group messages by session/entity ID → Process each group sequentially → Parallelize across groups → Use partition keys for routing → Trade global throughput for per-entity ordering guarantees.

The Problem It Solves

Imagine an e-commerce system processing thousands of orders per second. Each order flows through validation → payment → inventory → shipping. If payment completes before validation, you charge customers for invalid orders. If shipping triggers before inventory reservation, you promise products you don’t have. You need strict ordering per order, but processing all orders sequentially would crater throughput from 10,000 to 100 orders/second.

The naive solution—a single queue with one consumer—guarantees order but creates a bottleneck. The Competing Consumers pattern achieves high throughput by parallelizing across multiple consumers, but messages arrive out of order. Order #1234 might get processed by Consumer A while Order #1235 goes to Consumer B, and B finishes first. If both orders are for the same customer updating their address, you get race conditions.

You need ordering guarantees for related messages (same order, same user session, same bank account) while maintaining parallelism for unrelated messages. Sequential Convoy solves this by grouping related messages into “convoys” that process sequentially, while different convoys process in parallel. It’s the Goldilocks solution between single-threaded ordering and chaotic parallelism.

The Ordering vs. Parallelism Dilemma

graph TB
    subgraph Single Queue - Ordered but Slow
        Q1["Queue"]
        C1["Consumer"]
        Q1 --"Sequential<br/>100 msg/sec"--> C1
    end
    
    subgraph Competing Consumers - Fast but Chaotic
        Q2["Queue"]
        C2A["Consumer A"]
        C2B["Consumer B"]
        C2C["Consumer C"]
        Q2 --"Order #1234<br/>Step 2"--> C2A
        Q2 --"Order #1234<br/>Step 1"--> C2B
        Q2 --"Order #5678<br/>Step 1"--> C2C
        C2B -."Finishes first<br/>❌ Out of order".-> C2A
    end
    
    subgraph Sequential Convoy - Ordered Groups in Parallel
        Q3["Queue with<br/>Session IDs"]
        C3A["Consumer A<br/><i>Session: ORD-1234</i>"]
        C3B["Consumer B<br/><i>Session: ORD-5678</i>"]
        Q3 --"Order #1234<br/>Step 1→2→3"--> C3A
        Q3 --"Order #5678<br/>Step 1→2→3"--> C3B
        C3A -."✓ In order<br/>within session".-> C3A
        C3B -."✓ Parallel<br/>across sessions".-> C3B
    end

Sequential Convoy solves the trade-off between single-threaded ordering (slow but correct) and competing consumers (fast but chaotic) by grouping related messages into sessions that process sequentially while different sessions run in parallel.

Solution Overview

Sequential Convoy groups messages by a correlation identifier (session ID, user ID, order ID) and ensures messages within each group process in order, while different groups process concurrently. Think of it as multiple single-file lanes on a highway—cars in each lane maintain order, but lanes move independently.

The pattern works through three mechanisms. First, messages carry a session identifier that groups related operations. Second, the message broker or application logic routes messages with the same session ID to the same processing unit (queue partition, consumer instance, or worker thread). Third, each processing unit handles one message at a time for its assigned sessions, maintaining order within the session while other units process different sessions in parallel.

This differs from global ordering (all messages sequential) and pure parallelism (no ordering). It’s domain-driven ordering—you define what “related” means for your use case. For Slack, messages in Channel A must arrive in order, but Channel A and Channel B can process concurrently. For a banking system, transactions for Account #5678 must be sequential, but Account #5678 and Account #9012 can run in parallel.

How It Works

Step 1: Assign Session Identifiers. Each message includes a session ID that groups related operations. In an order processing system, the order ID becomes the session ID. For a chat application, the channel ID or conversation ID serves as the session. The producer stamps this identifier when publishing: {"orderId": "ORD-1234", "action": "VALIDATE", "sessionId": "ORD-1234"}.

Step 2: Partition by Session. The message broker uses the session ID as a partition key. Azure Service Bus uses SessionId, Kafka uses the message key, AWS SQS uses MessageGroupId for FIFO queues. Messages with the same session ID always route to the same partition or queue. This is critical—if Order #1234’s messages scatter across partitions, you lose ordering guarantees.

Step 3: Lock and Process Sequentially. When a consumer picks up a message, it acquires a session lock. While processing messages for session “ORD-1234”, that consumer won’t process other messages for the same session, but it can process messages from different sessions. The consumer processes messages in the order they arrived within the session: VALIDATE → PAYMENT → INVENTORY → SHIP.

Step 4: Release and Continue. After processing a message (or batch of messages for the session), the consumer releases the session lock and either picks up the next message in the same session or switches to a different session. If Consumer A is processing Order #1234 and Consumer B is processing Order #5678, both run in parallel. But if Order #1234 has 10 steps, Consumer A processes all 10 sequentially before moving to another order.

Step 5: Handle Failures Carefully. If processing fails mid-session, the pattern must decide: retry immediately (blocking the session) or dead-letter and continue (breaking order). Most implementations retry with exponential backoff while holding the session lock, accepting that a poison message can block an entire session. This is the pattern’s Achilles’ heel.

Sequential Convoy Message Flow

sequenceDiagram
    participant P as Producer
    participant B as Message Broker<br/>(Partitioned)
    participant C1 as Consumer 1
    participant C2 as Consumer 2
    
    Note over P: Step 1: Assign Session IDs
    P->>B: Msg1 {orderId: "ORD-1234", action: "VALIDATE", sessionId: "ORD-1234"}
    P->>B: Msg2 {orderId: "ORD-1234", action: "PAYMENT", sessionId: "ORD-1234"}
    P->>B: Msg3 {orderId: "ORD-5678", action: "VALIDATE", sessionId: "ORD-5678"}
    P->>B: Msg4 {orderId: "ORD-1234", action: "SHIP", sessionId: "ORD-1234"}
    
    Note over B: Step 2: Partition by Session<br/>ORD-1234 → Partition 1<br/>ORD-5678 → Partition 2
    
    Note over C1,C2: Step 3: Lock and Process Sequentially
    B->>C1: Acquire session lock: ORD-1234
    B->>C2: Acquire session lock: ORD-5678
    
    C1->>C1: Process VALIDATE (ORD-1234)
    C2->>C2: Process VALIDATE (ORD-5678)
    
    C1->>C1: Process PAYMENT (ORD-1234)
    Note over C1: Sequential within session
    
    C1->>C1: Process SHIP (ORD-1234)
    
    Note over C1: Step 4: Release and Continue
    C1->>B: Release session lock: ORD-1234
    C2->>B: Release session lock: ORD-5678
    
    Note over C1,C2: Both consumers processed<br/>different sessions in parallel

Messages with the same session ID route to the same partition and process sequentially by a single consumer, while different sessions process concurrently across multiple consumers. The broker manages session locks to prevent concurrent processing within a session.

Failure Handling: Retry vs. Dead-Letter Trade-off

stateDiagram-v2
    [*] --> Processing: Consumer receives<br/>message from session
    
    Processing --> Success: Processing<br/>succeeds
    Processing --> Failure: Processing<br/>fails
    
    Success --> [*]: Release session lock<br/>Continue to next message
    
    Failure --> RetryDecision: Evaluate retry count
    
    RetryDecision --> RetryAndBlock: Retry count < max<br/>(e.g., < 3 attempts)
    RetryDecision --> DeadLetterAndContinue: Retry count >= max<br/>(e.g., >= 3 attempts)
    
    RetryAndBlock --> Backoff: Hold session lock<br/>⚠️ Blocks entire session
    Backoff --> Processing: Exponential backoff<br/>(1s, 2s, 4s...)
    
    DeadLetterAndContinue --> DeadLetter: Move to DLQ<br/>❌ Breaks ordering
    DeadLetter --> [*]: Release session lock<br/>Continue with next message
    
    note right of RetryAndBlock
        Preserves ordering
        Blocks session on poison message
        Use for: Financial transactions
    end note
    
    note right of DeadLetterAndContinue
        Maintains throughput
        Breaks ordering guarantee
        Use for: Chat messages, logs
    end note

When message processing fails, Sequential Convoy must choose between retry-and-block (preserves order but one poison message stops the entire session) or dead-letter-and-continue (maintains throughput but breaks ordering). The choice depends on domain criticality—financial systems block, user-facing features continue.

Variants

Session-Based Convoy (Azure Service Bus): Uses explicit session IDs and session-aware consumers. The broker manages session locks automatically. When to use: When you need broker-level ordering guarantees and can afford vendor lock-in. Pros: Broker handles complexity, strong guarantees. Cons: Requires session-aware message brokers, limited to Azure/RabbitMQ with plugins.

Partition Key Convoy (Kafka): Uses partition keys to route messages to the same partition, relying on single-consumer-per-partition semantics for ordering. When to use: When using Kafka or similar log-based brokers. Pros: Scales horizontally by adding partitions, fits Kafka’s model naturally. Cons: Partition count is fixed, rebalancing can temporarily break order.

Application-Level Convoy: The application maintains in-memory queues per session and processes them sequentially, even with a non-ordered message broker. When to use: When your broker doesn’t support sessions or partitions (basic SQS). Pros: Works with any message broker, full control over retry logic. Cons: Complex to implement correctly, risk of message loss on consumer crash, requires distributed locking for multiple consumer instances.

Sequential Convoy Implementation Variants

graph TB
    subgraph Session-Based Convoy - Azure Service Bus
        P1["Producer<br/>Sets SessionId"] --> ASB["Azure Service Bus<br/><i>Session-aware</i>"]
        ASB --> C1A["Consumer A<br/>Locked to Session X"]
        ASB --> C1B["Consumer B<br/>Locked to Session Y"]
        ASB -."Broker manages<br/>session locks".-> ASB
    end
    
    subgraph Partition Key Convoy - Kafka
        P2["Producer<br/>Sets partition key"] --> K["Kafka Topic<br/><i>3 partitions</i>"]
        K --> P0["Partition 0"]
        K --> P1K["Partition 1"]
        K --> P2K["Partition 2"]
        P0 --> C2A["Consumer A<br/>Reads P0 only"]
        P1K --> C2B["Consumer B<br/>Reads P1 only"]
        P2K --> C2C["Consumer C<br/>Reads P2 only"]
    end
    
    subgraph Application-Level Convoy - Basic SQS
        P3["Producer<br/>No special ID"] --> SQS["SQS Queue<br/><i>No ordering</i>"]
        SQS --> C3["Consumer<br/><i>App logic</i>"]
        C3 --> QM["In-Memory<br/>Session Queues"]
        QM --> S1["Session A Queue"]
        QM --> S2["Session B Queue"]
        S1 --> W1["Worker Thread 1"]
        S2 --> W2["Worker Thread 2"]
        C3 -."App maintains<br/>session state".-> QM
    end

Three implementation approaches: Session-Based uses broker-managed session locks (Azure Service Bus), Partition Key relies on single-consumer-per-partition semantics (Kafka), and Application-Level implements session queues in application code (works with any broker but adds complexity).

Trade-offs

Ordering Scope: Global ordering (all messages sequential) vs. session-based ordering (per-entity sequential) vs. no ordering (pure parallelism). Decision criteria: Use session-based when domain entities have ordering requirements but the system has multiple independent entities. Global ordering is for audit logs or event sourcing where causality matters across all events. No ordering is for idempotent operations where order doesn’t affect correctness.

Throughput vs. Latency: Sequential processing within sessions reduces throughput for hot sessions. If 80% of messages belong to one session, you’ve effectively created a bottleneck. Decision criteria: Profile your session distribution. If sessions are evenly distributed (many users, each with few messages), Sequential Convoy scales well. If you have power users generating 1000x more messages than average users, consider splitting hot sessions into sub-sessions or accepting eventual consistency.

Failure Handling: Retry-and-block vs. dead-letter-and-continue. Blocking preserves order but one poison message stops the entire session. Dead-lettering maintains throughput but breaks ordering. Decision criteria: For critical workflows (financial transactions), block and alert on poison messages. For user-facing features (chat messages), dead-letter after N retries and continue—better to skip one message than freeze the entire conversation.

Complexity: Broker-managed sessions vs. application-managed queues. Broker-managed is simpler but locks you to specific technologies. Application-managed works anywhere but requires careful state management. Decision criteria: Use broker-managed for greenfield projects on Azure/AWS. Use application-managed when migrating existing systems or when broker features are insufficient.

Hot Session Bottleneck Problem

graph LR
    subgraph Even Distribution - Good Parallelism
        MB1["Message Broker<br/><i>1000 sessions</i>"]
        MB1 --> S1["Session A<br/>10 msgs"]
        MB1 --> S2["Session B<br/>10 msgs"]
        MB1 --> S3["Session C<br/>10 msgs"]
        MB1 --> S4["...<br/>997 more"]
        S1 --> C1["Consumer 1"]
        S2 --> C2["Consumer 2"]
        S3 --> C3["Consumer 3"]
        S4 --> C4["Consumer N"]
        C1 & C2 & C3 & C4 -."✓ 10,000 msgs/sec<br/>across all consumers".-> Result1["High Throughput"]
    end
    
    subgraph Hot Session - Bottleneck
        MB2["Message Broker<br/><i>10 sessions</i>"]
        MB2 --> HS["Hot Session X<br/>8000 msgs<br/>⚠️ Bottleneck"]
        MB2 --> S5["Session Y<br/>200 msgs"]
        MB2 --> S6["Session Z<br/>200 msgs"]
        MB2 --> S7["...<br/>7 more"]
        HS --> C5["Consumer 1<br/><i>Overloaded</i>"]
        S5 --> C6["Consumer 2<br/><i>Idle 80%</i>"]
        S6 --> C7["Consumer 3<br/><i>Idle 80%</i>"]
        S7 --> C8["Consumer N<br/><i>Idle 80%</i>"]
        C5 -."❌ 100 msgs/sec<br/>limited by one consumer".-> Result2["Low Throughput"]
    end

Sequential Convoy throughput depends on session distribution. With even distribution (many sessions, few messages each), parallelism is excellent. With hot sessions (one session dominates traffic), that session becomes a bottleneck since it can only be processed by one consumer at a time.

When to Use (and When Not To)

Use Sequential Convoy when you have multiple independent entities that each require ordered processing, but global ordering would be overkill. Classic scenarios: order processing systems where each order must flow through steps sequentially, but different orders can process in parallel; chat applications where messages in each conversation must arrive in order; financial systems where transactions per account must be sequential; workflow engines where each workflow instance has ordered steps.

The pattern fits when session distribution is relatively even. If you have 10,000 users each sending 10 messages per day, you get excellent parallelism. If you have 10 users each sending 10,000 messages per day, you’ve created 10 bottlenecks.

Anti-patterns: Don’t use Sequential Convoy for globally ordered event streams—use event sourcing or append-only logs instead. Don’t use it when operations are truly independent and idempotent—Competing Consumers is simpler and faster. Don’t use it when you need cross-session ordering (e.g., “process all payments before any shipments”)—you need distributed transactions or sagas. Don’t use it when your broker doesn’t support sessions/partitions and you’re not prepared to build application-level session management—the complexity isn’t worth it.

Real-World Examples

company: Slack system: Message delivery infrastructure how_they_use_it: Slack uses Sequential Convoy to ensure messages within a channel arrive in order while different channels process independently. Each channel has a session ID, and messages for that channel route to the same partition in their Kafka-based infrastructure. This guarantees that if User A sends “Hello” then “World” in Channel #general, all users see them in that order, even during high load. The interesting detail: Slack doesn’t use global ordering across all channels because that would limit throughput to a single consumer’s capacity. By partitioning on channel ID, they achieve millions of messages per second across their entire platform while maintaining per-channel ordering. interesting_detail: Slack handles hot channels (large public channels with thousands of messages per minute) by implementing sub-partitioning within the channel session, effectively creating multiple ordered sub-streams that merge at the client level.

company: Uber system: Trip state machine how_they_use_it: Uber’s trip processing uses Sequential Convoy to ensure state transitions happen in order: REQUESTED → ACCEPTED → ARRIVED → STARTED → COMPLETED. Each trip has a unique session ID, and all events for that trip route to the same processor. This prevents race conditions where a driver could complete a trip before it’s marked as started. The pattern runs on Kafka with trip ID as the partition key. Uber processes millions of trips concurrently, but each individual trip’s state machine is strictly sequential. interesting_detail: Uber discovered that naive partition key selection (using trip ID directly) created hot partitions in high-density areas where many trips started simultaneously. They now hash the trip ID with a time-based salt to distribute load more evenly while maintaining per-trip ordering.

Interview Essentials

Mid-Level

Explain the core problem: how do you maintain order for related messages while processing unrelated messages in parallel? Describe session IDs and partition keys as the grouping mechanism. Walk through a simple example like order processing: validation → payment → fulfillment must happen in order per order, but Order A and Order B can run concurrently. Discuss the basic trade-off: you get per-session ordering but lose global throughput if sessions are unevenly distributed. Know that Azure Service Bus, Kafka, and AWS SQS FIFO queues all support this pattern with different mechanisms.

Senior

Dive into failure scenarios: what happens when a message fails mid-session? Explain retry-and-block vs. dead-letter-and-continue trade-offs with specific examples. Discuss partition key selection strategies and how poor choices create hot partitions. Compare broker-managed sessions (Azure Service Bus SessionId) vs. partition-based ordering (Kafka) vs. application-managed queues. Explain how to monitor session distribution and detect hot sessions that become bottlenecks. Discuss the relationship to other patterns: how Sequential Convoy differs from Competing Consumers (no ordering) and event sourcing (global ordering). Be ready to design a system that uses Sequential Convoy, including partition count calculations and consumer scaling strategy.

Staff+

Discuss architectural decisions around ordering guarantees: when is per-entity ordering sufficient vs. when do you need causal ordering across entities? Explain how to handle hot sessions at scale—sub-partitioning strategies, dynamic partition splitting, or accepting eventual consistency for non-critical operations. Discuss the CAP theorem implications: Sequential Convoy prioritizes consistency (order) over availability (throughput) within a session. Explain how to evolve a system from global ordering to session-based ordering without breaking existing consumers. Discuss cross-region Sequential Convoy: how do you maintain ordering when messages replicate across data centers? Explain the relationship between Sequential Convoy and distributed transactions—when do you need both (e.g., two-phase commit within an ordered workflow)? Be prepared to design a system that handles millions of sessions with highly skewed distribution (power law), including strategies for detecting and mitigating hot sessions in real-time.

Common Interview Questions

How does Sequential Convoy differ from Competing Consumers? (Answer: Competing Consumers has no ordering guarantees—any consumer can process any message. Sequential Convoy groups related messages and ensures they process in order within the group.)

What happens if a consumer crashes while holding a session lock? (Answer: The broker’s session timeout expires and another consumer can pick up the session. Messages may be reprocessed, so handlers must be idempotent.)

How do you choose partition count for a Kafka-based Sequential Convoy? (Answer: Partition count = max concurrent sessions you want to process. Too few partitions limit parallelism, too many waste resources. Start with 3-5x your consumer count and monitor session distribution.)

Can you have ordering across multiple sessions? (Answer: Not with Sequential Convoy alone. You’d need distributed transactions, sagas, or event sourcing with causal ordering.)

Red Flags to Avoid

Claiming Sequential Convoy provides global ordering across all messages (it only orders within sessions)

Not discussing failure handling—poison messages can block entire sessions indefinitely

Ignoring session distribution—assuming all sessions have equal message volume

Not mentioning partition keys or session IDs as the core grouping mechanism

Confusing Sequential Convoy with event sourcing or distributed transactions

Key Takeaways

Sequential Convoy solves the Goldilocks problem: strict ordering for related messages (same order, user, account) while maintaining parallelism for unrelated messages. It’s the middle ground between single-threaded global ordering and chaotic Competing Consumers.

The pattern relies on session IDs or partition keys to group related messages. All messages with the same session ID route to the same processing unit, which handles them sequentially. Different sessions process in parallel across multiple units.

The critical trade-off is throughput vs. ordering scope. Per-session ordering scales horizontally (add more sessions/partitions), but hot sessions become bottlenecks. Monitor session distribution and consider sub-partitioning or eventual consistency for power users.

Failure handling is the pattern’s Achilles’ heel. Retry-and-block preserves order but one poison message stops the entire session. Dead-letter-and-continue maintains throughput but breaks ordering. Choose based on domain criticality.

Implementation varies by message broker: Azure Service Bus uses explicit SessionId with broker-managed locks, Kafka uses partition keys with single-consumer-per-partition semantics, and basic brokers require application-level session management. Pick the approach that matches your infrastructure and complexity tolerance.