Event Sourcing Pattern: Store State as Events

After this topic, you will be able to:

Design event-sourced systems with immutable event logs
Implement event replay and state reconstruction mechanisms
Evaluate trade-offs between event sourcing and traditional state storage
Assess snapshot strategies for performance optimization

TL;DR

Event Sourcing stores every state change as an immutable event in an append-only log, rather than persisting only current state. The current state is reconstructed by replaying events from the beginning. This provides a complete audit trail, enables time travel debugging, and supports complex business logic that depends on historical context—at the cost of increased complexity and eventual consistency challenges.

Cheat Sheet: Store events, not state → Replay events to rebuild state → Use snapshots for performance → Natural audit trail → Pairs well with CQRS → Complexity trade-off for auditability and temporal queries.

The Problem It Solves

Traditional CRUD systems face several fundamental limitations. First, they lose information—when you update a user’s address, the old address vanishes unless you build explicit audit tables. Second, they make temporal queries difficult: “What was this account’s balance on March 15th?” requires complex point-in-time reconstruction logic. Third, they struggle with business processes that depend on why something changed, not just what changed. A refund processed due to fraud requires different handling than one processed due to customer dissatisfaction, but traditional state storage treats them identically.

Financial systems illustrate this pain acutely. When Stripe processes a payment, they need to know not just the current balance, but every transaction that contributed to it—for compliance, dispute resolution, and reconciliation. Building this as an afterthought with audit tables creates synchronization nightmares: the audit log can drift from the actual state, and you end up maintaining two sources of truth. The fundamental problem is that CRUD systems treat state as primary and history as secondary, when many domains need the opposite.

Solution Overview

Event Sourcing inverts the traditional model: events become the source of truth, and current state becomes a derived view. Every state change is captured as an immutable event in an append-only log called the event store. Events are facts about what happened—“OrderPlaced”, “PaymentProcessed”, “ItemShipped”—with timestamps and complete context.

To answer “what is the current state?”, the system replays events from the beginning, applying each one in sequence to reconstruct state. This sounds expensive, but snapshots (periodic state checkpoints) make it practical: replay from the most recent snapshot, not from the dawn of time. The event log becomes your database, and traditional state storage becomes a cache that can be rebuilt at any time.

This approach provides automatic audit trails (every event is logged), enables time travel (replay to any point), supports event-driven architectures (other systems subscribe to your events), and makes complex business logic explicit (the events are your business process). The trade-off is complexity: you’re building a database on top of a log, which requires careful design of event schemas, replay logic, and snapshot strategies.

Event Sourcing Architecture: Events as Source of Truth

graph LR
    Client["Client Application"]
    CommandHandler["Command Handler<br/><i>Validates & Processes</i>"]
    EventStore[("Event Store<br/><i>Append-Only Log</i>")]
    Projections["Read Models<br/><i>Projections</i>"]
    QueryService["Query Service<br/><i>Serves Reads</i>"]
    
    Client --"1. POST /order<br/>(Command)"--> CommandHandler
    CommandHandler --"2. Append Event<br/>OrderPlaced"--> EventStore
    EventStore --"3. Publish Event"--> Projections
    Projections --"4. Update View"--> QueryService
    Client --"5. GET /order/789"--> QueryService
    
    EventStore -."Replay Events<br/>to Rebuild State".-> CommandHandler

Event Sourcing inverts the traditional model: events in the append-only event store are the source of truth, while read models are derived views. Commands generate events, which are replayed to reconstruct state and projected to optimized read models.

How It Works

Step 1: Capture Commands as Events

When a user action occurs, the system validates it and generates one or more events. For example, when a customer places an order, the system doesn’t directly update an “orders” table. Instead, it appends an “OrderPlaced” event to the event store:

{
  "eventId": "evt_1a2b3c",
  "eventType": "OrderPlaced",
  "aggregateId": "order_789",
  "timestamp": "2024-01-15T10:30:00Z",
  "data": {
    "customerId": "cust_456",
    "items": [{"sku": "WIDGET-1", "quantity": 2}],
    "totalAmount": 49.98
  },
  "version": 1
}

Events are immutable and append-only. You never update or delete an event; if something was wrong, you append a compensating event like “OrderCancelled”.

Step 2: Store Events in Sequence

The event store persists events in order, typically partitioned by aggregate ID (the entity the events describe). Each aggregate has its own event stream. For “order_789”, the stream might contain: OrderPlaced → PaymentAuthorized → ItemsReserved → OrderShipped. The sequence number (version) ensures ordering and detects concurrency conflicts.

Step 3: Reconstruct State by Replaying Events

To answer “what’s the current state of order_789?”, the system loads all events for that aggregate and applies them sequentially:

def rebuild_order(order_id):
    events = event_store.get_events(order_id)
    order = Order()  # Empty state
    for event in events:
        order.apply(event)  # Apply each event
    return order

The apply method contains the business logic: when applying “OrderPlaced”, set status to “pending” and populate items; when applying “OrderShipped”, set status to “shipped” and record tracking number. This is called event folding or reduction.

Step 4: Optimize with Snapshots

Replaying thousands of events for every read is impractical. Snapshots solve this: periodically save the current state after event N, then replay only events after N. For example, snapshot order_789 after event 100, then replay only events 101-150 to get current state. Snapshots are purely a performance optimization—you can always rebuild them from events.

Step 5: Project Events to Read Models

For complex queries (“show all shipped orders for customer X”), replaying events is too slow. Instead, maintain read models (projections) that subscribe to events and build optimized views. This is where Event Sourcing naturally pairs with CQRS (see CQRS): the event store handles writes, projections handle reads. When “OrderShipped” occurs, update a “shipped_orders” table for fast querying.

State Reconstruction with Snapshots

graph TB
    subgraph Event Stream for order_789
        E1["Event 1<br/>OrderPlaced"]
        E2["Event 2<br/>PaymentAuthorized"]
        E3["Event 3<br/>ItemsReserved"]
        Snap["📸 Snapshot<br/>State at Event 100<br/><i>Status: Processing</i>"]
        E101["Event 101<br/>OrderShipped"]
        E102["Event 102<br/>TrackingAdded"]
        E150["Event 150<br/>DeliveryConfirmed"]
    end
    
    E1 --> E2 --> E3 -."...".-> Snap
    Snap --> E101 --> E102 -."...".-> E150
    
    ReplayEngine["Replay Engine"]
    CurrentState["Current State<br/><i>Status: Delivered</i><br/><i>Tracking: TRK123</i>"]
    
    Snap --"1. Load Snapshot<br/>(Fast Start)"--> ReplayEngine
    E101 & E102 & E150 --"2. Replay Events<br/>101-150"--> ReplayEngine
    ReplayEngine --"3. Apply Events<br/>Sequentially"--> CurrentState
    
    Note["💡 Without snapshot:<br/>Replay all 150 events<br/><br/>With snapshot:<br/>Replay only 50 events"]

Snapshots optimize state reconstruction by checkpointing state at event N, then replaying only events after N. This reduces replay time from O(all events) to O(events since snapshot), making Event Sourcing practical for long-lived aggregates.

Event Sourcing Write Flow with CQRS Projections

sequenceDiagram
    participant Client
    participant API as API Gateway
    participant Command as Command Handler
    participant EventStore as Event Store
    participant Projection1 as Projection: Orders View
    participant Projection2 as Projection: Analytics
    participant ReadDB as Read Database
    
    Client->>API: 1. POST /orders<br/>{customerId, items}
    API->>Command: 2. PlaceOrder command
    Command->>Command: 3. Validate business rules
    Command->>EventStore: 4. Append OrderPlaced event<br/>{orderId, customerId, items, v1}
    EventStore-->>Command: 5. Event persisted (v1)
    Command-->>API: 6. 201 Created {orderId}
    API-->>Client: 7. Order created
    
    EventStore->>Projection1: 8. Publish OrderPlaced event
    EventStore->>Projection2: 8. Publish OrderPlaced event
    
    Projection1->>ReadDB: 9. INSERT into orders_view
    Projection2->>ReadDB: 9. UPDATE analytics_summary
    
    Note over EventStore,ReadDB: Eventual consistency: projections lag behind events
    
    Client->>API: 10. GET /orders/{orderId}
    API->>ReadDB: 11. SELECT from orders_view
    ReadDB-->>API: 12. Order data
    API-->>Client: 13. 200 OK {order}

The write flow shows command validation, event persistence, and asynchronous projection updates. Events are the source of truth (step 4), while projections build eventually-consistent read models (steps 8-9). Reads query projections, not the event store.

Event Versioning

Event schemas evolve as business requirements change, but old events in the log never change. This creates a versioning challenge: how do you replay events with schema version 1 alongside events with schema version 3?

Upcasting Strategy: Transform old events to the latest schema during replay. When loading “OrderPlaced_v1”, convert it to “OrderPlaced_v3” format before applying. This keeps business logic simple (only handles current schema) but requires maintaining transformation code for every historical version. Uber uses this approach for their trip events, maintaining upcast functions for 5+ years of schema evolution.

Multi-Version Handlers: The apply method handles multiple event versions explicitly: apply(OrderPlaced_v1), apply(OrderPlaced_v2), etc. This avoids transformation overhead but duplicates business logic across versions. Use this when versions differ significantly in semantics, not just structure.

Weak Schema: Store events as flexible JSON with optional fields. New fields are added without version bumps; old events simply lack those fields. Business logic handles missing fields gracefully with defaults. This works for additive changes but breaks down for structural changes (renaming fields, changing types).

Copy-and-Transform: When schema changes are breaking, append transformed events to the stream. For example, if “OrderPlaced” splits into “OrderCreated” + “PaymentInitiated”, append both new events and mark the old event as deprecated. Future replays use the new events. This effectively rewrites history but maintains the original events for audit purposes.

The key principle: never modify existing events. All versioning strategies work around this immutability constraint. Most production systems use upcasting for minor changes and copy-and-transform for major refactorings.

Event Schema Evolution: Upcasting Strategy

graph TB
    subgraph Event Store
        V1["OrderPlaced_v1<br/>{orderId, items, total}"]
        V2["OrderPlaced_v2<br/>{orderId, customerId,<br/>items, total, currency}"]
        V3["OrderPlaced_v3<br/>{orderId, customerId,<br/>items, pricing: {total,<br/>currency, tax}}"]
    end
    
    subgraph Replay Engine
        Upcast1["Upcast v1→v3<br/>Add customerId='unknown'<br/>Add currency='USD'<br/>Restructure pricing"]
        Upcast2["Upcast v2→v3<br/>Restructure pricing<br/>Add tax=0"]
        NoUpcast["No Upcast<br/>Already v3"]
    end
    
    subgraph Business Logic
        Handler["OrderPlaced_v3<br/>Handler<br/><i>Single version logic</i>"]
    end
    
    V1 --"Transform"--> Upcast1
    V2 --"Transform"--> Upcast2
    V3 --"Pass through"--> NoUpcast
    
    Upcast1 & Upcast2 & NoUpcast --"Normalized to v3"--> Handler
    
    Handler --> State["Current State<br/><i>All events processed<br/>as v3 format</i>"]
    
    Note["💡 Upcasting Pros:<br/>• Simple business logic<br/>• Single schema version<br/><br/>Cons:<br/>• Maintain transformations<br/>• Replay overhead"]

Upcasting transforms old event schemas to the latest version during replay, keeping business logic simple. Each schema version requires a transformation function, but the apply logic only handles the current schema (v3).

Variants

Event Sourcing with CQRS: The most common variant pairs Event Sourcing (write side) with CQRS read models. Events flow from the event store to multiple projections optimized for different queries. LinkedIn uses this for their member profile system: events capture all profile changes, while read models serve different views (public profile, recruiter view, connection suggestions). This variant handles read-heavy workloads but adds complexity of maintaining projections.

Event Sourcing without CQRS: Use Event Sourcing for the write side but rebuild state synchronously for reads. Simpler to implement but limits read performance. Works well for aggregates with small event streams (< 100 events) where replay is cheap. A shopping cart service might use this: cart events are few, and rebuilding cart state on every read is acceptable.

Hybrid Event Sourcing: Store both events and current state, using events for audit/replay but current state for reads. This sacrifices Event Sourcing’s purity (two sources of truth) but gains read performance without CQRS complexity. The event log becomes a backup/audit trail rather than the primary database. Many financial systems start here before evolving to full Event Sourcing.

Event Sourcing with Temporal Queries: Extend the event store with time-travel capabilities: query state at any historical point. “What was account balance on March 15th?” replays events up to that timestamp. Requires indexing events by time and careful handling of out-of-order events. Stripe’s ledger system supports this for financial reconciliation and dispute resolution.

Event Sourcing Variants: CQRS vs Hybrid Approaches

graph TB
    subgraph Variant 1: Event Sourcing + CQRS
        ES1["Event Store<br/><i>Source of Truth</i>"]
        P1["Projection 1<br/><i>Orders View</i>"]
        P2["Projection 2<br/><i>Analytics</i>"]
        P3["Projection 3<br/><i>Search Index</i>"]
        
        ES1 --> P1 & P2 & P3
    end
    
    subgraph Variant 2: Event Sourcing without CQRS
        ES2["Event Store<br/><i>Source of Truth</i>"]
        Rebuild["Rebuild State<br/>on Every Read<br/><i>Replay events</i>"]
        
        ES2 -."Synchronous<br/>Replay".-> Rebuild
    end
    
    subgraph Variant 3: Hybrid Event Sourcing
        ES3["Event Store<br/><i>Audit Trail</i>"]
        StateDB[("State Database<br/><i>Primary for Reads</i>")]
        Sync["Sync Process<br/><i>Keep in sync</i>"]
        
        ES3 <-."Dual Write".-> StateDB
        ES3 -."Backup/Audit".-> Sync
        Sync -."Reconcile".-> StateDB
    end
    
    Use1["✓ Read-heavy workloads<br/>✓ Multiple views needed<br/>✗ Complex to maintain"]
    Use2["✓ Simple implementation<br/>✓ Small event streams<br/>✗ Poor read performance"]
    Use3["✓ Fast reads<br/>✓ Audit capability<br/>✗ Two sources of truth"]
    
    P1 & P2 & P3 -.-> Use1
    Rebuild -.-> Use2
    StateDB -.-> Use3

Three common Event Sourcing variants trade off complexity, performance, and purity. CQRS provides scalable reads but requires projection management. Without CQRS is simpler but slower. Hybrid maintains both events and state, sacrificing single source of truth for read performance.

Trade-offs

Auditability vs. Complexity: Event Sourcing provides automatic, tamper-proof audit trails—every state change is logged with full context. Traditional systems require explicit audit tables that can drift from actual state. However, Event Sourcing adds significant complexity: you’re building a database on top of a log, managing event schemas, replay logic, and projections. Choose Event Sourcing when audit requirements justify the complexity (financial systems, healthcare, compliance-heavy domains).

Temporal Queries vs. Storage Costs: Event Sourcing enables time travel: replay to any historical point, answer “what if” questions, debug production issues by replaying events. Traditional systems struggle with temporal queries, requiring complex point-in-time reconstruction. The cost is storage: every event is kept forever (or until explicit retention policies delete them). A high-volume system might generate terabytes of events annually. Use Event Sourcing when historical analysis is critical, not just nice-to-have.

Event-Driven Architecture vs. Eventual Consistency: Events naturally enable event-driven systems: other services subscribe to your events and react. This decouples services and enables real-time data pipelines. However, projections are eventually consistent—there’s a lag between event occurrence and projection update. Traditional CRUD systems offer immediate consistency. Choose Event Sourcing when you can tolerate eventual consistency and benefit from event-driven integration.

Flexibility vs. Performance: Event Sourcing makes schema changes easier: add new projections without migrating data, replay events with new business logic, support multiple views of the same data. Traditional systems require schema migrations and data backfills. However, Event Sourcing has higher read latency (replay or projection lag) and write amplification (one command generates multiple events). Use Event Sourcing when flexibility and evolvability outweigh raw performance needs.

When to Use (and When Not To)

Use Event Sourcing when:

Audit requirements are non-negotiable: Financial systems, healthcare records, legal documents where you must prove what happened and when. The event log is your audit trail.
Business logic depends on history: Fraud detection (analyze transaction patterns), pricing (loyalty discounts based on purchase history), workflows (approval chains where context matters).
Temporal queries are valuable: “What was the state on date X?”, “How did we get to this state?”, “Replay this scenario with different logic”. Debugging production issues by replaying events is incredibly powerful.
Multiple views of the same data: Different teams need different projections (analytics, operations, customer service). Event Sourcing lets you build multiple read models from the same events.
Event-driven integration: Other systems need to react to your state changes in real-time. Publishing events is natural with Event Sourcing.

Avoid Event Sourcing when:

Simple CRUD dominates: A basic user profile service with no audit requirements doesn’t benefit from Event Sourcing’s complexity. Use traditional CRUD.
Strong consistency is required: If reads must reflect writes immediately, Event Sourcing’s eventual consistency is problematic. Use traditional transactions.
Event streams are unbounded: If aggregates accumulate thousands of events with no natural lifecycle (deletion, archival), replay becomes impractical even with snapshots.
Team lacks experience: Event Sourcing requires discipline around event schema design, versioning, and projection management. A team new to these concepts will struggle.

Red flag: Using Event Sourcing because it’s “cool” or “modern” without clear audit, temporal, or event-driven requirements. The complexity tax is real.

Real-World Examples

company: Uber system: Trip State Management implementation: Uber uses Event Sourcing for trip lifecycle management. Every trip generates events: TripRequested, DriverAssigned, TripStarted, TripCompleted, PaymentProcessed. These events flow through Kafka and are stored in an event store. Multiple projections consume these events: one for real-time driver/rider apps, one for billing, one for analytics, one for fraud detection. When a payment dispute occurs, Uber replays trip events to reconstruct exactly what happened—which driver, what route, what charges. The event log is the source of truth for financial reconciliation. interesting_detail: Uber maintains 5+ years of trip events, totaling petabytes of data. They use aggressive snapshotting (every 10 events) and tiered storage (recent events on SSD, old events on S3) to manage costs while preserving the ability to replay any historical trip.

company: Stripe system: Ledger System implementation: Stripe’s financial ledger uses Event Sourcing to track every monetary movement. Events like ChargeCreated, RefundIssued, TransferCompleted are immutable facts. Account balances are derived by replaying events, not stored directly. This enables Stripe to answer temporal queries (“What was the balance at end of day on March 15th?”) for financial reporting and compliance. When customers dispute charges, Stripe replays events to generate a complete transaction history for the dispute resolution process. interesting_detail: Stripe’s event store supports “as-of” queries natively: query the state at any historical timestamp without manual replay. This is implemented by indexing events by both aggregate ID and timestamp, enabling binary search to find the snapshot + events needed for any point in time.

company: LinkedIn system: Member Profile System implementation: LinkedIn uses Event Sourcing for member profiles, capturing events like ProfileCreated, ExperienceAdded, SkillEndorsed, ConnectionAccepted. These events feed multiple projections: the public profile view, the recruiter search index, the “People You May Know” recommendation engine, and analytics pipelines. Each projection consumes events and builds an optimized read model for its use case. This decouples the write side (profile updates) from the read side (various views), allowing each to scale independently. interesting_detail: LinkedIn’s event store uses Kafka for event transport and Espresso (their distributed database) for event persistence. They maintain projections in multiple databases (Voldemort for key-value, Galene for search) depending on query patterns, all fed from the same event stream.

Interview Essentials

Mid-Level

Explain the difference between storing state vs. storing events. Walk through how you’d reconstruct current state from an event log.

Describe how snapshots work and why they’re necessary. What’s the trade-off between snapshot frequency and replay performance?

How would you handle a bug in event application logic? Can you fix past events, or do you need compensating events?

What happens if two clients try to modify the same aggregate concurrently? How do you detect and resolve conflicts?

Senior

Design an event store for a payment processing system. What are your partitioning, indexing, and retention strategies?

How do you handle event schema evolution? Walk through a scenario where an event structure needs to change after 2 years of production data.

Compare Event Sourcing + CQRS vs. traditional CRUD + audit tables. When would you choose each approach?

How would you implement temporal queries (“show me state at timestamp T”)? What indexes and optimizations are needed?

What are the failure modes of Event Sourcing? How do you handle event store corruption, projection lag, or replay failures?

Staff+

You’re migrating a legacy CRUD system to Event Sourcing. Design the migration strategy, including how you’d handle existing data and dual-write periods.

How do you ensure event ordering guarantees in a distributed system? What are the trade-offs between strict ordering and scalability?

Design a multi-tenant event store that supports data isolation, per-tenant retention policies, and cross-tenant analytics. What are the security and performance considerations?

How would you implement event sourcing across multiple bounded contexts in a microservices architecture? How do events cross service boundaries?

Evaluate the cost model of Event Sourcing at scale. How do storage costs, replay costs, and projection maintenance costs grow with event volume?

Common Interview Questions

Q: Isn’t Event Sourcing just an audit log? A: No. An audit log is a secondary concern, often incomplete and inconsistent with actual state. Event Sourcing makes events the primary source of truth—state is derived from events, not the other way around. This fundamental inversion enables capabilities beyond auditing: temporal queries, event-driven integration, and flexible projections.

Q: How do you delete data for GDPR compliance if events are immutable? A: Three approaches: (1) Encrypt events with per-user keys, then delete the key (crypto-shredding). (2) Append a “PersonalDataDeleted” event and filter it during replay. (3) Rewrite the event stream, removing personal data (breaks immutability but may be legally required). Most systems use crypto-shredding for GDPR.

Q: What if the event stream grows too large to replay? A: Use aggressive snapshotting (snapshot every N events, not just periodically) and archival (move old events to cold storage). For aggregates with unbounded growth, consider lifecycle boundaries—close the aggregate and start a new one. For example, close a bank account’s event stream annually and start a new stream for the next year.

Q: How do you handle out-of-order events? A: Depends on your ordering guarantees. If events have causal dependencies, use version numbers or vector clocks to detect out-of-order delivery and buffer until ordering is restored. If events are independent, apply them as they arrive. Most systems partition events by aggregate ID to ensure ordering within an aggregate.

Red Flags to Avoid

Claiming Event Sourcing is “always better” than CRUD without acknowledging the complexity trade-off. It’s a tool for specific problems, not a universal solution.

Not understanding the difference between Event Sourcing and event-driven architecture. Event Sourcing is about storage; event-driven is about communication. They complement each other but are distinct.

Ignoring event schema versioning. “We’ll just never change event schemas” is naive—business requirements evolve, and you need a versioning strategy from day one.

Treating projections as an afterthought. In Event Sourcing + CQRS, projections are critical infrastructure that require monitoring, error handling, and replay capabilities.

Not considering the operational burden. Event Sourcing requires tooling for event store management, projection monitoring, replay capabilities, and debugging. Underestimating this leads to operational nightmares.

Key Takeaways

Event Sourcing stores events, not state: Every state change is captured as an immutable event in an append-only log. Current state is derived by replaying events, making the event log the source of truth.

Snapshots are essential for performance: Replaying thousands of events is impractical. Snapshot current state periodically, then replay only events since the last snapshot. Snapshots are a cache that can be rebuilt from events.

Event schema versioning is critical: Events are immutable, but schemas evolve. Use upcasting (transform old events to new schema during replay) or multi-version handlers. Plan for versioning from day one.

Event Sourcing naturally pairs with CQRS: Use Event Sourcing for writes (event store) and CQRS for reads (projections). This decouples write and read concerns, enabling independent scaling and optimization.

Trade complexity for auditability and flexibility: Event Sourcing provides automatic audit trails, temporal queries, and event-driven integration. The cost is increased complexity in event schema design, replay logic, and projection management. Use it when these benefits justify the complexity, not as a default choice.