Deployment Stamps Pattern: Multi-Region Scale-Out

TL;DR

Deployment stamps (also called scale units or cells) are independent, self-contained copies of your entire application stack that serve a subset of users or tenants. Instead of scaling a single monolithic deployment, you deploy multiple identical stamps and route traffic across them. This pattern enables near-linear horizontal scaling, improves fault isolation, simplifies regional expansion, and provides natural tenant data separation—making it a cornerstone pattern for global SaaS platforms.

Cheat Sheet: Deploy multiple identical, independent copies of your full stack → Route users/tenants to specific stamps → Scale by adding stamps, not just instances → Each stamp failure affects only its users → Natural data residency boundaries.

The Analogy

Think of deployment stamps like franchise restaurant locations. Instead of building one massive restaurant that serves an entire city (vertical scaling), you open multiple identical franchise locations across different neighborhoods (horizontal scaling via stamps). Each location has its own kitchen, staff, and inventory (independent resources), serves customers in its area (tenant assignment), and operates independently—if one location has a kitchen fire, the others keep serving customers. When demand grows, you don’t make existing restaurants bigger; you open new franchises. This is exactly how Spotify scales: each “stamp” serves millions of users, and they add new stamps as they grow globally.

Why This Matters in Interviews

Deployment stamps come up in interviews when discussing multi-tenant SaaS architecture, global scaling strategies, or blast radius reduction. Interviewers want to see that you understand the difference between scaling within a deployment (adding instances) versus scaling the deployment itself (adding stamps). Strong candidates explain the tenant routing layer, discuss stamp sizing decisions with actual numbers, and compare stamps to alternatives like sharding or multi-region active-active. This pattern signals experience with production systems at scale—companies like Stripe, GitHub, and Azure extensively use stamps, so mentioning real-world examples demonstrates practical knowledge beyond textbook theory.

Core Concept

Deployment stamps are a reliability and scalability pattern where you deploy multiple independent, identical copies of your entire application infrastructure, with each copy serving a bounded subset of your total user base or tenant population. Unlike traditional horizontal scaling where you add more instances of individual services within a single deployment, stamps replicate the entire stack—load balancers, application servers, databases, caches, message queues, everything. Each stamp operates as a self-contained unit with its own resources, data stores, and failure domain.

The pattern emerged from the limitations of scaling monolithic deployments. When you have a single large deployment serving all users, you face coordination overhead, blast radius concerns (one bad deployment affects everyone), and practical limits on how large a single database or cache cluster can grow. Stamps solve this by partitioning your user base across multiple independent deployments. If you have 10 million users and each stamp can handle 1 million users, you deploy 10 stamps. When you reach 11 million users, you deploy an 11th stamp. This approach provides near-linear scalability because each stamp’s complexity remains constant regardless of total system size.

The key architectural component is the routing layer that sits in front of all stamps and directs each request to the correct stamp based on user identity, tenant ID, geographic location, or other partitioning keys. This routing layer is typically stateless and highly available, using a lookup service or consistent hashing to determine stamp assignments. Once a user or tenant is assigned to a stamp, all their requests go to that stamp, ensuring data locality and simplifying the architecture within each stamp.

Deployment Stamps Architecture Overview (C4 Context)

graph TB
    User["User/Tenant<br/><i>Web/Mobile Client</i>"]
    Router["Global Routing Layer<br/><i>Stateless Proxy</i>"]
    Mapping[("Tenant Mapping DB<br/><i>DynamoDB/Spanner</i>")]
    
    subgraph Stamp 1
        LB1["Load Balancer"]
        App1["App Servers"]
        DB1[("Database<br/><i>PostgreSQL</i>")]
        Cache1[("Cache<br/><i>Redis</i>")]
    end
    
    subgraph Stamp 2
        LB2["Load Balancer"]
        App2["App Servers"]
        DB2[("Database<br/><i>PostgreSQL</i>")]
        Cache2[("Cache<br/><i>Redis</i>")]
    end
    
    subgraph Stamp N
        LBN["Load Balancer"]
        AppN["App Servers"]
        DBN[("Database<br/><i>PostgreSQL</i>")]
        CacheN[("Cache<br/><i>Redis</i>")]
    end
    
    User --"1. API Request"--> Router
    Router --"2. Lookup Tenant"--> Mapping
    Mapping --"3. Return Stamp ID"--> Router
    Router --"4. Route to Stamp"--> LB1
    Router -."Route to Stamp".-> LB2
    Router -."Route to Stamp".-> LBN
    
    LB1 --> App1
    App1 --> DB1
    App1 --> Cache1
    
    LB2 --> App2
    App2 --> DB2
    App2 --> Cache2
    
    LBN --> AppN
    AppN --> DBN
    AppN --> CacheN

Each stamp is a complete, independent copy of the application stack with its own databases and caches. The global routing layer uses a tenant mapping database to direct requests to the correct stamp, ensuring no runtime dependencies between stamps.

How It Works

Step 1: Define the Stamp Blueprint. Create an infrastructure-as-code template that defines your complete application stack. This includes all services (web servers, API servers, background workers), data stores (databases, caches, object storage), networking components (load balancers, VPCs, subnets), and monitoring infrastructure. The blueprint should be parameterized so you can deploy identical stamps with different identifiers (stamp-1, stamp-2, etc.) and in different regions. At Stripe, each stamp includes PostgreSQL clusters, Redis instances, application servers, and dedicated monitoring stacks—everything needed to process payments independently.

Step 2: Implement the Routing Layer. Build a global routing service that maps users or tenants to specific stamps. This service maintains a mapping table (often in a globally replicated database like DynamoDB or Spanner) that records which stamp serves which tenant. When a request arrives, the router looks up the tenant ID, determines the assigned stamp, and forwards the request. The router must be highly available and low-latency since it’s on the critical path for every request. GitHub’s routing layer uses consistent hashing with virtual nodes to distribute repositories across stamps while maintaining the ability to rebalance when adding new stamps.

Step 3: Deploy Initial Stamps. Provision your first set of stamps across your target regions. Each stamp should have sufficient capacity to handle its assigned load with headroom for growth and traffic spikes. For a SaaS platform launching in North America and Europe, you might deploy two stamps in us-east-1, two in us-west-2, and two in eu-west-1. Each stamp is completely independent—separate databases, separate deployments, separate monitoring dashboards. This initial deployment establishes your baseline capacity.

Step 4: Assign Tenants to Stamps. Implement a tenant assignment strategy that distributes load evenly across stamps while respecting constraints like data residency requirements. New tenants are typically assigned to the stamp with the most available capacity in their required region. Some systems use consistent hashing for automatic distribution, while others use explicit assignment with a control plane service. The assignment is recorded in the routing layer’s mapping table and becomes the source of truth for all future requests from that tenant.

Step 5: Monitor Stamp Health and Capacity. Each stamp reports metrics on resource utilization (CPU, memory, database connections, queue depth) and business metrics (requests per second, active users, data volume). A central monitoring system aggregates these metrics to identify stamps approaching capacity limits or experiencing degraded performance. When a stamp reaches 70-80% of its capacity threshold, that’s the signal to deploy a new stamp and start assigning new tenants to it. Existing tenants typically remain on their current stamp unless you implement tenant migration capabilities.

Step 6: Scale by Adding Stamps. When capacity needs increase, deploy additional stamps using your infrastructure-as-code blueprint. The new stamp goes through a deployment and validation process, then gets registered with the routing layer as available for new tenant assignments. This scaling operation is independent of existing stamps—you’re not modifying running infrastructure, just adding new capacity. This makes scaling safer and more predictable than in-place scaling operations that risk affecting existing users.

Step 7: Handle Stamp Failures Gracefully. When a stamp experiences issues (deployment failure, infrastructure outage, cascading failure), the blast radius is limited to tenants assigned to that stamp. The routing layer can detect unhealthy stamps through health checks and either retry requests to the same stamp (for transient issues) or redirect traffic to a backup stamp if you’ve implemented multi-stamp redundancy for critical tenants. The key is that other stamps continue operating normally, maintaining availability for the majority of users.

Request Flow Through Stamp Architecture

sequenceDiagram
    participant User
    participant Router as Global Router
    participant Mapping as Mapping DB
    participant Stamp as Stamp-2
    participant DB as Stamp-2 DB
    
    User->>Router: 1. POST /api/orders<br/>tenant_id: acme-corp
    Router->>Mapping: 2. SELECT stamp_id<br/>WHERE tenant='acme-corp'
    Mapping-->>Router: 3. stamp_id: stamp-2
    Router->>Stamp: 4. Forward request to<br/>stamp-2.example.com
    Stamp->>DB: 5. INSERT INTO orders
    DB-->>Stamp: 6. Success
    Stamp-->>Router: 7. 201 Created
    Router-->>User: 8. 201 Created
    
    Note over User,DB: All subsequent requests from<br/>acme-corp go to stamp-2
    
    User->>Router: 9. GET /api/orders<br/>tenant_id: acme-corp
    Router->>Mapping: 10. Lookup (cached)
    Mapping-->>Router: stamp-2
    Router->>Stamp: 11. Forward to stamp-2
    Stamp->>DB: 12. SELECT * FROM orders
    DB-->>Stamp: 13. Return data
    Stamp-->>Router: 14. 200 OK + data
    Router-->>User: 15. 200 OK + data

The routing layer performs a lightweight lookup to determine stamp assignment, then forwards requests to the appropriate stamp. Once assigned, all tenant requests consistently route to the same stamp, ensuring data locality and eliminating cross-stamp queries.

Key Principles

Principle 1: Complete Independence. Each stamp must be fully self-contained with no shared runtime dependencies on other stamps. This means separate databases, separate caches, separate message queues—no shared infrastructure except the routing layer. Violating this principle creates hidden coupling that defeats the blast radius isolation benefits. For example, if all stamps share a single Redis cluster for session storage, that Redis cluster becomes a single point of failure affecting all stamps. Netflix learned this lesson early: their original architecture had shared dependencies that caused cascading failures, leading them to adopt fully independent stamps (they call them “cells”) where each cell can fail without impacting others. The only acceptable shared components are global routing infrastructure and control plane services that manage stamp lifecycle, and even these should be designed for high availability.

Principle 2: Bounded Capacity Per Stamp. Define explicit capacity limits for each stamp based on resource constraints and performance requirements. A stamp might support 1 million users, 10,000 tenants, 50,000 requests per second, or 5TB of data—whatever metric makes sense for your system. This bound serves multiple purposes: it ensures predictable performance (you’ve tested and validated the system at this scale), simplifies capacity planning (adding a stamp adds a known amount of capacity), and prevents runaway growth in individual stamps. Shopify sizes their stamps based on the number of stores and transaction volume, with hard limits that trigger automatic stamp provisioning when approached. Without bounded capacity, you end up with unevenly sized stamps that are difficult to manage and reason about.

Principle 3: Identical Stamp Configuration. All stamps should be deployed from the same infrastructure-as-code template with minimal variation. Configuration differences should be limited to environment-specific values like region, network CIDR blocks, and stamp identifiers. Avoid the temptation to customize individual stamps for specific tenants or use cases—this creates operational complexity and makes it difficult to reason about system behavior. When Stripe deploys a new stamp, it’s byte-for-byte identical to existing stamps except for region-specific parameters. This uniformity means that testing one stamp validates all stamps, deployment procedures are consistent, and engineers can work on any stamp without learning stamp-specific quirks. If you need different configurations for different tenant tiers, deploy separate stamp types (e.g., standard-stamps and premium-stamps) rather than customizing individual stamps.

Principle 4: Stateless Routing Layer. The routing infrastructure that directs traffic to stamps must be stateless, highly available, and horizontally scalable. It should not maintain session state or perform complex business logic—its job is purely to look up the stamp assignment and forward the request. This simplicity is critical because the router is on the critical path for every request and becomes a potential bottleneck or single point of failure if not designed carefully. GitHub’s routing layer uses GeoDNS for initial region selection, then a stateless proxy tier that performs tenant-to-stamp lookup using a cached mapping table. The lookup data is replicated globally with eventual consistency, and the routing logic can run on thousands of instances behind a load balancer. If routing requires complex logic or state, you’ve probably designed it wrong.

Principle 5: Graceful Degradation and Blast Radius Containment. Design stamps to fail independently without cascading to other stamps or the routing layer. When a stamp fails, only its assigned tenants are affected—the system as a whole continues operating at reduced capacity. This requires careful attention to failure modes: a stamp shouldn’t make requests to other stamps, shouldn’t depend on shared services that could fail, and shouldn’t generate traffic patterns that overwhelm the routing layer when failing. Azure’s stamp architecture includes circuit breakers and bulkheads within each stamp to prevent cascading failures, and the routing layer monitors stamp health to stop sending traffic to failed stamps. The goal is that a stamp failure looks like a partial outage (“10% of tenants affected”) rather than a total outage, and recovery involves fixing or replacing the failed stamp without touching healthy stamps.

Deep Dive

Types / Variants

Geographic Stamps deploy stamps in different geographic regions to serve users with low latency and comply with data residency requirements. Each region has one or more stamps, and the routing layer directs users to stamps in their nearest region or required jurisdiction. This variant is common for global SaaS platforms where European customers must have their data stored in EU data centers. Spotify uses geographic stamps extensively: they have stamps in North America, Europe, Asia, and other regions, with users assigned to stamps in their region. The routing layer uses GeoDNS to direct users to the appropriate region, then load balances across stamps within that region. The main advantage is compliance with data sovereignty laws and reduced latency for users. The downside is increased operational complexity—you’re managing infrastructure in multiple regions, dealing with region-specific cloud provider quirks, and potentially handling cross-region traffic for features like user migration. Use geographic stamps when you have a global user base with data residency requirements or when latency to distant regions significantly impacts user experience.

Tenant-Tier Stamps create different stamp types for different customer tiers (free, standard, premium, enterprise). Each tier gets stamps with different resource allocations, SLAs, and feature sets. This allows you to provide differentiated service levels and isolate noisy neighbors—free tier users can’t impact enterprise customers because they’re on completely separate infrastructure. Salesforce uses this approach with different stamp types for different customer editions. Premium stamps might have larger database instances, more aggressive caching, dedicated support, and stricter rate limits, while standard stamps use more cost-effective configurations. The advantage is clear service differentiation and the ability to optimize costs by matching infrastructure to willingness to pay. The disadvantage is operational complexity—you’re maintaining multiple stamp types with different configurations, and tenant migration between tiers requires moving them between stamp types. Use tenant-tier stamps when you have clear service tiers with different SLAs and want to prevent cross-tier impact.

Functional Stamps partition the system by functionality rather than by tenant. For example, you might have stamps dedicated to API requests, separate stamps for batch processing, and separate stamps for analytics workloads. Each functional stamp is optimized for its workload type—API stamps prioritize low latency, batch stamps prioritize throughput, analytics stamps have large data stores. This variant is less common but useful for systems with very different workload characteristics. Twitter (now X) uses functional separation where real-time tweet delivery runs on different infrastructure than batch analytics jobs. The advantage is workload isolation and the ability to optimize each stamp type for its specific needs. The disadvantage is increased complexity in the routing layer (requests must be routed based on function, not just tenant) and potential data synchronization challenges if different functional stamps need access to the same data. Use functional stamps when workload characteristics are so different that shared infrastructure leads to resource contention or when you want to scale different functions independently.

Hybrid Stamps combine multiple partitioning strategies. For example, you might have geographic stamps (US-East, US-West, EU) with tenant-tier separation within each region (standard and premium stamps per region). This provides the benefits of multiple approaches but significantly increases operational complexity. GitHub uses a hybrid approach with geographic distribution and repository-size-based partitioning—very large repositories (Linux kernel, Chromium) get dedicated stamps to prevent them from impacting smaller repositories. The advantage is fine-grained control over resource allocation and isolation. The disadvantage is complexity: you’re managing a matrix of stamp types, the routing layer needs to consider multiple dimensions, and capacity planning becomes more sophisticated. Use hybrid stamps only when you have clear requirements that justify the complexity, such as global operations with strict data residency requirements AND significant customer tier differentiation.

Ephemeral Stamps are short-lived stamps created for specific purposes like testing, staging, or temporary capacity. Rather than maintaining permanent staging environments, you deploy ephemeral stamps on-demand, run tests or handle traffic spikes, then tear them down. This variant is particularly useful for preview environments or handling predictable traffic spikes. Stripe deploys ephemeral stamps for large-scale load testing—they spin up a complete stamp, run tests, collect metrics, and destroy it. The advantage is cost efficiency (you only pay for resources when needed) and the ability to test at production scale without maintaining expensive permanent environments. The disadvantage is deployment time—if it takes 30 minutes to deploy a stamp, ephemeral stamps aren’t useful for rapid response to unexpected traffic. Use ephemeral stamps for testing, staging, and predictable capacity needs where you can plan ahead.

Geographic Stamps Deployment Pattern

graph TB
    subgraph Region: US-East
        subgraph Stamp US-E-1
            LB_USE1["Load Balancer"]
            App_USE1["App Servers"]
            DB_USE1[("Primary DB")]
        end
        subgraph Stamp US-E-2
            LB_USE2["Load Balancer"]
            App_USE2["App Servers"]
            DB_USE2[("Primary DB")]
        end
    end
    
    subgraph Region: EU-West
        subgraph Stamp EU-W-1
            LB_EUW1["Load Balancer"]
            App_EUW1["App Servers"]
            DB_EUW1[("Primary DB")]
        end
        subgraph Stamp EU-W-2
            LB_EUW2["Load Balancer"]
            App_EUW2["App Servers"]
            DB_EUW2[("Primary DB")]
        end
    end
    
    subgraph Region: AP-South
        subgraph Stamp AP-S-1
            LB_APS1["Load Balancer"]
            App_APS1["App Servers"]
            DB_APS1[("Primary DB")]
        end
    end
    
    GeoDNS["GeoDNS<br/><i>Route 53</i>"]
    USUsers["US Users"]
    EUUsers["EU Users<br/><i>GDPR Compliant</i>"]
    APUsers["Asia Users"]
    
    USUsers --"Low Latency"--> GeoDNS
    EUUsers --"Data Residency"--> GeoDNS
    APUsers --"Regional Access"--> GeoDNS
    
    GeoDNS --"Route"--> LB_USE1
    GeoDNS --"Route"--> LB_USE2
    GeoDNS --"Route"--> LB_EUW1
    GeoDNS --"Route"--> LB_EUW2
    GeoDNS --"Route"--> LB_APS1
    
    LB_USE1 --> App_USE1 --> DB_USE1
    LB_USE2 --> App_USE2 --> DB_USE2
    LB_EUW1 --> App_EUW1 --> DB_EUW1
    LB_EUW2 --> App_EUW2 --> DB_EUW2
    LB_APS1 --> App_APS1 --> DB_APS1

Geographic stamps deploy independent copies in multiple regions for low latency and data residency compliance. Each region contains multiple stamps for capacity and redundancy, with GeoDNS routing users to their nearest region. EU stamps ensure GDPR compliance by keeping European user data within EU boundaries.

Trade-offs

Stamp Size: Large vs. Small Stamps. Large stamps (serving millions of users or thousands of tenants) provide better resource utilization and lower per-user costs because you’re amortizing fixed overhead (monitoring, management, control plane) across more users. However, large stamps have bigger blast radius when failures occur, longer deployment times, and more complex database management. Small stamps (serving thousands of users or tens of tenants) provide better fault isolation, faster deployments, and easier tenant migration, but higher per-user costs due to fixed overhead. The decision framework: choose large stamps when cost efficiency is paramount and you have confidence in reliability (mature systems, extensive testing), choose small stamps when blast radius reduction is critical and you’re willing to pay the overhead cost (early-stage systems, high-value customers). Shopify started with large stamps for cost efficiency but moved toward smaller stamps as they matured, prioritizing reliability over cost optimization. A practical middle ground is medium-sized stamps (tens of thousands of users) that balance cost and blast radius.

Routing Strategy: Static vs. Dynamic Assignment. Static assignment permanently assigns tenants to stamps at provisioning time, with explicit migration required to move tenants. This provides predictable routing, simpler implementation, and no risk of accidental tenant movement. Dynamic assignment uses algorithms like consistent hashing or load-based routing to automatically distribute and rebalance tenants across stamps. This provides automatic load balancing and easier scaling but introduces complexity in the routing layer and potential for unexpected tenant movement. The decision framework: choose static assignment when tenant data is large (migration is expensive), when you have data residency requirements (can’t automatically move tenants across regions), or when you need predictable tenant placement for debugging and support. Choose dynamic assignment when tenants are small and stateless, when you need automatic load balancing, or when you’re building a system from scratch and can design for tenant mobility. Most production systems use static assignment with explicit migration tools because the operational simplicity outweighs the benefits of automatic rebalancing.

Data Architecture: Stamp-Local vs. Globally Replicated Data. Stamp-local data stores all tenant data within the stamp, with no cross-stamp data access. This provides complete isolation, simpler architecture, and no cross-stamp dependencies, but makes cross-tenant features difficult (global search, analytics across all tenants) and requires careful handling of global data like user accounts. Globally replicated data uses distributed databases (Spanner, DynamoDB Global Tables) to replicate certain data across all stamps, enabling cross-tenant features and global user accounts, but introduces consistency challenges, higher latency for writes, and potential for cross-stamp dependencies. The decision framework: choose stamp-local data when tenant isolation is paramount, when you don’t need cross-tenant features, and when you want to minimize complexity. Choose globally replicated data when you need global features (user can access any tenant from any stamp), when you have small amounts of global data (user profiles, not tenant data), and when you can handle eventual consistency. A hybrid approach is common: tenant data is stamp-local, but user authentication and authorization data is globally replicated.

Deployment Strategy: All-at-Once vs. Progressive Rollout. All-at-once deployment updates all stamps simultaneously, providing consistent versions across all stamps and faster rollout of features. However, it has maximum blast radius if the deployment has issues—all stamps are affected simultaneously. Progressive rollout (canary or ring-based) deploys to stamps sequentially, starting with a small percentage of stamps and gradually expanding. This limits blast radius and provides early warning of issues, but results in version skew across stamps and slower feature rollout. The decision framework: choose all-at-once deployment when you have high confidence in the deployment (extensive testing, mature CI/CD), when version consistency is critical (database schema changes that require all services to be updated), or when you have good rollback mechanisms. Choose progressive rollout when you’re deploying risky changes, when you want to validate in production with real traffic before full rollout, or when you can tolerate version skew. Most production systems use progressive rollout for application changes but all-at-once for infrastructure changes that require consistency.

Tenant Migration: Enabled vs. Disabled. Enabling tenant migration allows you to move tenants between stamps for load balancing, stamp decommissioning, or tenant tier changes. This provides operational flexibility and the ability to optimize resource utilization over time. However, it requires building migration tooling, handling data consistency during migration, and managing downtime or degraded performance during the move. Disabling tenant migration means tenants stay on their initial stamp forever, simplifying the architecture but making it difficult to rebalance load or decommission old stamps. The decision framework: enable tenant migration when you have large tenants that might outgrow a stamp, when you need to rebalance load across stamps, or when you plan to decommission stamps over time (moving to new regions, upgrading infrastructure). Disable tenant migration when migration is technically difficult (large data volumes, complex state), when downtime is unacceptable, or when the operational cost of building migration tooling exceeds the benefit. Many systems start without migration and add it later when operational needs justify the investment.

Stamp Size Tradeoff Analysis

graph LR
    subgraph Large Stamps: 10,000 tenants
        LS_Cost["💰 Lower Cost<br/>$15/tenant/month"]
        LS_Blast["⚠️ Large Blast Radius<br/>10,000 tenants affected"]
        LS_Deploy["⏱️ Slow Deployment<br/>45-60 minutes"]
        LS_DB["🗄️ Large Database<br/>500TB per stamp"]
    end
    
    subgraph Small Stamps: 1,000 tenants
        SS_Cost["💰 Higher Cost<br/>$20/tenant/month"]
        SS_Blast["✅ Small Blast Radius<br/>1,000 tenants affected"]
        SS_Deploy["⚡ Fast Deployment<br/>10-15 minutes"]
        SS_DB["🗄️ Manageable DB<br/>50TB per stamp"]
    end
    
    Decision{"Stamp Size<br/>Decision"}
    
    Decision --"Prioritize Cost<br/>Mature System"--> Large Stamps
    Decision --"Prioritize Reliability<br/>High-Value Customers"--> Small Stamps
    
    LS_Cost -."33% cheaper".-> SS_Cost
    LS_Blast -."10x impact".-> SS_Blast
    LS_Deploy -."4x slower".-> SS_Deploy
    LS_DB -."10x larger".-> SS_DB

Large stamps offer better cost efficiency ($15 vs $20 per tenant) but create larger blast radius and operational challenges. Small stamps provide better fault isolation and faster operations but at higher per-tenant cost. The optimal size balances cost, reliability requirements, and operational maturity.

Common Pitfalls

Pitfall 1: Shared Dependencies Between Stamps. Teams often introduce shared services between stamps for convenience—a shared Redis cluster for caching, a shared message queue for async processing, or a shared database for “global” data. This defeats the primary benefit of stamps: fault isolation. When the shared dependency fails, all stamps fail simultaneously, creating a single point of failure worse than not using stamps at all. This happens because shared services seem efficient (why run 10 Redis clusters when one will do?) and because some data genuinely needs to be global (user accounts, configuration). The solution is to be ruthlessly disciplined about stamp independence. For truly global data, use a distributed database designed for global replication (Spanner, DynamoDB Global Tables) rather than a shared single-region database. For services that seem like they should be shared, deploy a copy within each stamp. Yes, this increases costs, but that’s the price of reliability. If cost is prohibitive, you might not need stamps—consider whether simpler patterns like multi-region active-active would suffice.

Pitfall 2: Unbounded Stamp Growth. Without explicit capacity limits, stamps grow indefinitely as you add tenants, eventually becoming too large to manage effectively. A stamp that started serving 1,000 tenants grows to 10,000, then 50,000, with increasingly large databases, longer deployment times, and bigger blast radius. This happens because it’s operationally easier to keep adding tenants to existing stamps than to deploy new stamps and implement tenant assignment logic. The solution is to define and enforce hard capacity limits per stamp from day one. When a stamp reaches 80% of its capacity limit, automatically trigger new stamp provisioning and redirect new tenant assignments to the new stamp. Monitor stamp capacity metrics (tenant count, database size, request rate) and alert when approaching limits. Some systems implement automatic tenant assignment that refuses to assign tenants to stamps at capacity, forcing operators to deploy new stamps. The key is making stamp capacity a first-class operational metric that drives scaling decisions.

Pitfall 3: Inconsistent Stamp Configuration. Over time, stamps diverge in configuration as teams make one-off changes to fix issues or accommodate specific tenants. Stamp-A gets a larger database instance for a big customer, Stamp-B gets a custom rate limit configuration, Stamp-C runs a different version of the application. This configuration drift makes it impossible to reason about system behavior—bugs appear in some stamps but not others, deployments succeed in some stamps and fail in others, and operational runbooks don’t work consistently. This happens because making one-off changes is faster than updating the infrastructure-as-code template and redeploying all stamps. The solution is to treat stamp configuration as immutable and enforce it through automation. All configuration changes must go through the infrastructure-as-code template and be deployed to all stamps (or to a new stamp type if you need differentiation). Use configuration management tools to detect drift and alert when stamps diverge from the template. If a tenant needs special treatment, deploy a separate stamp type rather than customizing an existing stamp. The operational discipline required here is significant but essential for maintaining the benefits of stamps.

Pitfall 4: Inadequate Routing Layer Design. Teams underestimate the importance of the routing layer, treating it as a simple proxy that looks up stamp assignments. In production, the routing layer becomes a critical bottleneck and single point of failure if not designed carefully. Common issues include: routing layer becomes stateful (caching tenant assignments in memory, creating session affinity), routing logic becomes complex (business rules about which stamp to use), and routing layer lacks redundancy (single region, insufficient capacity). This happens because routing seems simple initially—just a lookup table—but grows in complexity as you add features like health checking, circuit breaking, and tenant migration support. The solution is to design the routing layer as a first-class distributed system from the start. Use a globally replicated, low-latency data store for tenant-to-stamp mappings (DynamoDB, Spanner). Keep routing logic simple and stateless—complex decisions should happen in a separate control plane service that updates the mapping table. Deploy routing infrastructure in multiple regions with automatic failover. Load test the routing layer independently to ensure it can handle peak traffic. Monitor routing layer latency and error rates as critical metrics. The routing layer should be the most reliable component in your system because everything depends on it.

Pitfall 5: Ignoring Cross-Stamp Operations. Some features inherently require cross-stamp data access: global search across all tenants, analytics dashboards showing system-wide metrics, or user accounts that can access multiple tenants on different stamps. Teams often discover these requirements after building a stamp architecture that assumes complete isolation, then struggle to retrofit cross-stamp capabilities. This happens because initial requirements focus on single-tenant operations, and cross-stamp needs emerge later. The solution is to identify cross-stamp requirements early in the design phase and build appropriate mechanisms. For global search, implement a separate search index that aggregates data from all stamps (using change data capture or event streaming). For analytics, build a data warehouse that pulls data from all stamps for reporting. For multi-tenant user accounts, use a global user service that maintains user identity and permissions separately from stamp-local tenant data. The key insight is that cross-stamp operations require different architectural patterns than stamp-local operations—you can’t just query across stamps directly. Design these patterns into your architecture from the start rather than trying to add them later.

Stamp Independence Violation Anti-Pattern

graph TB
    subgraph Anti-Pattern: Shared Dependencies
        subgraph Stamp A
            AppA["App Servers"]
            DBA[("Local DB")]
        end
        
        subgraph Stamp B
            AppB["App Servers"]
            DBB[("Local DB")]
        end
        
        subgraph Stamp C
            AppC["App Servers"]
            DBC[("Local DB")]
        end
        
        SharedRedis[("❌ Shared Redis<br/><i>Single Point of Failure</i>")]
        SharedQueue["❌ Shared Message Queue<br/><i>Cascading Failure Risk</i>"]
        
        AppA --> DBA
        AppB --> DBB
        AppC --> DBC
        
        AppA --> SharedRedis
        AppB --> SharedRedis
        AppC --> SharedRedis
        
        AppA --> SharedQueue
        AppB --> SharedQueue
        AppC --> SharedQueue
    end
    
    subgraph Correct Pattern: Full Independence
        subgraph Stamp X
            AppX["App Servers"]
            DBX[("Local DB")]
            RedisX[("✅ Dedicated Redis")]
            QueueX["✅ Dedicated Queue"]
        end
        
        subgraph Stamp Y
            AppY["App Servers"]
            DBY[("Local DB")]
            RedisY[("✅ Dedicated Redis")]
            QueueY["✅ Dedicated Queue"]
        end
        
        AppX --> DBX
        AppX --> RedisX
        AppX --> QueueX
        
        AppY --> DBY
        AppY --> RedisY
        AppY --> QueueY
    end
    
    Failure["💥 Redis Failure"] -."Affects ALL Stamps".-> SharedRedis
    Isolation["✅ Failure Isolated"] -."Only Affects Stamp X".-> RedisX

Sharing infrastructure between stamps (Redis, message queues, databases) creates hidden dependencies that defeat fault isolation. When the shared component fails, all stamps fail simultaneously. Each stamp must have dedicated copies of all infrastructure to maintain independence and limit blast radius.

Math & Calculations

Stamp Capacity Planning. Determining the right size for each stamp requires calculating resource requirements based on tenant load. Let’s work through a realistic example for a SaaS platform.

Variables:

Target tenants per stamp: T
Average requests per second per tenant: R_tenant
Average database queries per request: Q
Average response time requirement: L (latency)
Database connection pool size: C
Peak load multiplier: P (typically 3-5x for daily peaks)

Calculation:

Total requests per second per stamp: RPS_stamp = T × R_tenant × P

Total database queries per second: QPS_db = RPS_stamp × Q

Required database connections (using Little’s Law): C_required = QPS_db × L

Worked Example:

Suppose we’re designing stamps for a project management SaaS:

Target: 1,000 tenants per stamp
Average: 5 requests/second per tenant during business hours
Database queries: 8 queries per request (typical for CRUD operations)
Response time SLA: 200ms (0.2 seconds)
Peak multiplier: 4x (lunch hour spike)

RPS_stamp = 1,000 × 5 × 4 = 20,000 requests/second

QPS_db = 20,000 × 8 = 160,000 queries/second

C_required = 160,000 × 0.2 = 32,000 concurrent database operations

This tells us each stamp needs database capacity to handle 160,000 queries/second with 32,000 concurrent operations. For PostgreSQL, this might translate to a cluster with 16-32 read replicas (each handling 5,000-10,000 QPS) plus a primary for writes.

Storage Capacity:

Average data per tenant: D_tenant Growth rate per month: G Retention period: M months

Storage_stamp = T × D_tenant × (1 + G)^M

Example:

1,000 tenants per stamp
50GB average per tenant initially
10% monthly growth
24-month planning horizon

Storage_stamp = 1,000 × 50GB × (1.1)^24 = 1,000 × 50GB × 9.85 = 492TB per stamp

This indicates each stamp needs approximately 500TB of storage capacity with room for growth.

Cost Analysis:

Fixed cost per stamp (infrastructure, monitoring, management): F Variable cost per tenant (storage, compute): V Total tenants across all stamps: T_total Number of stamps: N

Total cost = N × F + T_total × V

Cost per tenant = (N × F) / T_total + V

Example with different stamp sizes:

Scenario A (Large stamps): 10,000 tenants per stamp, F = $50,000/month, V = $10/tenant For 100,000 total tenants: N = 10 stamps Total cost = 10 × $50,000 + 100,000 × $10 = $500,000 + $1,000,000 = $1,500,000/month Cost per tenant = $15/tenant

Scenario B (Small stamps): 1,000 tenants per stamp, F = $10,000/month, V = $10/tenant For 100,000 total tenants: N = 100 stamps Total cost = 100 × $10,000 + 100,000 × $10 = $1,000,000 + $1,000,000 = $2,000,000/month Cost per tenant = $20/tenant

This shows the cost tradeoff: larger stamps are more cost-efficient ($15 vs $20 per tenant) but have larger blast radius (10,000 vs 1,000 tenants affected by a stamp failure).

Scaling Timeline:

Current tenants: T_current Growth rate: G_rate (tenants per month) Tenants per stamp: T_stamp Stamp deployment time: D_deploy (days)

Months until next stamp needed: M_next = (T_stamp - (T_current mod T_stamp)) / G_rate

Example:

Current: 8,500 tenants across 8 stamps (average 1,062 per stamp)
Capacity: 1,200 tenants per stamp
Growth: 300 tenants/month
Deployment time: 5 days

Remaining capacity in current stamps: 8 × 1,200 - 8,500 = 1,100 tenants Months until capacity exhausted: 1,100 / 300 = 3.67 months

Since stamp deployment takes 5 days, you should start provisioning the 9th stamp at month 3.5 to have it ready before capacity is exhausted. This calculation drives your capacity planning and stamp provisioning schedule.

Real-World Examples

Stripe: Payment Processing Stamps. Stripe uses deployment stamps (they call them “cells”) to scale their payment processing infrastructure globally. Each stamp is a complete copy of Stripe’s payment stack including API servers, PostgreSQL databases, Redis caches, and background job processors. Tenants (Stripe customers) are assigned to specific stamps based on their account creation time and geographic region. An interesting detail: Stripe sizes their stamps to handle approximately 10,000 merchants each, with capacity planning based on transaction volume rather than just merchant count. High-volume merchants like Shopify or Lyft might be assigned to dedicated stamps to prevent them from impacting smaller merchants. When a stamp approaches 70% capacity, Stripe’s infrastructure team provisions a new stamp and begins assigning new merchants to it. This architecture has allowed Stripe to scale from thousands to millions of businesses while maintaining 99.99% uptime. The stamp pattern also enables Stripe to comply with data residency requirements—European merchants are assigned to stamps running in EU regions, ensuring their payment data never leaves the EU. Stripe’s routing layer uses a globally distributed mapping service built on top of their own infrastructure, with sub-10ms lookup latency to determine which stamp serves each API request.

GitHub: Repository Stamps. GitHub partitions their infrastructure using a stamp pattern where repositories are assigned to specific stamps (they call them “repository clusters”). Each stamp contains the full GitHub stack: web servers, Git storage, MySQL databases, Redis caches, and search indexes. Repositories are assigned to stamps using consistent hashing based on repository ID, with special handling for very large repositories. The Linux kernel repository, for example, runs on a dedicated stamp because its size and traffic would overwhelm a shared stamp. An interesting operational detail: GitHub uses stamp-level feature flags, allowing them to test new features on a single stamp (affecting a subset of repositories) before rolling out globally. This caught a critical bug in their Git LFS implementation that only manifested at scale—the bug affected one stamp’s repositories, but the other stamps continued operating normally. GitHub’s stamp architecture also simplified their migration to a new data center: they moved one stamp at a time, validating each migration before moving the next, rather than attempting a big-bang migration of their entire infrastructure. The routing layer uses GeoDNS to direct users to stamps in their nearest region, then uses repository-to-stamp mapping stored in a globally replicated MySQL database to route requests to the correct stamp.

Shopify: Multi-Tenant E-commerce Stamps. Shopify operates one of the largest multi-tenant e-commerce platforms using a stamp architecture where each stamp serves thousands of online stores. Each stamp is a complete Shopify deployment including web servers, Ruby on Rails applications, MySQL databases (sharded within the stamp), Redis caches, and background job processors. Stores are assigned to stamps at creation time based on available capacity, with special handling for Shopify Plus customers (enterprise tier) who get dedicated stamps for performance isolation. An interesting challenge Shopify faced: Black Friday and Cyber Monday create massive traffic spikes (10-50x normal load) concentrated in a few hours. Rather than over-provisioning all stamps year-round, Shopify deploys ephemeral stamps specifically for peak shopping events. These temporary stamps are provisioned a week before Black Friday, handle the traffic spike, then are decommissioned a week later. This approach saves millions in infrastructure costs while maintaining performance during critical sales periods. Shopify’s stamp architecture also enabled their geographic expansion: when entering the Australian market, they deployed stamps in Sydney with data residency guarantees, allowing Australian merchants to keep their data in-country. The routing layer uses a combination of GeoDNS for region selection and a store-to-stamp mapping service that handles over 1 million lookups per second during peak traffic.

Interview Expectations

Mid-Level

What You Should Know: Explain the basic concept of deployment stamps as independent copies of your application stack serving different subsets of users. Describe the key components: the stamp itself (complete application infrastructure), the routing layer (directs traffic to the correct stamp), and the tenant assignment mechanism (determines which users go to which stamp). Understand the primary benefits: horizontal scalability by adding stamps, fault isolation (one stamp failure doesn’t affect others), and simplified regional expansion. Be able to contrast stamps with traditional horizontal scaling (adding instances within a single deployment) and explain when you’d choose stamps over simpler approaches. Walk through a basic stamp architecture for a multi-tenant SaaS application, identifying where stamps make sense and where they might be overkill.

Bonus Points: Discuss stamp sizing decisions with rough numbers (“each stamp serves 10,000 users” or “each stamp handles 50,000 requests/second”). Mention real companies that use stamps (Stripe, GitHub, Shopify) and briefly describe their implementation. Explain the tradeoff between stamp size and blast radius—larger stamps are more efficient but affect more users when they fail. Describe how you’d implement the routing layer using a simple lookup table in a distributed database. Recognize that stamps are most valuable for multi-tenant systems with clear tenant boundaries, and might be unnecessary for single-tenant applications or systems with very few users.

Common Mistakes to Avoid: Don’t confuse stamps with simple multi-region deployment (stamps are about scaling within a region, not just geographic distribution). Don’t claim stamps solve all scaling problems—they add operational complexity and are overkill for small systems. Don’t ignore the routing layer—it’s a critical component that must be highly available and low-latency. Don’t suggest sharing databases or other infrastructure between stamps—that defeats the isolation benefits.

Senior

What You Should Know: Everything from mid-level, plus deep understanding of stamp architecture tradeoffs and operational considerations. Explain different stamp variants (geographic, tenant-tier, functional) and when to use each. Discuss the routing layer in detail: how to implement tenant-to-stamp mapping, how to handle stamp health checks and failover, and how to minimize routing latency. Analyze stamp sizing decisions with actual capacity calculations: given X tenants with Y requests/second each, how many stamps do you need and how do you size them? Understand the operational challenges: deploying new stamps, monitoring stamp health, handling stamp failures, and managing configuration consistency across stamps. Explain how stamps interact with other reliability patterns like circuit breakers, bulkheads, and multi-region active-active.

Bonus Points: Discuss tenant migration strategies: how to move tenants between stamps for load balancing or stamp decommissioning, including data migration approaches and handling downtime. Explain how to handle cross-stamp operations like global search or analytics that need data from all stamps. Describe progressive rollout strategies for stamp deployments (canary, ring-based) and when to use all-at-once vs. gradual rollout. Analyze the cost implications of stamps: fixed cost per stamp vs. variable cost per tenant, and how stamp size affects total cost. Mention specific technologies used in production stamp architectures: consistent hashing for tenant assignment, DynamoDB or Spanner for routing layer storage, infrastructure-as-code tools like Terraform for stamp provisioning. Discuss how stamps enable compliance with data residency requirements (GDPR, data sovereignty laws).

Common Mistakes to Avoid: Don’t design stamps with shared dependencies—each stamp must be fully independent. Don’t ignore the complexity of the routing layer—it’s a distributed system that needs careful design. Don’t assume all data can be stamp-local—identify global data (user accounts, configuration) that needs different handling. Don’t overlook monitoring and observability—you need stamp-level metrics and the ability to aggregate across all stamps. Don’t claim stamps are always the right choice—they add significant operational complexity and are only justified for systems with specific scaling or isolation requirements.

Staff+

What You Should Know: Everything from senior level, plus strategic thinking about when stamps are the right architectural choice and how to evolve stamp architectures over time. Analyze the full spectrum of scaling patterns (vertical scaling, horizontal scaling, sharding, multi-region, stamps) and explain the decision tree for choosing between them. Design stamp architectures that handle edge cases: very large tenants that need dedicated stamps, cross-stamp features that require global data, tenant migration with zero downtime, and stamp decommissioning. Understand the organizational implications: how stamps affect team structure (stamp-focused teams vs. service-focused teams), how to manage configuration and deployment across many stamps, and how to maintain consistency while allowing stamp-level experimentation. Explain how stamps fit into broader architectural evolution: starting with a monolith, scaling with horizontal scaling, introducing stamps when you hit limits, and potentially moving to more sophisticated patterns like service mesh or serverless.

Distinguishing Signals: Discuss anti-patterns and when NOT to use stamps: systems with few tenants (stamps add unnecessary complexity), systems with heavy cross-tenant interactions (stamps make this difficult), or systems where tenant data is small and migration is cheap (simpler patterns suffice). Explain how to evolve from a non-stamp architecture to stamps: identifying the right partitioning key (tenant ID, user ID, geographic region), implementing the routing layer without disrupting existing traffic, and migrating tenants to stamps gradually. Design hybrid architectures that combine stamps with other patterns: stamps for tenant data with a separate global service mesh for cross-cutting concerns, or stamps within regions with multi-region active-active for disaster recovery. Analyze the economics of stamps at scale: at what point does the fixed cost per stamp become prohibitive, and how do you optimize for cost while maintaining isolation benefits? Discuss the future of stamps in cloud-native architectures: how serverless and Kubernetes change the stamp pattern, and whether stamps remain relevant as cloud providers offer better scaling primitives.

Common Mistakes to Avoid: Don’t apply stamps dogmatically—they’re a tool for specific problems, not a universal architecture. Don’t ignore the organizational challenges—stamps require mature DevOps practices and infrastructure-as-code discipline. Don’t design stamps that are too small (excessive overhead) or too large (excessive blast radius)—find the right balance for your specific context. Don’t overlook the evolution path—how do you migrate from your current architecture to stamps, and what’s the rollback plan if stamps don’t work out? Don’t assume stamps are a permanent architecture—be prepared to evolve to different patterns as your system grows and requirements change.

Common Interview Questions

Q1: When should I use deployment stamps vs. simple horizontal scaling?

60-second answer: Use stamps when you need fault isolation between groups of users (one group’s failure doesn’t affect others), when you’re hitting practical limits on single-deployment size (database too large, deployment too slow), or when you have data residency requirements (different regions need separate data stores). Use simple horizontal scaling when you have a single-tenant application, when your system is small enough that one deployment can handle all traffic, or when the operational complexity of stamps isn’t justified by your reliability or scaling needs.

2-minute answer: The decision comes down to three factors: scale, isolation, and operational maturity. For scale, if your database is approaching 10TB, your deployment takes over an hour, or you’re running hundreds of application instances, you’re hitting the practical limits of a single deployment and stamps can help. For isolation, if you have multiple customer tiers with different SLAs, if you need to comply with data residency laws, or if you want to limit blast radius of failures, stamps provide natural boundaries. For operational maturity, stamps require infrastructure-as-code, automated deployment pipelines, sophisticated monitoring, and a team comfortable managing distributed systems—if you don’t have these, stamps will create more problems than they solve. A good rule of thumb: if you’re serving fewer than 100,000 users or 1,000 tenants, you probably don’t need stamps yet. Start with simple horizontal scaling and introduce stamps when you have clear evidence that you need them (hitting scale limits, experiencing blast radius issues, or facing compliance requirements).

Red flags: Saying “we should use stamps because Google uses them” without understanding your specific requirements. Claiming stamps are always better than horizontal scaling. Not considering the operational complexity and cost of managing multiple stamps.

Q2: How do you handle cross-stamp operations like global search or analytics?

60-second answer: Build separate infrastructure for cross-stamp operations rather than trying to query across stamps directly. For global search, use change data capture or event streaming to populate a centralized search index (Elasticsearch, Algolia) that aggregates data from all stamps. For analytics, build a data warehouse that pulls data from all stamps for reporting. For user accounts that access multiple tenants, use a global user service that maintains identity and permissions separately from stamp-local tenant data.

2-minute answer: The key insight is that stamps are designed for isolation, so cross-stamp operations require different architectural patterns. For search, implement a pipeline where each stamp publishes data changes to a message queue (Kafka, Kinesis), and a separate search indexing service consumes these events to build a global search index. This index is eventually consistent with stamp data but provides fast global search. For analytics, use a similar pattern: each stamp exports data to a data lake (S3, BigQuery), and a separate analytics service processes this data to generate reports. The analytics data is typically hours or days behind real-time, which is acceptable for reporting use cases. For user accounts, maintain a global user service (using a distributed database like Spanner or DynamoDB Global Tables) that handles authentication and stores which tenants a user can access. When a user logs in, the global service authenticates them and returns a list of accessible tenants with their stamp locations. The user’s browser or app then makes requests directly to the appropriate stamps. The critical principle: never make synchronous cross-stamp requests in the critical path of user requests—this creates dependencies that defeat stamp isolation. All cross-stamp operations should be asynchronous and eventually consistent.

Red flags: Suggesting synchronous queries across stamps in the request path. Not considering the consistency implications of cross-stamp data. Claiming you can maintain strong consistency across stamps without understanding the performance implications.

Q3: How do you size stamps and decide when to deploy a new one?

60-second answer: Define capacity limits based on resource constraints (database size, request rate, tenant count) and performance requirements (response time, throughput). Monitor stamp utilization and deploy a new stamp when existing stamps reach 70-80% of capacity. Use actual production metrics to refine your capacity model over time. For example, if each stamp can handle 50,000 requests/second and you’re growing by 10,000 requests/second per month, deploy a new stamp every 3-4 months.

2-minute answer: Start by identifying your bottleneck resource—usually the database, but could be CPU, memory, or network. For database-constrained systems, measure how database size and query load affect performance, then set a hard limit (e.g., 5TB per stamp, 100,000 queries/second). For compute-constrained systems, measure CPU and memory utilization under load and set limits that maintain acceptable response times (e.g., 70% CPU utilization at peak). Once you have capacity limits, instrument your stamps to track utilization: tenant count, database size, request rate, and resource utilization. Set alerts at 70% and 80% of capacity—70% is when you start planning to deploy a new stamp, 80% is when you must have a new stamp ready. The lead time for deploying a new stamp (provisioning infrastructure, testing, registering with routing layer) determines when you need to act. If stamp deployment takes two weeks, start at 70% capacity to ensure the new stamp is ready before you hit 80%. Use historical growth rates to predict when you’ll hit capacity thresholds. For example, if you’re adding 1,000 tenants per month and each stamp supports 10,000 tenants, you need a new stamp every 7-8 months (accounting for the 70-80% threshold). Build automation to deploy new stamps when capacity thresholds are reached—manual provisioning doesn’t scale.

Red flags: Not having defined capacity limits for stamps. Waiting until stamps are at 100% capacity before deploying new ones. Not accounting for traffic spikes and growth in capacity planning. Sizing stamps based on current load without headroom for growth.

Q4: What’s your strategy for deploying updates across multiple stamps?

60-second answer: Use progressive rollout: deploy to a small percentage of stamps first (canary), monitor for errors and performance degradation, then gradually expand to more stamps. If issues are detected, halt the rollout and rollback affected stamps. For critical infrastructure changes that require consistency (database schema changes), use all-at-once deployment with robust rollback mechanisms. Automate the deployment process to ensure consistency and reduce human error.

2-minute answer: Implement a ring-based deployment strategy with multiple stages: (1) Deploy to a single canary stamp (1-5% of stamps) and monitor for 1-4 hours. Watch error rates, latency, resource utilization, and business metrics. (2) If canary is healthy, deploy to 10% of stamps and monitor for several hours. (3) Deploy to 50% of stamps. (4) Deploy to remaining stamps. At each stage, have automated checks that halt the rollout if metrics degrade beyond thresholds. For rollback, maintain the ability to quickly revert to the previous version—this might mean keeping old application versions running, using blue-green deployment within stamps, or having automated rollback scripts. For database migrations, use a different strategy: make schema changes backward-compatible (add columns, don’t remove them), deploy the schema change to all stamps simultaneously, then deploy application code that uses the new schema progressively. This ensures all stamps can handle both old and new application versions during the rollout. Use feature flags to decouple deployment from feature activation—deploy code to all stamps with features disabled, then gradually enable features stamp-by-stamp. This allows you to test in production with real traffic before full activation. The key is having observability at the stamp level: you need to see metrics for each stamp individually to detect issues during rollout, not just aggregate metrics across all stamps.

Red flags: Deploying to all stamps simultaneously without progressive rollout. Not having automated rollback mechanisms. Deploying database schema changes that aren’t backward-compatible. Not monitoring stamp-level metrics during deployment.

Q5: How do you handle tenant migration between stamps?

60-second answer: Implement a multi-phase migration: (1) Replicate tenant data from source stamp to destination stamp. (2) Run both stamps in parallel, writing to both but reading from source. (3) Switch reads to destination stamp. (4) Verify data consistency and monitor for issues. (5) Delete data from source stamp. Use feature flags to control which stamp serves each tenant, allowing quick rollback if issues occur. Expect some downtime or degraded performance during migration unless you build sophisticated dual-write mechanisms.

2-minute answer: Tenant migration is complex and should be avoided if possible, but when necessary, use this approach: First, build a migration service that orchestrates the process. This service coordinates data replication, routing changes, and validation. Second, implement data replication from source to destination stamp. For databases, use logical replication (PostgreSQL logical replication, MySQL binlog) to stream changes in real-time. For object storage, use sync tools to copy files. Third, enter a dual-write phase where the application writes to both stamps but reads from the source. This ensures the destination stamp has all new data while you verify the initial replication. Fourth, update the routing layer to direct reads to the destination stamp while continuing dual writes. Monitor error rates and latency carefully—this is when issues typically appear. Fifth, after a soak period (hours to days depending on risk tolerance), stop writing to the source stamp and verify data consistency. Use checksums or row counts to validate that source and destination match. Sixth, delete data from the source stamp and update capacity tracking. For large tenants, this process might take days or weeks. For small tenants, you might batch-migrate multiple tenants simultaneously. The key challenges are: maintaining data consistency during migration, handling application state (sessions, caches), and minimizing downtime. Some systems accept brief downtime (maintenance window), while others build complex dual-write mechanisms to achieve zero-downtime migration. The decision depends on your SLA requirements and the cost of building zero-downtime migration.

Red flags: Claiming tenant migration is easy or can be done instantly. Not having a rollback plan if migration fails. Not considering data consistency during migration. Migrating tenants frequently without understanding the operational cost.

Cross-Stamp Operations Architecture Pattern

graph TB
    subgraph Stamp 1
        App1["App Servers"]
        DB1[("Tenant Data")]
        CDC1["Change Data<br/>Capture"]
    end
    
    subgraph Stamp 2
        App2["App Servers"]
        DB2[("Tenant Data")]
        CDC2["Change Data<br/>Capture"]
    end
    
    subgraph Stamp 3
        App3["App Servers"]
        DB3[("Tenant Data")]
        CDC3["Change Data<br/>Capture"]
    end
    
    subgraph Global Services
        EventStream["Event Stream<br/><i>Kafka/Kinesis</i>"]
        SearchIndex[("Global Search Index<br/><i>Elasticsearch</i>")]
        DataWarehouse[("Analytics Warehouse<br/><i>BigQuery/Redshift</i>")]
        UserService["Global User Service<br/><i>Spanner/DynamoDB</i>"]
    end
    
    App1 --> DB1
    App2 --> DB2
    App3 --> DB3
    
    DB1 --> CDC1
    DB2 --> CDC2
    DB3 --> CDC3
    
    CDC1 --"Async Events"--> EventStream
    CDC2 --"Async Events"--> EventStream
    CDC3 --"Async Events"--> EventStream
    
    EventStream --"Index Updates"--> SearchIndex
    EventStream --"ETL Pipeline"--> DataWarehouse
    
    App1 & App2 & App3 -."Auth/Authz".-> UserService
    
    User["User"] --"Global Search"--> SearchIndex
    User --"Analytics Query"--> DataWarehouse
    User --"Login"--> UserService
    
    Note1["❌ Never: Synchronous<br/>cross-stamp queries"] 
    Note2["✅ Always: Async event<br/>streaming + eventual<br/>consistency"]
    
    class

Red Flags to Avoid

Red Flag 1: “Stamps are just multi-region deployment.” This conflates geographic distribution with the stamp pattern. Multi-region deployment means running your application in multiple geographic regions for latency and disaster recovery, but you might have a single deployment per region. Stamps are about partitioning within a region (or across regions) to create independent failure domains and enable horizontal scaling beyond single-deployment limits. You can have stamps within a single region, or you can have multi-region deployment without stamps. They’re orthogonal concepts that can be combined (geographic stamps). What to say instead: “Stamps are independent copies of your full application stack that serve different subsets of users, providing fault isolation and horizontal scalability. You can deploy stamps within a single region for scaling, or across multiple regions for geographic distribution. The key characteristic is that each stamp is self-contained with its own data stores and no runtime dependencies on other stamps.”

Red Flag 2: “We should share the database across stamps for efficiency.” This completely defeats the primary benefit of stamps: fault isolation. If all stamps share a database, a database failure affects all stamps simultaneously, creating a single point of failure. The whole point of stamps is that each stamp can fail independently without cascading to others. Sharing infrastructure between stamps (databases, caches, message queues) creates hidden coupling that makes your system less reliable than a single deployment. What to say instead: “Each stamp must have its own data stores to maintain fault isolation. Yes, this increases costs because you’re running multiple databases instead of one, but that’s the price of reliability. If cost is prohibitive, stamps might not be the right pattern—consider whether simpler approaches like read replicas or sharding would suffice. For truly global data that must be shared, use a distributed database designed for global replication like Spanner or DynamoDB Global Tables, not a shared single-region database.”

Red Flag 3: “Stamps solve all scaling problems.” Stamps are a tool for specific scaling challenges, not a universal solution. They add significant operational complexity: you’re managing multiple independent deployments, coordinating configuration across stamps, monitoring many systems, and handling cross-stamp operations. For small systems (fewer than 100,000 users, single region, simple architecture), stamps are overkill—the operational burden outweighs the benefits. Stamps are most valuable when you’re hitting practical limits of single-deployment scaling (database too large, deployment too slow, blast radius too big) or when you have specific isolation requirements (data residency, tenant tiers). What to say instead: “Stamps are appropriate when you need fault isolation between user groups, when you’re hitting practical limits on single-deployment size, or when you have data residency requirements. For smaller systems, simpler patterns like horizontal scaling with read replicas are more appropriate. The decision to use stamps should be driven by specific requirements, not by copying what large companies do. Start simple and introduce stamps when you have clear evidence that you need them.”

Red Flag 4: “We can query across stamps in real-time for global features.” Synchronous cross-stamp queries in the request path create dependencies that defeat stamp isolation and introduce latency. If Stamp-A needs to query Stamp-B to serve a user request, Stamp-A’s availability now depends on Stamp-B’s availability, and the request latency includes network round-trip time between stamps. This creates a distributed monolith that’s worse than a single deployment. Cross-stamp operations should be asynchronous and eventually consistent, using patterns like event streaming to a centralized index or data warehouse. What to say instead: “Cross-stamp operations require different architectural patterns than stamp-local operations. For global search, use change data capture to populate a centralized search index that aggregates data from all stamps. For analytics, build a data warehouse that pulls data from all stamps for reporting. For user accounts that access multiple tenants, use a global user service that maintains identity separately from stamp-local tenant data. These approaches accept eventual consistency in exchange for maintaining stamp independence and low latency for stamp-local operations.”

Red Flag 5: “Stamps are a permanent architecture that we’ll never change.” No architecture is permanent, and stamps are no exception. As your system evolves, you might find that stamps are no longer the right pattern: you might move to serverless architectures that provide better scaling primitives, you might consolidate stamps as you improve single-deployment scalability, or you might adopt different partitioning strategies as requirements change. Treating stamps as a permanent decision prevents you from adapting to changing needs and technologies. What to say instead: “Stamps are an architectural pattern that solves specific problems at a certain scale and maturity level. As our system evolves, we should continuously evaluate whether stamps remain the right choice. We might consolidate stamps if we improve single-deployment scalability, or we might evolve to different patterns like service mesh or serverless as cloud platforms provide better primitives. The key is building with flexibility in mind: use infrastructure-as-code so we can evolve our architecture, design APIs that don’t assume stamps exist, and avoid coupling application logic to stamp topology. Architecture is a journey, not a destination.”

Key Takeaways

Deployment stamps are independent, self-contained copies of your entire application stack that serve different subsets of users or tenants, enabling horizontal scaling beyond single-deployment limits while providing natural fault isolation and data residency boundaries.
Each stamp must be completely independent with no shared runtime dependencies (separate databases, caches, queues) to maintain fault isolation—the primary benefit of stamps is that one stamp’s failure doesn’t cascade to others.
The routing layer is a critical component that must be stateless, highly available, and low-latency, using a globally replicated mapping service to direct requests to the correct stamp based on tenant ID, user identity, or geographic location.
Stamp sizing requires balancing cost efficiency against blast radius—larger stamps (10,000+ tenants) are more cost-effective but affect more users when they fail, while smaller stamps (1,000 tenants) provide better isolation but higher per-user costs due to fixed overhead.
Cross-stamp operations require different architectural patterns than stamp-local operations—use asynchronous event streaming to centralized indexes for global search, data warehouses for analytics, and global user services for authentication, never synchronous cross-stamp queries in the request path.
Stamps are most valuable for multi-tenant SaaS systems at scale (100,000+ users, 1,000+ tenants) with clear tenant boundaries, data residency requirements, or the need to limit blast radius—smaller systems should use simpler patterns like horizontal scaling until they have clear evidence that stamps are necessary.

Prerequisites: Understanding Horizontal Scaling is essential before deployment stamps—stamps are a form of horizontal scaling at the deployment level. Familiarity with Multi-Tenant Architecture helps understand tenant isolation and data partitioning strategies. Knowledge of Load Balancing is important for understanding the routing layer.

Related Patterns: Database Sharding is conceptually similar to stamps but operates at the database level rather than the full stack. Circuit Breakers and Bulkheads complement stamps by providing fault isolation within each stamp. Multi-Region Active-Active can be combined with stamps for geographic distribution.

Next Steps: After understanding stamps, explore Chaos Engineering to test stamp failure scenarios and validate isolation. Study Infrastructure as Code for managing stamp deployments at scale. Learn about Service Mesh for advanced traffic management within stamps.