Databases in System Design: SQL, NoSQL & More
After this topic, you will be able to:
- Explain the fundamental differences between RDBMS and NoSQL database paradigms
- Describe ACID and BASE properties and their trade-offs in distributed systems
- Identify appropriate database types for different use case requirements
TL;DR
Databases are the persistent storage layer in every system design, and choosing the right one fundamentally shapes your architecture’s performance, scalability, and operational complexity. The core decision is between RDBMS (strong consistency, structured data) and NoSQL (flexible schemas, horizontal scalability), driven by your data model, consistency requirements, and scale. Cheat sheet: ACID = consistency first (banking, inventory); BASE = availability first (social feeds, analytics); pick based on what failure mode you can tolerate.
Why This Matters
Every system design interview eventually asks “where do you store the data?” and your answer reveals how deeply you understand distributed systems trade-offs. Databases aren’t just storage—they’re the foundation that determines whether your system can scale to millions of users, maintain data integrity during failures, or support complex queries efficiently. At Netflix, choosing Cassandra for viewing history versus MySQL for billing isn’t arbitrary; it reflects fundamental trade-offs between consistency and availability that cascade through the entire architecture.
In production, database decisions are expensive to reverse. Stripe can’t easily migrate from PostgreSQL to MongoDB for payment records because ACID transactions are non-negotiable for financial correctness. Twitter’s shift from MySQL to Manhattan (their distributed key-value store) for tweets took years and required rethinking their entire data model. Understanding the database landscape helps you make informed choices upfront and articulate the “why” behind your decisions—exactly what interviewers want to hear.
The database layer also exposes you to core distributed systems concepts: consistency models, replication, partitioning, and failure handling. Master these fundamentals here, and you’ll speak fluently about trade-offs throughout system design. A senior engineer doesn’t just pick “a NoSQL database”—they explain why Cassandra’s eventual consistency fits user activity logs but would be disastrous for inventory management, citing specific failure scenarios and recovery strategies.
The Landscape
The database world divides into two major paradigms: relational databases (RDBMS) and NoSQL databases, each optimized for different use cases. RDBMS like PostgreSQL, MySQL, and Oracle have dominated for decades because they guarantee ACID properties—atomicity, consistency, isolation, durability—making them the default choice for transactional systems where correctness matters more than raw throughput. Your bank account balance, e-commerce inventory, and user authentication all live in RDBMS because losing a transaction or showing stale data isn’t acceptable.
NoSQL emerged in the 2000s when companies like Google, Amazon, and Facebook hit scaling walls with traditional databases. They traded strict consistency for horizontal scalability and flexible schemas, embracing BASE properties (Basically Available, Soft state, Eventually consistent). This spawned four main NoSQL categories: key-value stores (Redis, DynamoDB), document databases (MongoDB, Couchbase), column-family stores (Cassandra, HBase), and graph databases (Neo4j, Amazon Neptune). Each optimizes for specific access patterns—key-value for caching, document for flexible schemas, column-family for time-series data, graph for relationship-heavy queries.
Between these extremes sit NewSQL databases like CockroachDB and Google Spanner, attempting to provide ACID guarantees at NoSQL scale through sophisticated distributed consensus protocols. They’re powerful but complex, typically appearing in scenarios where you absolutely need both strong consistency and horizontal scalability—think global financial systems or multi-region SaaS platforms. The landscape also includes specialized databases: time-series databases (InfluxDB, TimescaleDB) for metrics, search engines (Elasticsearch) for full-text queries, and data warehouses (Snowflake, BigQuery) for analytics. Modern systems often use polyglot persistence, combining multiple database types to match each data domain’s requirements.
Database Landscape: RDBMS vs NoSQL Categories
graph TB
DB["Database Systems"]
DB --> RDBMS["RDBMS<br/><i>ACID Properties</i>"]
DB --> NoSQL["NoSQL<br/><i>BASE Properties</i>"]
DB --> NewSQL["NewSQL<br/><i>ACID at Scale</i>"]
DB --> Specialized["Specialized<br/><i>Domain-Specific</i>"]
RDBMS --> PG["PostgreSQL<br/>MySQL<br/>Oracle"]
RDBMS -."Use Cases".-> RDBMS_UC["Transactions<br/>Complex Queries<br/>Strong Consistency"]
NoSQL --> KV["Key-Value<br/><i>Redis, DynamoDB</i>"]
NoSQL --> Doc["Document<br/><i>MongoDB, Couchbase</i>"]
NoSQL --> Col["Column-Family<br/><i>Cassandra, HBase</i>"]
NoSQL --> Graph["Graph<br/><i>Neo4j, Neptune</i>"]
KV -."Use Cases".-> KV_UC["Caching<br/>Session Storage"]
Doc -."Use Cases".-> Doc_UC["Flexible Schemas<br/>Rapid Iteration"]
Col -."Use Cases".-> Col_UC["Time-Series<br/>High Write Throughput"]
Graph -."Use Cases".-> Graph_UC["Social Networks<br/>Recommendations"]
NewSQL --> Cockroach["CockroachDB<br/>Google Spanner"]
NewSQL -."Use Cases".-> NewSQL_UC["Global Scale<br/>+ ACID Guarantees"]
Specialized --> TS["Time-Series<br/><i>InfluxDB</i>"]
Specialized --> Search["Search<br/><i>Elasticsearch</i>"]
Specialized --> Warehouse["Data Warehouse<br/><i>Snowflake</i>"]
The database landscape organized by paradigm and use case. RDBMS provides ACID guarantees for transactional systems, NoSQL offers four categories optimized for different access patterns with BASE properties, NewSQL attempts to bridge both worlds, and specialized databases target specific domains like search or analytics.
Database Selection Decision Tree
flowchart TB
Start["Database Selection<br/>Decision Point"]
Start --> Q1{"Need ACID<br/>Transactions?"}
Q1 -->|Yes| Q2{"Complex Queries<br/>with Joins?"}
Q1 -->|No| Q3{"Data Structure"}
Q2 -->|Yes| RDBMS["RDBMS<br/><i>PostgreSQL, MySQL</i>"]
Q2 -->|No| Q4{"Need Global<br/>Distribution?"}
Q4 -->|Yes| NewSQL["NewSQL<br/><i>CockroachDB, Spanner</i>"]
Q4 -->|No| RDBMS
Q3 -->|"Key-Value"| Q5{"In-Memory<br/>Speed Needed?"}
Q3 -->|"Document"| Doc["Document DB<br/><i>MongoDB, Couchbase</i>"]
Q3 -->|"Time-Series"| Col["Column-Family<br/><i>Cassandra, HBase</i>"]
Q3 -->|"Graph"| Graph["Graph DB<br/><i>Neo4j, Neptune</i>"]
Q5 -->|Yes| Redis["Redis<br/><i>Caching, Sessions</i>"]
Q5 -->|No| KV["DynamoDB<br/><i>Persistent K-V</i>"]
RDBMS -."Use Case".-> UC1["Banking, E-commerce<br/>Inventory, Auth"]
NewSQL -."Use Case".-> UC2["Global SaaS<br/>Multi-Region Finance"]
Doc -."Use Case".-> UC3["User Profiles<br/>Content Management"]
Col -."Use Case".-> UC4["IoT Data<br/>Activity Logs"]
Graph -."Use Case".-> UC5["Social Networks<br/>Recommendations"]
Redis -."Use Case".-> UC6["Session Store<br/>Rate Limiting"]
KV -."Use Case".-> UC7["User Preferences<br/>Feature Flags"]
A decision tree for database selection based on requirements. Start with ACID needs and query complexity to determine if RDBMS fits, then branch into NoSQL categories based on data structure and access patterns. Each path leads to specific technologies with example use cases, helping you articulate the ‘why’ behind database choices in interviews.
Key Areas
name: ACID vs BASE Properties why_it_matters: This is the fundamental trade-off that drives every database decision. ACID properties—Atomicity (all-or-nothing transactions), Consistency (data integrity rules enforced), Isolation (concurrent transactions don’t interfere), Durability (committed data survives crashes)—guarantee correctness but limit scalability. When you transfer $100 between bank accounts, ACID ensures both the debit and credit happen together or not at all, even if the server crashes mid-transaction. PostgreSQL and MySQL achieve this through write-ahead logging, locking, and single-node coordination.
BASE properties flip the priority: Basically Available (system responds even during failures), Soft state (data may be inconsistent temporarily), Eventually consistent (given enough time without updates, all replicas converge). Cassandra writing your tweet to three replicas doesn’t wait for all to acknowledge—it returns success after one or two, accepting that reads might see stale data briefly. This enables massive write throughput and survives datacenter failures, but you can’t use it for financial transactions where temporary inconsistency means lost money. Understanding when each model is acceptable separates junior engineers who memorize acronyms from seniors who reason about failure modes.
name: Data Modeling and Normalization why_it_matters: How you structure data determines query performance, storage efficiency, and schema evolution flexibility. Relational databases use normalization—organizing data into tables with foreign keys to eliminate redundancy. Third normal form (3NF) means each non-key attribute depends only on the primary key, preventing update anomalies. A user’s email appears once in the users table, referenced by foreign keys in orders and reviews. This ensures consistency (change email once, it updates everywhere) but requires joins for queries, which get expensive at scale.
NoSQL databases often denormalize deliberately, duplicating data across documents or rows to avoid joins. MongoDB might store a user’s name and email directly in each order document, accepting that updating a user’s email requires touching multiple documents. This trades storage and update complexity for read performance—fetching an order doesn’t require joining to the users table. The key insight: normalization optimizes for write consistency and storage, denormalization optimizes for read performance and availability. In interviews, explaining this trade-off shows you understand that database design isn’t about following rules blindly—it’s about matching structure to access patterns and consistency requirements.
name: Database Selection Criteria why_it_matters: Choosing the right database requires analyzing multiple dimensions: data structure (fixed schema vs flexible), consistency requirements (strong vs eventual), query patterns (simple lookups vs complex joins), scale (thousands vs billions of records), and operational complexity (managed service vs self-hosted). A common interview mistake is jumping to “I’d use MongoDB” without justifying why. Strong candidates walk through the decision tree: “User profiles need flexible schemas as we add features, reads outnumber writes 100:1, and we can tolerate eventual consistency for profile updates—document database fits well. But payment transactions require ACID guarantees and complex queries across orders, users, and inventory—that’s RDBMS territory.”
Real-world systems rarely use one database. Netflix uses MySQL for billing (ACID), Cassandra for viewing history (scale + availability), and Elasticsearch for search (full-text queries). This polyglot persistence approach matches each data domain to the optimal storage engine, but introduces operational complexity—multiple systems to monitor, backup, and keep in sync. In interviews, acknowledging this trade-off demonstrates production experience. You might start with a single PostgreSQL instance for simplicity, then explain when you’d introduce specialized databases as scale or requirements evolve.
name: Consistency Models and CAP Theorem why_it_matters: The CAP theorem states that distributed databases can provide at most two of three guarantees: Consistency (all nodes see the same data), Availability (every request gets a response), Partition tolerance (system works despite network failures). Since network partitions are inevitable in distributed systems, you’re really choosing between consistency (CP systems like HBase, MongoDB in default mode) and availability (AP systems like Cassandra, DynamoDB). This isn’t just theory—it determines how your system behaves during failures. A CP system might reject writes during a network split to prevent inconsistency, while an AP system accepts writes on both sides and reconciles conflicts later.
For a detailed exploration of CAP theorem implications and how different databases make this trade-off, see SQL vs NoSQL. In interviews, use CAP to explain database choices: “For a shopping cart, I’d choose availability (AP) because showing a user their cart with slightly stale data is better than showing an error. For inventory management, I’d choose consistency (CP) because overselling products due to stale reads causes real business problems.” This reasoning shows you think about user experience and business impact, not just technical properties.
name: Scaling Strategies why_it_matters: Databases scale vertically (bigger machines) or horizontally (more machines). Vertical scaling is simple—upgrade to a larger instance—but hits physical limits and creates a single point of failure. Horizontal scaling requires distributing data across multiple nodes through replication (copying data for redundancy) or sharding (partitioning data by key range or hash). RDBMS traditionally scaled vertically because maintaining ACID across distributed nodes is hard, while NoSQL databases were designed for horizontal scaling from day one.
For replication strategies and their trade-offs, see Replication (RDBMS). For sharding implementation details, see Sharding. In interviews, explain the progression: “We’d start with a single PostgreSQL instance with read replicas for read-heavy workloads. If writes become the bottleneck, we’d consider sharding by user_id or switching to a horizontally-scalable database like Cassandra. The decision depends on whether we can tolerate eventual consistency and whether our queries fit a sharded model—cross-shard joins are expensive.” This shows you understand scaling isn’t just adding more machines; it’s about matching the scaling strategy to your consistency and query requirements.
How Things Connect
The database landscape isn’t a random collection of technologies—it’s a coherent design space where each choice represents a different point on the consistency-availability-scalability spectrum. ACID properties and normalization form the foundation of RDBMS, enabling complex transactions and queries but limiting horizontal scalability. When systems outgrow vertical scaling, you face a fork: maintain ACID through sophisticated distributed consensus (NewSQL) or embrace BASE and eventual consistency (NoSQL).
This decision cascades through your architecture. Choosing Cassandra for eventual consistency means your application code must handle conflicts and stale reads—you can’t rely on the database to enforce invariants. Choosing PostgreSQL for ACID means you’ll eventually need replication for availability and possibly sharding for scale, each adding operational complexity. The CAP theorem explains why this trade-off is fundamental: in a distributed system with network partitions, you can’t have both perfect consistency and perfect availability.
Database selection criteria tie everything together. You start with requirements (data structure, consistency needs, query patterns, scale) and map them to database properties (ACID vs BASE, schema flexibility, query capabilities). This leads to specific technologies (PostgreSQL for transactions, Cassandra for time-series, Redis for caching) and scaling strategies (read replicas, sharding, multi-region). Understanding these connections helps you design coherent systems where the database choice aligns with consistency requirements, scaling strategy, and operational capabilities. In interviews, explicitly drawing these connections—“because we chose eventual consistency for availability, we need conflict resolution in the application layer”—demonstrates systems thinking beyond memorized facts.
ACID vs BASE: Property Comparison and Trade-offs
graph LR
subgraph ACID Properties
A["Atomicity<br/><i>All-or-nothing</i>"]
C["Consistency<br/><i>Integrity rules enforced</i>"]
I["Isolation<br/><i>No interference</i>"]
D["Durability<br/><i>Survives crashes</i>"]
end
subgraph Trade-off Space
Trade["CAP Theorem<br/>Choose 2 of 3:<br/>Consistency<br/>Availability<br/>Partition Tolerance"]
end
subgraph BASE Properties
BA["Basically Available<br/><i>Responds during failures</i>"]
S["Soft State<br/><i>Temporary inconsistency</i>"]
E["Eventually Consistent<br/><i>Converges over time</i>"]
end
ACID -."Guarantees".-> Strong["Strong Consistency<br/>Correctness First"]
BASE -."Guarantees".-> Eventual["Eventual Consistency<br/>Availability First"]
Strong --> UseCase1["Banking<br/>Inventory<br/>Payments"]
Eventual --> UseCase2["Social Feeds<br/>Analytics<br/>Logs"]
Strong -."Limitation".-> Limit1["Vertical Scaling<br/>Single Point of Failure"]
Eventual -."Limitation".-> Limit2["Stale Reads<br/>Conflict Resolution"]
Trade -."Drives".-> ACID
Trade -."Drives".-> BASE
ACID and BASE represent opposite ends of the consistency-availability spectrum, driven by the CAP theorem. ACID prioritizes correctness for transactional systems but limits scalability, while BASE prioritizes availability for high-scale systems but accepts temporary inconsistency. The choice depends on which failure mode your use case can tolerate.
Database Scaling Progression and Trade-offs
graph LR
Start["Single Instance<br/><i>Simple, Limited Scale</i>"]
Start --"1. Read Traffic Grows"--> Replicas["Primary + Read Replicas<br/><i>Horizontal Read Scaling</i>"]
Replicas --"2. Write Traffic Grows"--> Fork{"Scaling Fork"}
Fork --"Option A: Keep ACID"--> Vertical["Vertical Scaling<br/><i>Bigger Instance</i>"]
Fork --"Option B: Keep ACID"--> Sharding["Sharding<br/><i>Partition by Key</i>"]
Fork --"Option C: Trade Consistency"--> NoSQL["Switch to NoSQL<br/><i>Horizontal + BASE</i>"]
Vertical -."Trade-off".-> VT["+ Simple<br/>- Cost Ceiling<br/>- Single Point of Failure"]
Sharding -."Trade-off".-> ST["+ ACID per Shard<br/>- Complex Queries<br/>- Rebalancing Overhead"]
NoSQL -."Trade-off".-> NT["+ Massive Scale<br/>- Eventual Consistency<br/>- App-Level Conflicts"]
Vertical -."Eventually hits limit".-> Sharding
Sharding -."If queries don't fit".-> NoSQL
subgraph Real World Example
Uber["Uber: PostgreSQL → Schemaless<br/><i>Sharded MySQL for ACID at scale</i>"]
Stripe["Stripe: PostgreSQL + Vertical<br/><i>Optimize before switching</i>"]
end
The typical progression of database scaling strategies and their trade-offs. Start simple with a single instance, add read replicas for read-heavy workloads, then face a fork when writes become the bottleneck: vertical scaling (simple but limited), sharding (ACID but complex), or NoSQL (scalable but eventual consistency). Real companies like Uber and Stripe show different paths based on their specific requirements.
Real-World Context
Netflix’s database architecture illustrates polyglot persistence at scale. They use MySQL for billing and subscription management because financial transactions require ACID guarantees—you can’t charge a customer twice or lose a payment due to eventual consistency. For viewing history and recommendations, they use Cassandra, a NoSQL database optimized for high write throughput and availability across multiple AWS regions. Losing a few viewing events during a network partition is acceptable; showing users an error when they try to watch a movie isn’t. For search and discovery, they use Elasticsearch, which excels at full-text queries but isn’t suitable for transactional data. This isn’t over-engineering—each database matches its data domain’s requirements.
Uber’s transition from PostgreSQL to Schemaless (their custom MySQL-based sharded system) shows how scaling forces architectural evolution. Early Uber ran on a single PostgreSQL instance with read replicas, which worked fine for thousands of rides per day. As they scaled to millions of rides globally, they hit write bottlenecks and needed horizontal scaling. Rather than switch to NoSQL and lose ACID properties for critical data like trip state and payments, they built Schemaless: a sharded MySQL system that maintains transactional guarantees within shards while scaling horizontally. This hybrid approach shows that real-world systems often need custom solutions when off-the-shelf databases don’t fit.
Stripe’s use of PostgreSQL for payment processing demonstrates when you don’t compromise on consistency. Despite scaling to billions of API requests, they’ve stuck with PostgreSQL because financial correctness is non-negotiable. They scale through careful schema design, read replicas, and vertical scaling of primary instances. Their architecture accepts that some operations (like complex financial reports) are slower than they’d be in a NoSQL system because the alternative—eventual consistency in payment processing—could mean lost money or compliance violations. This pragmatism is what interviewers want to see: understanding that the “best” database depends entirely on your requirements and constraints.
Netflix Polyglot Persistence Architecture
graph TB
User["User<br/><i>Web/Mobile App</i>"]
subgraph Netflix System
API["API Gateway"]
subgraph Billing Domain
BillingSvc["Billing Service"]
MySQL[("MySQL<br/><i>ACID for Payments</i>")]
end
subgraph Viewing Domain
ViewingSvc["Viewing History Service"]
Cassandra[("Cassandra<br/><i>High Write Throughput</i>")]
end
subgraph Search Domain
SearchSvc["Search Service"]
Elastic[("Elasticsearch<br/><i>Full-Text Queries</i>")]
end
end
User --"1. Subscribe/Pay"--> API
API --"2. Process Payment"--> BillingSvc
BillingSvc --"3. ACID Transaction"--> MySQL
User --"4. Watch Content"--> API
API --"5. Log View Event"--> ViewingSvc
ViewingSvc --"6. High-Volume Writes"--> Cassandra
User --"7. Search Titles"--> API
API --"8. Query"--> SearchSvc
SearchSvc --"9. Full-Text Search"--> Elastic
MySQL -."Why: Financial correctness<br/>requires ACID guarantees".-> MySQL
Cassandra -."Why: Availability + scale<br/>for millions of events".-> Cassandra
Elastic -."Why: Optimized for<br/>complex text queries".-> Elastic
Netflix uses three different databases for three different domains, each matching the data’s consistency and scale requirements. MySQL handles billing where ACID is non-negotiable, Cassandra handles viewing history where high availability and write throughput matter most, and Elasticsearch handles search where full-text query capabilities are essential. This polyglot persistence approach is common in production systems.
Interview Essentials
Mid-Level
At the mid-level, you should articulate the basic trade-offs between RDBMS and NoSQL without just listing features. When asked “what database would you use for X?”, walk through the decision: “For a social media feed, I’d consider a NoSQL database like Cassandra because we need high write throughput for posts, reads can tolerate eventual consistency, and we’ll need to scale horizontally. The data model is simple—posts by user_id and timestamp—which fits a column-family store well.” Explain ACID properties concretely: “Atomicity means if we’re transferring money between accounts, both the debit and credit happen or neither does—we can’t have a partial transaction.” Demonstrate you understand normalization: “We’d normalize user data into a separate table to avoid duplicating email addresses across orders, but we might denormalize product names into the orders table to avoid joins on the hot read path.” Show awareness of scaling: “We’d start with a single instance and add read replicas when read traffic grows, considering sharding only if write traffic becomes the bottleneck.”
Senior
Senior engineers should explain database choices in terms of business impact and failure modes, not just technical properties. “For inventory management, eventual consistency is unacceptable because it leads to overselling—if two customers see the same item as available due to stale reads, we’ll have to cancel one order and damage customer trust. That’s why we need a CP system with strong consistency, even if it means lower availability during network partitions.” Discuss operational complexity: “While Cassandra gives us horizontal scalability, it requires expertise in tuning compaction, managing repairs, and handling consistency levels. For a startup, PostgreSQL with read replicas might be the right choice even if we’ll need to re-architect later—premature optimization for scale we don’t have yet adds operational burden.” Reference specific technologies with nuance: “MongoDB’s default write concern changed in version 4.0 to wait for majority acknowledgment, making it more CP than AP. Earlier versions were AP by default, which caused data loss in some failure scenarios.” Show you’ve dealt with real scaling challenges: “We hit a bottleneck at 10K writes/second on our primary PostgreSQL instance. Sharding would have required rewriting queries to avoid cross-shard joins, so we vertically scaled to a larger instance and optimized hot queries first—sometimes the simple solution is the right one.”
Staff+
Staff+ engineers should demonstrate strategic thinking about database architecture evolution and organizational impact. “Our database strategy needs to balance technical requirements with team capabilities. Introducing Cassandra for one use case means hiring engineers with distributed systems expertise, building monitoring and operational runbooks, and maintaining two database platforms. That’s only justified if the business value—say, supporting 10x user growth—outweighs the organizational cost.” Discuss cross-cutting concerns: “Database choice affects our disaster recovery strategy. PostgreSQL’s point-in-time recovery gives us fine-grained backup/restore, but Cassandra’s eventual consistency means we need application-level conflict resolution during multi-datacenter failover. These operational characteristics often matter more than raw performance.” Show awareness of industry trends: “NewSQL databases like CockroachDB promise ACID at scale, but they’re complex to operate and have subtle consistency edge cases. For most companies, the proven stability of PostgreSQL plus careful sharding is lower risk than betting on newer technology.” Frame decisions in terms of trade-offs: “We could use a graph database for our social network, but that introduces a specialized system for one feature. Storing relationships in PostgreSQL with recursive CTEs is slower but keeps our stack simpler. The right choice depends on whether graph queries are core to our product or a nice-to-have feature.”
Common Interview Questions
“When would you choose NoSQL over SQL?” → Walk through requirements: “If we need flexible schemas for rapid iteration, high write throughput, horizontal scalability, and can tolerate eventual consistency—like for user activity logs or social media posts—NoSQL fits well. If we need complex queries with joins, ACID transactions, or strong consistency—like for financial transactions or inventory—SQL is the better choice.”
“How do you handle database scaling?” → Show progression: “Start with vertical scaling and read replicas for read-heavy workloads. If writes become the bottleneck, consider caching to reduce database load, optimizing queries, or sharding. Sharding adds complexity—you need a sharding key that distributes load evenly and supports your query patterns without cross-shard joins. For some use cases, switching to a horizontally-scalable NoSQL database is simpler than sharding an RDBMS.”
“Explain ACID properties with an example” → Use concrete scenarios: “For a bank transfer, Atomicity ensures both the debit and credit happen together or not at all. Consistency means the database enforces rules like ‘account balance can’t be negative.’ Isolation means if two transfers happen simultaneously, they don’t interfere—each sees a consistent snapshot. Durability means once we confirm the transfer, it survives server crashes.”
“What’s the CAP theorem and how does it affect database choice?” → Explain practically: “CAP says distributed databases can’t simultaneously guarantee Consistency, Availability, and Partition tolerance. Since network partitions happen, you choose between CP (strong consistency, may reject requests during partitions) or AP (always available, may return stale data). For a shopping cart, AP is fine—stale data is better than errors. For inventory, CP prevents overselling.”
“How would you design a database schema for [specific system]?” → Think aloud: “First, identify entities and relationships. For an e-commerce system: users, products, orders, order_items. Normalize to 3NF to avoid redundancy—user email in one place, referenced by foreign keys. Consider denormalization for hot paths—storing product name in order_items avoids joins when displaying order history. Choose appropriate indexes for common queries—index on user_id for ‘orders by user’, compound index on (product_id, created_at) for product sales reports.”
Red Flags to Avoid
Choosing a database without explaining why: “I’d use MongoDB” without discussing schema flexibility, consistency requirements, or query patterns shows you’re pattern-matching, not reasoning.
Claiming NoSQL is always faster or more scalable: NoSQL trades consistency for scalability. For workloads that fit in memory or don’t need horizontal scaling, a well-tuned RDBMS often outperforms NoSQL.
Ignoring operational complexity: Saying “we’ll just use Cassandra” without acknowledging the operational expertise required to run it reliably suggests you haven’t operated distributed databases in production.
Not considering the simple solution first: Jumping to sharding or NoSQL before trying vertical scaling, read replicas, or query optimization shows premature optimization.
Misunderstanding ACID or CAP: Saying “NoSQL doesn’t support transactions” (many do, within a document or partition) or “we need all three of CAP” (impossible) reveals fundamental gaps.
Forgetting about data migration: Proposing a database switch without discussing how to migrate existing data and handle the transition period shows lack of production experience.
Key Takeaways
ACID vs BASE is the fundamental trade-off: RDBMS prioritize consistency and correctness through ACID properties, accepting scaling limitations. NoSQL prioritizes availability and scalability through BASE properties, accepting eventual consistency. Choose based on what failure mode your use case can tolerate—financial systems need ACID, social feeds can handle BASE.
Polyglot persistence is common in production: Real systems use multiple databases, each optimized for its data domain. Netflix uses MySQL for billing (ACID), Cassandra for viewing history (scale), and Elasticsearch for search (full-text queries). Don’t force one database to handle all use cases—match the tool to the requirements.
Database choice cascades through your architecture: Choosing eventual consistency means your application must handle conflicts and stale reads. Choosing ACID means you’ll need replication and possibly sharding for scale. Understanding these implications helps you design coherent systems where the database aligns with consistency requirements and scaling strategy.
Start simple, scale when needed: Begin with a single RDBMS instance and add complexity (read replicas, caching, sharding, NoSQL) only when measurements show you need it. Premature optimization for scale you don’t have yet adds operational burden and slows development. Stripe scaled to billions of requests on PostgreSQL through careful optimization before considering alternatives.
Explain decisions with business impact, not just technical properties: In interviews, connect database choices to user experience and business outcomes. “We need strong consistency for inventory to prevent overselling and customer complaints” is better than “we need ACID.” This shows you think about the system holistically, not just the technical layer.