Database Indexing Strategies: B-Tree, Hash & Composite

After this topic, you will be able to:

Compare B-tree, hash, and bitmap index structures and their use cases
Evaluate covering index benefits for query performance
Assess index trade-offs between read performance and write overhead

TL;DR

Database indexes are data structures that dramatically speed up query performance by creating shortcuts to data, trading write overhead and storage space for faster reads. B-tree indexes handle range queries and sorting, hash indexes excel at exact-match lookups, and bitmap indexes compress low-cardinality data efficiently. The art of indexing lies in choosing the right structure, column order, and coverage strategy based on your query patterns—over-indexing kills write performance while under-indexing cripples reads.

Cheat Sheet: B-tree for ranges/sorting (most common), hash for exact matches only, bitmap for low-cardinality (status, gender), covering indexes eliminate table lookups, composite index column order matters (high-selectivity first), expect 10-30% write penalty per index.

Background

In the early days of databases, finding a specific row meant scanning every record sequentially—a full table scan. For a million-row table, this could mean reading a million rows just to find one customer. The problem became acute as databases grew from thousands to billions of rows. Database indexing emerged in the 1970s as a fundamental solution, inspired by library card catalogs and book indexes. IBM’s System R pioneered B-tree indexes in the mid-1970s, establishing the pattern that still dominates relational databases today.

The core insight: maintain a separate, sorted data structure that maps search keys to row locations. Instead of scanning a million rows, you traverse a tree with perhaps 4-5 levels, examining only dozens of nodes. This transforms O(n) operations into O(log n), making the difference between a 10-second query and a 10-millisecond query. Modern databases like PostgreSQL, MySQL, and Oracle have evolved sophisticated indexing strategies, but the fundamental trade-off remains: indexes accelerate reads at the cost of slower writes and additional storage. Understanding when and how to index is what separates database amateurs from experts who can scale systems to billions of queries per day.

Architecture

A database index consists of three architectural layers: the index structure itself, the storage layer that persists it, and the query optimizer that decides when to use it. The index structure is a separate data structure—typically a B-tree, hash table, or bitmap—that stores sorted or hashed copies of indexed column values alongside pointers (row IDs or physical addresses) to the actual table rows. When you create an index on a users.email column, the database builds a B-tree where each node contains email values in sorted order, with leaf nodes pointing to the corresponding user records.

The storage layer manages how indexes are physically stored on disk. Indexes are typically stored in pages (4KB-16KB blocks) separate from table data, with their own buffer pool allocation. A clustered index (like MySQL InnoDB’s primary key) stores the actual table data within the index structure, making the index and table one entity. Non-clustered indexes store only the indexed columns plus row pointers, requiring a second lookup to fetch remaining columns unless the query is covered.

The query optimizer sits above both, analyzing SQL queries and deciding whether to use an index, which index to choose if multiple exist, and whether to combine multiple indexes. It estimates the cost of index seeks versus table scans based on statistics like cardinality, data distribution, and index selectivity. For a query like SELECT * FROM users WHERE email = 'john@example.com' AND status = 'active', the optimizer might choose a composite index on (email, status), use only the email index, or even skip indexes entirely if the table is tiny. This three-layer architecture—structure, storage, optimizer—determines whether your indexes actually improve performance or just waste space.

Internals

B-tree indexes are the workhorse of relational databases, handling 90% of indexing needs. A B-tree is a self-balancing tree where each node contains multiple keys (typically hundreds) and child pointers, keeping the tree shallow even with billions of rows. For a table with 100 million rows, a B-tree with 200 keys per node requires only 4-5 levels: 200 → 40,000 → 8,000,000 → 1,600,000,000. Each level requires one disk I/O, so finding any row takes 4-5 reads instead of 100 million. Leaf nodes are linked, enabling efficient range scans—WHERE age BETWEEN 25 AND 35 walks the tree once to find 25, then follows leaf pointers until reaching 35. B-trees maintain sorted order, supporting ORDER BY without additional sorting. PostgreSQL’s B-tree implementation uses a fill factor (default 90%) to leave space for updates, reducing page splits that would otherwise fragment the index.

Hash indexes use a hash function to map keys to buckets, providing O(1) lookup for exact matches. When you query WHERE user_id = 12345, the database hashes 12345, jumps directly to that bucket, and retrieves the row pointer. Hash indexes are 20-30% faster than B-trees for equality checks but completely useless for range queries—you can’t hash-scan from 12345 to 12400. They also don’t support sorting. PostgreSQL historically avoided hash indexes due to crash recovery issues (they weren’t WAL-logged until version 10), while MySQL’s MEMORY engine uses them by default. The bucket structure means hash indexes can suffer from collisions, where multiple keys hash to the same bucket, degrading to linked-list performance in pathological cases.

Bitmap indexes excel at low-cardinality columns like gender, status, or country. Instead of storing each row individually, a bitmap index creates one bit vector per distinct value: for a status column with values (active, inactive, suspended), you get three bitmaps where bit N is 1 if row N has that status. Querying WHERE status = 'active' AND country = 'US' becomes a fast bitwise AND operation between two bitmaps. Oracle popularized bitmap indexes for data warehouses where columns have few distinct values but millions of rows. The compression is dramatic—a 100-million-row table with 3 status values needs only 37.5MB (100M bits × 3 / 8). However, bitmap indexes are disastrous for OLTP workloads: updating a single row requires locking the entire bitmap, serializing all concurrent updates to that column.

Covering indexes include all columns needed by a query, eliminating the need to access the table at all. If you frequently run SELECT name, email FROM users WHERE status = 'active', a covering index on (status, name, email) lets the database satisfy the entire query from the index alone—an “index-only scan.” This is transformative for read-heavy workloads. The trade-off is index size: including extra columns makes indexes larger and slower to maintain. PostgreSQL’s INCLUDE clause lets you add non-key columns to the leaf level without affecting the tree structure: CREATE INDEX ON users(status) INCLUDE (name, email) keeps the tree sorted only by status while storing name and email for coverage.

Composite indexes on multiple columns follow a left-prefix rule: an index on (last_name, first_name, age) can satisfy queries filtering on last_name alone, or last_name + first_name, or all three, but NOT first_name alone or age alone. Column order matters enormously. Place high-selectivity columns first—if last_name has 10,000 distinct values but first_name only 100, (last_name, first_name) is far more effective than the reverse. The index can narrow down to a small subset using last_name, then filter within that subset by first_name. Reversing the order forces scanning all rows with a given first_name (potentially millions) before applying the last_name filter.

B-tree Index Structure and Traversal

graph TB
    subgraph "B-tree Index on users.email"
        Root["Root Node<br/>m...z"]
        L1A["Level 1<br/>a...f"]
        L1B["Level 1<br/>g...m"]
        L1C["Level 1<br/>n...z"]
        L2A["Level 2<br/>a...c"]
        L2B["Level 2<br/>d...f"]
        L2C["Level 2<br/>g...j"]
        L2D["Level 2<br/>k...m"]
        LeafA["Leaf<br/>alice@...<br/>bob@...<br/>carol@..."]
        LeafB["Leaf<br/>dave@...<br/>eve@...<br/>frank@..."]
        LeafC["Leaf<br/>grace@...<br/>henry@...<br/>iris@..."]
    end
    
    subgraph "Table Data (Heap)"
        Row1["Row: alice@... | Alice | ..."]
        Row2["Row: bob@... | Bob | ..."]
        Row3["Row: john@... | John | ..."]
    end
    
    Query["Query: WHERE email = 'john@example.com'"]
    
    Query --"1. Start at root"--> Root
    Root --"2. Compare 'john' → go right"--> L1C
    L1C --"3. Navigate to range"--> L2D
    L2D --"4. Find in leaf"--> LeafC
    LeafC --"5. Follow pointer"--> Row3
    
    LeafA -."Linked for range scans".-> LeafB
    LeafB -."Linked for range scans".-> LeafC

B-tree index traversal for an email lookup requires only 4-5 I/O operations even with millions of rows. Each node contains hundreds of keys, keeping the tree shallow. Leaf nodes are linked to enable efficient range scans without re-traversing the tree.

Hash Index vs B-tree Lookup Comparison

graph LR
    subgraph "Hash Index: O(1) Exact Match Only"
        HQuery["WHERE user_id = 12345"]
        HHash["Hash Function<br/>hash(12345) = bucket_42"]
        HBucket["Bucket 42<br/>[12345 → RowPtr]<br/>[12389 → Row Ptr]"]
        HRow["Direct Row Access"]
        
        HQuery --"1. Hash key"--> HHash
        HHash --"2. Jump to bucket"--> HBucket
        HBucket --"3. Follow pointer"--> HRow
    end
    
    subgraph "B-tree Index: Supports Ranges & Sorting"
        BQuery["WHERE user_id BETWEEN<br/>12345 AND 12400"]
        BRoot["Root Node"]
        BLevel["Internal Nodes"]
        BLeaf["Leaf Nodes<br/>12345...12400"]
        BRows["Multiple Rows<br/>(sorted order)"]
        
        BQuery --"1. Traverse tree"--> BRoot
        BRoot --"2. Navigate down"--> BLevel
        BLevel --"3. Find range start"--> BLeaf
        BLeaf --"4. Scan linked leaves"--> BRows
    end
    
    HNoRange["❌ Cannot do:<br/>BETWEEN, >, <, ORDER BY"]
    BSupports["✓ Supports:<br/>Equality, Ranges, Sorting"]
    
    HBucket -.-> HNoRange
    BLeaf -.-> BSupports

Hash indexes provide O(1) lookup for exact matches but cannot handle range queries, sorting, or inequality comparisons. B-trees are slightly slower for exact matches (O(log n)) but support the full spectrum of query operations, making them the default choice for most use cases.

Bitmap Index Structure for Low-Cardinality Columns

graph TB
    subgraph "Table: 100M Users"
        Table["user_id | status | country<br/>1 | active | US<br/>2 | inactive | UK<br/>3 | active | US<br/>4 | suspended | CA<br/>... 100 million rows"]
    end
    
    subgraph "Bitmap Indexes (Compressed)"
        StatusBitmaps["Status Bitmaps<br/>active: 101000...<br/>inactive: 010000...<br/>suspended: 000100..."]
        CountryBitmaps["Country Bitmaps<br/>US: 101000...<br/>UK: 010000...<br/>CA: 000100..."]
    end
    
    subgraph "Query Execution"
        Query["WHERE status = 'active'<br/>AND country = 'US'"]
        BitAND["Bitwise AND Operation<br/>active: 101000...<br/>US: 101000...<br/>Result: 101000..."]
        Result["Matching Row IDs:<br/>1, 3, ..."]
    end
    
    Table --"Build bitmaps"--> StatusBitmaps
    Table --"Build bitmaps"--> CountryBitmaps
    Query --"1. Fetch bitmaps"--> StatusBitmaps
    Query --"2. Fetch bitmaps"--> CountryBitmaps
    StatusBitmaps --"3. AND operation"--> BitAND
    CountryBitmaps --"3. AND operation"--> BitAND
    BitAND --"4. Extract row IDs"--> Result
    
    Storage["Storage: 37.5MB<br/>(100M bits × 3 statuses / 8)<br/>vs 2-3GB for B-tree"]
    Warning["⚠️ Write Lock Issue:<br/>Updating 1 row locks<br/>entire bitmap for that value"]
    
    StatusBitmaps -.-> Storage
    StatusBitmaps -.-> Warning

Bitmap indexes compress low-cardinality data dramatically (10-100x vs B-trees) and enable fast boolean operations through bitwise AND/OR. However, updates lock entire bitmaps, making them suitable only for read-heavy data warehouse workloads with batch updates, not OLTP systems.

Covering Index: Eliminating Table Lookups

graph LR
    subgraph "Without Covering Index"
        Q1["SELECT name, email<br/>FROM users<br/>WHERE status = 'active'"]
        I1["Index on (status)"]
        T1["Table Lookup<br/>for name, email"]
        R1["Result Set"]
        
        Q1 --"1. Index seek (5ms)"--> I1
        I1 --"2. Get row pointers"--> T1
        T1 --"3. Fetch columns (20ms)"--> R1
    end
    
    subgraph "With Covering Index"
        Q2["SELECT name, email<br/>FROM users<br/>WHERE status = 'active'"]
        I2["Covering Index<br/>(status, name, email)"]
        R2["Result Set"]
        
        Q2 --"1. Index-only scan (5ms)"--> I2
        I2 --"2. All data in index"--> R2
    end
    
    Time1["Total: 25ms<br/>Index + Table"]
    Time2["Total: 5ms<br/>Index Only<br/>(5x faster)"]
    
    Cost1["Index Size: 100MB<br/>Write Overhead: +15%"]
    Cost2["Index Size: 300MB<br/>Write Overhead: +30%"]
    
    R1 -.-> Time1
    R2 -.-> Time2
    I1 -.-> Cost1
    I2 -.-> Cost2

Covering indexes include all columns needed by a query, enabling index-only scans that eliminate table lookups. This can reduce query latency by 50-90% but increases index size 2-3x and write overhead proportionally. Use selectively for high-frequency, performance-critical queries.

Composite Index Column Order: Left-Prefix Rule

graph TB
    subgraph "Index: (city, age, status)"
        Idx["Composite Index<br/>Sorted by: city → age → status"]
    end
    
    subgraph "Queries That CAN Use This Index"
        Q1["✓ WHERE city = 'NYC'<br/>(uses city)"]
        Q2["✓ WHERE city = 'NYC'<br/>AND age = 25<br/>(uses city, age)"]
        Q3["✓ WHERE city = 'NYC'<br/>AND age = 25<br/>AND status = 'active'<br/>(uses all three)"]
        Q4["✓ WHERE city = 'NYC'<br/>AND status = 'active'<br/>(uses city, skips age)"]
    end
    
    subgraph "Queries That CANNOT Use This Index"
        Q5["❌ WHERE age = 25<br/>(missing city prefix)"]
        Q6["❌ WHERE status = 'active'<br/>(missing city prefix)"]
        Q7["❌ WHERE age = 25<br/>AND status = 'active'<br/>(missing city prefix)"]
    end
    
    subgraph "Column Order Matters"
        Order1["(city, age, status)<br/>city: 1000 values (high selectivity)<br/>age: 100 values<br/>status: 3 values (low selectivity)"]
        Order2["(status, age, city)<br/>❌ Poor: starts with low selectivity<br/>Scans many rows before filtering"]
    end
    
    Idx --> Q1
    Idx --> Q2
    Idx --> Q3
    Idx --> Q4
    Idx -."Cannot use".-> Q5
    Idx -."Cannot use".-> Q6
    Idx -."Cannot use".-> Q7
    
    Rule["Left-Prefix Rule:<br/>Must use leftmost column(s)<br/>Can skip middle columns<br/>but loses remaining columns"]
    
    Idx -.-> Rule

Composite indexes follow the left-prefix rule: queries must filter on the leftmost column(s) to use the index. Column order is critical—place high-selectivity columns first to narrow down results quickly. An index on (city, age, status) cannot help queries filtering only on age or status.

Performance Characteristics

Index performance depends on three factors: tree depth (I/O operations), selectivity (rows matched), and cache hit rates. A well-designed B-tree index on a 100-million-row table requires 4-5 disk I/Os for a single-row lookup, translating to 4-5ms on SSDs or 40-50ms on spinning disks. If the index fits in memory (buffer pool), this drops to microseconds. Range scans add sequential I/O: scanning 10,000 rows via an index might take 50-100ms, while a full table scan of 100 million rows could take 30-60 seconds. The speedup is 300-600x, but only if the index is selective—if your query matches 50% of rows, the optimizer may skip the index entirely and scan the table, avoiding the overhead of following millions of pointers.

Write performance degrades with each index. Every INSERT, UPDATE, or DELETE must update all indexes on that table. Adding one index typically slows writes by 10-30%, depending on index size and tree depth. A table with 5 indexes might see 50-100% write overhead. This is why write-heavy OLTP systems like payment processors or high-frequency trading platforms keep indexes minimal, while read-heavy analytics systems liberally add indexes. Index maintenance also matters: B-trees require periodic rebalancing and page splits, which can cause write amplification. PostgreSQL’s VACUUM process reclaims dead index entries, while MySQL’s InnoDB performs background merge operations.

Hash indexes provide O(1) lookup in theory but suffer from hash collisions and poor cache locality. In practice, they’re 20-30% faster than B-trees for exact matches on high-cardinality keys, but the difference shrinks if the B-tree root and first few levels are cached. Bitmap indexes shine in data warehouses: Oracle reports 10-100x compression compared to B-trees for low-cardinality columns, and bitwise operations are extremely fast. However, bitmap updates are so slow (locking entire bitmaps) that they’re only viable for read-mostly workloads with batch updates.

Covering indexes can eliminate 50-90% of query time by avoiding table lookups. A query that previously required 5ms for index seek + 20ms for table fetch now takes just 5ms. The cost is index size—a covering index might be 2-3x larger than a simple index—and proportionally slower writes. Netflix’s Cassandra clusters use covering indexes (materialized views) extensively, accepting 2x write amplification to achieve sub-10ms read latencies at petabyte scale.

Trade-offs

The fundamental trade-off is read performance versus write overhead and storage cost. Indexes can accelerate queries by 100-1000x, but each index slows writes by 10-30% and consumes disk space equal to 10-50% of the table size. A table with 10 indexes might have 3x the storage footprint and 3x slower writes than an unindexed table. For read-heavy workloads (analytics, reporting, search), this is a bargain. For write-heavy workloads (logging, time-series data, high-frequency trading), excessive indexing can cripple throughput.

B-trees are the Swiss Army knife: they handle equality, ranges, sorting, and even prefix matching (LIKE ‘abc%’), making them the default choice. The downside is logarithmic complexity—even with excellent caching, you’re doing multiple comparisons per lookup. Hash indexes are faster for exact matches but completely inflexible: no ranges, no sorting, no prefix matching. They’re a one-trick pony that’s rarely worth the loss of versatility.

Bitmap indexes offer incredible compression and fast boolean operations for low-cardinality data, but they’re write-hostile. Updating a single row locks the entire bitmap for that value, serializing all concurrent updates. Oracle and PostgreSQL support bitmap indexes primarily for data warehouses with bulk-load patterns, not OLTP. Covering indexes eliminate table lookups but bloat index size and slow writes proportionally. They’re a win when queries repeatedly access the same column subset, but over-covering wastes space on columns that are rarely queried together.

Composite indexes are powerful but brittle: column order determines which queries they can satisfy. An index on (A, B, C) doesn’t help queries filtering only on B or C, forcing you to create additional indexes or accept full table scans. This leads to index proliferation—a table with 5 columns might need 10+ indexes to cover all query patterns, multiplying write overhead. The art is finding the minimal set of indexes that covers 95% of queries, accepting that rare queries might be slower.

When to Use (and When Not To)

Create B-tree indexes on columns used in WHERE clauses, JOIN conditions, and ORDER BY clauses, especially high-cardinality columns (many distinct values) like user IDs, email addresses, or timestamps. If you’re filtering on user_id in 80% of queries, index it. If queries frequently join orders.user_id to users.id, index both. B-trees are the default choice unless you have a specific reason to use something else.

Use hash indexes only for exact-match lookups on high-cardinality columns where you never need ranges or sorting, and only if your database supports them reliably (PostgreSQL 10+, MySQL MEMORY tables). In practice, B-trees are fast enough that hash indexes rarely justify their limitations. Consider them for in-memory lookup tables or caches where you’re doing millions of equality checks per second.

Bitmap indexes are for data warehouses and analytics databases with low-cardinality dimensions (status, category, region) and read-heavy, batch-update workloads. If you’re running complex queries with multiple filters on columns that have 10-1000 distinct values across billions of rows, bitmap indexes can reduce storage by 10-100x and accelerate boolean operations. Never use them in OLTP systems with concurrent updates.

Covering indexes are worth the overhead when queries repeatedly access the same small subset of columns and those queries are performance-critical. If your homepage query is SELECT name, avatar FROM users WHERE status = 'active' ORDER BY last_login LIMIT 20 and it runs 10,000 times per second, a covering index on (status, last_login, name, avatar) eliminates table lookups and might reduce latency from 50ms to 5ms. The 2x write overhead is worth it for that level of read improvement.

Composite indexes should follow the left-prefix rule and prioritize high-selectivity columns first. For a query filtering on (city, age, status), if city has 1000 values, age has 100, and status has 3, index (city, age, status). This lets the index narrow down to a small subset via city, then filter further by age and status. Avoid creating redundant indexes: if you have an index on (A, B, C), you don’t need separate indexes on (A) or (A, B)—they’re covered by the left-prefix rule.

Skip indexes on very small tables (under 1000 rows), very low-cardinality columns (boolean flags, unless using bitmaps), or columns that are updated frequently but queried rarely. The overhead isn’t worth it. Also avoid indexing columns used only in functions or expressions—WHERE YEAR(created_at) = 2023 can’t use an index on created_at unless you create a functional index on YEAR(created_at).

Real-World Examples

Uber’s trip database uses composite B-tree indexes extensively on (city_id, created_at) to support queries like “show me all trips in San Francisco in the last hour.” The city_id narrows down to a single city’s data (high selectivity), then created_at enables efficient time-range scans. Without this composite index, queries would scan all cities’ trips and filter by time, or scan all times and filter by city—both disastrous at Uber’s scale of 20+ million trips per day. They also use covering indexes on (rider_id, created_at, status, fare) for rider history queries, eliminating table lookups for the 95% of queries that only need those four fields. The write overhead (Uber estimates 15-20% slower inserts) is acceptable because trips are written once but queried hundreds of times over their lifetime.

Stripe’s payment ledger relies on B-tree indexes on (account_id, created_at) and (transaction_id) to support both account-level queries (“show me all transactions for account X”) and direct transaction lookups. They deliberately avoid indexing low-selectivity columns like currency or status, accepting that queries filtering only on those fields will be slower. Stripe’s database team found that over-indexing their payments table (they briefly had 12 indexes) caused write latency to spike from 5ms to 30ms during peak load, so they pruned to 5 carefully chosen indexes that cover 98% of query patterns. They use PostgreSQL’s EXPLAIN ANALYZE religiously to verify that new queries use indexes efficiently before deploying code.

Pinterest’s homefeed system uses bitmap indexes in their analytics warehouse (built on Druid) to filter billions of pin impressions by dimensions like country, device_type, and content_category. A query like “show me impressions from US mobile users viewing fashion pins” becomes a fast bitwise AND of three bitmaps. The compression is extreme: 100 billion impressions with 200 countries, 10 device types, and 50 categories requires only ~250GB of bitmap indexes instead of several terabytes for B-trees. Pinterest accepts that these indexes are read-only—they rebuild them nightly during batch ingestion—because their analytics workload is entirely read-heavy. For their OLTP pin database, they use standard B-tree indexes and avoid bitmaps entirely.

Interview Essentials

Mid-Level

Explain the difference between B-tree and hash indexes, including when you’d choose each. Describe what a covering index is and why it improves performance. Demonstrate understanding that indexes speed up reads but slow down writes. Be able to explain why column order matters in composite indexes (left-prefix rule). Show awareness that indexes consume storage space. Discuss how to identify which columns to index based on query patterns (WHERE clauses, JOINs, ORDER BY). Understand that full table scans are sometimes faster than indexes for low-selectivity queries.

Senior

Design an indexing strategy for a specific system (e.g., e-commerce order database, social media feed). Justify index choices based on read/write ratios, query patterns, and cardinality. Explain how the query optimizer decides whether to use an index, including concepts like selectivity and cost estimation. Discuss trade-offs between multiple narrow indexes versus fewer wide covering indexes. Describe index maintenance issues: fragmentation, bloat, and when to rebuild indexes. Explain bitmap indexes and why they’re unsuitable for OLTP. Discuss how to measure index effectiveness using EXPLAIN plans and query statistics. Address the write amplification problem and how to balance read performance against write throughput.

Staff+

Architect indexing strategies for systems at massive scale (billions of rows, millions of QPS). Discuss sharding implications: how indexes interact with partition keys and cross-shard queries. Explain advanced techniques like partial indexes (PostgreSQL), filtered indexes (SQL Server), and functional indexes. Describe how different databases implement indexes differently (InnoDB clustered vs. heap, PostgreSQL MVCC impact on indexes). Discuss index-only scans and visibility maps. Explain how to handle index hotspots in distributed systems (e.g., monotonically increasing IDs causing write contention). Address operational concerns: online index creation, index rebuild strategies, monitoring index bloat. Discuss when to denormalize or use materialized views instead of complex indexes. Explain how modern columnar databases (Snowflake, BigQuery) handle indexing differently than row-stores.

Common Interview Questions

Why would a database choose a full table scan over using an index?

How does a composite index on (A, B, C) differ from three separate indexes on A, B, and C?

Explain how a covering index eliminates table lookups. When is this optimization worth the cost?

What happens to index performance when you update a column that’s heavily indexed?

How would you design indexes for a query that filters on A, sorts by B, and returns columns C and D?

Why are bitmap indexes unsuitable for OLTP workloads?

How do you decide whether to add a new index or optimize the query instead?

What’s the relationship between index selectivity and query performance?

Red Flags to Avoid

Claiming indexes always improve performance without mentioning write overhead or storage cost

Not understanding the left-prefix rule for composite indexes

Suggesting hash indexes as a general-purpose solution (they’re very limited)

Indexing every column ‘just in case’ without considering write impact

Not knowing what a covering index is or why it matters

Inability to explain when a full table scan might be faster than an index

Confusing clustered and non-clustered indexes

Not understanding that index maintenance (rebuilds, vacuuming) is necessary

Claiming bitmap indexes are good for OLTP (they’re write-hostile)

Not knowing how to use EXPLAIN to verify index usage

Key Takeaways

Indexes trade write performance and storage for dramatically faster reads—expect 10-30% write overhead per index but 100-1000x read speedup for selective queries. The art is finding the minimal set of indexes that covers your critical query patterns.

B-tree indexes are the default choice, handling equality, ranges, and sorting. Hash indexes are faster for exact matches but inflexible. Bitmap indexes excel at low-cardinality columns in read-heavy warehouses but are disastrous for OLTP due to write contention.

Composite index column order matters enormously: place high-selectivity columns first and remember the left-prefix rule. An index on (A, B, C) can satisfy queries on A, (A, B), or (A, B, C), but not B alone or C alone.

Covering indexes eliminate table lookups by including all queried columns, potentially reducing latency by 50-90%. The cost is larger indexes and slower writes, so use them selectively for high-frequency, performance-critical queries.

Index effectiveness depends on selectivity: if your query matches 50% of rows, the optimizer may skip the index and scan the table. Always verify index usage with EXPLAIN plans, and remember that over-indexing can cripple write throughput in high-volume systems.