- rtshkmr's digital garden/
- References/
- Architecture Design Basics/
- Pattern Taxonomy/
- Data Storage & Retrieval/
- Index Trade-offs (B-Tree vs LSM)/
Index Trade-offs (B-Tree vs LSM)
Table of Contents
🟠P1 — DDIA Ch3 deep knowledge; shows understanding of storage engine internals
Problem #
Databases must index data for efficient retrieval. The index structure determines write amplification, read amplification, and space amplification trade-offs.
The Two Dominant Structures #
| Dimension | B-Tree | LSM-Tree (Log-Structured Merge) |
|---|---|---|
| Write path | In-place update (random I/O) | Append-only (sequential I/O) |
| Read path | Single tree traversal | Check memtable + multiple levels |
| Write amplification | Moderate (page splits) | High (compaction rewrites) |
| Read amplification | Low (one tree) | Higher (multiple sorted runs) |
| Space amplification | Low (one copy) | Higher (during compaction) |
| Write throughput | Good | Excellent (sequential writes) |
| Range queries | Excellent (sorted) | Good (sorted runs) |
Used by #
- B-Tree: PostgreSQL, MySQL (InnoDB), traditional RDBMS
- LSM-Tree: RocksDB, LevelDB, Cassandra, HBase, CockroachDB
Instinct #
B-Tree for read-heavy OLTP workloads where you need fast point reads and range scans with predictable latency.
LSM-Tree for write-heavy workloads (time-series, event logs, high-ingest) where sequential I/O and compaction in the background are acceptable. Knowing this distinction signals depth in interviews — most candidates treat databases as black boxes.
Secondary & Covering Indexes #
| Index Type | What it does | When to use |
|---|---|---|
| Secondary index | Index on non-primary key column | Query patterns beyond primary key |
| Composite index | Multi-column index (order matters!) | Multi-predicate queries |
| Covering index | Index includes all queried columns | Avoid table lookups entirely |
| Partial index | Index with a WHERE clause filter | Sparse hot data (e.g. active=true) |
Instinct: The most impactful performance optimisation is often adding the right index, not scaling hardware. Index order matters for composites — (user_id, created_at) is different from (created_at, user_id).
References #
- Use The Index, Luke! — Markus Winand; the best free resource on SQL indexing
- PostgreSQL Index Documentation
DDIA 2e Reference #
- Chapter 3: Storage and Retrieval — this IS Chapter 3’s core content