Skip to main content
  1. References/
  2. Architecture Design Basics/
  3. Pattern Taxonomy/
  4. Data Storage & Retrieval/

Index Trade-offs (B-Tree vs LSM)

·· 286 words· 2 mins

🟠 P1 — DDIA Ch3 deep knowledge; shows understanding of storage engine internals

Problem #

Databases must index data for efficient retrieval. The index structure determines write amplification, read amplification, and space amplification trade-offs.

The Two Dominant Structures #

DimensionB-TreeLSM-Tree (Log-Structured Merge)
Write pathIn-place update (random I/O)Append-only (sequential I/O)
Read pathSingle tree traversalCheck memtable + multiple levels
Write amplificationModerate (page splits)High (compaction rewrites)
Read amplificationLow (one tree)Higher (multiple sorted runs)
Space amplificationLow (one copy)Higher (during compaction)
Write throughputGoodExcellent (sequential writes)
Range queriesExcellent (sorted)Good (sorted runs)

Used by #

  • B-Tree: PostgreSQL, MySQL (InnoDB), traditional RDBMS
  • LSM-Tree: RocksDB, LevelDB, Cassandra, HBase, CockroachDB

Instinct #

B-Tree for read-heavy OLTP workloads where you need fast point reads and range scans with predictable latency.

LSM-Tree for write-heavy workloads (time-series, event logs, high-ingest) where sequential I/O and compaction in the background are acceptable. Knowing this distinction signals depth in interviews — most candidates treat databases as black boxes.

Secondary & Covering Indexes #

Index TypeWhat it doesWhen to use
Secondary indexIndex on non-primary key columnQuery patterns beyond primary key
Composite indexMulti-column index (order matters!)Multi-predicate queries
Covering indexIndex includes all queried columnsAvoid table lookups entirely
Partial indexIndex with a WHERE clause filterSparse hot data (e.g. active=true)

Instinct: The most impactful performance optimisation is often adding the right index, not scaling hardware. Index order matters for composites — (user_id, created_at) is different from (created_at, user_id).

References #

DDIA 2e Reference #

  • Chapter 3: Storage and Retrieval — this IS Chapter 3’s core content