Write-Ahead Logging

Table of Contents

🟠 P1 — the durability mechanism underlying every serious database

Problem #

If a database crashes mid-write, data is corrupted or lost. We need crash recovery without sacrificing write performance.

Mechanism #

Client write request
  ↓
1. Append to WAL (sequential write to disk — fast)
  ↓
2. Update in-memory data structure
  ↓
3. Acknowledge to client
  ↓
(Later) Flush in-memory data to data files (checkpoint)

The WAL is an append-only log of every mutation. On crash recovery, replay the WAL to reconstruct in-memory state. Since the WAL is sequentially written and fsynced, it’s both fast and durable.

Key Trade-offs #

Durability vs latency: fsync on every write guarantees durability but adds ~1ms. Batching fsyncs improves throughput but risks losing the last batch on crash.
WAL size: Must be periodically truncated after check pointing. If checkpoints are infrequent, WAL grows large and recovery time increases.

Instinct #

WAL is not a pattern you implement — it’s a pattern you understand. Knowing that PostgreSQL’s WAL is the basis for replication (streaming replication sends WAL records) shows database internals knowledge. It also connects to event sourcing conceptually: both are append-only logs of mutations.

DDIA 2e Reference #

Chapter 3: B-Tree crash recovery, LSM memtable durability
Chapter 5: Replication via WAL shipping