- rtshkmr's digital garden/
- References/
- Architecture Design Basics/
- Pattern Taxonomy/
- Deployment & Evolution/
- Zero-Downtime Migrations/
Zero-Downtime Migrations
··
183 words·
1 min
Table of Contents
🟠P1 — underrated staff-level signal; how to change schemas on a live system
Problem #
You need to rename a column, change a data type, or split a table — but the application is serving traffic 24/7 and you can’t take it down.
Expand-Contract Pattern #
Phase 1 (Expand):
- Add new column (nullable or with default)
- Deploy code that writes to BOTH old and new columns
- Backfill: copy data from old column to new column
Phase 2 (Migrate):
- Deploy code that reads from new column
- Verify correctness
Phase 3 (Contract):
- Deploy code that stops writing to old column
- Drop old columnDual-Write Pattern (for service migrations) #
Phase 1: Write to old service + new service simultaneously
Phase 2: Read from new service (verify against old)
Phase 3: Stop writing to old service
Phase 4: Decommission old serviceDual-Write Pitfalls #
- Inconsistency window: If write to new succeeds but old fails, systems diverge
- Ordering: Writes arrive in different order → divergence. Prefer CDC over application-level dual-write
- Performance: Every write now takes 2× latency. Consider async writes to the new system
Instinct: “Prefer CDC over application-level dual-write whenever possible. CDC captures the database’s actual write stream — no application bugs, no ordering issues.”
Instinct #
Every migration is a multi-deploy operation. Never combine schema change and code change in one deploy. The expand-contract pattern ensures backward compatibility at every step. The hardest part is usually the backfill: it must be idempotent, resumable, and rate-limited to avoid overloading the database.
References #
- Parallel Change (Expand-Contract) — Martin Fowler
- Online Migrations at Scale — Stripe Engineering (essential for Stripe interviews!)
- Debezium CDC Tutorial