- rtshkmr's digital garden/
- References/
- Architecture Design Basics/
- Pattern Taxonomy/
- Reliability, Consistency & Synchronisation/
- Bulkhead Isolation/
Bulkhead Isolation
··
144 words·
1 min
Table of Contents
🟠 P1 — isolating failures so one component’s problems don’t sink the ship
Problem #
If all requests share the same thread pool/connection pool, one slow downstream can consume all resources, starving requests to healthy downstreams.
Mechanism #
Named after ship bulkheads: compartmentalise resources so a breach in one compartment doesn’t flood the entire ship.
Without bulkhead:
[Shared thread pool: 100 threads]
→ Slow Service A consumes 95 threads
→ Service B, C starved (5 threads for everything else)
With bulkhead:
[Pool for Service A: 30 threads] ← A can only consume 30
[Pool for Service B: 30 threads]
[Pool for Service C: 30 threads]
[General pool: 10 threads]Isolation Strategies #
- Thread pool isolation: Separate thread pools per downstream. Strong isolation, higher resource overhead.
- Semaphore isolation: Count-based limiting without dedicated threads. Lighter, but caller’s thread is still blocked.
- Process isolation: Separate processes/containers per workload. Strongest isolation, highest overhead.
Instinct #
Bulkhead isolation is most valuable for your most critical paths.
If payment processing and analytics reporting share resources, a slow analytics query should never be able to degrade payment latency.
Pair with circuit breakers: bulkheads prevent resource exhaustion, circuit breakers prevent cascading calls.
References #
- Release It! — Michael Nygard