Skip to main content
  1. References/
  2. Architecture Design Basics/
  3. Pattern Taxonomy/
  4. Reliability, Consistency & Synchronisation/

Bulkhead Isolation

·· 144 words· 1 min

🟠 P1 — isolating failures so one component’s problems don’t sink the ship

Problem #

If all requests share the same thread pool/connection pool, one slow downstream can consume all resources, starving requests to healthy downstreams.

Mechanism #

Named after ship bulkheads: compartmentalise resources so a breach in one compartment doesn’t flood the entire ship.

Without bulkhead:
  [Shared thread pool: 100 threads]
  → Slow Service A consumes 95 threads
  → Service B, C starved (5 threads for everything else)

With bulkhead:
  [Pool for Service A: 30 threads]  ← A can only consume 30
  [Pool for Service B: 30 threads]
  [Pool for Service C: 30 threads]
  [General pool: 10 threads]

Isolation Strategies #

  • Thread pool isolation: Separate thread pools per downstream. Strong isolation, higher resource overhead.
  • Semaphore isolation: Count-based limiting without dedicated threads. Lighter, but caller’s thread is still blocked.
  • Process isolation: Separate processes/containers per workload. Strongest isolation, highest overhead.

Instinct #

Bulkhead isolation is most valuable for your most critical paths.

If payment processing and analytics reporting share resources, a slow analytics query should never be able to degrade payment latency.

Pair with circuit breakers: bulkheads prevent resource exhaustion, circuit breakers prevent cascading calls.

References #