- rtshkmr's digital garden/
- References/
- Architecture Design Basics/
- Pattern Taxonomy/
- Communication & API Design/
- Sync vs Async Communication/
Sync vs Async Communication
Table of Contents
🔴 P0 — the most fundamental inter-service communication decision
Problem #
When Service A needs something from Service B, should it wait for a response (synchronous) or send a message and move on (asynchronous)? This decision affects latency, coupling, reliability, and failure propagation.
Mechanism #
| Dimension | Synchronous (RPC) | Asynchronous (Messaging) |
|---|---|---|
| Coupling | Temporal: A waits for B | Decoupled: A doesn’t wait |
| Latency | Sum of all hops in chain | Fire-and-forget (eventual processing) |
| Failure impact | Cascading (if B is down, A fails) | Buffered (queue absorbs B’s downtime) |
| Consistency | Easier to reason about | Eventually consistent |
| Debugging | Linear call trace | Distributed, harder to trace |
| Throughput | Limited by slowest service | Burst-absorbing (queue as buffer) |
When to Use Each #
Synchronous:
- User-facing request-response (user expects immediate result)
- Read-heavy paths where latency matters
- Simple service graphs (A → B, not A → B → C → D)
Asynchronous:
- Write-heavy or batch processing workloads
- When downstream can be temporarily unavailable
- Long chains of processing steps
- Workload smoothing (absorb traffic spikes)
Instinct #
Default to sync for reads, async for writes. A user querying their balance should get a synchronous response. A user initiating a payment can get an immediate acknowledgement, with the actual processing happening asynchronously. The boundary between sync and async is often the most important architectural decision in a system design interview.
Framing #
Here’s an example of how judgement can be framed / communicated
The payment API returns
202 Acceptedwith a charge ID synchronously — the user has their confirmation immediately. The actual ledger posting, fraud check, and settlement happen asynchronously via events. If any downstream step fails, we compensate via the saga. This gives us sub-200ms API response times while keeping the complex processing decoupled and retryable.
MISCONCEPTION: negative statement: “Always push processing behind a queue.”
The nature of the job matters. Short-running jobs that complete in <100ms are better returned synchronously — it makes the architecture simpler, provides clearer back-pressure, and gives better UX. Reserve async processing (job queues, worker pools) for genuinely long-running tasks (video encoding, report generation, batch processing).
References #
- Enterprise Integration Patterns — Hohpe & Woolf; the canonical messaging patterns reference
DDIA 2e Reference #
- Chapter 4: Message-passing dataflow
- Chapter 11: Stream processing (async paradigm)