Skip to main content
  1. References/
  2. Architecture Design Basics/
  3. Pattern Taxonomy/
  4. Scaling & Performance/

Rate Limiting & Backpressure

·· 369 words· 2 mins

🔴 P0 — definitely Stripe territory; protecting services from overload

Problem #

Without rate limiting, a single misbehaving client can overwhelm your service, degrading experience for everyone. Without backpressure, upstream services can flood downstream services faster than they can process.

Rate Limiting Algorithms #

AlgorithmMechanismProsCons
Token BucketTokens added at fixed rate, consumed per requestAllows bursts up to bucket sizePer-client storage needed
Leaky BucketRequests queue and drain at fixed rateSmooth output rateNo burst tolerance
Fixed WindowCount requests per time windowSimpleBoundary burst problem
Sliding Window LogTrack exact timestamps of each requestPreciseMemory-intensive
Sliding Window CounterHybrid: weighted current + previous windowGood balanceApproximate

Backpressure Mechanisms #

  • Queue depth limits: Reject new work when queue exceeds threshold
  • Load shedding: Drop lower-priority requests when overloaded
  • Circuit breaker: Stop calling a downstream that’s failing (see also: Circuit Breaker)
  • Adaptive concurrency: Dynamically adjust the number of concurrent requests based on latency signals

Instinct #

Token bucket is the default for API rate limiting (Stripe uses it). It allows reasonable bursts while maintaining an average rate. For distributed rate limiting (multiple gateway instances), use a centralised counter (Redis) or approximate local + global (each instance tracks locally, periodically syncs to global counter).

Backpressure is the mature approach: rate limiting says “no” to clients; backpressure says “slow down” to upstream services. Both are needed.

  • EXP: I’ve implemented per-endpoint rate limiting at the API gateway layer. The key operational insight: rate limiting without good observability is useless — you need to see which clients are hitting limits and why, else you’re flying blind on whether limits are too aggressive or too lax.
    • also the uses for DLQs for wildly unpredictable API providers

Distributed Rate Limiting #

StrategyMechanismTrade-off
Centralised (Redis)All instances check shared counterAccurate but Redis is SPOF
Local + syncPer-instance limiting, periodic syncApproximate but resilient
Sliding window (Redis)MULTI/EXEC with sorted setsPrecise, single round-trip

Instinct: At Stripe’s scale, local rate limiting with periodic global sync is the pragmatic choice. Perfect accuracy isn’t worth the availability risk of a centralised counter.

References #