- rtshkmr's digital garden/
- References/
- Architecture Design Basics/
- Pattern Taxonomy/
- Scaling & Performance/
- Rate Limiting & Backpressure/
Rate Limiting & Backpressure
Table of Contents
🔴 P0 — definitely Stripe territory; protecting services from overload
Problem #
Without rate limiting, a single misbehaving client can overwhelm your service, degrading experience for everyone. Without backpressure, upstream services can flood downstream services faster than they can process.
Rate Limiting Algorithms #
| Algorithm | Mechanism | Pros | Cons |
|---|---|---|---|
| Token Bucket | Tokens added at fixed rate, consumed per request | Allows bursts up to bucket size | Per-client storage needed |
| Leaky Bucket | Requests queue and drain at fixed rate | Smooth output rate | No burst tolerance |
| Fixed Window | Count requests per time window | Simple | Boundary burst problem |
| Sliding Window Log | Track exact timestamps of each request | Precise | Memory-intensive |
| Sliding Window Counter | Hybrid: weighted current + previous window | Good balance | Approximate |
Backpressure Mechanisms #
- Queue depth limits: Reject new work when queue exceeds threshold
- Load shedding: Drop lower-priority requests when overloaded
- Circuit breaker: Stop calling a downstream that’s failing (see also: Circuit Breaker)
- Adaptive concurrency: Dynamically adjust the number of concurrent requests based on latency signals
Instinct #
Token bucket is the default for API rate limiting (Stripe uses it). It allows reasonable bursts while maintaining an average rate. For distributed rate limiting (multiple gateway instances), use a centralised counter (Redis) or approximate local + global (each instance tracks locally, periodically syncs to global counter).
Backpressure is the mature approach: rate limiting says “no” to clients; backpressure says “slow down” to upstream services. Both are needed.
- EXP: I’ve implemented per-endpoint rate limiting at the API gateway layer. The key operational insight: rate limiting without good observability is useless — you need to see which clients are hitting limits and why, else you’re flying blind on whether limits are too aggressive or too lax.
- also the uses for DLQs for wildly unpredictable API providers
Distributed Rate Limiting #
| Strategy | Mechanism | Trade-off |
|---|---|---|
| Centralised (Redis) | All instances check shared counter | Accurate but Redis is SPOF |
| Local + sync | Per-instance limiting, periodic sync | Approximate but resilient |
| Sliding window (Redis) | MULTI/EXEC with sorted sets | Precise, single round-trip |
Instinct: At Stripe’s scale, local rate limiting with periodic global sync is the pragmatic choice. Perfect accuracy isn’t worth the availability risk of a centralised counter.
References #
- Scaling your API with Rate Limiters — Stripe Engineering; the definitive blog
- Counting Things at Cloudflare