Rate Limiting & Backpressure

Table of Contents

🔴 P0 — definitely Stripe territory; protecting services from overload

Problem #

Without rate limiting, a single misbehaving client can overwhelm your service, degrading experience for everyone. Without backpressure, upstream services can flood downstream services faster than they can process.

Rate Limiting Algorithms #

Algorithm	Mechanism	Pros	Cons
Token Bucket	Tokens added at fixed rate, consumed per request	Allows bursts up to bucket size	Per-client storage needed
Leaky Bucket	Requests queue and drain at fixed rate	Smooth output rate	No burst tolerance
Fixed Window	Count requests per time window	Simple	Boundary burst problem
Sliding Window Log	Track exact timestamps of each request	Precise	Memory-intensive
Sliding Window Counter	Hybrid: weighted current + previous window	Good balance	Approximate

Backpressure Mechanisms #

Queue depth limits: Reject new work when queue exceeds threshold
Load shedding: Drop lower-priority requests when overloaded
Circuit breaker: Stop calling a downstream that’s failing (see also: Circuit Breaker)
Adaptive concurrency: Dynamically adjust the number of concurrent requests based on latency signals

Instinct #

Token bucket is the default for API rate limiting (Stripe uses it). It allows reasonable bursts while maintaining an average rate. For distributed rate limiting (multiple gateway instances), use a centralised counter (Redis) or approximate local + global (each instance tracks locally, periodically syncs to global counter).

Backpressure is the mature approach: rate limiting says “no” to clients; backpressure says “slow down” to upstream services. Both are needed.

EXP: I’ve implemented per-endpoint rate limiting at the API gateway layer. The key operational insight: rate limiting without good observability is useless — you need to see which clients are hitting limits and why, else you’re flying blind on whether limits are too aggressive or too lax.
- also the uses for DLQs for wildly unpredictable API providers

Distributed Rate Limiting #

Strategy	Mechanism	Trade-off
Centralised (Redis)	All instances check shared counter	Accurate but Redis is SPOF
Local + sync	Per-instance limiting, periodic sync	Approximate but resilient
Sliding window (Redis)	`MULTI/EXEC` with sorted sets	Precise, single round-trip

Instinct: At Stripe’s scale, local rate limiting with periodic global sync is the pragmatic choice. Perfect accuracy isn’t worth the availability risk of a centralised counter.

References #

Scaling your API with Rate Limiters — Stripe Engineering; the definitive blog
Counting Things at Cloudflare