- rtshkmr's digital garden/
- References/
- Architecture Design Basics/
- Pattern Taxonomy/
- Scaling & Performance/
- Consistent Hashing/
Consistent Hashing
Table of Contents
🔴 P0 — minimises data movement when nodes are added or removed
Problem #
With naive hash-based sharding (hash(key) % N), adding or removing a node changes N and remaps nearly every key. This causes massive data movement.
Mechanism #
Hash Ring (0 to 2³²):
Node A (pos 100)
/ \
/ \
Node D (pos 800) Node B (pos 300)
\ /
\ /
Node C (pos 600)
Key "user:123" hashes to position 250
→ Walk clockwise → lands on Node BWhen Node B is removed, only keys between Node A and Node B are remapped (to Node C). The rest of the ring is unaffected.
Virtual Nodes #
Each physical node gets multiple positions on the ring (virtual nodes). This ensures even load distribution even when physical nodes have different capacities.
We arrange servers and keys on a virtual ring, then hash each key and place it on the ring. That’s the key that belongs to the next server you encounter going clockwise.
This reduces movement of data to only ~10% of total data movement (compared to the ~90% data that needs to move for naive modulo hashing)
- on adding a new server, only the keys between that new server and the previous server need to move (remap).
- on removing a server: only it’s keys need to relocate to the next server on the ring, all else remain
Usage Examples #
This is a generic solution to the volume distribution problems, so it shows up in a few places:
- @ distributed caches
- for distributing keys across cache nodes
- e.g. Memcached, Redis Cluster
- @ distributed dbs
- used for sharding
- e.g. Cassandra, DynamoDB
- @ load-balancers
- for assigning requests to backend servers in a stable way when servers come and go
- e.g. used by CDNs
Key Trade-offs #
- Minimal disruption: Only K/N keys move on node addition/removal (K = total keys, N = nodes)
- Even distribution: Requires virtual nodes; without them, distribution is uneven
- Complexity: More complex than simple hash mod, but well-understood and widely implemented
Instinct #
Consistent hashing is the standard for any distributed cache or storage system. Redis Cluster, DynamoDB, Cassandra all use variants. In interviews, explain the hash ring, why virtual nodes matter, and the K/N redistribution property. Know that jump consistent hash is a simpler alternative when nodes are added at the end only.
Framing #
it should be alright to hand wave the details around this, useful to mention when we care about the elastic-scaling properties of the system we’re designing. This means cases when the system needs to add/remove cache nodes, or when db sharding should be done based on load then the consistent hashing mechanisms are relevant to frame because we need a practical way to achieve that elasticity without massive data-movements.
for cache distribution
we’ll use consistent hashing to distribute data across cache nodes
for sharding
we’ll use consistent hashing for the shard key
INTERVIEW: It’s fine to hand-wave the implementation details. What matters is demonstrating awareness of the mechanism and when to invoke it:
We’ll use consistent hashing to distribute cache keys across nodes, so adding or removing a node only redistributes ~K/N keys instead of remapping everything.
References #
- Consistent Hashing and Random Trees — Karger et al. (1997); original paper
- Maglev: A Fast and Reliable Software Network Load Balancer — Google (2016)
DDIA 2e Reference #
- Chapter 6: Partitioning — consistent hashing as a partitioning strategy