Chapter Five: Design Consistent Hashing

To achieve horizontal scaling, it is important to distribute requests/data efficiently and evenly across servers. Consistent hashing is a commonly used technique to achieve this goal.

The rehashing problem

Untitled

This approach works well when the size of the server pool is fixed, and the data distribution is even.

Untitled

When server 1 goes offline, most cache clients will connect to the wrong servers to fetch data. This causes a storm of cache misses. Consistent hashing is an effective technique to mitigate this problem.

Consistent hashing

Consistent hashing is a kind of hashing such that when a hash table is resized and consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys, and n is the number of slots.

Hash space and hash ring

Assume SHA-1 is used as the hash function f, and the output range of the hash function is: x0, x1, x2, x3, ..., xn. In cryptography, SHA-1’s hash space goes from 0 to 2^160 - 1.

Untitled

Using the same hash function f, we map servers based on server IP or name onto the ring.

Untitled

As shown in picture, the keys and values are all hashed onto the hash ring. To determine which server a key is stored on, we go clockwise from the key position on the ring until a server is found.

Using the logic described above, adding a new server will only require redistribution of a fraction of keys.

Untitled

Two issues in the basic approach

The basic steps are:

Map servers and keys on to the ring using a uniformly distributed hash function.
To find out which server a key is mapped to, go clockwise from the key position until the first server on the ring is found.

Two problems:

Impossible to keep the same size of partitions on the ring for all servers considering a server can be added or removed.
Possible to have a non-uniform key distribution on the ring.

A technique called virtual nodes or replicas is used to solve these problems.