Consistent Hashing – Learning About Distributed Databases :: Issue 002

One of the core tools in the belt of the distributed database is consistent hashing. In Riak this is especially true, as it stands at the core of a Riak Cluster. Hashing, using a hash function, is an algorithm that maps data to variable length to data that’s fixed. In other words, odd things like the name of things mapped to integers. Consistent hashing is a special kind of hashing that provides the pattern for mapping keys and all related functionality around a cluster ring in Riak.

Consistent hashing was originally devised by David Karger, a professor of computer science at MIT (Massachusetts Institute of Technology). He’s also known for Karger’s Algorithm, a Monte Carlo method that computes the minimum cut in a connected graph (graph theory related stuff). Along with these developments he’s been part of many other efforts and contributed to computer science in many ways.

Remapping, Mapping and Keeping Distributed (& Available)

One key property of a consistent hash is that it minimizes the number of keys that must be remapped. With a regular hash changes, the entire key hash must be remapped.

Consistent hashing is based around mapping each object to a point of a circle. The system maps each storage bucket to pseudo-randomly distributed points on the edge of this circle.

The system finds where to place the object based on the key on the edge of the circle. It then walks the circle falling into the first bucket it finds. This results in the buckets containing the resources between its point and the next bucket point.

When a bucket disappears for any reason, the pseudo randomly mapped objects will now get re-mapped to different buckets. When a bucket appears, such as becoming available again or being added, a similar process occurs.

The Basho Docs describe in brief that,

Consistent hashing is a technique used to limit the reshuffling of keys when a hash-table data structure is rebalanced (when slots are added or removed). Riak uses consistent hashing to organize its data storage and replication. Specifically, the vnodes in the Riak Ring responsible for storing each object are determined using the consistent hashing technique.

NOTES: This is not a single blog entry topic by any means. This is merely a cursory look at consistent hashing. This entry I aimed to provide a basic description and coverage of the actions around consistent hashing. For more information and to dive even deeper into consistent hashing I’ve included a few links that have extensive information on the topic: