Consistent Hashing

I wrote about consistent hashing in 2013 when I worked at Basho, when I had started a series called “Learning About Distributed Databases” and today I’m kicking that back off after a few years (ok, after 5 or so years!) with this post on consistent hashing.

As with Riak, which I wrote about in 2013, Cassandra remains one of the core active distributed database projects alive today that provides an effective and reliable consistent hash ring for the clustered distributed database system. This hash function, is an algorithm that maps data to variable length to data that’s fixed. This consistent hash is a kind of hashing that provides this pattern for mapping keys to particular nodes around the ring in Cassandra. One can think of this as a kind of Dewey Decimal Classification system where the cluster nodes are the various bookshelves in the library.

Ok, so maybe the Dewey Decimal system isn’t the best analogy. Does anybody even learn about that any more? If you don’t know what it is, please read up and support your local library.

Consistent hashing allows data distributed across a cluster to minimize reorganization when nodes are added or removed. These partitions are based on a particular partition key. The partition key shouldn’t be confused with a primary key either, it’s more like a unique identifier controlled by the system that would make up part of a primary key of a primary key that is made up of multiple candidate keys in a composite key.

For an example, let’s take a look at sample data from the DataStax docs on consistent hashing.

For example, if you have the following data:

 
name age car gender
jim 36 camaro M
carol 37 345s F
johnny 12 supra M
suzy 10 mustang F

The database assigns a hash value to each partition key:

 
Partition key Murmur3 hash value
jim -2245462676723223822
carol 7723358927203680754
johnny -6723372854036780875
suzy 1168604627387940318

Each node in the cluster is responsible for a range of data based on the hash value.

Hash values in a four node cluster

DataStax Enterprise places the data on each node according to the value of the partition key and the range that the node is responsible for. For example, in a four node cluster, the data in this example is distributed as follows:

Node Start range End range Partition key Hash value
1 -9223372036854775808 -4611686018427387904 johnny -6723372854036780875
2 -4611686018427387903 -1 jim -2245462676723223822
3 0 4611686018427387903 suzy 1168604627387940318
4 4611686018427387904 9223372036854775807 carol 7723358927203680754

So there ya go, that’s consistent hashing and how it works in a distributed database like Apache Cassandra, the derived distributed database DataStax Enterprise, or the mostly defunct RIP Riak. If you’d like to dig in further, I’ve also found Distributed Hash Tables interesting and also a host of other articles that delve into coding up a consistent has table, respective ring, and the whole enchilada. Check out these articles for more information and details:

    • Simple Magic Consistent by Mathias Meyer @roidrage CTO of Travis CI. Mathias’s post is well written and drives home some good points.
    • Consistent Hashing: Algorithmic Tradeoffs by Damien Gryski @dgryski. This post from Damien is pretty intense, and if you want code, he’s got code for ya.
    • How Ably Efficiently Implemented Consistent Hashing by Srushtika Neelakantam. Srushtika does a great job not only of describing what consistent hashing is but also has drawn up diagrams, charts, and more to visualize what is going on. But that isn’t all, she also wrote up some code to show nodes coming and going. A really great post.

For more on distributed database things to know subscribe to the blog, of course the ole’ RSS feed works great too, and follow @CompositeCode on Twitter for blog updates.

Distributed Database Things to Know Series

  1. Consistent Hashing (this post)
  2. Apache Cassandra Datacenter & Racks

 

Strata, Ninjas, Distributed Data Day, and Graph Day Trip Recap

This last week was a helluva set of trips, conferences to attend, topics to discuss, and projects to move forward on. This post I’ll attempt to run through the gamut of events and the graph of things that are transversing from the conference nodes onward! (See what I did there, yeah, transpiling that graph verbiage onto events and related efforts!)

Monday Flight(s)

Monday involved some flying around the country for me via United. It was supposed to be a singular flight, but hey, why not some adventures around the country for shits and giggles right! Two TIL’s (Things I Learned) that I might have known already, but repetition reinforces one’s memory.

  1. If you think you’ve bought a nonstop ticket be sure to verify that there isn’t a stopover half way through the trip. If there’s any delays or related changes your plane might be taken away, you’ll get shuffled off to who know’s what other flight, and then you end up spending the whole day flying around instead of the 6 hour flight you’re supposed to have.
  2. Twitter sentiment tends to be right, it’s good policy to avoid United, they schedule their planes and the logistical positions and crews in ways that generally become problematic quickly when there’s a mere minor delay or two.

Tuesday Strata Day Zero (Train & Workshop Day)

Tuesday rolled in and Strata kicked off with a host of activities. I rolled in to scope out our booth but overall, Tuesday was a low yield activity day. Eventually met up with the team and we rolled out for an impromptu team dinner, drinks, and further discussions. We headed off to Ninja, which if you haven’t been there it’s a worthy adventure for those brave enough. I had enough fun that I felt I should relay this info and provide a link or three so you too could go check it out.

Wednesday Strata Day One

Day two of Strata kicked off and my day involved mostly discussions with speakers, meetings, a few analyst discussions, and going around to booths to check out which technology I needed to add to my “check it out soon” list. Here are a few of the things I noted and are now on the list.

I also worked with the video team and cut some video introductions for Strata and upcoming DataStax Developer Days Announcements. DataStax Developer Days are free events coming to a range of cities. Check them out here and sign up for whichever you’re up for attending. I’m looking forward to teaching those sessions and learning from attendees about their use cases and domains in which they’re working.

The cities you’ll find us coming to soon:

I wish I could come and teach in every city but I narrowed it down to Chicago and Dallas, so if you’re in those cities, I look forward to meeting you there! Otherwise you’ll get to meet other excellent members of the team!

This evening we went to Death Ave. The food was great, drinks solid, and the name was simply straight up metal. Albeit it be a rather upper crust dining experience and no brutal metal was in sight to be seen or heard. However, I’d definitely recommend the joint, especially for groups as they have a whole room you can get if you’ve got enough people and that improves the experience over standard dining.

Thursday Strata Day Two

I scheduled my flights oddly for this day. Which in turn left me without any time to spend at Strata. But that’s the issues one runs into when things are booked back to back on opposite coasts of the country! Thus, this day involved me returning to Newark via Penn Station and flying back out to San Francisco. As some of you may know, I’m a bit of a train geek, so I took a New Jersey NEC (Northeast Corridor) train headed for Trenton out of Penn back to the airport.

The train, whether you’re taking the Acela, Metroliner, NJ Transit, or whatever is rolling along to Newark that day is the way to go in my opinion. I’ve taken the bus, which is slightly cheaper, but meh it’s an icky east coast intercity bus. The difference in price in a buck or three or something, nothing significant, and of course you can jump in an Uber, Taxi, or other transport also. Even when they can make it faster I tend to prefer the train. It’s just more comfortable, I don’t have to deal with a driver, and they’re more reliable. The turnpikes and roadways into NYC from Newark aren’t always 100%, and during rush hour don’t even expect to get to the city in a timely manner. But to each their own, but for those that might not know, beware the taxi price range of $55 base plus tolls which often will put your trip into Manhattan into the $99 or above price range. If you’re going to any other boroughs you better go ahead and take a loan out of the bank.

The trip from Newark to San Francisco was aboard United on a Boeing 757. I kid you not, regardless of airline, if you get to fly on a 757 versus a 737 or Airbus 319 or 320, it’s preferable. Especially for flights in the 2+ hour range. There is just a bit more space, the engines make less noise, the overall plane flies smoother, and the list of comforts is just a smidgen better all around. The 757 is the way to go for cross continent flights!

In San Francisco I took the standard BART route straight into the city and over to the airbnb I was staying at in Protrero Hill. Right by Farley’s on Texas Street if you know the area. I often pick the area because it’s cheap (relatively), super chill, good food nearby, not really noisy, and super close to where the Distributed Data Summit and Graph Day Conferences Venue is located.

The rest of Thursday included some pizza and a short bout of hacking some Go. Then a moderately early turn in around midnight to get rested for the next day.

Friday Distributed Data Summit

I took the short stroll down Texas Street. While walking I watched a few Caltrain Commuter Trains roll by heading into downtown San Francisco. Eventually I got to 16th and cross the rail line and found the walkway through campus to the conference venue. Walked toward the building entrance and there was my fellow DataStaxian Amanda. We chatted a bit and then I headed over to check out the schedule and our DataStax Booth.

We had a plethora of our rather interesting and fun new DataStax tshirts. I’ll be picking some up week after next during our DevRel week get together. I’ll be hauling these back up to Seattle and could prospectively get some sent out to others in the US if you’re interested. Here’s a few pictures of the tshirts.

After that joined the audience for Nate McCall’s Keynote. It was good, he put together a good parallel of life and finding and starting to work with and on Cassandra. Good kick off, and after I delved into a few other talks. Overall, all were solid, and some will even have videos posted on the DataStax Academy Youtube Account. Follow me @Adron or the @DataStaxAcademy account to get the tweets when they’re live, or alternatively just subscribe to the YouTube Channel (honestly, that’s probably the easiest way)!

After the conference wrapped up we rolled through some pretty standard awesome hanging out DevRel DataStax style. It involved the following ordered events:

  1. Happy hour at Hawthorne in San Francisco with drink tickets, some tasty light snacks, and most excellent conversation about anything and everything on the horizon for Cassandra and also a fair bit of chatter about what we’re lining up for upcoming DataStax releases!
  2. BEER over yonder at the world famous Mikeller Bar. This place is always pretty kick ass. Rock n’ Roll, seriously stout beer, more good convo and plotting to take over the universe, and an all around good time.
  3. Chinese Food in CHINA TOWN! So good! Some chow mein, curry, and a host of things. I’m a big fan of always taking a walk into Chinatown in San Francicsco and getting some eats. It’s worth it!

Alright, after that, unlike everybody else that then walked a mere two blocks to their hotel or had taken a Lyft back, I took a solid walk all the way down to the Embarcadero. Walked along for a bit until I decided I’d walked enough and boarded a T-third line train out to Dogpatch. Then walked that last 6 or so blocks up the hill to Texas Street. Twas an excellent night and a great time with everybody!

Saturday Graph Day

Do you do graph stuff? Lately I’ve started looking into Graph Database tech again since I’ll be working on and putting together some reference material and code around the DataStax Graph Database that has been built onto the Cassandra distro. I’m still, honestly kind of a newb at a lot of this but getting it figured out quickly. I do after all have a ton of things I’d like to put into and be able to query against from a graph database perspective. Lot’s of graph problems of course don’t directly correlate to a graph database being a solution, but it’s indeed part of the solution!

Overall, it was an easy day, the video team got a few more talks and I attended several myself. Again, same thing as previously mentioned subscribe to the channel on Youtube or follow me on Twitter @Adron or the crew @DataStaxAcademy to get notified when the videos are released.

Summary

It has been a whirlwind week! Exhausting but worth it. New connections made, my own network of contacts and graph of understanding on many topics has expanded. I even got a short little time in New York among all the activity to do some studying, something I always love to break away and do. I do say though, I’m looking forward to getting back to the coding, Twitch streams, and the day to day in Seattle again. Got some solid material coming together and looking forward to blogging that too, and it only gets put together when I’m on the ground at home in Seattle.

Cheers, happy thrashing code!

Wrap Up for August of 2018

Thrashing Code Special Session: Netboot, Debian, Fighting BIOS UEFI, ISC DHCP, Ansible, Cassandra Cluster, and More

Recently my friend swung by to record a Twitch stream on setting up the network, hardware, OS loads, PXE boot, and related items for the cluster in the office. Here’s a rundown of notes, details, and related links and such from the video, and as icing on the cake, here’s the video after some edits. I’ve managed to cut the Twitch stream down to a clean 150 minutes and 40 seconds from over 240 minutes! A little bit more digestible in this shorter format.

Some of the additional changes, that you’ll distinctly notice if you watched the original stream, is that I cleaned up the audio in some places, attempted to raise the audio in others, and for the mono audio I bumped it out to stereo so it isn’t so strange to listen to. I’ve also added callout text for all the configuration files edited, and some other commands here and there were Jeremy uses his fast typing to open a file so fast you might not see which it was. Overall, it should be easier to listen to, more bite size in segments, and more useful to reference.

In addition, below I’ve broken out key parts of the process at their respective time points.

  • 0:00:56 Determining what’s going to be setup and installed; Starting with a net install, setting up bastion server first, then figuring out the cassandra install once we have that.
  • 0:02:10 Added lftp.[sourcecode language=”sourcecode=”bash”]sudo apt-get install lftp[/sourcecode]

    lftp wikipedialftp sitelftp manual page

  • 0:02:50 Whoops, wrong initial iso image. Cut out the cycling through BIOS and mess as that needs to be setup per whatever machines are involved. But now, back to the correct Debian image.
  • 0:04:20 Getting image from Debian Distro FTP Servers. ftp.us.debian.org at path ftp://ftp.us.debian.org/debian-cdimage/current/amd64/iso-cd.
  • 0:04:46 Plan is: Initial server will be a DHCP server setup so we can setup TFTP options to clients. – TFTP Trivial File Transfer Protocol – TFTP will then be used to grab the kernal and ramdisk for Debian which will contain the installer. This will include a preseed file for Debian.
  • 0:07:36 Initial setup of Debian.
  • 0:08:05 Setup of the NIC with two ports begins. One port for internet, one port for inward facing network for the cluster.
  • 0:11:04 Setup of the drives, since there’s a number of disks in the bastion server. Includes setup of home, swap, etc and Jeremy’s thoughts on that. Then some booting conflicts with the ssd vs. flash vs. other drives.
  • 0:12:40 Jeremy and I discuss swap, failure approaches and practices.
  • 0:14:10 Discussing various ways to automate the installation. Preseed, kickstart, various automated installs, etc.
  • 0:30:38 A request for a bigger font, so Jeremy tweaks that while the webcam camera goes all auto-focus nightmare.
  • 0:31:26 Jeremy installs his dot files. For more on dot files check out here, here, here, and all the dot files on Github here are useful.
  • 0:32:30 Installing & configuration of ISC (Internet Systems Consortium) DHCP (Dynamic Host Configuration Protocol) download. The Debian docs page. To install issue command[sourcecode language=”bash”]sudo apt-get install isc-dhcp-server[/sourcecode]

    . Configuration file is /etc/default/isc-dhcp-server.

  • 0:32:50 Configure the isc-dhcp-server file.
  • 0:33:35 Configure the dhcpd.conf file.
  • 0:34:10 Initial subnet setup with 192 range,
  • 0:36:44 Oh shit, wrong range. Switching to guns, going with 10 dot range.
  • 0:38:10 Troubleshooting some as server didn’t start immediately.
  • 0:39:17 Beginning of NAT Traffic setup & related.
  • 0:39:38 Setup of iptables begins.
  • 0:40:03 Jeremy declares he’ll just setup the iptables from memory. Then, he does indeed setup iptables from memory.
  • 0:41:49 Setup port 22 and anything on the inside network.
  • 0:42:50 Setup iptables to run on boot.
  • 0:43:18 Set in /etc/network/interfaces
  • 0:43:49 Set firewall to run first.
  • 0:44:17 Reboot confirmation of load sequence.
  • 0:44:55 Switching masquerade to be on interface vs. IP.
  • 0:46:50 iptables hangs and troubleshooting commences.
  • 0:48:42 Setup sshd_config; turn off ‘use dns’.
  • 0:49:33 Jeremy switches to Chromebook to complete the remaining configuration steps.
  • 0:50:48 ssh key setup.
  • 0:51:20 ssh key setup on Chromebook and respective key setup.
  • 0:53:40 Install tftpd.[sourcecode language=”bash”]sudo apt-get install tftpd[/sourcecode]
  • 0:55:00 Adding iptables rules for additional changes made since initial iptables setup.
  • 0:55:40 Setup/download inetd, tftpd, and other tools on bastion to setup remaining network and servers. Jeremy also provides an explanation of the daemons and how and what is getting setup now and respective DDOS concerns.
  • 0:57:40 Starts download of netboot installer and everything else required to netboot a machine.
  • 0:58:48 First pxelinux steps. Setup of configuration file.
  • 1:00:52 First pxeboot attempt of a node server. KVM Switch confusion, and a flicker of life on the first node!
  • 1:01:50 The pxe boot got the installer started, but no further steps were taken. Jeremy delves into the log files to determine why the pxe boot didn’t launch further into the installer.
  • 1:04:20 Looks up pxelinux to setup some of the defaults and determine solutions. Good reference point for further research is available via syslinux.org.
  • 1:05:40 After a couple minutes of pxelinux information, back to the configuration.
  • 1:07:23 Jeremy gets into preseed configuration but before diving in too deep covers some ground on what to do in a production environment versus what is getting setup in the current environment for this video.
  • 1:08:47 Takes a preseed example to work from. Works through the file to setup the specifics of the installation for each of the nodes.
  • 1:16:09[sourcecode language=”bash”]sudo dpkg-reconfigure tzdata[/sourcecode]

    since earlier we set the time zone to PST since it was one of the only options but really wanted UTC. After reconfiguration execute a timesync.

    [sourcecode language=”bash”]sudo systemctl restart systemd-timesyncd[/sourcecode]

    and then get a status

    [sourcecode language=”bash”]sudo systemctl restart systemd-timesyncd[/sourcecode]

  • 1:21:46 Finished reviewing the installation for the nodes. Started at 1:08:47, so it took a while to cover all those bases!
  • 1:22:28 Moves new preseed file into the correct path in the installer inetrd.
  • 1:27:44 After some troubleshooting, the pxeboot loading gets to the business of successfully loading a node, success! We follow up a short celebration of getting the first pxeboot with a little summary, restating the purpose of pxeboot, and some form and function of how pxeboot will work, i.e. pxeboot works when all other boot methods on a node fail. So when a drive is totally formatted and no master boot record to load from, boot, kicks off via pxeboot and we get a new image. Thus, it’s as simple as plugging in a new server and turning it on and I’ll have a brand new node in the cluster.
  • 1:29:30 Cycling through each of the servers, which we’ve powered on to start loading from pxeboot, just to watch them start loading and to try an initial load.
  • 1:29:51 Jeremy discusses the reasoning behind setting up some things specifically to the way they’ve been setup specific to having a database loaded on the servers versus just standard servers or another configuration of use.
  • 1:32:13 With Jeremy declaring, “it’s just YAML right?!” we opt to use Ansible for the next steps of configuration for the nodes and their respective Cassandra database installations and setup.
  • 1:32:22 Jeremy and I have a small debate about Python being trash for CLI’s or not. I’m wrong but it’s still garbage for a CLI, why does it keep getting used for CLI’s, use Go already. Grumble grumble, whatever.
  • 1:35:46 Jeremy and I now have a discussion on the configuration related to IP’s, what the range would give us, how to assign them specifically, service discoverability of the nodes, and how all of this complexity can be mitigated by something simple for us to setup right now, instead of later.
  • 1:38:25 After discussing, we opt to go with a hard coded DHCP configuration for now so we can just use the static (in function, but now literally of course since they’re designated in the DHCP) IP’s.
  • 1:41:56 Setting up the actual Ansible Playbook for Cassandra starts here.
  • 1:43:27 Executing Ansible and troubleshooting connectivity issues between nodes.
  • 1:43:48 Need sshpass, installed with[sourcecode language=”bash”]sudo apt-get install sshpass[/sourcecode]
  • 1:44:18 We realize there’s a problem with the machines actually re-installing via pxe boot and change boot sequence to give that a temp fix.
  • 1:45:25 Back into the Ansible Playbook. Further troubleshooting of the Ansible playbook.
  • 1:45:59 Checking out the ansible hosts file.
  • 1:46:19 Checking out the cassandra.yml for Ansible further. Which leads to…
  • 1:46:53 …the realization we’re also installing Spark in addition to Cassandra. Hmmm, Ok, let’s do that too.
  • 1:47:xx For the next several minutes Jeremy steps through and makes additions and fixes to the Ansible file(s).
  • 1:59:52 Took a break at this point.
  • 2:01:11 Had names for IP’s swapped. Fixed.
  • 2:03:09 3 of the 5 nodes are now reachable after some troubleshooting.
  • 2:03:27 After some more quick checks and troubleshooting executing the playbook again.
  • 2:05:15 Oh dear it appears to be install Cassandra finally! Even with 3 nodes and 2 failing, that’s enough for quorum so moving forward!
  • 2:08:44 Further edits and tweaks to the playbook.
  • 2:09:20 Executing playbook again.
  • 2:09:36 Removing further cruft from the playbook we copied.
  • 2:09:53 Troubleshooting the Cassandra loads.
  • 2:15:04 At this point there’s three executing Cassandra installs pulled from FTP for installation and on nodes.
  • 2:30:40 End

As a shout out and props to Jeremy, when you need or want a new hard drive, check our https://diskprices.com/. It’s pretty solid for finding excellent disk prices and Jeremy gets a few pennies per order. Cheers, thanks for watching, and thanks Jeremy for swinging by and educating us all!

 

September & October Op & Dev Dis Sys Meetups Posted

I’m excited to announce several new speakers coming to Seattle. Meet Karthik Ramasamy, Joseph Jacks, and Luc Perkins. They’re going to cover a range of technologies, but to list just a few; Heron, messaging, queueing, streaming, Apache Cassandra, Apache Pulsar, Prometheus, Kubernetes, and others.

Everybody meet Karthik Ramasamy!

Karthik_Ramasamy_17K0108_Crop32_Web
Karthik Ramasamy

Karthik Ramasamy is the co-founder of Streamlio that focuses on building next generation real time infrastructure. Before Streamlio, he was the engineering manager and technical lead for real-time infrastructure at Twitter where he co-created Twitter Heron. He has two decades of experience working with companies such as Teradata, Greenplum, and Juniper in their rapid growth stages building parallel databases, big data infrastructure, and networking. He co-founded Locomatix, a company that specializes in real-time streaming processing on Hadoop and Cassandra using SQL, that was acquired by Twitter. Karthik has a Ph.D. in computer science from the University of Wisconsin, Madison with a focus on big data and databases. During his college tenure several of his research projects were later spun off as a company acquired by Teradata. Karthik is the author of several publications, patents, and Network Routing: Algorithms, Protocols and Architectures.

Presentation: Unifying Messaging, Queuing, Streaming & Light Weight Compute with Apache Pulsar

Data processing use cases, from transformation to analytics, perform tasks that require various combinations of queuing, streaming and lightweight processing steps. Until now, supporting all of those needs has required different systems for each task–stream processing engines, messaging queuing middleware, and streaming messaging systems. That has led to increased complexity for development and operations.

In this session, we’ll discuss the need to unify these capabilities in a single system and how Apache Pulsar was designed to address that. Apache Pulsar is a next generation distributed pub-sub system that was developed and deployed at Yahoo. Karthik, will explain how the architecture and design of Pulsar provides the flexibility to support developers and applications needing any combination of queuing, messaging, streaming and lightweight compute.

Everybody meet Joseph Jacks & Luc Perkins!

highres_474164199
Joseph Jacks & Luc Perkins

More about Joseph

https://twitter.com/asynchio
https://www.linkedin.com/in/josephjacks/

Joseph was the founder and organizer of KubeCon (the Kubernetes community conference, donated to and now run by the Linux Foundation’s CNCF). He also co-founded Kismatic (the first commercial open source Kubernetes tools and services company), acquired by Apprenda in 2016. Joseph previously worked at Enstratius Networks (acquired by Dell Software), TIBCO, and Talend (2016 IPO). He was also a founding strategy and product consultant at Mesosphere. Recently, Joseph served as a corporate EIR at Quantum Corporation in support of the Rook project. He currently serves as the co-founder and CEO of a new stealth technology startup.

More about Luc

https://twitter.com/lucperkins
https://www.linkedin.com/in/luc-perkins-a087b322/

Luc has joined the tech industry a few years back after a foray in choral tunes and thrashing guitar virtuosity. Educated at Reed in Portland Oregon and then on to Duke where he wrapped up. Then back to Portlandia and then joined AppFog for a bit working in he platform as a service world before delving into the complexities of distributed databases at Basho. Having working with Luc there along with Eric Redmond I wasn’t surprised to see Luc just release the 2nd edition of the Seven Databases in Seven Weeks book. Recently he also joined CNCF as a Developer Advocate after drifting through some time at Twitter and Streamli working on streaming & related distributed systems.

Presentation: Prometheus, Grafana, Kubernetes, and a Cassandra Cluster

Over the past few years, Prometheus has emerged as a best-of-breed OSS monitoring and observability solution. In this talk, I’ll walk you through setting up a full-fledged Prometheus setup for a Cassandra cluster running on Kubernetes, including Grafana dashboards, Alertmanager notifications via Slack, and more.

Presentations: Title TBD – Stay Tuned!

I’ll post more details on Joseph’s talk in the next couple of days. But you can get an idea that it’ll be some seriously interesting material!

RSVP to the Meetups Here