Tag Archives: distributed systems

Lena presents “So You Want to Run Data-Intensive Systems on Kubernetes”

If you’re interested in running data-intensive systems (think Apache Cassandra, DataStax Enterprise, Kafka, Spark, Tensorflow, Elasticsearch, Redis, etc) in Kubernetes this is a great talk. @Lenadroid covers what options are available in Kubernetes, how architectural features around pods, jobs, stateful sets, and replica sets work together to provide distributed systems capabilities. Other features she continues and delves into include custom resource definitions (CRDs), operators, and HELM Charts, which include future and peripheral feature capabilities that can help you host various complex distributed systems. I’ve included references below the video here, enjoy.


Cassandra Datacenter & Racks

The last post in this series is Distributed Database Things to Know: Consistent Hashing.

Let’s talk about the analogy of Apache Cassandra Datacenter & Racks to actual datacenter and racks. I kind of enjoy the use of the terms datacenter and racks to describe architectural elements of Cassandra. However, as time moves on the relationship between these terms and why they’re called datacenter and racks can be obfuscated.

Take for instance, a datacenter could just be a cloud provider, an actual physical datacenter location, a zone in Azure, or region in some other provider. What an actual Datacenter in Cassandra parlance actually is can vary, but the origins of why it’s called a Datacenter remains the same. The elements of racks also can vary, but also remain the same.

Origins: Racks & Datacenters?

Let’s cover the actual things in this industry we call datacenter and racks first, unrelated to Apache Cassandra terms.

Racks: The easiest way to describe a physical rack is to show pictures of datacenter racks via the ole’ Google images.


A rack is something that is located in a data-center, or even just someone’s garage in some odd scenarios. Ya know, if somebody wants serious hardware to work with. The rack then has a number of servers, often various kinds, within that rack itself. As you can see from the images above there’s a wide range of these racks.

Datacenter: Again the easiest way to describe a datacenter is to just look at a bunch of pictures of datacenter, albeit you see lots of racks again. But really, that’s what a datacenter is, is a building that has lots and lots of racks.


However in Apache Cassandra (and respectively DataStax Enterprise products) a datacenter and rack do not directly correlate to a physical rack or datacenter. The idea is more of an abstraction than hard mapping to the physical realm. In turn it is better to think of datacenter and racks as a way to structure and organize your DataStax Enterprise or Apache Cassandra architecture. From a tree perspective of organizing your cluster, think of things in this hierarchy.

  • Cluster
    • Datacenter(s)
      • Rack(s)
        • Server(s)
          • Node (vnode)

Apache Cassandra Datacenter

An Apache Cassandra Datacenter is a group of nodes, related and configured within a cluster for replication purposes. Setting up a specific set of related nodes into a datacenter helps to reduce latency, prevent transactions from impact by other workloads, and related effects. The replication factor can also be setup to write to multiple datacenter, providing additional flexibility in architectural design and organization. One specific element of datacenter to note is that they must contain only one node type:

Depending on the replication factor, data can be written to multiple datacenters. Datacenters must never span physical locations.Each datacenter usually contains only one node type. The node types are:

  • Transactional: Previously referred to as a Cassandra node.
  • DSE Graph: A graph database for managing, analyzing, and searching highly-connected data.
  • DSE Analytics: Integration with Apache Spark.
  • DSE Search: Integration with Apache Solr. Previously referred to as a Solr node.
  • DSE SearchAnalytics: DSE Search queries within DSE Analytics jobs.

Apache Cassandra Racks

An Apache Cassandra Rack is a grouped set of servers. The architecture of Cassandra uses racks so that no replica is stored redundantly inside a singular rack, ensuring that replicas are spread around through different racks in case one rack goes down. Within a datacenter there could be multiple racks with multiple servers, as the hierarchy shown above would dictate.

To determine where data goes within a rack or sets of racks Apache Cassandra uses what is referred to as a snitch. A snitch determines which racks and datacenter a particular node belongs to, and by respect of that, determines where the replicas of data will end up. This replication strategy which is informed by the snitch can take the form of numerous kinds of snitches, some examples include;

  • SimpleSnitch – this snitch treats order as proximity. This is primarily only used when in a single-datacenter deployment.
  • Dynamic Snitching – the dynamic snitch monitors read latencies to avoid reading from hosts that have slowed down.
  • RackInferringSnitch – Proximity is determined by rack and datacenter, assumed corresponding to 3rd and 2nd octet of each node’s IP address. This particular snitch is often used as an example for writing a custom snitch class since it isn’t particularly useful unless it happens to match one’s deployment conventions.

In the future I’ll outline a few more snitches, how some of them work with more specific detail, and I’ll get into a whole selection of other topics. Be sure to subscribe to the blog, the ole’ RSS feed works great too, and follow @CompositeCode for blog updates. For discourse and hot takes follow me @Adron.

Distributed Database Things to Know Series

  1. Consistent Hashing
  2. Apache Cassandra Datacenter & Racks (this post)


DataStax Developer Days

Over the last week I had the privilege and adventure of coming out to Chicago and Dallas to teach about operations and security capabilities of DataStax Enterprise. More about that later in this post, first I’ll elaborate on and answer the following:

  • What is DataStax Developer Day? Why would you want to attend?
  • Where are the current DataStax Developer Day events that have been held, and were future events are going to be held?
  • Possibilities for future events near a city you live in.

What is DataStax Developer Day?

The way we’ve organized this developer day event at DataStax, is focused around the DataStax Enterprise built on Apache Cassandra product, however I have to add the very important note that this isn’t merely just a product pitch type of thing, you can and will learn about distributed databases and systems in a general sense too. We talk about a number of the core principles behind distributed systems such as the pivotally important consistent hash ring, datacenter and racks, gossip, replication, snitches, and more. We feel it’s important that there’s enough theory that comes along with the configuration and features covered to understand who, what, where, why, and how behind the configuration and features too.

The starting point of the day’s course material is based on the idea that one has not worked with or played with a Apache Cassandra or DataStax Enterprise. However we have a number of courses throughout the day that delve into more specific details and advanced topics. There are three specific tracks:

  1. Cassandra Track – this track consists of three workshops: Core Cassandra, Cassandra Data Modeling, and Cassandra Application Development. [more details]
  2. DSE Track – this track consists of three workshops: DataStax Enterprise Search, DataStax Enterprise Analytics, and DataStax Enterprise Graph. [more details]
  3. Bonus Content – This track has two workshops: DataStax Enterprise Overview and DataStax Enterprise Operations and Security.  [more details]

Why would you want to attend?

  • One huge rad awesome reason is that the developer day events are FREE. But really, nothing is ever free right? You’d want to take a day away from the office to join us, so there’s that.
  • You also might want to even stay a little later after the event as we always have a solidly enjoyable happy hour so we can all extend conversations into the evening and talk shop. After all, working with distributed databases, managing data, and all that jazz is honestly pretty enjoyable when you’ve got awesome systems like this to work with, so an extended conversation into the evening is more than worth it!
  • You’ll get a firm basis of knowledge and skillset around the use, management, and more than a few ideas about how Apache Cassandra and DataStax Enterprise can extend your system’s systemic capabilities.
  • You’ll get a chance to go beyond merely the distributed database system of Apache Cassandra itself and delve into graph, what it is and how it works, analytics, and search too. All workshops take a look at the architecture, uses, and what these capabilities will provide your systems.
  • You’ll also have one on one time with DataStax engineers, and other technical members of the team to ask questions, talk about architecture and solutions that you may be working on, or generally discuss any number of graph, analytics, search, or distributed systems related questions.

Where are the current DataStax Developer Day events that have been held, and were future events are going to be held? So far we’ve held events in New York City, Washington DC, Chicago, and Dallas. We’ve got two more events scheduled with one in London, England and one in Paris, France.

Future events? With a number of events completed and a few on the calendar, we’re interested in hearing about future possible locations for events. Where are you located and where might an event of this sort be useful for the community? I can think of a number of cities, but organizing them into order to know where to get something scheduled next is difficult, which is why the team is looking for input. So ping me via @Adron, email, or just send me a quick message from here.

Consistent Hashing

I wrote about consistent hashing in 2013 when I worked at Basho, when I had started a series called “Learning About Distributed Databases” and today I’m kicking that back off after a few years (ok, after 5 or so years!) with this post on consistent hashing.

As with Riak, which I wrote about in 2013, Cassandra remains one of the core active distributed database projects alive today that provides an effective and reliable consistent hash ring for the clustered distributed database system. This hash function, is an algorithm that maps data to variable length to data that’s fixed. This consistent hash is a kind of hashing that provides this pattern for mapping keys to particular nodes around the ring in Cassandra. One can think of this as a kind of Dewey Decimal Classification system where the cluster nodes are the various bookshelves in the library.

Ok, so maybe the Dewey Decimal system isn’t the best analogy. Does anybody even learn about that any more? If you don’t know what it is, please read up and support your local library.

Consistent hashing allows data distributed across a cluster to minimize reorganization when nodes are added or removed. These partitions are based on a particular partition key. The partition key shouldn’t be confused with a primary key either, it’s more like a unique identifier controlled by the system that would make up part of a primary key of a primary key that is made up of multiple candidate keys in a composite key.

For an example, let’s take a look at sample data from the DataStax docs on consistent hashing.

For example, if you have the following data:

name age car gender
jim 36 camaro M
carol 37 345s F
johnny 12 supra M
suzy 10 mustang F

The database assigns a hash value to each partition key:

Partition key Murmur3 hash value
jim -2245462676723223822
carol 7723358927203680754
johnny -6723372854036780875
suzy 1168604627387940318

Each node in the cluster is responsible for a range of data based on the hash value.

Hash values in a four node cluster

DataStax Enterprise places the data on each node according to the value of the partition key and the range that the node is responsible for. For example, in a four node cluster, the data in this example is distributed as follows:

Node Start range End range Partition key Hash value
1 -9223372036854775808 -4611686018427387904 johnny -6723372854036780875
2 -4611686018427387903 -1 jim -2245462676723223822
3 0 4611686018427387903 suzy 1168604627387940318
4 4611686018427387904 9223372036854775807 carol 7723358927203680754

So there ya go, that’s consistent hashing and how it works in a distributed database like Apache Cassandra, the derived distributed database DataStax Enterprise, or the mostly defunct RIP Riak. If you’d like to dig in further, I’ve also found Distributed Hash Tables interesting and also a host of other articles that delve into coding up a consistent has table, respective ring, and the whole enchilada. Check out these articles for more information and details:

    • Simple Magic Consistent by Mathias Meyer @roidrage CTO of Travis CI. Mathias’s post is well written and drives home some good points.
    • Consistent Hashing: Algorithmic Tradeoffs by Damien Gryski @dgryski. This post from Damien is pretty intense, and if you want code, he’s got code for ya.
    • How Ably Efficiently Implemented Consistent Hashing by Srushtika Neelakantam. Srushtika does a great job not only of describing what consistent hashing is but also has drawn up diagrams, charts, and more to visualize what is going on. But that isn’t all, she also wrote up some code to show nodes coming and going. A really great post.

For more on distributed database things to know subscribe to the blog, of course the ole’ RSS feed works great too, and follow @CompositeCode on Twitter for blog updates.

Distributed Database Things to Know Series

  1. Consistent Hashing (this post)
  2. Apache Cassandra Datacenter & Racks


Strata, Ninjas, Distributed Data Day, and Graph Day Trip Recap

This last week was a helluva set of trips, conferences to attend, topics to discuss, and projects to move forward on. This post I’ll attempt to run through the gamut of events and the graph of things that are transversing from the conference nodes onward! (See what I did there, yeah, transpiling that graph verbiage onto events and related efforts!)

Monday Flight(s)

Monday involved some flying around the country for me via United. It was supposed to be a singular flight, but hey, why not some adventures around the country for shits and giggles right! Two TIL’s (Things I Learned) that I might have known already, but repetition reinforces one’s memory.

  1. If you think you’ve bought a nonstop ticket be sure to verify that there isn’t a stopover half way through the trip. If there’s any delays or related changes your plane might be taken away, you’ll get shuffled off to who know’s what other flight, and then you end up spending the whole day flying around instead of the 6 hour flight you’re supposed to have.
  2. Twitter sentiment tends to be right, it’s good policy to avoid United, they schedule their planes and the logistical positions and crews in ways that generally become problematic quickly when there’s a mere minor delay or two.

Tuesday Strata Day Zero (Train & Workshop Day)

Tuesday rolled in and Strata kicked off with a host of activities. I rolled in to scope out our booth but overall, Tuesday was a low yield activity day. Eventually met up with the team and we rolled out for an impromptu team dinner, drinks, and further discussions. We headed off to Ninja, which if you haven’t been there it’s a worthy adventure for those brave enough. I had enough fun that I felt I should relay this info and provide a link or three so you too could go check it out.

Wednesday Strata Day One

Day two of Strata kicked off and my day involved mostly discussions with speakers, meetings, a few analyst discussions, and going around to booths to check out which technology I needed to add to my “check it out soon” list. Here are a few of the things I noted and are now on the list.

I also worked with the video team and cut some video introductions for Strata and upcoming DataStax Developer Days Announcements. DataStax Developer Days are free events coming to a range of cities. Check them out here and sign up for whichever you’re up for attending. I’m looking forward to teaching those sessions and learning from attendees about their use cases and domains in which they’re working.

The cities you’ll find us coming to soon:

I wish I could come and teach in every city but I narrowed it down to Chicago and Dallas, so if you’re in those cities, I look forward to meeting you there! Otherwise you’ll get to meet other excellent members of the team!

This evening we went to Death Ave. The food was great, drinks solid, and the name was simply straight up metal. Albeit it be a rather upper crust dining experience and no brutal metal was in sight to be seen or heard. However, I’d definitely recommend the joint, especially for groups as they have a whole room you can get if you’ve got enough people and that improves the experience over standard dining.

Thursday Strata Day Two

I scheduled my flights oddly for this day. Which in turn left me without any time to spend at Strata. But that’s the issues one runs into when things are booked back to back on opposite coasts of the country! Thus, this day involved me returning to Newark via Penn Station and flying back out to San Francisco. As some of you may know, I’m a bit of a train geek, so I took a New Jersey NEC (Northeast Corridor) train headed for Trenton out of Penn back to the airport.

The train, whether you’re taking the Acela, Metroliner, NJ Transit, or whatever is rolling along to Newark that day is the way to go in my opinion. I’ve taken the bus, which is slightly cheaper, but meh it’s an icky east coast intercity bus. The difference in price in a buck or three or something, nothing significant, and of course you can jump in an Uber, Taxi, or other transport also. Even when they can make it faster I tend to prefer the train. It’s just more comfortable, I don’t have to deal with a driver, and they’re more reliable. The turnpikes and roadways into NYC from Newark aren’t always 100%, and during rush hour don’t even expect to get to the city in a timely manner. But to each their own, but for those that might not know, beware the taxi price range of $55 base plus tolls which often will put your trip into Manhattan into the $99 or above price range. If you’re going to any other boroughs you better go ahead and take a loan out of the bank.

The trip from Newark to San Francisco was aboard United on a Boeing 757. I kid you not, regardless of airline, if you get to fly on a 757 versus a 737 or Airbus 319 or 320, it’s preferable. Especially for flights in the 2+ hour range. There is just a bit more space, the engines make less noise, the overall plane flies smoother, and the list of comforts is just a smidgen better all around. The 757 is the way to go for cross continent flights!

In San Francisco I took the standard BART route straight into the city and over to the airbnb I was staying at in Protrero Hill. Right by Farley’s on Texas Street if you know the area. I often pick the area because it’s cheap (relatively), super chill, good food nearby, not really noisy, and super close to where the Distributed Data Summit and Graph Day Conferences Venue is located.

The rest of Thursday included some pizza and a short bout of hacking some Go. Then a moderately early turn in around midnight to get rested for the next day.

Friday Distributed Data Summit

I took the short stroll down Texas Street. While walking I watched a few Caltrain Commuter Trains roll by heading into downtown San Francisco. Eventually I got to 16th and cross the rail line and found the walkway through campus to the conference venue. Walked toward the building entrance and there was my fellow DataStaxian Amanda. We chatted a bit and then I headed over to check out the schedule and our DataStax Booth.

We had a plethora of our rather interesting and fun new DataStax tshirts. I’ll be picking some up week after next during our DevRel week get together. I’ll be hauling these back up to Seattle and could prospectively get some sent out to others in the US if you’re interested. Here’s a few pictures of the tshirts.

After that joined the audience for Nate McCall’s Keynote. It was good, he put together a good parallel of life and finding and starting to work with and on Cassandra. Good kick off, and after I delved into a few other talks. Overall, all were solid, and some will even have videos posted on the DataStax Academy Youtube Account. Follow me @Adron or the @DataStaxAcademy account to get the tweets when they’re live, or alternatively just subscribe to the YouTube Channel (honestly, that’s probably the easiest way)!

After the conference wrapped up we rolled through some pretty standard awesome hanging out DevRel DataStax style. It involved the following ordered events:

  1. Happy hour at Hawthorne in San Francisco with drink tickets, some tasty light snacks, and most excellent conversation about anything and everything on the horizon for Cassandra and also a fair bit of chatter about what we’re lining up for upcoming DataStax releases!
  2. BEER over yonder at the world famous Mikeller Bar. This place is always pretty kick ass. Rock n’ Roll, seriously stout beer, more good convo and plotting to take over the universe, and an all around good time.
  3. Chinese Food in CHINA TOWN! So good! Some chow mein, curry, and a host of things. I’m a big fan of always taking a walk into Chinatown in San Francicsco and getting some eats. It’s worth it!

Alright, after that, unlike everybody else that then walked a mere two blocks to their hotel or had taken a Lyft back, I took a solid walk all the way down to the Embarcadero. Walked along for a bit until I decided I’d walked enough and boarded a T-third line train out to Dogpatch. Then walked that last 6 or so blocks up the hill to Texas Street. Twas an excellent night and a great time with everybody!

Saturday Graph Day

Do you do graph stuff? Lately I’ve started looking into Graph Database tech again since I’ll be working on and putting together some reference material and code around the DataStax Graph Database that has been built onto the Cassandra distro. I’m still, honestly kind of a newb at a lot of this but getting it figured out quickly. I do after all have a ton of things I’d like to put into and be able to query against from a graph database perspective. Lot’s of graph problems of course don’t directly correlate to a graph database being a solution, but it’s indeed part of the solution!

Overall, it was an easy day, the video team got a few more talks and I attended several myself. Again, same thing as previously mentioned subscribe to the channel on Youtube or follow me on Twitter @Adron or the crew @DataStaxAcademy to get notified when the videos are released.


It has been a whirlwind week! Exhausting but worth it. New connections made, my own network of contacts and graph of understanding on many topics has expanded. I even got a short little time in New York among all the activity to do some studying, something I always love to break away and do. I do say though, I’m looking forward to getting back to the coding, Twitch streams, and the day to day in Seattle again. Got some solid material coming together and looking forward to blogging that too, and it only gets put together when I’m on the ground at home in Seattle.

Cheers, happy thrashing code!

Wrap Up for August of 2018

Thrashing Code Special Session: Netboot, Debian, Fighting BIOS UEFI, ISC DHCP, Ansible, Cassandra Cluster, and More

Recently my friend swung by to record a Twitch stream on setting up the network, hardware, OS loads, PXE boot, and related items for the cluster in the office. Here’s a rundown of notes, details, and related links and such from the video, and as icing on the cake, here’s the video after some edits. I’ve managed to cut the Twitch stream down to a clean 150 minutes and 40 seconds from over 240 minutes! A little bit more digestible in this shorter format.

Some of the additional changes, that you’ll distinctly notice if you watched the original stream, is that I cleaned up the audio in some places, attempted to raise the audio in others, and for the mono audio I bumped it out to stereo so it isn’t so strange to listen to. I’ve also added callout text for all the configuration files edited, and some other commands here and there were Jeremy uses his fast typing to open a file so fast you might not see which it was. Overall, it should be easier to listen to, more bite size in segments, and more useful to reference.

In addition, below I’ve broken out key parts of the process at their respective time points.

  • 0:00:56 Determining what’s going to be setup and installed; Starting with a net install, setting up bastion server first, then figuring out the cassandra install once we have that.
  • 0:02:10 Added lftp.sudo apt-get install lftp

    lftp wikipedialftp sitelftp manual page

  • 0:02:50 Whoops, wrong initial iso image. Cut out the cycling through BIOS and mess as that needs to be setup per whatever machines are involved. But now, back to the correct Debian image.
  • 0:04:20 Getting image from Debian Distro FTP Servers. ftp.us.debian.org at path ftp://ftp.us.debian.org/debian-cdimage/current/amd64/iso-cd.
  • 0:04:46 Plan is: Initial server will be a DHCP server setup so we can setup TFTP options to clients. – TFTP Trivial File Transfer Protocol – TFTP will then be used to grab the kernal and ramdisk for Debian which will contain the installer. This will include a preseed file for Debian.
  • 0:07:36 Initial setup of Debian.
  • 0:08:05 Setup of the NIC with two ports begins. One port for internet, one port for inward facing network for the cluster.
  • 0:11:04 Setup of the drives, since there’s a number of disks in the bastion server. Includes setup of home, swap, etc and Jeremy’s thoughts on that. Then some booting conflicts with the ssd vs. flash vs. other drives.
  • 0:12:40 Jeremy and I discuss swap, failure approaches and practices.
  • 0:14:10 Discussing various ways to automate the installation. Preseed, kickstart, various automated installs, etc.
  • 0:30:38 A request for a bigger font, so Jeremy tweaks that while the webcam camera goes all auto-focus nightmare.
  • 0:31:26 Jeremy installs his dot files. For more on dot files check out here, here, here, and all the dot files on Github here are useful.
  • 0:32:30 Installing & configuration of ISC (Internet Systems Consortium) DHCP (Dynamic Host Configuration Protocol) download. The Debian docs page. To install issue command
    sudo apt-get install isc-dhcp-server

    . Configuration file is /etc/default/isc-dhcp-server.

  • 0:32:50 Configure the isc-dhcp-server file.
  • 0:33:35 Configure the dhcpd.conf file.
  • 0:34:10 Initial subnet setup with 192 range,
  • 0:36:44 Oh shit, wrong range. Switching to guns, going with 10 dot range.
  • 0:38:10 Troubleshooting some as server didn’t start immediately.
  • 0:39:17 Beginning of NAT Traffic setup & related.
  • 0:39:38 Setup of iptables begins.
  • 0:40:03 Jeremy declares he’ll just setup the iptables from memory. Then, he does indeed setup iptables from memory.
  • 0:41:49 Setup port 22 and anything on the inside network.
  • 0:42:50 Setup iptables to run on boot.
  • 0:43:18 Set in /etc/network/interfaces
  • 0:43:49 Set firewall to run first.
  • 0:44:17 Reboot confirmation of load sequence.
  • 0:44:55 Switching masquerade to be on interface vs. IP.
  • 0:46:50 iptables hangs and troubleshooting commences.
  • 0:48:42 Setup sshd_config; turn off ‘use dns’.
  • 0:49:33 Jeremy switches to Chromebook to complete the remaining configuration steps.
  • 0:50:48 ssh key setup.
  • 0:51:20 ssh key setup on Chromebook and respective key setup.
  • 0:53:40 Install tftpd.
    sudo apt-get install tftpd
  • 0:55:00 Adding iptables rules for additional changes made since initial iptables setup.
  • 0:55:40 Setup/download inetd, tftpd, and other tools on bastion to setup remaining network and servers. Jeremy also provides an explanation of the daemons and how and what is getting setup now and respective DDOS concerns.
  • 0:57:40 Starts download of netboot installer and everything else required to netboot a machine.
  • 0:58:48 First pxelinux steps. Setup of configuration file.
  • 1:00:52 First pxeboot attempt of a node server. KVM Switch confusion, and a flicker of life on the first node!
  • 1:01:50 The pxe boot got the installer started, but no further steps were taken. Jeremy delves into the log files to determine why the pxe boot didn’t launch further into the installer.
  • 1:04:20 Looks up pxelinux to setup some of the defaults and determine solutions. Good reference point for further research is available via syslinux.org.
  • 1:05:40 After a couple minutes of pxelinux information, back to the configuration.
  • 1:07:23 Jeremy gets into preseed configuration but before diving in too deep covers some ground on what to do in a production environment versus what is getting setup in the current environment for this video.
  • 1:08:47 Takes a preseed example to work from. Works through the file to setup the specifics of the installation for each of the nodes.
  • 1:16:09
    sudo dpkg-reconfigure tzdata

    since earlier we set the time zone to PST since it was one of the only options but really wanted UTC. After reconfiguration execute a timesync.

    sudo systemctl restart systemd-timesyncd

    and then get a status

    sudo systemctl restart systemd-timesyncd
  • 1:21:46 Finished reviewing the installation for the nodes. Started at 1:08:47, so it took a while to cover all those bases!
  • 1:22:28 Moves new preseed file into the correct path in the installer inetrd.
  • 1:27:44 After some troubleshooting, the pxeboot loading gets to the business of successfully loading a node, success! We follow up a short celebration of getting the first pxeboot with a little summary, restating the purpose of pxeboot, and some form and function of how pxeboot will work, i.e. pxeboot works when all other boot methods on a node fail. So when a drive is totally formatted and no master boot record to load from, boot, kicks off via pxeboot and we get a new image. Thus, it’s as simple as plugging in a new server and turning it on and I’ll have a brand new node in the cluster.
  • 1:29:30 Cycling through each of the servers, which we’ve powered on to start loading from pxeboot, just to watch them start loading and to try an initial load.
  • 1:29:51 Jeremy discusses the reasoning behind setting up some things specifically to the way they’ve been setup specific to having a database loaded on the servers versus just standard servers or another configuration of use.
  • 1:32:13 With Jeremy declaring, “it’s just YAML right?!” we opt to use Ansible for the next steps of configuration for the nodes and their respective Cassandra database installations and setup.
  • 1:32:22 Jeremy and I have a small debate about Python being trash for CLI’s or not. I’m wrong but it’s still garbage for a CLI, why does it keep getting used for CLI’s, use Go already. Grumble grumble, whatever.
  • 1:35:46 Jeremy and I now have a discussion on the configuration related to IP’s, what the range would give us, how to assign them specifically, service discoverability of the nodes, and how all of this complexity can be mitigated by something simple for us to setup right now, instead of later.
  • 1:38:25 After discussing, we opt to go with a hard coded DHCP configuration for now so we can just use the static (in function, but now literally of course since they’re designated in the DHCP) IP’s.
  • 1:41:56 Setting up the actual Ansible Playbook for Cassandra starts here.
  • 1:43:27 Executing Ansible and troubleshooting connectivity issues between nodes.
  • 1:43:48 Need sshpass, installed with
    sudo apt-get install sshpass
  • 1:44:18 We realize there’s a problem with the machines actually re-installing via pxe boot and change boot sequence to give that a temp fix.
  • 1:45:25 Back into the Ansible Playbook. Further troubleshooting of the Ansible playbook.
  • 1:45:59 Checking out the ansible hosts file.
  • 1:46:19 Checking out the cassandra.yml for Ansible further. Which leads to…
  • 1:46:53 …the realization we’re also installing Spark in addition to Cassandra. Hmmm, Ok, let’s do that too.
  • 1:47:xx For the next several minutes Jeremy steps through and makes additions and fixes to the Ansible file(s).
  • 1:59:52 Took a break at this point.
  • 2:01:11 Had names for IP’s swapped. Fixed.
  • 2:03:09 3 of the 5 nodes are now reachable after some troubleshooting.
  • 2:03:27 After some more quick checks and troubleshooting executing the playbook again.
  • 2:05:15 Oh dear it appears to be install Cassandra finally! Even with 3 nodes and 2 failing, that’s enough for quorum so moving forward!
  • 2:08:44 Further edits and tweaks to the playbook.
  • 2:09:20 Executing playbook again.
  • 2:09:36 Removing further cruft from the playbook we copied.
  • 2:09:53 Troubleshooting the Cassandra loads.
  • 2:15:04 At this point there’s three executing Cassandra installs pulled from FTP for installation and on nodes.
  • 2:30:40 End

As a shout out and props to Jeremy, when you need or want a new hard drive, check our https://diskprices.com/. It’s pretty solid for finding excellent disk prices and Jeremy gets a few pennies per order. Cheers, thanks for watching, and thanks Jeremy for swinging by and educating us all!