In this article, I’ll explore CAP Theorem and its implications on distributed systems, particularly focusing on Apache Kafka, Apache Flink, and Apache Cassandra. I’ll then dissect how CAP influences these systems in real-world scenarios, delve into some of the edge cases like split-brain scenarios, and offer actionable strategies to mitigate challenges. Finally, a wrap up with deployment strategies for self-hosted environments and discuss how Confluent Cloud tackles CAP-related challenges.
What is the CAP Theorem?
The CAP Theorem, introduced by Eric Brewer, states that in a distributed data system, you can only guarantee two out of the following three properties:
Consistency (C): Every read receives the most recent write or an error.
Availability (A): Every request receives a response, even if it’s not the most recent write.
Partition Tolerance (P): The system continues to function despite network partitions.
This means that distributed systems inherently make trade-offs, and understanding these trade-offs is key to designing robust architectures.
Hello to all the data curious, database lovers, and sciency datamungers! I have a small favor to ask of you all. At DataStax we just opened up our Apolloservice i.e. “Apache Cassandra as a Service” i.e. DBaaS offering and I’m looking for people that want to test drive the database! Now, you don’t have to actually tell me you’re using it or anything, but I’d love to know if you are. Maybe we could even chat about your experience using it.
Pick a driver here. [C#/F#, Node.js/JavaScript, Java, C++, and Python] – I added F# cuz ya know, that’s how F# works and all, you just use the C# driver and BOOM, you’ve got F# access!!
Alright, where profit is that’s when you let me know what works for you and what doesn’t. Feel free to comment here, ping me via Twitter @adron, or via the response form here, or however you’ve got to message me. I’d be super stoked to chat!
Currently during beta we have AWS as the provider option, and you can choose between Developer, Startup, Standard, and Enterprise. Each offering various configurations and future prospective SLA’s and such.
Once you have the database name, keyspace, user name, and you password set, click on Launch Database and the spin up of the multi-node database will begin. You’ll be greeted with a message notifying you that it’ll take a little bit of time for the database to spin up and an email will be sent once it is done. Enjoy a coffee in the meantime.
Once the database spins up there are two key sections on the database page. First, there is the connection details. They’re located in the bottom left of the database page.
If you click on the “Learn How” you’ll get directly linked to the docs pages with multiple examples of how to get connected to the database you’ve just created. You can also reset your password here and retrieve the security bundle (it’s a tar/zip file) that you’ll need to authenticate any applications with.
The other part that can be really helpful, especially as you do any development or testing with your database is the grafana dashboard. It’s on the Health tab of the database page.
A trick that I used, to get an easier and full screen view of all the metrics, is to inspect the page right at the metrics, within that you’ll find the iframe in which to get the link specifically to the Grafana metrics. They look pretty nice broken out of frame! As you work through queries and such keep an eye on this for extra insight.
Any other thoughts, contemplation, or otherwise do get in touch!
Today’s trip care of Alaska Airlines Flight 2 out of SEATAC Airport (Seattle & Tacoma’s airport) to National (Reagan) in Alexandria, Virginia. I’ll be staying there and commuting daily across the Potomoc River to Gaylord Resort and Convention Center (at National Harbor). I decided I’d write up something about this trip for a few specific reasons:
I finally purchased a Bromptown Bicycle which I’ve been wanting to attain and use for my trips that require air travel or don’t have enough space for a proper bicycle.
The adventure is entirely new to me, I’ve not been to these locations at any point in my life. New for me, new for those reading this (or adventuring along with me on my Twitch channel).
I also picked up a number of new things that I want to see how they’ll work for streaming while on the go. These include; Android Phone, a new dual Go Pro + Phone mount for the bike, and among these a few existing devices like my trusty set of GoPro Cameras.
I flew over via first class for various reasons. I thus, wanted to share some of the advantages and why I think it’s more than worth it to fly first class vs. coach and why companies should rethink their ideas around this when positions require frequent travel and working on the go.
Leaving Cascadia
The first thing I did was pack up the Brompton. I got a hardshell case to go along with it since I’d read during my research the airlines sometimes will snap off parts of the bike when a softshell case is used. The other advantage, the hardshell case has wheels! Inside this I also put my front mount messenger bag and some bungie cables so I can mount this stuff up to the bike upon arrival.
Once that was packed it was time to get the Mission Workshop ARKIV backpack I have locked and loaded. In my pack, which is the large of the two sizes, I get all my cloths, toothbrush, razors, and related amenities. In the side pouches that I mount up specific for longer trips I put my power brick and other electric plugs I’d need regularly in the quickest to access pouches. The other things go in various assorted pockets here and there. Since this is such a short trip, I also skip the outer backpack laptop pouch and just put the laptop in the inner sleeve.
All in all, a fairly heavy load, but the cool thing is with the configuration and post-arrival setup I have there isn’t actually much to carry. Backpack goes on my back and the hardshell case rolls along like a carry on. What makes it even easier, I’ve got an express bus with plenty of space and light rail with special areas specific for luggage like this. My 17x Express arrives on time, I board and ride off with my pack and hard shell sitting right next to me.
When I arrive downtown I merely pack up and roll downstairs to the Sound Transit LINK, board the train and off to the airport I go. No need to mess with a driver, no need for chatter or worrying about the implications of social anxiety or evils of clicking “don’t talk to me uber driver”. Just board and go. Then, read a book, check your phone, or whatever comes to mind. That’s what I do.
At the airport I strolled and rolled into the first class lounge, which I attempted to record via my new Android with the Twitch app. It… went oddly I’m assuming. Let’s take a look here.
Once I got situated in the lounge I made some pancakes – a tradition I have now – and sat down for some coding. The seats are comfortable, the views are great, and along with the coding I get to nerd out on all the planes taking on and off. At least, when one is flying in and out of C Gate at SEATAC. N Gates are kind of “meh”.
Eventually I left the relaxing lounge and headed into the boarding area of C Gates. The Alaska Air 737-900 arrived and started deplaning. With deplaning, boarding, and refueling done for the trip back east to DC we headed back out on the tarmac to queue up 15th in line to take off. Check that out, total plane traffic jam!
Once in the air we flew through some piddly turbulence and into more clouds. Clearing 10,000 foot laptops came out and a little bit more coding resumed. In addition I started this post, took a few pictures, and knocked out a few other things I needed to do.
After a while food and drink services began. In first class anything over an hour can safely assume a meal will be served. This time it was tortellini or a sandwich of some sort. I got the tortellini. The meal is then served in three parts. Starting with a little salad and soup, entree, and then wrapped up with a desert.
The soup was tasty, I was somewhat surprised by this. Where as the salad was merely a salad with some cherry tomatoes, carrots, and greens. Nothing real special, but then of course it’s a salad so not like there’s much expectation.
The tortellini was pretty good. Even in comparison to other food outside of the airlines. A little salt and pepper brought it up just slightly to something I’d even have been happy with in an actual restaurant!
Finally we wrapped up with some Salt & Straw for desert. Considering this is an airplane I was kind of amazed they’d get Salt & Straw, but then again, Alaska Airlines does like to play to the local products and all!
After food, a couple more hours of coding and prep for the oncoming days of Accelerate.
Arrival in the District of Columbia
I arrived in DC, retrieved my Brompton and racked up the case it packs in and threw my bag on the front. Now for a 26 minute bike ride from the airport to Alexandria.
On the way, the setting was magnificent with honey suckle providing a divine fragrance while I road along the bike trail along the Potomac River. The moon shined down, almost full, and in spectacular fashion!
Eventually I arrived at my new home for the week. The ride a success, an experiment that it was.
Bootcamp!
NOTE: I am an employee at DataStax, just so you know, in case you didn’t know. I always do my best to give you the direct details, but just so you don’t think I’m being a shill here. Some people don’t seem to be able to determine how people and occupations are correlated, so I like to keep things on the up and up.
First day, or maybe it’s zero day on account of zero based indexes and all, bootcamp kicked off!
In the boot camp we covered a lot of material to get attendees up to speed on Apache Cassandra. To boot, Patrick McFadin announced that everybody would get to use DataStax Constellation, our new Cassandra as a Service offering – currently in test. The awesomeness about this whole bootcamp was that we provided Constellation for everybody, without a blip on the radar! No system issues came up, albeit we crossed a few programmatic network wires that were crisscrossed but that got remedied in seconds. With that all wrapped up, released, with a bow on top, bootcamp went off without a hitch. Also a huge shout out to the dozens of team members that provided support throughout the room of 300+ attendees!
Good times in success!
Day 1 – Announcing DataStax Constellation
The first day, based on our zero based index numbering of conference days, started with Billy Bosworth CEO of DataStax giving keynote number one.
In the keynote Billy talks about the direction of DataStax and the upcoming releases, and current releases as of Accelerate 2019. Then Chelsea Navo joins Billy to do a LIVE – emphasis on a LIVE demo of DataStax Enterprise (i.e. Apache Cassandra and all the goodies) running multi-cloud in Azure, AWS, and GCP.
9:23 – Demo of DataStax Enterprise – Multi-cloud in real life. “Not a pretend demo!”
15:17 – Chealsea shows how we introduced a little chaos into the mix, and introduces the ability to simply and easily bring a datacenter down. In realtime, as the related reads and writes are occurring. Nothing stops, not even a blip… whoops, did I spoil it? Give it a watch, it’s a solid keynote demo!
At the 20 minute mark, Billy introduced DataStax Constellation. Watch it, learn more, etc. Following that Billy talks about Insights, which will be built in and services based AI, system health, and related capabilities within the cloud offering.
After the keynote, everybody broke out into technical sessions on a wide, very wide range of topics. From Apache Cassandra to DataStax to Kafka to Vue.js! Great day!
Day 2 – Apache Cassandra v4.0
On day two Billy starts off the keynotes, and introduces others including Nate McCall. Nate is the Apache Cassandra PMC Chair & committer to the project. He dove into the new features, capabilities, and changes of v4.
Next up is DataStax CTO (and founder!) and Apache Cassandra committer of yore, and more, Jonothan Ellis! (video is time point linked below so you can dive right into the talk).
After the keynotes more technical sessions. I attended some architecture discussions around graph and related technology. Lots of good conversations. I really enjoyed it, and to wrap it all up that evening we had an ending keynote with Keren Elazari.
Another flight down to the bay area. Today it was Alaska AirFlight 330 from Seattle to San Jose. It was mostly a clear day at start, with a solid layer of bright cloud cover exiting Washington on the way down to Oregon. As we crossed over that arbitrary human defined line of Oregon and California, nature presented us with even more perfectly glowing bright cloud cover. This is Cascadia after all and it’s basically covered in clouds the majority of the time. On departure I also noted Bremerton has three aircraft carriers in dock along with a normal plethora of other naval vessels. The amount of naval power in the area is always pretty awe inspiring.
Why was I in flight once again? I am heading down to teach with Jeff Carpenter (@jscarp) at the South Bay Cassandra User Group‘s Cassandra Day events. These are single day events, where we cover an introduction to Apache Cassandra, concepts of data-modeling for Apache Cassandra, and then a wrap up of application development with the respective drivers. Now if you aren’t in Santa Clara – or ya know Menlo Park, San Jose, Oakland, San Francisco, or well, the surrounding area – there are other days scheduled! We also have days scheduled that aren’t even located in the Bay, so check out the full list of events:
NOTE: If you’re interested in Seattle, Portland, or Vancouver BC area events, scroll all the way down to the end of this blog entry I’ve got more details for you!
Introduction to Apache Cassandra
In the introduction to Apache Cassandra we cover an overview of the architecture and features of the distributed database. Starting off with a definition of a distributed hash ring and how this is used in Apache Cassandra to provide data storage across the nodes that make up the Apache Cassandra Database. Moving on we’ll get into the other capabilities, trade offs of data replication between nodes, configuration settings, and a lot more.
Data Modeling
For data modeling we start off with a short review of relational database data modeling to provide something that is more familiar for many people. From this, we then build off of many concepts around denormalization, breaking apart various levels of normalization forms, and then get into the thinking and approach behind modeling an application in a distributed database and go deeper with details around Apache Cassandra.
Application Development
For application development, focusing around the Java language and technology stack, we’ll start with some concepts around how the drivers connect to and work with Apache Cassandra. We’ll open up some code too, get into some code changes and additions, to get more familiar with how the driver works and some of the capabilities of the driver itself.
Most of the code, concepts, and related material in use around Java and the tech stack are directly usable on C#, JavaScript, and even using the community open source Go CQL Library.
Coming soon…
In the coming weeks (ok, maybe a month or two) we’ll be updating this material for Apache Cassandra v4 and additionally, I’m aiming to line up some half day and probably some full day workshops in the Cascadian region: Portland, Seattle, and Vancouver BC. They’ll be almost identical except for a few tweaks, but you’ll have to RSVP to find out the details!
Also, if you’re in between any of those cities and have a stop on the Amtrak Cascades, let me know and we’ll get an RSVP list started for your city and see if we can get the required attendee count to make it official!
You must be logged in to post a comment.