Cassie Schema Migrator >> CaSMa

A few weeks back I started working on a schema migration tool for Apache Cassandra and DataStax Enterprise. Just for context, here are the short definitions of what each of the elements of CaSMa are.

  • cstar-iconApache Cassandra
    • Definition: Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
    • History: Avinash Lakshman, one of the authors of Amazon’s Dynamo, and Prashant Malik initially developed Cassandra at Facebook to power the Facebook inbox search feature. Facebook released Cassandra as an open-source project on Google code in July 2008. In March 2009 it became an Apache Incubator project. On February 17, 2010 it graduated to a top-level project. Facebook developers named their database after the Trojan mythological prophet Cassandra, with classical allusions to a curse on an oracle.
  • dse-logoDataStax Enterprise
    • Definition: DataStax Enterprise, or routinely just referred to as DSE, is an extended version of Apache Cassandra with multi-model capabilities around graph, search, analytics, and other features like security capabilities and a core data engine 2x speed improvement.
    • History: DataStax was formed in 2009 by Jonathan Ellis and Matt Pfeil and originally named Riptano. In 2011 Riptano changes names to DataStax. For more history check out the Wikipedia page or company page for a timeline of events.
  • command-toolsSchema Migration
    • Definition:In software engineering, schema migration (also database migration, database change management) refers to the management of incremental, reversible changes to relational database schemas. A schema migration is performed on a database whenever it is necessary to update or revert that database’s schema to some newer or older version. Migrations are generally performed programmatically by using a schema migration tool. When invoked with a specified desired schema version, the tool automates the successive application or reversal of an appropriate sequence of schema changes until it is brought to the desired state.
    • Addition reference and related materials:

iconmonstr-twitch-5Over the next dozen weeks or so as I work on this application via the DataStax Devs Twitch stream (next coding session events list) I’ll also be posting some blog posts in parallel about schema migration and my intent to expand on the notion of schema migration specifically for multi-model databases and larger scale NoSQL systems; namely Apache Cassandra and DataStax Enterprise. Here’s a shortlist for the next three episodes;

The other important pieces include the current code base on Github, the continuous integration build, and the tasks and issues.

Alright, now that all the collateral and context is listed, let’s get into at a high level what this is all about.

CaSMa’s Mission

Schema migration is a powerful tool to get a project on track and consistently deployed and development working against the core database(s). However, it’s largely entrenched in the relational database realm. This means it’s almost entirely focused on a schema with the notions of primary and foreign keys, the complexities around many to many relationships, indexes, and other errata that needs to be built consistently for a relational database. Many of those things need to be built for a distributed columnar store, key value, graph, time series, or a million other possibilities too. However, in our current data schema world, that tooling isn’t always readily available.

The mission of CaSMa is to first resolve this gap around schema migration, first and foremost for Apache Cassandra and prospectively in turn for DataStax Enterprise and then onward for other database systems. Then the mission will continue around multi-model systems that should, can, and ought to take advantage of schema migration for graph, and related schema modeling. At some point the mission will expand to include other schema, data, and state management focused around software development and data needs within that state

As progress continues I’ll publish additional posts here on the different data model concepts and nature behind various multi-model database options. These modeling options will put us in a position to work consistently, context based, and seamlessly with ongoing development efforts. In addition to all this, there will be the weekly Twitch sessions where I’ll get into coding and reviewing what coding I’ve done off camera too. Check those out on the DataStax Devs Channel.

If you’d like to get into the project and help out just ping me via Twitter @Adron or message me here.

Conflicts of Building a Real World Example Application Starter Kit

As I dug through a number of JavaScript user interface frameworks lately, reading a number of posts, and building a more informed opinion. All this to decide which one I should use for a sample application for some starter kits. One post I read hit home that it does need to be a bit more complex than a todo app.

However, I’m still starting with a todo app anyway, but it’s going to turn into something else that is much more than a mere todo app. In this post I’m going to write up some of those larger plans and what complexities lie in wait – dragons are indeed there – for this more extensive real world app.

Modernizing Real World US Passenger Rail Ticket Sales!

Ok, I picked this topic since it is one of the things I find frustrating in the United States. The passenger rail systems, pretty much all of them, are barely better than many 3rd world countries, let alone the developed nations. One of those elements that the United States falls far behind on is an effective, efficient, accurate, and useful ticketing and seat assignment system. Let’s talk about this particular problem for a moment and you’ll start to visualize the problems that exist with the current system.

The Problem(s): Train Seating Options

Siemens Charger engine waiting with Talgo train.
Siemens Charger engine waiting with Talgo train.

Getting people on and off of a transport system like a train, airplane, ferry, or other mode of transport isn’t a simple process. However, many times it doesn’t have to be as complex, wrought with error, confusion, or disarray as there often is in the United States. Let’s step back and focus on one particular set of trains, the four particular trains that leave form King Street Station in Seattle, Washington on an almost daily basis.

  1. Sound Transit Sounder – [Stations] [Fares] [Wikipedia] This is a commuter route that has two lines:
    1. North Line – Seattle to Everett.
    2. South Line – Seattle to Tacoma, then onward to Lakewood.
  2. Amtrak Cascades – [Wikipedia] Seattle is one of the major stops on the Cascades route, which starts in Eugene down in Oregon and traverses all the way into Canada to Vancouver.
  3. Amtrak Empire Build – [Wikipedia] This is one of the two Superliner cross country overnight trains that leaves Seattle, connects with a sister train in Spokane everyday from Portland, and then combines and travels all the way to Chicago!
  4. Amtrak Coast Starlight – [Wikipedia] This is one of the other Superliner cross country overnight trains. It departs from Seattle, travels south with a number of stops and eventually ends in Los Angeles.

These four trains use specific train equipment with a particular accommodations for ticket sales.

One of the Amtrak Superliner Coach Car's seating layout.
One of the Amtrak Superliner Coach Car’s seating layout. (Images found here)

The Sounder provides tickets via the Sound Transit System in the area, which is a relatively cheap, non-reserved seat, heavily used train. Often there’s standing room only. It’s one of those things, that if one could purchase a ticket and know if they’re getting a seat, or if the train is full or not, that would encourage or discourage use accordingly. Currently, you buy a ticket and just get on. Rarely are they even checked, there is no gated entry, it’s basically a free for all.

The Amtrak Cascades are a reserved seat system. You purchase a ticket with the contract agreement that you will be provided a seat – either business class or regular – upon boarding. Emphasis on upon boarding as this can cause great confusion when entering the station and attempting to determine how to pick up these seat assignments even though you’ve already purchased a ticket. It adds time to boarding, requires the train sits waiting longer, and passengers have to arrive much earlier than the train departure. Albeit, just for context this earlier arrival (~20-30 minutes before) is nothing compared to the horrors of airports (2 hour suggested arrival before departure), it’s still unnecessary if modern systems were used to provide a streamlined and more efficient boarding process.

Amtrak Empire Builder
Amtrak Empire Builder

The Amtrak Empire Builder and Coast Starlight are currently an interesting mix. Both trains have sleeping accommodations that give a reserved room number before boarding. A very efficient process indeed, something to aim for. Since one knows the car number and room number, one could theoretically just board without even being guided. The rest of the seats however, some 200-300 or more of them depending on the train, are reserved seats albeit one doesn’t receive the seat assignment until they arrive at the station. Again, causing unnecessary chaos.

The Problem(s): Technology Deeper Dive

Problem: Passenger Navigation to Seat Reservation

Amtrak Cascades Bistro
Amtrak Cascades Bistro

Every single one of the trains listed above: Amtrak Empire Builder, Amtrak Coast Starlight, Amtrak Cascades, and Sound Transit Sounder all have some similar characteristics that would make it cheap and relatively easy to implement a ticketing and seat reservation system. In all of the train equipment, whether Sounder Bombardier, Superliner, or Talgo Amtrak Cascades there are seat numbers and car numbers. This provides us a core basis in which to work, to make all of this processing much easier.

At each station where these trains stop, each car of each train stops at a particular point – or could be made to stop at a particular point – at each station. The Sounder trains for example all have floor mats at the station that read “Welcome Aboard”! This is another element we could use to navigate a particular seat reservation. Automating the process of not just assigning a seat, but providing the information on each ticket for where and exactly when each passenger should arrive at a particular point at the station.

Since the cars and stations all have known characteristics about where to be, where the train will arrive and depart from, and what car number and door position is at this can all be automated per train. This is a repeatable process. Something that easily meets the exact definition of why we build computer systems and automate things with computer systems!

Problem: Equipment Changes, Modifiable Trains

Sometimes I’ve had conversations with what might change within the system. Almost all changes with a rail system are very known. From a disaster all the way to a simple everyday equipment change. For example, the train arriving may have an extra coach car or sleeper car on the Coast Starlight for some reason. Since we can build a system to model around the specific vehicles, and the vehicles numbers on a train can easily be set these changes can extrapolate out to tickets so they can be accurately assigned by a computer the day of. Changing equipment may take multiple minutes in the rail yard, but in the computer it’s a few keystrokes and it’s done. All tickets re-assigned, everything rebalanced, it’s almost as magical as a distributed database.

Problem: Common Concurrency, Purchasing, and Related Issues

There are also a number of issues a proper ticketing and reservation system would have to cover, such as managing for multiple people attempting to buy the same seat at relatively the same time. A locking and concurrency mechanism will be needed, something that’s been solved before, so appropriate planning around this will solve the issue.

There are of course timing issues too, once a ticket is locked, eventing within the system should unlock it appropriately. These event based timers will be an interesting challenge too. Solved already, but fun that they’ll need solved again specifically for this system!

Problem: Or Feature “See a Mountain”?

Aerial view of mount Rainier
Aerial view of mount Rainier

Some other things I’ve pondered include, the selling of some seats as choice preferences. For example, for the Empire Builder, Coast Starlight, and Cascades trains each have specific views that are easier or harder to see depending on the side of the train the accommodations are for. An example, if you’re facing west on the Coast Starlight you get all of the ocean views in southern California. If you’re on the east side, you get views of all the mountains like Rainier (see above picture!) and even Shasta if there is a full moon. Depending on these views and related characteristics, I’d happily pay a few bucks more to ensure I get a specific assignment or get to pick a specific assignment, so why not offer the ability to choose the seat for a specific fare?

IMG_4090-XL
The Puget Sound, traveling north out of Seattle on the Amtrak Cascades or Sound Transit Sounder north line.

Summary  & Next Steps

Summary – This is post one of many about the very distributed nature of purchasing tickets for one of the trains into and out of the city. As comparison with my todo app, this will definitely provide a very real world application option indeed! As soon as I wrap up the initial todo app samples, just to get started and provide details on how to get started I’m going to move on to building a real, real world applications sample, so real that it could be implemented by Sound Transit, Brightline, Virgin Rail, SNCF’s TGV, Germany’s ICE, or even good ole’ Amtrak here in the United States.

Next Steps – Next up I’m going to finish up the todo applications, with the notion that they provide some starting points for people but also for this more complex real world application. I’ll also add some more details and thoughts, and would love to converse, discuss, contributions, or co-hack on this project. Maybe you’ll join me, onward, and may you enjoy this flanged wheel ride and code slinging adventure!

Jonathan Ellis talks about Five Lessons in Distributed Databases

Notes on the talk…

  1. If it’s not SQL it’s not a database. Watch, you’ll get to hear why… ha!

Then Jonathan covers the recent history (sort of recent, the last ~20ish years) of the industry and how we’ve gotten to this point in database technology.

  1. It takes 5+ years to build a database.

Also the tens of millions of dollars with that period of time. Both are needed, in droves, time and money.

…more below the video.

  1. The customer is always right.

Even when they’re clearly wrong, they’re largely right.

For number 4 and 5 you’ll have to watch the video. Lot’s good stuff in this video including comparisons of Cosmos, Dynamo DB, Apache Cassandra, DataStax Enterprise, and how these distributed databases work, their performance (3rd Party metrics are shown) and more details!

Cassandra Datacenter & Racks

The last post in this series is Distributed Database Things to Know: Consistent Hashing.

Let’s talk about the analogy of Apache Cassandra Datacenter & Racks to actual datacenter and racks. I kind of enjoy the use of the terms datacenter and racks to describe architectural elements of Cassandra. However, as time moves on the relationship between these terms and why they’re called datacenter and racks can be obfuscated.

Take for instance, a datacenter could just be a cloud provider, an actual physical datacenter location, a zone in Azure, or region in some other provider. What an actual Datacenter in Cassandra parlance actually is can vary, but the origins of why it’s called a Datacenter remains the same. The elements of racks also can vary, but also remain the same.

Origins: Racks & Datacenters?

Let’s cover the actual things in this industry we call datacenter and racks first, unrelated to Apache Cassandra terms.

Racks: The easiest way to describe a physical rack is to show pictures of datacenter racks via the ole’ Google images.

racks.png

A rack is something that is located in a data-center, or even just someone’s garage in some odd scenarios. Ya know, if somebody wants serious hardware to work with. The rack then has a number of servers, often various kinds, within that rack itself. As you can see from the images above there’s a wide range of these racks.

Datacenter: Again the easiest way to describe a datacenter is to just look at a bunch of pictures of datacenter, albeit you see lots of racks again. But really, that’s what a datacenter is, is a building that has lots and lots of racks.

data-center.png

However in Apache Cassandra (and respectively DataStax Enterprise products) a datacenter and rack do not directly correlate to a physical rack or datacenter. The idea is more of an abstraction than hard mapping to the physical realm. In turn it is better to think of datacenter and racks as a way to structure and organize your DataStax Enterprise or Apache Cassandra architecture. From a tree perspective of organizing your cluster, think of things in this hierarchy.

  • Cluster
    • Datacenter(s)
      • Rack(s)
        • Server(s)
          • Node (vnode)

Apache Cassandra Datacenter

An Apache Cassandra Datacenter is a group of nodes, related and configured within a cluster for replication purposes. Setting up a specific set of related nodes into a datacenter helps to reduce latency, prevent transactions from impact by other workloads, and related effects. The replication factor can also be setup to write to multiple datacenter, providing additional flexibility in architectural design and organization. One specific element of datacenter to note is that they must contain only one node type:

Depending on the replication factor, data can be written to multiple datacenters. Datacenters must never span physical locations.Each datacenter usually contains only one node type. The node types are:

  • Transactional: Previously referred to as a Cassandra node.
  • DSE Graph: A graph database for managing, analyzing, and searching highly-connected data.
  • DSE Analytics: Integration with Apache Spark.
  • DSE Search: Integration with Apache Solr. Previously referred to as a Solr node.
  • DSE SearchAnalytics: DSE Search queries within DSE Analytics jobs.

Apache Cassandra Racks

An Apache Cassandra Rack is a grouped set of servers. The architecture of Cassandra uses racks so that no replica is stored redundantly inside a singular rack, ensuring that replicas are spread around through different racks in case one rack goes down. Within a datacenter there could be multiple racks with multiple servers, as the hierarchy shown above would dictate.

To determine where data goes within a rack or sets of racks Apache Cassandra uses what is referred to as a snitch. A snitch determines which racks and datacenter a particular node belongs to, and by respect of that, determines where the replicas of data will end up. This replication strategy which is informed by the snitch can take the form of numerous kinds of snitches, some examples include;

  • SimpleSnitch – this snitch treats order as proximity. This is primarily only used when in a single-datacenter deployment.
  • Dynamic Snitching – the dynamic snitch monitors read latencies to avoid reading from hosts that have slowed down.
  • RackInferringSnitch – Proximity is determined by rack and datacenter, assumed corresponding to 3rd and 2nd octet of each node’s IP address. This particular snitch is often used as an example for writing a custom snitch class since it isn’t particularly useful unless it happens to match one’s deployment conventions.

In the future I’ll outline a few more snitches, how some of them work with more specific detail, and I’ll get into a whole selection of other topics. Be sure to subscribe to the blog, the ole’ RSS feed works great too, and follow @CompositeCode for blog updates. For discourse and hot takes follow me @Adron.

Distributed Database Things to Know Series

  1. Consistent Hashing
  2. Apache Cassandra Datacenter & Racks (this post)

 

Consistent Hashing

I wrote about consistent hashing in 2013 when I worked at Basho, when I had started a series called “Learning About Distributed Databases” and today I’m kicking that back off after a few years (ok, after 5 or so years!) with this post on consistent hashing.

As with Riak, which I wrote about in 2013, Cassandra remains one of the core active distributed database projects alive today that provides an effective and reliable consistent hash ring for the clustered distributed database system. This hash function, is an algorithm that maps data to variable length to data that’s fixed. This consistent hash is a kind of hashing that provides this pattern for mapping keys to particular nodes around the ring in Cassandra. One can think of this as a kind of Dewey Decimal Classification system where the cluster nodes are the various bookshelves in the library.

Ok, so maybe the Dewey Decimal system isn’t the best analogy. Does anybody even learn about that any more? If you don’t know what it is, please read up and support your local library.

Consistent hashing allows data distributed across a cluster to minimize reorganization when nodes are added or removed. These partitions are based on a particular partition key. The partition key shouldn’t be confused with a primary key either, it’s more like a unique identifier controlled by the system that would make up part of a primary key of a primary key that is made up of multiple candidate keys in a composite key.

For an example, let’s take a look at sample data from the DataStax docs on consistent hashing.

For example, if you have the following data:

 
name age car gender
jim 36 camaro M
carol 37 345s F
johnny 12 supra M
suzy 10 mustang F

The database assigns a hash value to each partition key:

 
Partition key Murmur3 hash value
jim -2245462676723223822
carol 7723358927203680754
johnny -6723372854036780875
suzy 1168604627387940318

Each node in the cluster is responsible for a range of data based on the hash value.

Hash values in a four node cluster

DataStax Enterprise places the data on each node according to the value of the partition key and the range that the node is responsible for. For example, in a four node cluster, the data in this example is distributed as follows:

Node Start range End range Partition key Hash value
1 -9223372036854775808 -4611686018427387904 johnny -6723372854036780875
2 -4611686018427387903 -1 jim -2245462676723223822
3 0 4611686018427387903 suzy 1168604627387940318
4 4611686018427387904 9223372036854775807 carol 7723358927203680754

So there ya go, that’s consistent hashing and how it works in a distributed database like Apache Cassandra, the derived distributed database DataStax Enterprise, or the mostly defunct RIP Riak. If you’d like to dig in further, I’ve also found Distributed Hash Tables interesting and also a host of other articles that delve into coding up a consistent has table, respective ring, and the whole enchilada. Check out these articles for more information and details:

    • Simple Magic Consistent by Mathias Meyer @roidrage CTO of Travis CI. Mathias’s post is well written and drives home some good points.
    • Consistent Hashing: Algorithmic Tradeoffs by Damien Gryski @dgryski. This post from Damien is pretty intense, and if you want code, he’s got code for ya.
    • How Ably Efficiently Implemented Consistent Hashing by Srushtika Neelakantam. Srushtika does a great job not only of describing what consistent hashing is but also has drawn up diagrams, charts, and more to visualize what is going on. But that isn’t all, she also wrote up some code to show nodes coming and going. A really great post.

For more on distributed database things to know subscribe to the blog, of course the ole’ RSS feed works great too, and follow @CompositeCode on Twitter for blog updates.

Distributed Database Things to Know Series

  1. Consistent Hashing (this post)
  2. Apache Cassandra Datacenter & Racks