Tag Archives: distributed databases

Conflicts of Building a Real World Example Application Starter Kit

As I dug through a number of JavaScript user interface frameworks lately, reading a number of posts, and building a more informed opinion. All this to decide which one I should use for a sample application for some starter kits. One post I read hit home that it does need to be a bit more complex than a todo app.

However, I’m still starting with a todo app anyway, but it’s going to turn into something else that is much more than a mere todo app. In this post I’m going to write up some of those larger plans and what complexities lie in wait – dragons are indeed there – for this more extensive real world app.

Modernizing Real World US Passenger Rail Ticket Sales!

Ok, I picked this topic since it is one of the things I find frustrating in the United States. The passenger rail systems, pretty much all of them, are barely better than many 3rd world countries, let alone the developed nations. One of those elements that the United States falls far behind on is an effective, efficient, accurate, and useful ticketing and seat assignment system. Let’s talk about this particular problem for a moment and you’ll start to visualize the problems that exist with the current system.

The Problem(s): Train Seating Options

Siemens Charger engine waiting with Talgo train.

Siemens Charger engine waiting with Talgo train.

Getting people on and off of a transport system like a train, airplane, ferry, or other mode of transport isn’t a simple process. However, many times it doesn’t have to be as complex, wrought with error, confusion, or disarray as there often is in the United States. Let’s step back and focus on one particular set of trains, the four particular trains that leave form King Street Station in Seattle, Washington on an almost daily basis.

  1. Sound Transit Sounder – [Stations] [Fares] [Wikipedia] This is a commuter route that has two lines:
    1. North Line – Seattle to Everett.
    2. South Line – Seattle to Tacoma, then onward to Lakewood.
  2. Amtrak Cascades – [Wikipedia] Seattle is one of the major stops on the Cascades route, which starts in Eugene down in Oregon and traverses all the way into Canada to Vancouver.
  3. Amtrak Empire Build – [Wikipedia] This is one of the two Superliner cross country overnight trains that leaves Seattle, connects with a sister train in Spokane everyday from Portland, and then combines and travels all the way to Chicago!
  4. Amtrak Coast Starlight – [Wikipedia] This is one of the other Superliner cross country overnight trains. It departs from Seattle, travels south with a number of stops and eventually ends in Los Angeles.

These four trains use specific train equipment with a particular accommodations for ticket sales.

One of the Amtrak Superliner Coach Car's seating layout.

One of the Amtrak Superliner Coach Car’s seating layout. (Images found here)

The Sounder provides tickets via the Sound Transit System in the area, which is a relatively cheap, non-reserved seat, heavily used train. Often there’s standing room only. It’s one of those things, that if one could purchase a ticket and know if they’re getting a seat, or if the train is full or not, that would encourage or discourage use accordingly. Currently, you buy a ticket and just get on. Rarely are they even checked, there is no gated entry, it’s basically a free for all.

The Amtrak Cascades are a reserved seat system. You purchase a ticket with the contract agreement that you will be provided a seat – either business class or regular – upon boarding. Emphasis on upon boarding as this can cause great confusion when entering the station and attempting to determine how to pick up these seat assignments even though you’ve already purchased a ticket. It adds time to boarding, requires the train sits waiting longer, and passengers have to arrive much earlier than the train departure. Albeit, just for context this earlier arrival (~20-30 minutes before) is nothing compared to the horrors of airports (2 hour suggested arrival before departure), it’s still unnecessary if modern systems were used to provide a streamlined and more efficient boarding process.

Amtrak Empire Builder

Amtrak Empire Builder

The Amtrak Empire Builder and Coast Starlight are currently an interesting mix. Both trains have sleeping accommodations that give a reserved room number before boarding. A very efficient process indeed, something to aim for. Since one knows the car number and room number, one could theoretically just board without even being guided. The rest of the seats however, some 200-300 or more of them depending on the train, are reserved seats albeit one doesn’t receive the seat assignment until they arrive at the station. Again, causing unnecessary chaos.

The Problem(s): Technology Deeper Dive

Problem: Passenger Navigation to Seat Reservation

Amtrak Cascades Bistro

Amtrak Cascades Bistro

Every single one of the trains listed above: Amtrak Empire Builder, Amtrak Coast Starlight, Amtrak Cascades, and Sound Transit Sounder all have some similar characteristics that would make it cheap and relatively easy to implement a ticketing and seat reservation system. In all of the train equipment, whether Sounder Bombardier, Superliner, or Talgo Amtrak Cascades there are seat numbers and car numbers. This provides us a core basis in which to work, to make all of this processing much easier.

At each station where these trains stop, each car of each train stops at a particular point – or could be made to stop at a particular point – at each station. The Sounder trains for example all have floor mats at the station that read “Welcome Aboard”! This is another element we could use to navigate a particular seat reservation. Automating the process of not just assigning a seat, but providing the information on each ticket for where and exactly when each passenger should arrive at a particular point at the station.

Since the cars and stations all have known characteristics about where to be, where the train will arrive and depart from, and what car number and door position is at this can all be automated per train. This is a repeatable process. Something that easily meets the exact definition of why we build computer systems and automate things with computer systems!

Problem: Equipment Changes, Modifiable Trains

Sometimes I’ve had conversations with what might change within the system. Almost all changes with a rail system are very known. From a disaster all the way to a simple everyday equipment change. For example, the train arriving may have an extra coach car or sleeper car on the Coast Starlight for some reason. Since we can build a system to model around the specific vehicles, and the vehicles numbers on a train can easily be set these changes can extrapolate out to tickets so they can be accurately assigned by a computer the day of. Changing equipment may take multiple minutes in the rail yard, but in the computer it’s a few keystrokes and it’s done. All tickets re-assigned, everything rebalanced, it’s almost as magical as a distributed database.

Problem: Common Concurrency, Purchasing, and Related Issues

There are also a number of issues a proper ticketing and reservation system would have to cover, such as managing for multiple people attempting to buy the same seat at relatively the same time. A locking and concurrency mechanism will be needed, something that’s been solved before, so appropriate planning around this will solve the issue.

There are of course timing issues too, once a ticket is locked, eventing within the system should unlock it appropriately. These event based timers will be an interesting challenge too. Solved already, but fun that they’ll need solved again specifically for this system!

Problem: Or Feature “See a Mountain”?

Aerial view of mount Rainier

Aerial view of mount Rainier

Some other things I’ve pondered include, the selling of some seats as choice preferences. For example, for the Empire Builder, Coast Starlight, and Cascades trains each have specific views that are easier or harder to see depending on the side of the train the accommodations are for. An example, if you’re facing west on the Coast Starlight you get all of the ocean views in southern California. If you’re on the east side, you get views of all the mountains like Rainier (see above picture!) and even Shasta if there is a full moon. Depending on these views and related characteristics, I’d happily pay a few bucks more to ensure I get a specific assignment or get to pick a specific assignment, so why not offer the ability to choose the seat for a specific fare?


The Puget Sound, traveling north out of Seattle on the Amtrak Cascades or Sound Transit Sounder north line.

Summary  & Next Steps

Summary – This is post one of many about the very distributed nature of purchasing tickets for one of the trains into and out of the city. As comparison with my todo app, this will definitely provide a very real world application option indeed! As soon as I wrap up the initial todo app samples, just to get started and provide details on how to get started I’m going to move on to building a real, real world applications sample, so real that it could be implemented by Sound Transit, Brightline, Virgin Rail, SNCF’s TGV, Germany’s ICE, or even good ole’ Amtrak here in the United States.

Next Steps – Next up I’m going to finish up the todo applications, with the notion that they provide some starting points for people but also for this more complex real world application. I’ll also add some more details and thoughts, and would love to converse, discuss, contributions, or co-hack on this project. Maybe you’ll join me, onward, and may you enjoy this flanged wheel ride and code slinging adventure!

Jonathan Ellis talks about Five Lessons in Distributed Databases

Notes on the talk…

  1. If it’s not SQL it’s not a database. Watch, you’ll get to hear why… ha!

Then Jonathan covers the recent history (sort of recent, the last ~20ish years) of the industry and how we’ve gotten to this point in database technology.

  1. It takes 5+ years to build a database.

Also the tens of millions of dollars with that period of time. Both are needed, in droves, time and money.

…more below the video.

  1. The customer is always right.

Even when they’re clearly wrong, they’re largely right.

For number 4 and 5 you’ll have to watch the video. Lot’s good stuff in this video including comparisons of Cosmos, Dynamo DB, Apache Cassandra, DataStax Enterprise, and how these distributed databases work, their performance (3rd Party metrics are shown) and more details!

Cassandra Datacenter & Racks

The last post in this series is Distributed Database Things to Know: Consistent Hashing.

Let’s talk about the analogy of Apache Cassandra Datacenter & Racks to actual datacenter and racks. I kind of enjoy the use of the terms datacenter and racks to describe architectural elements of Cassandra. However, as time moves on the relationship between these terms and why they’re called datacenter and racks can be obfuscated.

Take for instance, a datacenter could just be a cloud provider, an actual physical datacenter location, a zone in Azure, or region in some other provider. What an actual Datacenter in Cassandra parlance actually is can vary, but the origins of why it’s called a Datacenter remains the same. The elements of racks also can vary, but also remain the same.

Origins: Racks & Datacenters?

Let’s cover the actual things in this industry we call datacenter and racks first, unrelated to Apache Cassandra terms.

Racks: The easiest way to describe a physical rack is to show pictures of datacenter racks via the ole’ Google images.


A rack is something that is located in a data-center, or even just someone’s garage in some odd scenarios. Ya know, if somebody wants serious hardware to work with. The rack then has a number of servers, often various kinds, within that rack itself. As you can see from the images above there’s a wide range of these racks.

Datacenter: Again the easiest way to describe a datacenter is to just look at a bunch of pictures of datacenter, albeit you see lots of racks again. But really, that’s what a datacenter is, is a building that has lots and lots of racks.


However in Apache Cassandra (and respectively DataStax Enterprise products) a datacenter and rack do not directly correlate to a physical rack or datacenter. The idea is more of an abstraction than hard mapping to the physical realm. In turn it is better to think of datacenter and racks as a way to structure and organize your DataStax Enterprise or Apache Cassandra architecture. From a tree perspective of organizing your cluster, think of things in this hierarchy.

  • Cluster
    • Datacenter(s)
      • Rack(s)
        • Server(s)
          • Node (vnode)

Apache Cassandra Datacenter

An Apache Cassandra Datacenter is a group of nodes, related and configured within a cluster for replication purposes. Setting up a specific set of related nodes into a datacenter helps to reduce latency, prevent transactions from impact by other workloads, and related effects. The replication factor can also be setup to write to multiple datacenter, providing additional flexibility in architectural design and organization. One specific element of datacenter to note is that they must contain only one node type:

Depending on the replication factor, data can be written to multiple datacenters. Datacenters must never span physical locations.Each datacenter usually contains only one node type. The node types are:

  • Transactional: Previously referred to as a Cassandra node.
  • DSE Graph: A graph database for managing, analyzing, and searching highly-connected data.
  • DSE Analytics: Integration with Apache Spark.
  • DSE Search: Integration with Apache Solr. Previously referred to as a Solr node.
  • DSE SearchAnalytics: DSE Search queries within DSE Analytics jobs.

Apache Cassandra Racks

An Apache Cassandra Rack is a grouped set of servers. The architecture of Cassandra uses racks so that no replica is stored redundantly inside a singular rack, ensuring that replicas are spread around through different racks in case one rack goes down. Within a datacenter there could be multiple racks with multiple servers, as the hierarchy shown above would dictate.

To determine where data goes within a rack or sets of racks Apache Cassandra uses what is referred to as a snitch. A snitch determines which racks and datacenter a particular node belongs to, and by respect of that, determines where the replicas of data will end up. This replication strategy which is informed by the snitch can take the form of numerous kinds of snitches, some examples include;

  • SimpleSnitch – this snitch treats order as proximity. This is primarily only used when in a single-datacenter deployment.
  • Dynamic Snitching – the dynamic snitch monitors read latencies to avoid reading from hosts that have slowed down.
  • RackInferringSnitch – Proximity is determined by rack and datacenter, assumed corresponding to 3rd and 2nd octet of each node’s IP address. This particular snitch is often used as an example for writing a custom snitch class since it isn’t particularly useful unless it happens to match one’s deployment conventions.

In the future I’ll outline a few more snitches, how some of them work with more specific detail, and I’ll get into a whole selection of other topics. Be sure to subscribe to the blog, the ole’ RSS feed works great too, and follow @CompositeCode for blog updates. For discourse and hot takes follow me @Adron.

Distributed Database Things to Know Series

  1. Consistent Hashing
  2. Apache Cassandra Datacenter & Racks (this post)


Consistent Hashing

I wrote about consistent hashing in 2013 when I worked at Basho, when I had started a series called “Learning About Distributed Databases” and today I’m kicking that back off after a few years (ok, after 5 or so years!) with this post on consistent hashing.

As with Riak, which I wrote about in 2013, Cassandra remains one of the core active distributed database projects alive today that provides an effective and reliable consistent hash ring for the clustered distributed database system. This hash function, is an algorithm that maps data to variable length to data that’s fixed. This consistent hash is a kind of hashing that provides this pattern for mapping keys to particular nodes around the ring in Cassandra. One can think of this as a kind of Dewey Decimal Classification system where the cluster nodes are the various bookshelves in the library.

Ok, so maybe the Dewey Decimal system isn’t the best analogy. Does anybody even learn about that any more? If you don’t know what it is, please read up and support your local library.

Consistent hashing allows data distributed across a cluster to minimize reorganization when nodes are added or removed. These partitions are based on a particular partition key. The partition key shouldn’t be confused with a primary key either, it’s more like a unique identifier controlled by the system that would make up part of a primary key of a primary key that is made up of multiple candidate keys in a composite key.

For an example, let’s take a look at sample data from the DataStax docs on consistent hashing.

For example, if you have the following data:

name age car gender
jim 36 camaro M
carol 37 345s F
johnny 12 supra M
suzy 10 mustang F

The database assigns a hash value to each partition key:

Partition key Murmur3 hash value
jim -2245462676723223822
carol 7723358927203680754
johnny -6723372854036780875
suzy 1168604627387940318

Each node in the cluster is responsible for a range of data based on the hash value.

Hash values in a four node cluster

DataStax Enterprise places the data on each node according to the value of the partition key and the range that the node is responsible for. For example, in a four node cluster, the data in this example is distributed as follows:

Node Start range End range Partition key Hash value
1 -9223372036854775808 -4611686018427387904 johnny -6723372854036780875
2 -4611686018427387903 -1 jim -2245462676723223822
3 0 4611686018427387903 suzy 1168604627387940318
4 4611686018427387904 9223372036854775807 carol 7723358927203680754

So there ya go, that’s consistent hashing and how it works in a distributed database like Apache Cassandra, the derived distributed database DataStax Enterprise, or the mostly defunct RIP Riak. If you’d like to dig in further, I’ve also found Distributed Hash Tables interesting and also a host of other articles that delve into coding up a consistent has table, respective ring, and the whole enchilada. Check out these articles for more information and details:

    • Simple Magic Consistent by Mathias Meyer @roidrage CTO of Travis CI. Mathias’s post is well written and drives home some good points.
    • Consistent Hashing: Algorithmic Tradeoffs by Damien Gryski @dgryski. This post from Damien is pretty intense, and if you want code, he’s got code for ya.
    • How Ably Efficiently Implemented Consistent Hashing by Srushtika Neelakantam. Srushtika does a great job not only of describing what consistent hashing is but also has drawn up diagrams, charts, and more to visualize what is going on. But that isn’t all, she also wrote up some code to show nodes coming and going. A really great post.

For more on distributed database things to know subscribe to the blog, of course the ole’ RSS feed works great too, and follow @CompositeCode on Twitter for blog updates.

Distributed Database Things to Know Series

  1. Consistent Hashing (this post)
  2. Apache Cassandra Datacenter & Racks


September & October Op & Dev Dis Sys Meetups Posted

I’m excited to announce several new speakers coming to Seattle. Meet Karthik Ramasamy, Joseph Jacks, and Luc Perkins. They’re going to cover a range of technologies, but to list just a few; Heron, messaging, queueing, streaming, Apache Cassandra, Apache Pulsar, Prometheus, Kubernetes, and others.

Everybody meet Karthik Ramasamy!


Karthik Ramasamy

Karthik Ramasamy is the co-founder of Streamlio that focuses on building next generation real time infrastructure. Before Streamlio, he was the engineering manager and technical lead for real-time infrastructure at Twitter where he co-created Twitter Heron. He has two decades of experience working with companies such as Teradata, Greenplum, and Juniper in their rapid growth stages building parallel databases, big data infrastructure, and networking. He co-founded Locomatix, a company that specializes in real-time streaming processing on Hadoop and Cassandra using SQL, that was acquired by Twitter. Karthik has a Ph.D. in computer science from the University of Wisconsin, Madison with a focus on big data and databases. During his college tenure several of his research projects were later spun off as a company acquired by Teradata. Karthik is the author of several publications, patents, and Network Routing: Algorithms, Protocols and Architectures.

Presentation: Unifying Messaging, Queuing, Streaming & Light Weight Compute with Apache Pulsar

Data processing use cases, from transformation to analytics, perform tasks that require various combinations of queuing, streaming and lightweight processing steps. Until now, supporting all of those needs has required different systems for each task–stream processing engines, messaging queuing middleware, and streaming messaging systems. That has led to increased complexity for development and operations.

In this session, we’ll discuss the need to unify these capabilities in a single system and how Apache Pulsar was designed to address that. Apache Pulsar is a next generation distributed pub-sub system that was developed and deployed at Yahoo. Karthik, will explain how the architecture and design of Pulsar provides the flexibility to support developers and applications needing any combination of queuing, messaging, streaming and lightweight compute.

Everybody meet Joseph Jacks & Luc Perkins!


Joseph Jacks & Luc Perkins

More about Joseph


Joseph was the founder and organizer of KubeCon (the Kubernetes community conference, donated to and now run by the Linux Foundation’s CNCF). He also co-founded Kismatic (the first commercial open source Kubernetes tools and services company), acquired by Apprenda in 2016. Joseph previously worked at Enstratius Networks (acquired by Dell Software), TIBCO, and Talend (2016 IPO). He was also a founding strategy and product consultant at Mesosphere. Recently, Joseph served as a corporate EIR at Quantum Corporation in support of the Rook project. He currently serves as the co-founder and CEO of a new stealth technology startup.

More about Luc


Luc has joined the tech industry a few years back after a foray in choral tunes and thrashing guitar virtuosity. Educated at Reed in Portland Oregon and then on to Duke where he wrapped up. Then back to Portlandia and then joined AppFog for a bit working in he platform as a service world before delving into the complexities of distributed databases at Basho. Having working with Luc there along with Eric Redmond I wasn’t surprised to see Luc just release the 2nd edition of the Seven Databases in Seven Weeks book. Recently he also joined CNCF as a Developer Advocate after drifting through some time at Twitter and Streamli working on streaming & related distributed systems.

Presentation: Prometheus, Grafana, Kubernetes, and a Cassandra Cluster

Over the past few years, Prometheus has emerged as a best-of-breed OSS monitoring and observability solution. In this talk, I’ll walk you through setting up a full-fledged Prometheus setup for a Cassandra cluster running on Kubernetes, including Grafana dashboards, Alertmanager notifications via Slack, and more.

Presentations: Title TBD – Stay Tuned!

I’ll post more details on Joseph’s talk in the next couple of days. But you can get an idea that it’ll be some seriously interesting material!

RSVP to the Meetups Here

A Collection of Links & Tour of DataStax Docker Images

Another way to get up and running with a DataStax Enterprise 6 setup on your local machine is to use the available (and supported by DataStax) Docker images. For additional description of what each of the images is, what is contained on the images, I read up on via Kathryn Erickson’s (@012345) blog entry “DataStax Now Offering Docker Images for Development“. Also there’s a video Jeff Carpenter (@jscarp) put together which talks from the 5.x version of the release (since v6 wasn’t released at the time).

Continue reading

Riak Developer Guidance

The “Client Round Robin Anti-Pattern”

One of the features that is often available in Riak Client software (including the CorrguatedIron .NET Client, the riak-js client and others) is the ability to send requests to the Riak Cluster through a round robin style approach. What this means is each IP, of each node within the Riak Cluster is entered into a config file for the client. The client then goes through that list to send off requests to read, write or delete data in the database.

The client being responsible and knowledgeable about the data tier of the application in an architecture is an immediate red flag! The concept around SoC (Separation of Concerns) dictates that

“SoC is a principle for separating a computer program into distinct sections, such that each section addresses a separate concern.

Having the client provide a network tier layer to round robin communication with the database leaves us in a scenario that should be separated into individual concerns. Below is some basic guidance on eliminating this SoC issue.

  • Client ONLY sends and receives communication: The client, especially in the situation with a distributed system like Riak should only be dealing with sending and receiving information from the cluster or a facade that provides an interface for that cluster.
  • Another layer should deal with the network communication and division of nodes and node communication. Ideally, in the case or Riak, and most distributed systems this should be dealt with at the network device layer (router).
  • The network device (router) layer would ideally be able to have (through software likely) a way to automate the failure, inclusion or exclusion of nodes with the cluster system. If a node goes down, the network device should handle the immediate cessation of communication with that node from all clients, routing the communication accordingly to an active node.
  • The node itself needs to maintain a continual information state available to the network. Ideally the network state would identify any addition or removal of a node and if possible the immediate failure of a node. Of course it isn’t always possible to be informed of a failure, but the first line of defense should start within the cluster itself among the nodes.

The Anti-Pattern

Having the client handle all of these parts of the functional architecture leads to a number of problems, not merely that the guidance of the SoC concept is broken. With the client attempting to track and be aware of the individual nodes in the cluster, it sets the client with a huge responsibility.

Take for instance the riak-js client. If a node goes down the client will need to be aware of which node has gone down. For a few seconds (yes, you have to wait entire seconds at this level) the node will be gone and the client won’t know it is down. The client would just have to reasonably wait. When the communication times out, the client would then have to have the responsibility of marking that particular node as down. At this point the client must track which node it is in some type of data repository local to the client. The client must also set a time or some way to identify when the node comes back up. Several questions start to come up such as;

  • Does the client do an arbitrary test to determine when the node comes back up?
  • When the node comes back up is it considered alive or damaged?
  • How would the client manage the IP (or identifier) of the node that has gone down?
  • How long would the client store that the node is down?

The list of questions can get long pretty quick, thus the bad karma of not following a good practice around separating your concerns appropriately! One has to be careful, a god class might be right around the corner otherwise! That’s it for this quick journey into some distributed database usage guidelines. Until next, happy data sciencing.  😉