Cassie Schema Migrator >> CaSMa

A few weeks back I started working on a schema migration tool for Apache Cassandra and DataStax Enterprise. Just for context, here are the short definitions of what each of the elements of CaSMa are.

  • cstar-iconApache Cassandra
    • Definition: Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
    • History: Avinash Lakshman, one of the authors of Amazon’s Dynamo, and Prashant Malik initially developed Cassandra at Facebook to power the Facebook inbox search feature. Facebook released Cassandra as an open-source project on Google code in July 2008. In March 2009 it became an Apache Incubator project. On February 17, 2010 it graduated to a top-level project. Facebook developers named their database after the Trojan mythological prophet Cassandra, with classical allusions to a curse on an oracle.
  • dse-logoDataStax Enterprise
    • Definition: DataStax Enterprise, or routinely just referred to as DSE, is an extended version of Apache Cassandra with multi-model capabilities around graph, search, analytics, and other features like security capabilities and a core data engine 2x speed improvement.
    • History: DataStax was formed in 2009 by Jonathan Ellis and Matt Pfeil and originally named Riptano. In 2011 Riptano changes names to DataStax. For more history check out the Wikipedia page or company page for a timeline of events.
  • command-toolsSchema Migration
    • Definition:In software engineering, schema migration (also database migration, database change management) refers to the management of incremental, reversible changes to relational database schemas. A schema migration is performed on a database whenever it is necessary to update or revert that database’s schema to some newer or older version. Migrations are generally performed programmatically by using a schema migration tool. When invoked with a specified desired schema version, the tool automates the successive application or reversal of an appropriate sequence of schema changes until it is brought to the desired state.
    • Addition reference and related materials:

iconmonstr-twitch-5Over the next dozen weeks or so as I work on this application via the DataStax Devs Twitch stream (next coding session events list) I’ll also be posting some blog posts in parallel about schema migration and my intent to expand on the notion of schema migration specifically for multi-model databases and larger scale NoSQL systems; namely Apache Cassandra and DataStax Enterprise. Here’s a shortlist for the next three episodes;

The other important pieces include the current code base on Github, the continuous integration build, and the tasks and issues.

Alright, now that all the collateral and context is listed, let’s get into at a high level what this is all about.

CaSMa’s Mission

Schema migration is a powerful tool to get a project on track and consistently deployed and development working against the core database(s). However, it’s largely entrenched in the relational database realm. This means it’s almost entirely focused on a schema with the notions of primary and foreign keys, the complexities around many to many relationships, indexes, and other errata that needs to be built consistently for a relational database. Many of those things need to be built for a distributed columnar store, key value, graph, time series, or a million other possibilities too. However, in our current data schema world, that tooling isn’t always readily available.

The mission of CaSMa is to first resolve this gap around schema migration, first and foremost for Apache Cassandra and prospectively in turn for DataStax Enterprise and then onward for other database systems. Then the mission will continue around multi-model systems that should, can, and ought to take advantage of schema migration for graph, and related schema modeling. At some point the mission will expand to include other schema, data, and state management focused around software development and data needs within that state

As progress continues I’ll publish additional posts here on the different data model concepts and nature behind various multi-model database options. These modeling options will put us in a position to work consistently, context based, and seamlessly with ongoing development efforts. In addition to all this, there will be the weekly Twitch sessions where I’ll get into coding and reviewing what coding I’ve done off camera too. Check those out on the DataStax Devs Channel.

If you’d like to get into the project and help out just ping me via Twitter @Adron or message me here.

Bunches of Databases in Bunches of Weeks – PostgreSQL Day 1

May the database deluge begin, it’s time for “Bunches of Databases in Bunches of Weeks”. We’ll get into looking at databases similar to how they’re approached in “7 Databases in 7 Weeks“. In this session I got into a hard look at PostgreSQL or as some refer to it just Postgres. This is the first of a few sessions on PostgreSQL in which I get the database installed locally on Ubuntu. Which is transferable to any other operating system really, PostgreSQL is awesome like that. Then after installing and getting pgAdmin 4, the user interface for PostgreSQL working against that, I go the Docker route. Again, pointing pgAdmin 4 at that and creating a database and an initial table.

Below the video here I’ve added the timeline and other details, links, and other pertinent information about this series.

0:00 – The intro image splice and metal intro with tunes..
3:34 – Start of the video database content.
4:34 – Beginning the local installation of Postgres/PostgreSQL on the local machine.
20:30 – Getting pgAdmin 4 installed on local machine.
24:20 – Taking a look at pgAdmin 4, a stroll through setting up a table, getting some basic SQL from and executing with pgAdmin 4.
1:00:05 – Installing Docker and getting PostgreSQL setup as a container!
1:00:36 – Added the link to the stellar post at Digital Ocean’s Blog.
1:00:55 – My declaration that if Digital Ocean just provided documentation I’d happily pay for it, their blog entries, tutorials, and docs are hands down some of the best on the web!
1:01:10 – Installing Postgesql on Ubuntu 18.04.
1:06:44 – Signing in to Docker hub and finding the official Postgresql Docker Image.
1:09:28 – Starting the container with Docker.
1:10:24 – Connecting to the Docker Postgresql Container with pgadmin4.
1:13:00 – Creating a database and working with SQL, tables, and other resources with pgAdmin4 against the Docker container.
1:16:03 – The hacker escape outtro. Happy thrashing code!

For each of these sessions for the “Bunches of Databases in Bunches of Weeks” series I’ll follow this following sequence. I’ll go through each database in this list of my top 7 databases for day 1 (see below), then will go through each database and work through the day 2, and so on. Accumulating additional days similarly to the “7 Databases in 7 Weeks

Day 1” of the Database, I’ll work toward building a development installation of the particular database. For example, in this session I setup PostgreSQL by installing it to the local machine and also pulled a Docker image to run PostgreSQL.

Day 2” of the respective database, I’ll get into working against the database with CQL, SQL, or whatever that one would use to work specifically with the database directly. At this point I’ll also get more deeply into the types, inserting, and storing data in the respective database.

Day 3” of the respective database, I’ll get into connecting an application with C#, Node.js, and Go. Implementing a simple connection, prospectively a test of the connection, and do a simple insert, update, and delete of some sort against the respective database built on the previous day 2 of the same database.

Day 4” and onward I’ll determine the path and layout of the topic later, so subscribe on YouTube and Twitch, and tune in. The events are scheduled, with the option to be notified when a particular episode is coming on that you’d like to watch here on Twitch.

Next Events for “Bunches of Databases in Bunches of Days

Cedrick Lunven on Creating an API for your database with Rest, GraphQL, gRPC

Here’s a talk Cedrick Lunven (who I have the fortune of working with!) about creating API’s for your database, your distributed database. He starts out with a few objectives for the talk:

  1. Provide you a working API implementing Rest, gRPC, and GraphQL.
  2. Give implementation details through Demo.
  3. Reveal hints to choose and WHY, (specifically to work with Databases)

Other topics include specific criteria around conceptual data models, shifting from relational to distributed columnar store, with differentiation between entities, relationships, queries, and their respective behaviors. All of this is pertinent to our Killrvideo reference application we have too.


A New Adventure of Multi-model Distribute Graph Time Series […etc…] Database(s) Explorations Begins!

I arrived at the airport, sending a few tweets of this or that nature with all of this Github and Microsoft News. I have a great view out the window from the Alaska Lounge just before heading to the D gates. For you aeronautics fans like myself, here’s a picture of that view and a few of those Alaska Planes with one of the newly acquired Virgin America Planes!


All this news with Github and Microsoft was easily eclipsing WWDC18 and in the meanwhile little ole’ me is on my way to a new adventure in my career. So priorities what they are, the news being excited, I’m more excited today to announce today I’m joining a most excellent team at DataStax! to bring forth investigation, research, knowledge, ideas, and whatever else I can as a Developer Evangelist with the crew here at DataStax! I’m unbelievably stoked as I’ve been searching for a company that would check all of my “will this work” check boxes for some months now! DataStax won out among the other prospective candidate companies and I’m starting today!


To kick off this adventure, I’m heading to San Francisco to join in the fun attending DevxCon. I’ll be there a little later today, hopefully in time for the kick off (ya know, pending flights and BART are all timely and such)! Then a full day of the conf, then later will join the team for a visit to DataStax HQ and maybe a few surprises. I’m super excited and ready to bring awesome content your way, while inventing, building, and experimenting my way through some awesome technologies!

Docker Tips n’ Tricks for Devs – #0001 – 3 Second to Postgresql

The easiest implementation of a docker container with Postgresql that I’ve found recently allows the following commands to pull and run a Postgresql server for you.

[sourcecode language=”bash”]
docker pull postgres:latest
docker run -p 5432:5432 postgres

Then you can just connect to the Postgresql Server by opening up pgadmin with the following connection information:

  • Host: localhost
  • Port: 5432
  • Maintenance DB: postgres
  • Username: postgres
  • Password:

With that information you’ll be able to connect and use this as a development database that only takes about 3 seconds to launch whenever you need it.