Cassie Schema Migrator >> CaSMa

A few weeks back I started working on a schema migration tool for Apache Cassandra and DataStax Enterprise. Just for context, here are the short definitions of what each of the elements of CaSMa are.

  • cstar-iconApache Cassandra
    • Definition: Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
    • History: Avinash Lakshman, one of the authors of Amazon’s Dynamo, and Prashant Malik initially developed Cassandra at Facebook to power the Facebook inbox search feature. Facebook released Cassandra as an open-source project on Google code in July 2008. In March 2009 it became an Apache Incubator project. On February 17, 2010 it graduated to a top-level project. Facebook developers named their database after the Trojan mythological prophet Cassandra, with classical allusions to a curse on an oracle.
  • dse-logoDataStax Enterprise
    • Definition: DataStax Enterprise, or routinely just referred to as DSE, is an extended version of Apache Cassandra with multi-model capabilities around graph, search, analytics, and other features like security capabilities and a core data engine 2x speed improvement.
    • History: DataStax was formed in 2009 by Jonathan Ellis and Matt Pfeil and originally named Riptano. In 2011 Riptano changes names to DataStax. For more history check out the Wikipedia page or company page for a timeline of events.
  • command-toolsSchema Migration
    • Definition:In software engineering, schema migration (also database migration, database change management) refers to the management of incremental, reversible changes to relational database schemas. A schema migration is performed on a database whenever it is necessary to update or revert that database’s schema to some newer or older version. Migrations are generally performed programmatically by using a schema migration tool. When invoked with a specified desired schema version, the tool automates the successive application or reversal of an appropriate sequence of schema changes until it is brought to the desired state.
    • Addition reference and related materials:

iconmonstr-twitch-5Over the next dozen weeks or so as I work on this application via the DataStax Devs Twitch stream (next coding session events list) I’ll also be posting some blog posts in parallel about schema migration and my intent to expand on the notion of schema migration specifically for multi-model databases and larger scale NoSQL systems; namely Apache Cassandra and DataStax Enterprise. Here’s a shortlist for the next three episodes;

The other important pieces include the current code base on Github, the continuous integration build, and the tasks and issues.

Alright, now that all the collateral and context is listed, let’s get into at a high level what this is all about.

CaSMa’s Mission

Schema migration is a powerful tool to get a project on track and consistently deployed and development working against the core database(s). However, it’s largely entrenched in the relational database realm. This means it’s almost entirely focused on a schema with the notions of primary and foreign keys, the complexities around many to many relationships, indexes, and other errata that needs to be built consistently for a relational database. Many of those things need to be built for a distributed columnar store, key value, graph, time series, or a million other possibilities too. However, in our current data schema world, that tooling isn’t always readily available.

The mission of CaSMa is to first resolve this gap around schema migration, first and foremost for Apache Cassandra and prospectively in turn for DataStax Enterprise and then onward for other database systems. Then the mission will continue around multi-model systems that should, can, and ought to take advantage of schema migration for graph, and related schema modeling. At some point the mission will expand to include other schema, data, and state management focused around software development and data needs within that state

As progress continues I’ll publish additional posts here on the different data model concepts and nature behind various multi-model database options. These modeling options will put us in a position to work consistently, context based, and seamlessly with ongoing development efforts. In addition to all this, there will be the weekly Twitch sessions where I’ll get into coding and reviewing what coding I’ve done off camera too. Check those out on the DataStax Devs Channel.

If you’d like to get into the project and help out just ping me via Twitter @Adron or message me here.

Thrashing Code Twitch Schedule September 19th-October 2nd

I’ve got everything queued back up with some extra Thrashing Code Sessions and will have some on the rails travel streams. Here’s what the schedule looks like so far.

Today at 3pm PST (UPDATED: Sep 19th 2018)

thrashing-code-terraformUPDATED: Video available https://youtu.be/NmlzGKUnln4

I’m going to get back into the roll of things this session after the travels last week. In this session I’m aiming to do several things:

  1. Complete next steps toward getting a DataStax Enterprise Apache Cassandra cluster up and running via Terraform in Google Cloud Platform. My estimate is I’ll get to the point that I’ll have three instances that launch and will automate the installation of Cassandra on the three instances. Later I’ll aim to expand this, but for now I’m just going to deploy 3 nodes and then take it from there. Another future option is to bake the installation into a Packer deployed image and use it for the Terraform execution. Tune in to find out the steps and what I decide to go with.
  2. I’m going to pull up the InteroperabilityBlackBox and start to flesh out some objects for our model. The idea, is based around something I stumbled into last week during travels, the thread on that is here.

Friday (Today) @ 3pm PST

thrashing-code-gopherThis Friday I’m aiming to cover some Go basics before moving further into the Colligere CLI  app. Here are the highlights of the plan.

  1.  I’m going to cover some of the topics around program structure including: type declarations, tuple assignment, variable lifetime, pointers, and other variables.
  2.  I’m going to cover some basics on packages, initialization of packages, imports, and scope. This is an important aspect of ongoing development with Colligere since we’ll be pulling in a number of packages for generation of the data.
  3. Setting up configuration and schema for the Colligere application using Viper and related tooling.

Tuesday, October 2nd @ 3pm PST

thrashing-code-terraformThis session I’m aiming to get some more Terraform work done around the spin up and shutdown of the cluster. I’ll dig into some more specific points depending on where I progress to in sessions previous to this one. But it’s on the schedule, so I’ll update this one in the coming days.

 

DSE6 + .NET v?

Project Repo: Interoperability Black Box

First steps. Let’s get .NET installed and setup. I’m running Ubuntu 18.04 for this setup and start of project. To install .NET on Ubuntu one needs to go through a multi-command process of keys and some other stuff, fortunately Microsoft’s teams have made this almost easy by providing the commands for the various Linux distributions here. The commands I ran are as follows to get all this initial setup done.

[sourcecode language=”bash”]
wget -qO- https://packages.microsoft.com/keys/microsoft.asc | gpg –dearmor > microsoft.asc.gpg
sudo mv microsoft.asc.gpg /etc/apt/trusted.gpg.d/
wget -q https://packages.microsoft.com/config/ubuntu/18.04/prod.list
sudo mv prod.list /etc/apt/sources.list.d/microsoft-prod.list
sudo chown root:root /etc/apt/trusted.gpg.d/microsoft.asc.gpg
sudo chown root:root /etc/apt/sources.list.d/microsoft-prod.list
[/sourcecode]

After all this I could then install the .NET SDK. It’s been so long since I actually installed .NET on anything that I wasn’t sure if I just needed the runtime, the SDK, or what I’d actually need. I just assumed it would be safe to install the SDK and then install the runtime too.

[sourcecode language=”bash”]
sudo apt-get install apt-transport-https
sudo apt-get update
sudo apt-get install dotnet-sdk-2.1
[/sourcecode]

Then the runtime.

[sourcecode language=”bash”]
sudo apt-get install aspnetcore-runtime-2.1
[/sourcecode]

logoAlright. Now with this installed, I wanted to also see if Jetbrains Rider would detect – or at least what would I have to do – to have the IDE detect that .NET is now installed. So I opened up the IDE to see what the results would be. Over the left hand side of the new solution dialog, if anything isn’t installed Rider usually will display a message that X whatever needs installed. But it looked like everything is showing up as installed, “yay for things working (at this point)!

rider-01

Next up is to get a solution started with the pertinent projects for what I want to build.

dse2

Kazam_screenshot_00001

For the next stage I created three projects.

  1. InteroperationalBlackBox – A basic class library that will be used by a console application or whatever other application or service that may need access to the specific business logic or what not.
  2. InteroperationalBlackBox.Tests – An xunit testing project for testing anything that might need some good ole’ testing.
  3. InteroperationalBlackBox.Cli – A console application (CLI) that I’ll use to interact with the class library and add capabilities going forward.

Alright, now that all the basic projects are setup in the solution, I’ll go out and see about the .NET DataStax Enterprise driver. Inside Jetbrains Rider I can right click on a particular project that I want to add or manage dependencies for. I did that and then put “dse” in the search box. The dialog pops up from the bottom of the IDE and you can add it by clicking on the bottom right plus sign in the description box to the right. Once you click the plus sign, once installed, it becomes a little red x.

dse-adding-package

Alright. Now it’s almost time to get some code working. We need ourselves a database first however. I’m going to setup a cluster in Google Cloud Platform (GCP), but feel free to use whatever cluster you’ve got. These instructions will basically be reusable across wherever you’ve got your cluster setup. I wrote up a walk through and instructions for the GCP Marketplace a few weeks ago. I used the same offering to get this example cluster up and running to use. So, now back to getting the first snippets of code working.

Let’s write a test first.

[sourcecode language=”csharp”]
[Fact]
public void ConfirmDatabase_Connects_False()
{
var box = new BlackBox();
Assert.Equal(false, box.ConfirmConnection());
}
[/sourcecode]

In this test, I named the class called BlackBox and am planning to have a parameterless constructor. But as things go tests are very fluid, or ought to be, and I may change it in the next iteration. I’m thinking, at least to get started, that I’ll have a method to test and confirm a connection for the CLI. I’ve named it ConfirmConnection for that purpose. Initially I’m going to test for false, but that’s primarily just to get started. Now, time to implement.

[sourcecode language=”csharp”]
namespace InteroperabilityBlackBox
using System;
using Dse;
using Dse.Auth;

namespace InteroperabilityBlackBox
{
public class BlackBox
{
public BlackBox()
{}

public bool ConfirmConnection()
{
return false;
}
}
}
[/sourcecode]

That gives a passing test and I move forward. For more of the run through of moving from this first step to the finished code session check out this

By the end of the coding session I had a few tests.

[sourcecode language=”csharp”]
using Xunit;

namespace InteroperabilityBlackBox.Tests
{
public class MakingSureItWorksIntegrationTests
{
[Fact]
public void ConfirmDatabase_Connects_False()
{
var box = new BlackBox();
Assert.Equal(false, box.ConfirmConnection());
}

[Fact]
public void ConfirmDatabase_PassedValuesConnects_True()
{
var box = new BlackBox(“cassandra”, “”, “”);
Assert.Equal(false, box.ConfirmConnection());
}

[Fact]
public void ConfirmDatabase_PassedValuesConnects_False()
{
var box = new BlackBox(“cassandra”, “notThePassword”, “”);
Assert.Equal(false, box.ConfirmConnection());
}
}
}
[/sourcecode]

The respective code for connecting to the database cluster, per the walk through I wrote about here, at session end looked like this.

[sourcecode language=”csharp”]
using System;
using Dse;
using Dse.Auth;

namespace InteroperabilityBlackBox
{
public class BlackBox : IBoxConnection
{
public BlackBox(string username, string password, string contactPoint)
{
UserName = username;
Password = password;
ContactPoint = contactPoint;
}

public BlackBox()
{
UserName = “ConfigValueFromSecretsVault”;
Password = “ConfigValueFromSecretsVault”;
ContactPoint = “ConfigValue”;
}

public string ContactPoint { get; set; }
public string UserName { get; set; }
public string Password { get; set; }

public bool ConfirmConnection()
{
IDseCluster cluster = DseCluster.Builder()
.AddContactPoint(ContactPoint)
.WithAuthProvider(new DsePlainTextAuthProvider(UserName, Password))
.Build();

try
{
cluster.Connect();
return true;
}
catch (Exception e)
{
Console.WriteLine(e);
return false;
}

}
}
}
[/sourcecode]

With my interface providing the contract to meet.

[sourcecode language=”csharp”]
namespace InteroperabilityBlackBox
{
public interface IBoxConnection
{
string ContactPoint { get; set; }
string UserName { get; set; }
string Password { get; set; }
bool ConfirmConnection();
}
}
[/sourcecode]

Conclusions & Next Steps

After I wrapped up the session two things stood out that needed fixed for the next session. I’ll be sure to add these as objectives for the next coding session at 3pm PST on Thursday.

  1. The tests really needed to more resiliently confirm the integrations that I was working to prove out. My plan at this point is to add some Docker images that would provide the development integration tests a point to work against. This would alleviate the need for something outside of the actual project in the repository to exist. Removing that fragility.
  2. The application, in its “Black Box”, should do something. For the next session we’ll write up some feature requests we’d want, or maybe someone has some suggestions of functionality they’d like to see implemented in a CLI using .NET Core working against a DataStax Enterprise Cassandra Database Cluster? Feel free to leave a comment or three about a feature, I’ll work on adding it during the next session.

Chapter 2 in My Twitch Streaming

A while back I started down the path of getting a Twitch Channel started. At this point I’ve gotten a channel setup which I’ve dubbed Thrashing Code albeit it still just has “adronhall” all over it. I’ll get those details further refined as I work on it more.

Today I recorded a new Twitch stream about doing a twitch stream and created an edited video of all the pieces and cameras and angles. I could prospectively help people get started, it’s just my experiences so far and efforts to get everything connected right. The actual video stream recording is available, and I’ll leave it on the channel. However the video I edited will be available and I’ll post a link here.

Tomorrow will be my first official Twitch stream at 3pm PST. If you’re interested in watching check out my Twitch profile here follow and it’ll ping you when I go live. This first streaming session, or episode, or whatever you want to call it, will include a couple topics. I’ll be approaching these topics from that of someone just starting, so if you join help hold me to that! Don’t let me skip ahead or if you notice I left out something key please join and chat at me during the process. I want to make sure I’m covering all the bases as I step through toward achieving the key objectives. Which speaking of…

Tomorrow’s Mission Objectives

  1. Create a DataStax Enterprise Cassandra Cluster in Google Cloud Platform.
  2. Create a .NET project using the latest cross-platform magical bits that will have a library for abstracting the data source(s), a console interface for using the application, and of course a test project.
  3. Configure & connect to the distributed database cluster.

Mission Stretch Objectives

  1. Start a github repo to share the project with others.
  2. Setup some .github templates for feature request issues or related issues.
  3. Write up some Github Issue Feature requests and maybe even sdd some extra features to the CLI for…??? no idea ??? determine 2-3 during the Twitch stream.

If you’d like to follow along, here’s what I have installed. You’re welcome to a range of tooling to follow along with that is the same as what I’ve got here or a variance of other things. Feel free to bring up tooling if you’re curious about it via chat and I’ll answer questions where and when I can.

  • Ubuntu v18.04
  • .NET core v2.1
  • DataStax Enterprise v6