Tag Archives: database

Cassie Schema Migrator >> CaSMa

A few weeks back I started working on a schema migration tool for Apache Cassandra and DataStax Enterprise. Just for context, here are the short definitions of what each of the elements of CaSMa are.

  • cstar-iconApache Cassandra
    • Definition: Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
    • History: Avinash Lakshman, one of the authors of Amazon’s Dynamo, and Prashant Malik initially developed Cassandra at Facebook to power the Facebook inbox search feature. Facebook released Cassandra as an open-source project on Google code in July 2008. In March 2009 it became an Apache Incubator project. On February 17, 2010 it graduated to a top-level project. Facebook developers named their database after the Trojan mythological prophet Cassandra, with classical allusions to a curse on an oracle.
  • dse-logoDataStax Enterprise
    • Definition: DataStax Enterprise, or routinely just referred to as DSE, is an extended version of Apache Cassandra with multi-model capabilities around graph, search, analytics, and other features like security capabilities and a core data engine 2x speed improvement.
    • History: DataStax was formed in 2009 by Jonathan Ellis and Matt Pfeil and originally named Riptano. In 2011 Riptano changes names to DataStax. For more history check out the Wikipedia page or company page for a timeline of events.
  • command-toolsSchema Migration
    • Definition:In software engineering, schema migration (also database migration, database change management) refers to the management of incremental, reversible changes to relational database schemas. A schema migration is performed on a database whenever it is necessary to update or revert that database’s schema to some newer or older version. Migrations are generally performed programmatically by using a schema migration tool. When invoked with a specified desired schema version, the tool automates the successive application or reversal of an appropriate sequence of schema changes until it is brought to the desired state.
    • Addition reference and related materials:

iconmonstr-twitch-5Over the next dozen weeks or so as I work on this application via the DataStax Devs Twitch stream (next coding session events list) I’ll also be posting some blog posts in parallel about schema migration and my intent to expand on the notion of schema migration specifically for multi-model databases and larger scale NoSQL systems; namely Apache Cassandra and DataStax Enterprise. Here’s a shortlist for the next three episodes;

The other important pieces include the current code base on Github, the continuous integration build, and the tasks and issues.

Alright, now that all the collateral and context is listed, let’s get into at a high level what this is all about.

CaSMa’s Mission

Schema migration is a powerful tool to get a project on track and consistently deployed and development working against the core database(s). However, it’s largely entrenched in the relational database realm. This means it’s almost entirely focused on a schema with the notions of primary and foreign keys, the complexities around many to many relationships, indexes, and other errata that needs to be built consistently for a relational database. Many of those things need to be built for a distributed columnar store, key value, graph, time series, or a million other possibilities too. However, in our current data schema world, that tooling isn’t always readily available.

The mission of CaSMa is to first resolve this gap around schema migration, first and foremost for Apache Cassandra and prospectively in turn for DataStax Enterprise and then onward for other database systems. Then the mission will continue around multi-model systems that should, can, and ought to take advantage of schema migration for graph, and related schema modeling. At some point the mission will expand to include other schema, data, and state management focused around software development and data needs within that state

As progress continues I’ll publish additional posts here on the different data model concepts and nature behind various multi-model database options. These modeling options will put us in a position to work consistently, context based, and seamlessly with ongoing development efforts. In addition to all this, there will be the weekly Twitch sessions where I’ll get into coding and reviewing what coding I’ve done off camera too. Check those out on the DataStax Devs Channel.

If you’d like to get into the project and help out just ping me via Twitter @Adron or message me here.

Bunches of Databases in Bunches of Weeks – PostgreSQL Day 1

May the database deluge begin, it’s time for “Bunches of Databases in Bunches of Weeks”. We’ll get into looking at databases similar to how they’re approached in “7 Databases in 7 Weeks“. In this session I got into a hard look at PostgreSQL or as some refer to it just Postgres. This is the first of a few sessions on PostgreSQL in which I get the database installed locally on Ubuntu. Which is transferable to any other operating system really, PostgreSQL is awesome like that. Then after installing and getting pgAdmin 4, the user interface for PostgreSQL working against that, I go the Docker route. Again, pointing pgAdmin 4 at that and creating a database and an initial table.

Below the video here I’ve added the timeline and other details, links, and other pertinent information about this series.

0:00 – The intro image splice and metal intro with tunes..
3:34 – Start of the video database content.
4:34 – Beginning the local installation of Postgres/PostgreSQL on the local machine.
20:30 – Getting pgAdmin 4 installed on local machine.
24:20 – Taking a look at pgAdmin 4, a stroll through setting up a table, getting some basic SQL from and executing with pgAdmin 4.
1:00:05 – Installing Docker and getting PostgreSQL setup as a container!
1:00:36 – Added the link to the stellar post at Digital Ocean’s Blog.
1:00:55 – My declaration that if Digital Ocean just provided documentation I’d happily pay for it, their blog entries, tutorials, and docs are hands down some of the best on the web!
1:01:10 – Installing Postgesql on Ubuntu 18.04.
1:06:44 – Signing in to Docker hub and finding the official Postgresql Docker Image.
1:09:28 – Starting the container with Docker.
1:10:24 – Connecting to the Docker Postgresql Container with pgadmin4.
1:13:00 – Creating a database and working with SQL, tables, and other resources with pgAdmin4 against the Docker container.
1:16:03 – The hacker escape outtro. Happy thrashing code!

For each of these sessions for the “Bunches of Databases in Bunches of Weeks” series I’ll follow this following sequence. I’ll go through each database in this list of my top 7 databases for day 1 (see below), then will go through each database and work through the day 2, and so on. Accumulating additional days similarly to the “7 Databases in 7 Weeks

Day 1” of the Database, I’ll work toward building a development installation of the particular database. For example, in this session I setup PostgreSQL by installing it to the local machine and also pulled a Docker image to run PostgreSQL.

Day 2” of the respective database, I’ll get into working against the database with CQL, SQL, or whatever that one would use to work specifically with the database directly. At this point I’ll also get more deeply into the types, inserting, and storing data in the respective database.

Day 3” of the respective database, I’ll get into connecting an application with C#, Node.js, and Go. Implementing a simple connection, prospectively a test of the connection, and do a simple insert, update, and delete of some sort against the respective database built on the previous day 2 of the same database.

Day 4” and onward I’ll determine the path and layout of the topic later, so subscribe on YouTube and Twitch, and tune in. The events are scheduled, with the option to be notified when a particular episode is coming on that you’d like to watch here on Twitch.

Next Events for “Bunches of Databases in Bunches of Days

Cedrick Lunven on Creating an API for your database with Rest, GraphQL, gRPC

Here’s a talk Cedrick Lunven (who I have the fortune of working with!) about creating API’s for your database, your distributed database. He starts out with a few objectives for the talk:

  1. Provide you a working API implementing Rest, gRPC, and GraphQL.
  2. Give implementation details through Demo.
  3. Reveal hints to choose and WHY, (specifically to work with Databases)

Other topics include specific criteria around conceptual data models, shifting from relational to distributed columnar store, with differentiation between entities, relationships, queries, and their respective behaviors. All of this is pertinent to our Killrvideo reference application we have too.

Enjoy!

A New Adventure of Multi-model Distribute Graph Time Series […etc…] Database(s) Explorations Begins!

I arrived at the airport, sending a few tweets of this or that nature with all of this Github and Microsoft News. I have a great view out the window from the Alaska Lounge just before heading to the D gates. For you aeronautics fans like myself, here’s a picture of that view and a few of those Alaska Planes with one of the newly acquired Virgin America Planes!

IMG_5264

All this news with Github and Microsoft was easily eclipsing WWDC18 and in the meanwhile little ole’ me is on my way to a new adventure in my career. So priorities what they are, the news being excited, I’m more excited today to announce today I’m joining a most excellent team at DataStax! to bring forth investigation, research, knowledge, ideas, and whatever else I can as a Developer Evangelist with the crew here at DataStax! I’m unbelievably stoked as I’ve been searching for a company that would check all of my “will this work” check boxes for some months now! DataStax won out among the other prospective candidate companies and I’m starting today!

datastax_logo_blue

To kick off this adventure, I’m heading to San Francisco to join in the fun attending DevxCon. I’ll be there a little later today, hopefully in time for the kick off (ya know, pending flights and BART are all timely and such)! Then a full day of the conf, then later will join the team for a visit to DataStax HQ and maybe a few surprises. I’m super excited and ready to bring awesome content your way, while inventing, building, and experimenting my way through some awesome technologies!

Linux Containers

Docker Tips n’ Tricks for Devs – #0001 – 3 Second to Postgresql

The easiest implementation of a docker container with Postgresql that I’ve found recently allows the following commands to pull and run a Postgresql server for you.

docker pull postgres:latest
docker run -p 5432:5432 postgres

Then you can just connect to the Postgresql Server by opening up pgadmin with the following connection information:

  • Host: localhost
  • Port: 5432
  • Maintenance DB: postgres
  • Username: postgres
  • Password:

With that information you’ll be able to connect and use this as a development database that only takes about 3 seconds to launch whenever you need it.

Xamarin and I Are Hella Busy Hacking This Week

This week, along with the normal duties of getting everything from SSL working to code slung for account management to intellectual property (what is that exactly 😮 )…  this week is going to get hella busy. Here’s a few of the public events and training that I’ll be attending this week along with the normal bike n’ hacking n’ gettin’ shit done.

Shared Code Projects, PCL and Xamarin on 7/8/14 @

Intel JFCC Auditorium
2111 N.E. 25th Ave
Hillsboro, OR

James Montemagno  from Xamarin is coming to learn us the deets on how to create common core code that can run on any or all common platforms. Find out the differences between shared code project, portable class libraries, and simple file linking to share more code on iOS, Android, and Windows. This should be pretty kick ass to help kick OrchestrateExecutive off the ground. There’s a little more info here for the event: http://www.padnug.org/.

Database Stuff that aint RDBMS on 7/10/14 @

I’ll @adron be presenting on database types, what’s available out there outside of the relational and RDBMS world. How to resolve various problems with alternate data solutions for better results, better performance and ways to leap around the hurdles that are sometimes faced with RDBMS use.  More info here: http://www.meetup.com/ssdevelopers/events/176032122/

Xamarin Hands-on-Lab/Hackathon on 7/12/14 @

Montgomery Park

Kelly White @mckhendry has put together a hands-on-lab and hackathon, just a few days later hosting a hand-on-lab working with Xamarin to build apps. I’m going to hit up this event too (then go ride a 100+ kilometer bike ride, anybody up for the ride, ping me?) and sling some code on OrchestrateExecutive. Also a little more info here for the event: http://www.padnug.org/.

There’s more, but these are the top few meets I’ll be attending over the next two weeks. Happy hacking!

In-memory Orchestrate Local Development Database

I was talking with Tory Adams @BEZEI2K about working with Orchestrate‘s Services. We’re totally sold on what they offer and are looking forward to a lot of the technology that is in the works. The day to day building against Orchestrate is super easy, and setting up collections for dev or test or whatever are so easy nothing has stood in our way. Except one thing…

Every once in a while we have to work disconnected. For whatever the reason might be; Comcast cable goes out, we decide to jump on a train or one of us ends up on one of those Q400 puddle jumpers that doesn’t have wifi! But regardless of being disconnected from wifi, cable or internet connectivity we still want to be able to code and test!

In Memory Orchestrate Wrapper

Enter the idea of creating an in memory Orchestrate database wrapper. Using something like convict.js one could easily redirect all the connections as necessary when developing locally. That way development continues right along and when the application is pushed live, it’s redirected to the appropriate Orchestrate connections and keys!

This in memory “fake” or “mock” would need to have the key value, events, and graph store setup just like Orchestrate. With the possibility of having this in memory one could also easily write tests against a real fake and be able to test connected or disconnected without mocking. Not to say that’s a good or bad idea, but just one more tool in the tool chest doesn’t hurt!

If something like this doesn’t pop up in the next week or three, I might just have to kick off this project myself! If anybody is interested please reach out to me and let’s discuss! I’m open to writing it in JavaScript, C#, Java or whatever poison pill you’d prefer. (I’m not polyglot to limit my options!!)

Other Ideas, Development Shop Swap

Another idea that I’ve been pondering is setting up a development shop swap. I’ll leave the reader to determine what that means!  😉  Feel free to throw down ideas that this might bring up and I’ll incorporate that into the soon to be implementation. I’ll have more information about that idea right here once the project gets rolling. In the meantime, happy coding!