Tag Archives: software development

Cassie Schema Migrator >> CaSMa

A few weeks back I started working on a schema migration tool for Apache Cassandra and DataStax Enterprise. Just for context, here are the short definitions of what each of the elements of CaSMa are.

  • cstar-iconApache Cassandra
    • Definition: Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
    • History: Avinash Lakshman, one of the authors of Amazon’s Dynamo, and Prashant Malik initially developed Cassandra at Facebook to power the Facebook inbox search feature. Facebook released Cassandra as an open-source project on Google code in July 2008. In March 2009 it became an Apache Incubator project. On February 17, 2010 it graduated to a top-level project. Facebook developers named their database after the Trojan mythological prophet Cassandra, with classical allusions to a curse on an oracle.
  • dse-logoDataStax Enterprise
    • Definition: DataStax Enterprise, or routinely just referred to as DSE, is an extended version of Apache Cassandra with multi-model capabilities around graph, search, analytics, and other features like security capabilities and a core data engine 2x speed improvement.
    • History: DataStax was formed in 2009 by Jonathan Ellis and Matt Pfeil and originally named Riptano. In 2011 Riptano changes names to DataStax. For more history check out the Wikipedia page or company page for a timeline of events.
  • command-toolsSchema Migration
    • Definition:In software engineering, schema migration (also database migration, database change management) refers to the management of incremental, reversible changes to relational database schemas. A schema migration is performed on a database whenever it is necessary to update or revert that database’s schema to some newer or older version. Migrations are generally performed programmatically by using a schema migration tool. When invoked with a specified desired schema version, the tool automates the successive application or reversal of an appropriate sequence of schema changes until it is brought to the desired state.
    • Addition reference and related materials:

iconmonstr-twitch-5Over the next dozen weeks or so as I work on this application via the DataStax Devs Twitch stream (next coding session events list) I’ll also be posting some blog posts in parallel about schema migration and my intent to expand on the notion of schema migration specifically for multi-model databases and larger scale NoSQL systems; namely Apache Cassandra and DataStax Enterprise. Here’s a shortlist for the next three episodes;

The other important pieces include the current code base on Github, the continuous integration build, and the tasks and issues.

Alright, now that all the collateral and context is listed, let’s get into at a high level what this is all about.

CaSMa’s Mission

Schema migration is a powerful tool to get a project on track and consistently deployed and development working against the core database(s). However, it’s largely entrenched in the relational database realm. This means it’s almost entirely focused on a schema with the notions of primary and foreign keys, the complexities around many to many relationships, indexes, and other errata that needs to be built consistently for a relational database. Many of those things need to be built for a distributed columnar store, key value, graph, time series, or a million other possibilities too. However, in our current data schema world, that tooling isn’t always readily available.

The mission of CaSMa is to first resolve this gap around schema migration, first and foremost for Apache Cassandra and prospectively in turn for DataStax Enterprise and then onward for other database systems. Then the mission will continue around multi-model systems that should, can, and ought to take advantage of schema migration for graph, and related schema modeling. At some point the mission will expand to include other schema, data, and state management focused around software development and data needs within that state

As progress continues I’ll publish additional posts here on the different data model concepts and nature behind various multi-model database options. These modeling options will put us in a position to work consistently, context based, and seamlessly with ongoing development efforts. In addition to all this, there will be the weekly Twitch sessions where I’ll get into coding and reviewing what coding I’ve done off camera too. Check those out on the DataStax Devs Channel.

If you’d like to get into the project and help out just ping me via Twitter @Adron or message me here.

Learning Go Episode 4 – Composite Types, Slices, Arrays, Etc.

Episode Post & Video Links:  1, 2, 3, 4 (this post), & 5 (almost done)

If you’d like to go through this material too in book form, I highly suggest “The Go Programming Language” by Alan A.A. Donovan & Brian W. Kernighan. I use it throughout these sessions to provide a guideline. I however add a bunch of other material about IDE’s, development tips n’ tricks and other material.

8:00 Starting the core content with some notes. Composite types, arrays, slices, etc.
11:40 Announcement of my first reload – https://compositecode.blog/2019/02/11/… – second successful time blogged here) of the XPS 15 I have, which – https://youtu.be/f0z1chi4v1Q – actually ended in catastrophe the first time!
14:08 Starting the project for this session.
16:48 Setting up arrays, the things that could be confusing, and setup of our first code for the day. I work through assignment, creation, new vs. comparison, and various other characteristics of working with arrays during this time.

29:36 Creation of a type, called Currency, of type int, setting up constants, and using this kind of like an enumerator to work with code that reads cleaner. Plus of course, all the various things that you might want to, or need for a setup of types, ints, and related composite types like this.

43:48 Creating an example directly from the aforementioned book enumerating the months of the year. This is a great example I just had to work through it a bit for an example.

52:40 Here I start showing, and in the process, doing some learning of my own about runes. I wasn’t really familiar with them before digging in just now!

1:09:40 Here I break things down and start a new branch for some additional examples. I also derail off into some other things about meetups and such for a short bit. Skip to the next code bits at the next time point.
1:23:58 From here on to the remainder of the video I work through a few examples of how to setup maps, how make works, and related coding around how to retrieve, set, and otherwise manipulate the maps one you’ve got them.

That’s it for this synopsis. Until next episode, happy code thrashing and go coding!

DevRel Data: Presentation & Deductions

Before diving into conclusions, let’s take a look at some answers to questions asked. This is a slice of answers, with totals for the charts and such. After a few months of answers I’ll have another follow up to see how things may or may not change.

Do you like video material?

chart

What specifically do you, or would you like to watch in video? Screencasts, short videos, conversational, or some other type of videos?

  • Screencasts/tutorials
  • I love both screencasts going through big topics and short videos that cover smaller tips and gotchas.
  • Videos with a specific outcome as the goal, whether achieved or not. Showing the process of something.. like hey, here’s how you building out a Postgres cluster using streaming replication and repmgr and pgpool… Kind of thing.
  • Bite sized content, maybe 2 minutes, to teach me one thing.
  • Editing. No jokes, no “hey what’s up guys” with 60 second intros. Discuss the problem, then solve it.
  • Demos, learning a new way of doing something
  • Doesn’t matter short or long, but has to be deeply technical with code examples that I can actually apply
  • I watch videos mostly for fun.
  • Screencast
  • Short videos of say 5-10 minutes each covering different concept of the subject matter
  • (videos work best in a classroom setting where time/attention is precommitted, or as part of a tutorial)
  • conversational
  • Short videos.
  • If it’s too long, it ends up on my todo list forever (not good). So shorter is better. And something that benefits from visuals, rather than something that could just be written.
  • I also watch LinkedIn Learning when just starting a new tech. to get a general overview and pick up a tip or tow, then I read books and the Internet from there.
  • short videos

What kind of written material do you like?

chart2

Do you like other material mixed in that details the reason for the tech, the story, or such?

chart3

Is there anything that comes to mind, that you’d like to have me or the team I’m working with (@ DataStax) put together that you’d find useful, entertaining, or related.

  • Place priorities on designing materials for more depth (i.e., more linked material) as well as less attention-nuisance. That’s no criticism of your work, merely the gestalt of where we work — so less noise is a better way to stand out and make materials useful.
  • Maybe focus more on written material – code & architecture material (books, articles) rather than videos and twitch. It is much easier to consume and is easily googlable. Also I’d suggest making blog posts target a specific common issue or question – sometimes I see posts that I don’t really care about or the problem is so narrow that I don’t want to read about it. I’d read about building resilient and highly available architectures in various configurations.
  • Database reliability, scalability, migrations and such stuff is interesting.
  • Anything to do with machine learning.
  • Data model examples, starting up a Cassandra node, configuring YAML, etc

Deductions

I’m going to go backwards through the questions and discuss what I’ve deducted, and in some ways what has surprised me among the answers!

First there’s the “Is there anything that comes to mind, that you’d like to have me or the team I’m working with (@ DataStax) put together that you’d find useful, entertaining, or related.” request and questions.

The answers here didn’t surprise me much at all. Within DevRel from Microsoft to DataStax to Google to many other organizations we have this ongoing battle between “write a whole book on it” or “make it 2 minutes short”. It’s wildly difficult to determine what format, what timing, and what structure material needs to be in for it to be most useful to people. So when I saw the answer that reads, “Place priorities on designing materials for more depth (i.e., more linked material) as well as less attention-nuisance.” I immediately thought, “yeah, for real, but ugh…” it’s difficult. However, I’m working on more thorough material, some of it will be paywalled via LinkedIn Learning or Pluralsight and other material may be available by book in the coming months. But there will be other material that will indeed be long form how to material on how to really put things together – from scratch and from the basis of “we have X thing and need to hack it so we can add a feature”.

The next answer I got in this section that I completely agree with is increasing the focus on written material. I’m making tons of video, and I’ve got that down to the point where it’s actually easier and faster to do most of it then it is to write things down. However I realized, especially from my own point of view, that written material actually ends up being vastly more useful than video material. That’s also why, even with the video material, when I’m covering specifics I aim to provide a linkable timeline and a blog entry with the code and other changes shown in the video. Thanks for reinforcing these efforts and giving me that indirect encouragement to make this process and the results even better. More written material is on its way!

As for the database reliability, scalability, migrations, machine learning, data modeling, Cassandra node starting, and all that it’s in the queue and I’m getting to it as fast as I possibly can.

Next question I asked is, “Do you like other material mixed in that details the reason for the tech, the story, or such?

It appears, albeit not a huge contingent of people, some people are curious about biking, train coding, and making good grub! Hey, that’s groovy cuz I’ve got a show coming out which is basically the behind the scenes videos about all those topics that make the coding and technology hacking possible!

The one outlier in this set however is clearly the request for “Ways to simplify life to dig through those algorithms faster, easier, better?” which I didn’t suspect would be any different then the other answers for this questions. Which left me surprised and ill-prepared on what to do about fulfilling what is clearly a demand. I’ll have to up level my blog posts around algorithms. I did do a couple a long while ago now in “Algorithms 101: Big Sums” which I completed in Go and another I wrote up “Algorithms 101: Roads & Town Centers” which I have 90% of the answer complete but I’ve never finished the blog entry! I guess it’s time to get the algorithm train coupled up and ready to depart!

Then the question, “What kind of written material do you like?

Two options lead by a healthy margin for this question: Demos w/ Write Up and Blog Articles. With this coupled up to the first question it’s clear that written material via blog and demoes via blog should and ought to be top priority. They are, however they’re a whole helluva lot of work, so I only get them produced but so fast. Got some gems coming on Go, Bash, Cassandra, and a few other demo, tech, and historical information.

Next up was single page cheat sheets and documentation, followed closely by books. I kind of expected documentation and books, but wow that single page cheat sheets option is higher rated than I would have suspected and by proxy I’ve immediately added that to my produce this type of material list! I put it in as a very secondary thought but it’s going to get into that increased focus queue.

The last one with some semblance of demand is pamphlet size short form. This one almost seems like a fluke, but I’ll ponder putting together some of these. I know O’Reilly has their short novelette size books which cover a particular topic. They hand these out for free at conferences and seem pretty solid. Maybe I’ll work one of those into the queue? Maybe.

The other three options scraped by with 1%, so somebody was choosing them. So the vi mug isn’t a priority nor the short explainer videos. Which seems in contention with video content demands around shorter content. I guess, explainer videos just doesn’t sound useful!

The next question I just put together a top three of the results, “What specifically do you, or would you like to watch in video? Screencasts, short videos, conversational, or some other type of videos?

  1. Make screen casts.
  2. Make screen casts generally short.
  3. Make screen cases that are short and on a specific and deep dive into a topic.

This seems kind of in conflict with itself, but I’m going to aim for it and try to hone the skill further. So that I can produce screen casts, screen casts that are generally short, and make sure that these screen casts that are short are on a specific and deep dive into a topic. Whew, got it.

Finally, “do you like video material?

chart

At this time, 53.8% of you have said yes. I had guessed it would be around 50%.

I had guessed no would be about 25%, and at 23.1% I wasn’t to far off.

The other respective mishmash of answers made for interesting depth to the questions that followed this question.

Article Summary & TLDR

Produce more topic specific, detailed material around screen casts and blog entries!

End of story.

For more on this information, why I asked, and what I do check out my article titled “Evangelism, Advocacy, and Activism in The Technology Industry” and for some of the big victories for big corporations check out “The Developer Advocates – Observations on Microsoft’s New Competence“.

Let’s Really Discuss Lock In

For to long lock-in has been referred to with an almost entirely negative connotation even though it can be inferred in positive and negative situations. The fact is that there’s a much more nuanced and balanced range to benefits and disadvantages of lock-in. Often this may even be referred to as this or that dependency, but either way a dependency often is just another form of lock in. Weighing those and finding the right balance for your projects can actually lead to lock-in being a positive game changer or something that simply provides one a basis in which to work and operate. Sometimes lock-in actually will provide a way to remove lock-in by providing more choices to other things, that in turn may provide another variance of lock-in.

Concrete Lock-in Examples

The JavaScript Lock-In

IT Security icons. Simplus seriesTake the language we choose to build an application in. JavaScript is a great example. It has become the singular language of the web, at least on the client side. This was long ago, a form of lock-in that browser makers (and standards bodies) chose that dictated how and in which direction the web – at least web pages – would progress.

JavaScript has now become a prominent language on the server side now too thanks to Node.js. It has even moved in as a first class language in serverless technology like AWS’s Lambda. JavaScript is a perfect example of a language, initially being a source of specific lock-in, but required for the client, that eventually expanded to allow programming in a number of other environments – reducing JavaScript’s lock in – but displacing lock in through abstractions to other spaces such as the server side and and serverless functions.

The .NET Windows SQL Server Lock In

IT Security icons. Simplus seriesJavaScript is merely one example, and a relatively positive one that expands one’s options in more ways than limits one’s efforts. But let’s say the decision is made to build a high speed trading platform and choose SQL Server, .NET C#, and Windows Server. Immediately this is a technology combination that has notoriously illuminated in the past * how lock-in can be extremely dangerous.

This application, say it was built out with this set of technology platforms and used stored procedures in SQL Server, locking the application into the specific database, used proprietary Windows specific libraries in .NET with the C# code, and on Windows used IIS specific advances to make the application faster. When it was first built it seemed plenty fast and scaled just right according to the demand at the time.

Fast forward to today. The application now has a sharded database when it hit a mere 8 Terabytes, loaded on two super pumped up – at least for today – servers that have many cores, many CPUs, GPUs, and all that jazz. They came in around $240k each! The application is tightly coupled to a middle tier, that is then sort of tightly coupled to those famous stored procedures, and the application of course has a turbo capability per those IIS Servers.

But today it’s slow. Looking at benchmarks and query times the database is having a hard time dealing with things as is, and the application has outages on a routine basis for a whole variation of reasons. Sometimes tracing and debugging solves the problems quickly, other times the servers just oversubscribe resources and sit thrashing.

Where does this application go? How does one resolve the database loading issues? They’ve already sunk a half million on servers, they’re pegged out already, horizontally scaling isn’t an option, they’re tightly coupled to Window Servers running IIS removing the possibility of effectively scaling out the application servers via container technologies, and other issues. Without recourse, this is the type of lock in that will kill the company if something is changed in a massive way very soon.

To add, this is the description of an actual company that is now defunct. I phrased it as existing today only to make the point. The hard reality is the company went under, almost entirely because of the costs of maintaining and unsustainable architecture that caused an exorbitant lock in to very specific tools – largely because the company drank the cool aid to use the tools as suggested. They developed the product into a corner. That mistake was so expensive that it decimated the finances of the company. Not a good scenario, not a happy outcome, and something to be avoided in every way! This is truly the epitomy of negative lock in.

Of course there’s this distinctive lock in we have to steer clear from, but there’s the lock in associated with languages and other technology capabilities that will help your company move forward faster, easier, and with increasing capabilities. Those are the choices, the ties to technology and capabilities that decision makers can really leverage with fewer negative consequences.

The “Lock In” That Enables

IT Security icons. Simplus seriesOne common statement is, “the right tool for the job”. This is of course for the ideal world where ideal decisions can be made all the time. This doesn’t exist and we have to strive for balance between decisions that will wreck the ship or decisions that will give us clear waters ahead.

For databases we need to choose the right databases for where we want to go versus where we are today. Not to gold plate the solution, but to have intent and a clear focus on what we want our future technology to hold for us. If we intend to expand our data and want to maintain the ability to effectively query – let’s take the massive SQL Server for example – what could we have done to prevent it from becoming a debilitating decision?

A solution that could have effectively come into play would have been not to shard the relational database, but instead to either export or split the data in a more horizontal way and put it into a distributed database store. Start building the application so that this system could be used instead of being limited by the relational database. As the queries are built out and the tight coupling to SQL Server removed, the new distributed database could easily add nodes to compensate for the ever growing size of the data stored. The options are numerous, that all are a form of lock-in, but not the kind that eventually killed this company that had limited and detrimentally locked itself into use of a relational database.

At the application tier, another solution could have been made to remove the ties to IIS and start figuring out a way to containerize the application. One way years ago would have been to move away from .NET, but let’s say that wasn’t really an option for other reasons. The idea to mimic containerization could have been done through shifting to a self-contained web server on Windows that would allow the .NET application to run under a singular service and then have those services spin off the application as needed. This would decouple from IIS, and enable spreading the load more quickly across a set number of machines and eventually when .NET Core was released offer the ability to actually containerize and shift entirely off of Windows Server to a more cost efficient solution under Linux.

These are just some ideas. The solutions of course would vary and obviously provide different results. Above all there are pathways away from negative lock in and a direction toward positive lock in that enables. Realize there’s the balance, and find those that leverage lock in positively.

Nuanced Pedantic Notes:

  • Note I didn’t say all examples, but just that this combo has left more than a few companies out on a limb over the years. There are of course other technologies that have put companies (people actually) in awkward situations too. I’m just using this combo here as an example. For instance, probably some of the most notorious lock in comes from the legal ramifications of using Oracle products and being tied into their sales agreements. On the opposite end of the spectrum, Stack Overflow is a great example of how choosing .NET and scaling with it, SQL Server, and related technologies can work just fine.

Pricing Calculations for Work & Projects

Recently I’ve been paying a lot more attention to the job market as I go through the finding, hiring, and gaining customers routine with a number of different companies, recruiters, and individuals I know within the industry. Conversations kick off a million different ways. In this post I’m going to tackle how I work pricing of a job for myself. Many companies have some criteria based on what others in the company are paid and someone in human resources often has some spreadsheet or whatever that tracks this stuff. This is the flip side of the coin, this is my pricing list. It’s really simple, it’s based on what I’m looking for in life and what a company can give me, which in turn dictates what I can do for a company and aligning with what they want too. In the end, this helps insure we both get a win out of the relationship. Continue reading

Reading Up on Observability and Monitoring

In the next few weeks I’ll be doing some work for Honeycomb.io. In preparation I’ve been ramping up on a number of topics, namely the big one is the specific differences in things like observability and monitoring. The aside that makes this relatively easy, is I live and breath this space from a site reliability, apps, and services perspective. Not only am I fortunate in my situation to dive into this topic, I’m excited to do so as well.

I started this research just checking out what material Honeycomb.io has available onsite and additionally what Charity Majors has written, or spoken about at conferences. The first two were easy to find with a simple Google search and checking out the Honeycomb.io Blog. For each of these materials I’ve provided a summary before each linked reference. Enjoy. Continue reading

I’ve Got a JavaScript & Node.js Webinar, Webstorm Tutorial Videos, Work & Flow With JavaScript Development and More…

Webinar: Node.js Development Workflow in WebStorm

This coming week I’m doing an intro to work and flow with Node.js JavaScript Programming that I’m working with JetBrains on. In the webinar I’ll be covering the following key topics in the webinar:

  • Open an existing project & getting WebStorm configured for running, testing and related working tasks.
  • A quick tour of other IDE features that help with daily work. Some in pretty huge ways.
  • Running WebStorm & debugging Node.js JavaScript applications.
  • Checking out Mocha, how it works and what it gives WebStorm the power to do. Then we’ll write a few tests & implement that code too.

All this will include Q & A throughout and at the end of the webinar. Be sure to register soon!

WebStorm Tutorials: Learning Shortcuts, Customizing Layout and Others

These WebStorm Tutorials have been put together by John Lindquist @johnlindquist for JetBrains. There solid, quick snippets of useful WebStorm usage. Two that I’ve found really useful I’ve included here:

John also has a lot of other great totally kick ass material out there. So check out his blog @ http://johnlindquist.com/ and follow his youtube channel too.

Coming Up in the Near Future, The Work & Flow of JavaScript Development

I have a new course I’m working on right now for Pluralsight, that will take these basic precepts and dive even deeper into the daily workflow of the JavaScript Developer. Whether it’s client side hacking or server side coding, I’ll be diving into a whole lot of JavaScript goodness. If you’d like me to ping you when the course is done, hit me up on Twitter @adron and just let me know. In the meantime get a Pluralsight subscription (free to sign up and at least give it a try) and check out these courses by myself and others.