Tag Archives: cassandra

TRIP REPORT: Accelerate 2019 in Washington DC, I mean National Harbor!

Trip Time.

Today’s trip care of Alaska Airlines Flight 2 out of SEATAC Airport (Seattle & Tacoma’s airport) to National (Reagan) in Alexandria, Virginia. I’ll be staying there and commuting daily across the Potomoc River to Gaylord Resort and Convention Center (at National Harbor). I decided I’d write up something about this trip for a few specific reasons:

  1. I finally purchased a Bromptown Bicycle which I’ve been wanting to attain and use for my trips that require air travel or don’t have enough space for a proper bicycle.
  2. The adventure is entirely new to me, I’ve not been to these locations at any point in my life. New for me, new for those reading this (or adventuring along with me on my Twitch channel).
  3. I also picked up a number of new things that I want to see how they’ll work for streaming while on the go. These include; Android Phone, a new dual Go Pro + Phone mount for the bike, and among these a few existing devices like my trusty set of GoPro Cameras.
  4. I flew over via first class for various reasons. I thus, wanted to share some of the advantages and why I think it’s more than worth it to fly first class vs. coach and why companies should rethink their ideas around this when positions require frequent travel and working on the go.

Leaving Cascadia

The first thing I did was pack up the Brompton. I got a hardshell case to go along with it since I’d read during my research the airlines sometimes will snap off parts of the bike when a softshell case is used. The other advantage, the hardshell case has wheels! Inside this I also put my front mount messenger bag and some bungie cables so I can mount this stuff up to the bike upon arrival.

Once that was packed it was time to get the Mission Workshop ARKIV backpack I have locked and loaded. In my pack, which is the large of the two sizes, I get all my cloths, toothbrush, razors, and related amenities. In the side pouches that I mount up specific for longer trips I put my power brick and other electric plugs I’d need regularly in the quickest to access pouches. The other things go in various assorted pockets here and there. Since this is such a short trip, I also skip the outer backpack laptop pouch and just put the laptop in the inner sleeve.

Stats

Backpack w/ Laptop: 22 lbs. (with laptop)
Hardshell w/ Bike: 32 lbs.

All in all, a fairly heavy load, but the cool thing is with the configuration and post-arrival setup I have there isn’t actually much to carry. Backpack goes on my back and the hardshell case rolls along like a carry on. What makes it even easier, I’ve got an express bus with plenty of space and light rail with special areas specific for luggage like this. My 17x Express arrives on time, I board and ride off with my pack and hard shell sitting right next to me.

When I arrive downtown I merely pack up and roll downstairs to the Sound Transit LINK, board the train and off to the airport I go. No need to mess with a driver, no need for chatter or worrying about the implications of social anxiety or evils of clicking “don’t talk to me uber driver”. Just board and go. Then, read a book, check your phone, or whatever comes to mind. That’s what I do.

At the airport I strolled and rolled into the first class lounge, which I attempted to record via my new Android with the Twitch app. It… went oddly I’m assuming. Let’s take a look here.

Once I got situated in the lounge I made some pancakes – a tradition I have now – and sat down for some coding. The seats are comfortable, the views are great, and along with the coding I get to nerd out on all the planes taking on and off. At least, when one is flying in and out of C Gate at SEATAC. N Gates are kind of “meh”.

Eventually I left the relaxing lounge and headed into the boarding area of C Gates. The Alaska Air 737-900 arrived and started deplaning. With deplaning, boarding, and refueling done for the trip back east to DC we headed back out on the tarmac to queue up 15th in line to take off. Check that out, total plane traffic jam!

IMG_20190520_140122

Once in the air we flew through some piddly turbulence and into more clouds. Clearing 10,000 foot laptops came out and a little bit more coding resumed. In addition I started this post, took a few pictures, and knocked out a few other things I needed to do.

After a while food and drink services began. In first class anything over an hour can safely assume a meal will be served. This time it was tortellini or a sandwich of some sort. I got the tortellini. The meal is then served in three parts. Starting with a little salad and soup, entree, and then wrapped up with a desert.

The soup was tasty, I was somewhat surprised by this. Where as the salad was merely a salad with some cherry tomatoes, carrots, and greens. Nothing real special, but then of course it’s a salad so not like there’s much expectation.

The tortellini was pretty good. Even in comparison to other food outside of the airlines. A little salt and pepper brought it up just slightly to something I’d even have been happy with in an actual restaurant!

Finally we wrapped up with some Salt & Straw for desert. Considering this is an airplane I was kind of amazed they’d get Salt & Straw, but then again, Alaska Airlines does like to play to the local products and all!

After food, a couple more hours of coding and prep for the oncoming days of Accelerate.

Arrival in the District of Columbia

I arrived in DC, retrieved my Brompton and racked up the case it packs in and threw my bag on the front. Now for a 26 minute bike ride from the airport to Alexandria.

the-path.png

On the way, the setting was magnificent with honey suckle providing a divine fragrance while I road along the bike trail along the Potomac River. The moon shined down, almost full, and in spectacular fashion!

Eventually I arrived at my new home for the week. The ride a success, an experiment that it was.

Bootcamp!

NOTE: I am an employee at DataStax, just so you know, in case you didn’t know. I always do my best to give you the direct details, but just so you don’t think I’m being a shill here. Some people don’t seem to be able to determine how people and occupations are correlated, so I like to keep things on the up and up.

First day, or maybe it’s zero day on account of zero based indexes and all, bootcamp kicked off!

In the boot camp we covered a lot of material to get attendees up to speed on Apache Cassandra. To boot, Patrick McFadin announced that everybody would get to use DataStax Constellation, our new Cassandra as a Service offering – currently in test. The awesomeness about this whole bootcamp was that we provided Constellation for everybody, without a blip on the radar! No system issues came up, albeit we crossed a few programmatic network wires that were crisscrossed but that got remedied in seconds. With that all wrapped up, released, with a bow on top, bootcamp went off without a hitch. Also a huge shout out to the dozens of team members that provided support throughout the room of 300+ attendees!

Good times in success!

Day 1 – Announcing DataStax Constellation

The first day, based on our zero based index numbering of conference days, started with Billy Bosworth CEO of DataStax giving keynote number one.

In the keynote Billy talks about the direction of DataStax and the upcoming releases, and current releases as of Accelerate 2019. Then Chelsea Navo joins Billy to do a LIVE – emphasis on a LIVE demo of DataStax Enterprise (i.e. Apache Cassandra and all the goodies) running multi-cloud in Azure, AWS, and GCP.

9:23 – Demo of DataStax Enterprise – Multi-cloud in real life. “Not a pretend demo!

15:17 – Chealsea shows how we introduced a little chaos into the mix, and introduces the ability to simply and easily bring a datacenter down. In realtime, as the related reads and writes are occurring. Nothing stops, not even a blip… whoops, did I spoil it? Give it a watch, it’s a solid keynote demo!

At the 20 minute mark, Billy introduced DataStax Constellation. Watch it, learn more, etc. Following that Billy talks about Insights, which will be built in and services based AI, system health, and related capabilities within the cloud offering.

After the keynote, everybody broke out into technical sessions on a wide, very wide range of topics. From Apache Cassandra to DataStax to Kafka to Vue.js! Great day!

Day 2 – Apache Cassandra v4.0

On day two Billy starts off the keynotes, and introduces others including Nate McCall. Nate is the Apache Cassandra PMC Chair & committer to the project. He dove into the new features, capabilities, and changes of v4.

Next up is DataStax CTO (and founder!) and Apache Cassandra committer of yore, and more, Jonothan Ellis! (video is time point linked below so you can dive right into the talk).

After the keynotes more technical sessions. I attended some architecture discussions around graph and related technology. Lots of good conversations. I really enjoyed it, and to wrap it all up that evening we had an ending keynote with Keren Elazari.

Departure

I had a great time, and as I always like a little lagniappe. Here’s some photos from the trip back. If you’d like to join DataStax Accelerate for 2020, give a good look at the upcoming conference next year!

 

In Flight to Apache Cassandra Days

Another flight down to the bay area. Today it was Alaska Air Flight 330 from Seattle to San Jose. It was mostly a clear day at start, with a solid layer of bright cloud cover exiting Washington on the way down to Oregon. As we crossed over that arbitrary human defined line of Oregon and California, nature presented us with even more perfectly glowing bright cloud cover. This is Cascadia after all and it’s basically covered in clouds the majority of the time. On departure I also noted Bremerton has three aircraft carriers in dock along with a normal plethora of other naval vessels. The amount of naval power in the area is always pretty awe inspiring.

Why was I in flight once again? I am heading down to teach with Jeff Carpenter (@jscarp) at the South Bay Cassandra User Group‘s Cassandra Day events. These are single day events, where we cover an introduction to Apache Cassandra, concepts of data-modeling for Apache Cassandra, and then a wrap up of application development with the respective drivers. Now if you aren’t in Santa Clara – or ya know Menlo Park, San Jose, Oakland, San Francisco, or well, the surrounding area – there are other days scheduled! We also have days scheduled that aren’t even located in the Bay, so check out the full list of events:

https://www.datastax.com/company/events

NOTE: If you’re interested in Seattle, Portland, or Vancouver BC area events, scroll all the way down to the end of this blog entry I’ve got more details for you!

Introduction to Apache Cassandra

In the introduction to Apache Cassandra we cover an overview of the architecture and features of the distributed database. Starting off with a definition of a distributed hash ring and how this is used in Apache Cassandra to provide data storage across the nodes that make up the Apache Cassandra Database. Moving on we’ll get into the other capabilities, trade offs of data replication between nodes, configuration settings, and a lot more.

Data Modeling

For data modeling we start off with a short review of relational database data modeling to provide something that is more familiar for many people. From this, we then build off of many concepts around denormalization, breaking apart various levels of normalization forms, and then get into the thinking and approach behind modeling an application in a distributed database and go deeper with details around Apache Cassandra.

Application Development

For application development, focusing around the Java language and technology stack, we’ll start with some concepts around how the drivers connect to and work with Apache Cassandra. We’ll open up some code too, get into some code changes and additions, to get more familiar with how the driver works and some of the capabilities of the driver itself.

Most of the code, concepts, and related material in use around Java and the tech stack are directly usable on C#, JavaScript, and even using the community open source Go CQL Library.

Coming soon…

In the coming weeks (ok, maybe a month or two) we’ll be updating this material for Apache Cassandra v4 and additionally, I’m aiming to line up some half day and probably some full day workshops in the Cascadian region: Portland, Seattle, and Vancouver BC. They’ll be almost identical except for a few tweaks, but you’ll have to RSVP to find out the details!

Also, if you’re in between any of those cities and have a stop on the Amtrak Cascades, let me know and we’ll get an RSVP list started for your city and see if we can get the required attendee count to make it official!

Survey of Go Libraries for Database Work

Over the past few months I’ve picked up a number of libraries in the Go ecosystem to help me get work done around database engineering. These libraries are ones that I have used to do a range of work primarily around Apache Cassandra, DataStax Enterprise, PostgreSQL, and to a lesser degree MS SQL Server, MySQL, and others. The following is a survey of libraries that I’ve found to be pretty solid for getting the job done.

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering (5)

I’ve broken the follow tooling libraries out into the following categories:

  • Observability, Monitoring, & Insight – I created this section, and added libraries to it based specifically on the specific and peculiarly pedantic nature of observability in light of monitoring that work to provide insight into one’s applications they’re responsible for. For additional information about observability check out the Wikipedia article on the topic observability, it’s a great starting point. For monitoring however it gets more specific with a breakdown of monitoring types: application performance monitoring, network monitoring, system monitoring, and business transaction monitoring. The libraries in this section apply to some or all of the criteria in this definitions.
  • Data Schema Migration – Managing one’s data schema for a database, even really, truly, honestly if you have a schema-less system you still need to manage the underlying schema at some level.
  • Flow, Pipelines, Extraction, Transformation, and Loading – This section is mutative in the sense that it includes a lot of various types of libraries that have a very wide range of work to do and they offers a plethora of ways to do this work. Creating pipelines, to flow sequences, to extraction and transformation, to standard bulk loading. These libraries provide ways to get the data where you need it when you need it there in effective and reliable ways.
  • Database Backup Libraries – There are a zillion different things to maintaining effective and useful database backups; onsite storage, offsite storage, rotation periods, transmission & security control, scheduling, full or differential, and other topics of concern. One of the most important and often overlooked aspect of database backups is actually restoring the database from backup! These libraries can be used to get those backups, automate, and implement restoration of data in a more seamless way.
  • Database Drivers – At the core of any programmable automation of databases, one needs to have some way to connect to and work with the databases they’re automating, that’s where database drivers come into play. For Go, there’s a ton of support on every relatively known database in existence. MS SQL, Apache Cassandra, PostgreSQL, and dozens more!

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering

Veneur – Largely used by and originating from Stripe. This library works as a distributed, fault tolerant pipeline for data emitted from run time on systems and services throughout your environment. It has server implementations of the DogStatsD protocol or SSF (Sensor Sensibility Format) for aggregating metrics and sending these metrics for storage or via sinks to various other systems. The system can also works up histograms, sets, and counters as global aggregator.

TLDR;

Veneur is a convenient sink for various observability primitives with lots of outputs!

Honeycomb.io – Honeycomb I did some work for back in February of 2018 and gotta say I loved the team. Charity @mipsytipsy, Christine @cyen, Ben @maplebed and crew are tops! Friendly, wildly smart, and humble thrown in for good measure. With that said, I’m also a fan of the product. It’s a solid high cardinality, query and event intake system for observability. There are libraries for Go as well as others, and it’s pretty easy to use the library to setup ingest for appropriately instrumented applications.

TLDR;

Honeycomb.io is a Saas tool with available libraries for Go to provide observability insight and data collection for your applications!

OpenCensus – This framework and toolsetprovides ways to get telemetry out of your services. Currently  there are libraries for a number of languages that allow you to capture, manipulate, and export metrics and distributed traces to your data store of choice. The key idea is that OpenCensus works via tracing through the course of events in an application and that data is logged for awareness, insight, and thus observability of your systems.

TLDR;

OpenCensus is a library that provides ways to gather telemetry for your services and store it in your choice of a location.

RxGo – This library is a reactive extensions built for Go. This one is as much a programming concept as it is a way to enhance and specifically focus on observability, so let’s take a look at the intro example they’ve got on the actual repo README.md itself.

ReactiveX, or Rx for short, is an API for programming with observable streams. This is a ReactiveX API for the Go language.

ReactiveX is a new, alternative way of asynchronous programming to callbacks, promises and deferred. It is about processing streams of events or items, with events being any occurrences or changes within the system.

In Go, it is simpler to think of a observable stream as a channel which can Subscribe to a set of handler or callback functions.

The pattern is that you Subscribe to an Observable using an Observer:

subscription := observable.Subscribe(observer)

An Observer is a type consists of three EventHandler fields, the NextHandlerErrHandler, and DoneHandler, respectively. These handlers can be evoked with OnNextOnError, and OnDone methods, respectively.

The Observer itself is also an EventHandler. This means all types mentioned can be subscribed to an Observable.

nextHandler := func(item interface{}) interface{} {
    if num, ok := item.(int); ok {
        nums = append(nums, num)
    }
}

// Only next item will be handled.
sub := observable.Subscribe(handlers.NextFunc(nextHandler))

TLDR;

RxGo are the reactive extensions that make it easier to go full scale and spectrum observability, with significantly greater insight into your applications over time and the events they execute.

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering (1)

Go-Migrate – This library is written in Go and handles data schema migrations for a significant number of databases; PostgreSQL, MySQL, SQLite, RedShift, Neo4j, CockroadDB, and that’s just a few.

Example:

migrate -source file://path/to/migrations -database postgres://localhost:5432/database up 2

TLDR;

Go-Migrate is an open source library that can be used via CLI or in code to manage all your schema migration needs.

Gocqlx Migrate – This library primarily provides extensions to the Go CQL driver library, and one of those extensions specifically is a data-schema migration functionality.

Example:

package main

import (
    "context"

    "github.com/scylladb/gocqlx/migrate"
)

const dir = "./cql" 

func main() {
    session := CreateSession()
    defer session.Close()

    ctx := context.Background()
    if err := migrate.Migrate(ctx, session, dir); err != nil {
        panic(err)
    }
}

TLDR;

Gocqlx Migrate is a feature of the Gocqlx extensions library that can be used for schema migrations from within code.

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering (2)

Pachyderm – (Open Source Repo) A pachyderm is

a very large mammal with thick skin, especially an elephant, rhinoceros, or hippopotamus.

So it is kind of a fitting name for this library. The library, the project itself, has found funding and bills itself as “Scalable, Reproducible Data Science“. I’ve used it minimally myself, but find it continually popping up on my “use this tool because you’ll need a ton of the features” list.

TLDR;

Pachyderm is an open source library, and paired capital funded company, that does indeed provide scalable, reproducible data science in addition to being a great library for your ETL and related data management needs.

Reflow – This library provides incremental data processing in the cloud. Providing this ability gives scientists and engineers the ability to put tools together, packaged in Docker images, using programming constructs. The library then evaluates the programs transparently parallelizing the work and memoizing results – i.e. using go routines and caching data appropriately to speed up tasks. The library was created at GRAIL to manage our NGS (next generation sequencing) bioinformatics workloads on AWS, but has also been used for many other applications, including model training and ad-hoc data analyses. Severl of Reflow’s key features include:

  • functional, lazy, type-safe Domain Specific Language (DSL) for writing workflow programs.
  • the runtime for the DSL evaluates incrementally, coordinating cluster execution, and memoization.
  • a cluster scheduler to dynamically provision and tear down resources in the cloud (currently AWS is supported).
  • with containers the same processing workloads can also be executed locally.

TLDR;

Reflow provides a way for data scientists, and by proxy database administrators, data programmers, programmers, and anybody that needs to work through ETL or related work to write programs against that data in the cloud or locally.

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering (3)

Restic (Github) – Restic is a backup CLI and Go library that will backup to a number of sources, a few including; local directory, sftp, http REST, S3, Google Cloud Storage, Azure Blob Storage, and others.

Restic follows several objectives:

  • The tool aims to be easy, with minimal singular steps to execute a backup.
  • The tool aims to be fast, using appropriate mechanisms to ensure speedy backups.
  • The tool aims to provide verifiable backups that can easily be restored.
  • The tool aims to incorporate cryptographic guarantees of confidentiality to make sure the backups are secure.
  • The tool aims to be efficient with additional snapshots only taking the storage of the actual increment and de-duplicated to save space in the storage back end.

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering (4)

For each of these there’s a particular single driver that I use for each. Except in the case of Apache Cassandra and DataStax Enterprise I have also picked up gocqlx to add to my gocql usage.

PostgreSQL – Features:

  • SSL
  • Handles bad connections for database/sql
  • Scan time.Time correctly (i.e. timestamp[tz], time[tz], date)
  • Scan binary blobs correctly (i.e. bytea)
  • Package for hstore support
  • COPY FROM support
  • pq.ParseURL for converting urls to connection strings for sql.Open.
  • Many libpq compatible environment variables
  • Unix socket support
  • Notifications: LISTEN/NOTIFY
  • pgpass support

Gocql & Gocqlx

Gocql Features:

  • Modern Cassandra client using the native transport
  • Automatic type conversions between Cassandra and Go
    • Support for all common types including sets, lists and maps
    • Custom types can implement a Marshaler and Unmarshaler interface
    • Strict type conversions without any loss of precision
    • Built-In support for UUIDs (version 1 and 4)
  • Support for logged, unlogged and counter batches
  • Cluster management
    • Automatic reconnect on connection failures with exponential falloff
    • Round robin distribution of queries to different hosts
    • Round robin distribution of queries to different connections on a host
    • Each connection can execute up to n concurrent queries (whereby n is the limit set by the protocol version the client chooses to use)
    • Optional automatic discovery of nodes
    • Policy based connection pool with token aware and round-robin policy implementations
  • Support for password authentication
  • Iteration over paged results with configurable page size
  • Support for TLS/SSL
  • Optional frame compression (using snappy)
  • Automatic query preparation
  • Support for query tracing
  • Support for Cassandra 2.1+ binary protocol version 3
    • Support for up to 32768 streams
    • Support for tuple types
    • Support for client side timestamps by default
    • Support for UDTs via a custom marshaller or struct tags
  • Support for Cassandra 3.0+ binary protocol version 4
  • An API to access the schema metadata of a given keyspace

Gocqlx Features:

  • Binding query parameters form struct or map
  • Scanning results directly into struct or slice
  • CQL query builder (package qb)
  • Super simple CRUD operations based on table model (package table)
  • Database migrations (package migrate)

Go-MSSQLDB – Features:

  • Can be used with SQL Server 2005 or newer
  • Can be used with Microsoft Azure SQL Database
  • Can be used on all go supported platforms (e.g. Linux, Mac OS X and Windows)
  • Supports new date/time types: date, time, datetime2, datetimeoffset
  • Supports string parameters longer than 8000 characters
  • Supports encryption using SSL/TLS
  • Supports SQL Server and Windows Authentication
  • Supports Single-Sign-On on Windows
  • Supports connections to AlwaysOn Availability Group listeners, including re-direction to read-only replicas.
  • Supports query notifications

So this is just a few of the libraries I use, have worked with, and suggest checking out if you’re delving into database work and especially building systems around databases for reliability and related efforts.

If you’ve got other libraries that you’ve used, or really like, definitely leave a comment and let me know and I’ll update the post to include new libraries for Go. Subscribe to the blog too as I’ve got more posts in the cooker for database work, Go libraries and usage with databases, and a lot more. Happy thrashing code!

Let’s Talk Top 7 Options for Database Gumbo

When one starts to dig into databases things get really complex really fast. There’s not only a whole plethora of database companies and projects, but database types, storage engines, and other options and functionality to choose from. One place to get a start is just to take a look at the crazy long list of databases on db-engines. In this post I’m going to take a look at a few of the top database engines to create a starting point – which I’ll reference – for future video streaming coding sessions (follow me @ twitch.tv/adronhall).

My Options for Database Gumbo

  1. Apache Cassandra / DataStax Enterprise
  2. Postgresql
  3. SQL Server
  4. Elasticsearch
  5. Redis
  6. SQLite
  7. Dynamo DB

The Reasons

Ok, so the list is as such, and as stated it’s my list. There are a lot of databases, and of course some are still more used such as Oracle. However here’s some of the logic and reasoning behind my choices above.

Oracle

First off I feel like I need to broach the Oracle topic. Mostly because of their general use in industry. I’m not doing anything with Oracle now, nor have I for years for a long, long, LONG list of reasons. Using their software tends to be buried in bureaucratic, oddly broken and unnecessary usage today anyway. They use predatory market tactics, completely dishonorable approach to sales and services, as well as threatening and suing people for doing benchmarks, and a host of other practices. In face to face experiences, Oracle tends to give off experiences, that Lawrence from Office Space would say, “naw man, I think you’d get your ass kicked for that!” and I agree. Oracle’s practices are too often disgusting. But even from the purely technical point of view, the Oracle Database and ecosystem itself really isn’t better than other options out there. It is indeed a better, more intelligently strategic and tactical option to use a number of alternatives.

Apache Cassandra / DataStax Enterprise

This combo has multiple reasons and logic to be on the list. First and foremost, much of my work today is using DataStax Enterprise (DSE) and Apache Cassandra since I work for DataStax. But it’s important to know I didn’t just go to DataStax because I needed a job, but because I chose them (and obviously they chose me by hiring me) because of the team and technology. Yes, they pay me, but it’s very much a two way street, I advocate Cassandra and DSE because I personally know the tech is top tier and solid.

On the fact that Apache Cassandra is top tier and solid, it is simply the remaining truly masterless distributed database that provides a linear path of scalability on the market that you can use, buy support for, and is actually actively and knowingly maintained not just by DataStax but by members of the community. One could make an argument for MongoDB but I’ll maybe elaborate on that in the future.

In addition to being a solid distributed database there are capabilities inherent in Apache Cassandra because of the data types and respective the CQL (Cassandra Query Language) that make it a great database to use too. DataStax Enterprise extends that to provide spatial (re: GIS/Geo Data/Queries), graph data, analytics engine, and more built on other components like SOLR and related technology. Overall a great database and great prospective combinations with the database.

Postgresql

Postgres is a relational database that has been around for a long time. It’s got some really awesome features like native JSON support, which I’m a big fan of. But I digress, there’s tons of other material that lays out thoroughly why to use Postgres which I very much agree with.

Just from the perspective of the extensive and rich data types Postgres is enough to be put on this list, but considering there are a lot of reasons around multi-tenancy, scalability, and related characteristics that are mostly unique to Postgres it’s held a solid position.

SQL Server

This one is on my list for a few reasons that have nothing to do with features or capabilities. This is the first database I was responsible for in its entirety. Administration, queries, query tuning, setup, and developer against with the application tier. I think of all my experience, this database I’ve spent the most time with, with Apache Cassandra being a close second, then Postgres and finally Riak.

Kind of a pattern there eh? Relational, distributed, relational, distributed!

The other thing about SQL Server however is the integrations, tooling, and related development ecosystem around SQL Server is above and beyond most options out there. Maybe, with a big maybe, Oracle’s ecosystem might be comparable but the pricing is insanely different. In that SQL Server basically can carry the whole workload, reporting, ETL, and other feature capabilities that the Oracle ecosystem has traditionally done. Combine SQL Server with SSIS (SQL Server Integration Services), SSRS (SQL Server Reporting Services), and other online systems like Azure’s SQL Database and the support, tooling, and ecosystem is just massive. Even though I’ve had my ins and outs with Microsoft over the years, I’ve always found myself enjoying working on SQL Server and it’s respective tooling options and such. It’s a feature rich, complete, solidly, and generally well performing relational database, full stop.

Elasticsearch

Ok, this is kind of a distributed database of sorts but focused more exclusively (not totally since it’s kind of expanded its roles) search engine. Overall I’ve had good experiences with Elasticsearch and it’s respective ELK (or Elastic ecosystem) of tooling and such, with some frustrating flakiness here and there over the years. Most of my experience has come from an operational point of view with Elasticsearch. I’ve however done a fair bit of work over the years in supporting teams that are doing actual software development against the system. I probably won’t write a huge amount about Elasticsearch in the coming months, but I’ll definitely bring it up at certain times.

Redis / SQLite / DynamoDB

These I’ll be covering in the coming months. For Redis and DynamoDB I have wanted to dig in for some comparison analysis from the perspective of implementing data tiers against these databases, where they are a good option, and determining where they’re just an outright bad option.

For SQLite I’ve used it on and off for many years, but have wanted to sit down and just learn it and try out some of its features a bit more.

Twitz Coding Session in Go – Cobra + Viper CLI Wrap Up + Twitter Data Retrieval

Part 3 of 3 – Coding Session in Go – Cobra + Viper CLI for Parsing Text Files, Retrieval of Twitter Data, and Exports to various file formats.

UPDATED PARTS:

  1. Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files
  2. Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation
  3. Twitz Coding Session in Go – Cobra + Viper CLI Wrap Up + Twitter Data Retrieval (this post)

Updated links to each part will be posted at bottom of  this post when I publish them. For code, written walk through, and the like scroll down below the video and timestamps.

0:54 The thrashing introduction.
3:40 Getting started, with a recap of the previous sessions but I’ve not got the sound on so ignore this until 5:20.
5:20 I notice, and turn on the volume. Now I manage to get the recap, talking about some of the issues with the Twitter API. I step through setup of the app and getting the appropriate ID’s and such for the Twitter API Keys and Secrets.
9:12 I open up the code base, and review where the previous sessions got us to. Using Cobra w/ Go, parsing and refactoring that was previously done.
10:30 Here I talk about configuration again and the specifics of getting it setup for running the application.
12:50 Talking about Go’s fatal panic I was getting. The dependency reference to Github for the application was different than what is in application and don’t show the code that is actually executing. I show a quick fix and move on.
17:12 Back to the Twitter API use by using the go-twitter library. Here I review the issue and what the fix was for another issue I was having previous session with getting the active token! Thought the library handled it but that wasn’t the case!
19:26 Now I step through creating a function to get the active oath bearer token to use.
28:30 After deleting much of the code that doesn’t work from the last session, I go about writing the code around handling the retrieval of Twitter results for various passed in Twitter Accounts.

The bulk of the next section is where I work through a number of functions, a little refactoring, and answering some questions from the audience/Twitch Chat (working on a way to get it into the video!), fighting with some dependency tree issues, and a whole slew of silliness. Once that wraps up I get some things committed into the Github repo and wrap up the core functionality of the Twitz Application.

58:00 Reviewing some of the other examples in the go-twitter library repo. I also do a quick review of the other function calls form the library that take action against the Twitter API.
59:40 One of the PR’s I submitted to the project itself I review and merge into the repo that adds documentation and a build badge for the README.md.
1:02:48 Here I add some more information about the configuration settings to the README.md file.

1:05:48 The Twitz page is now updated: https://adron.github.io/twitz/
1:06:48 Setup of the continuous integration for the project on Travis CI itself: https://travis-ci.org/Adron/twitz
1:08:58 Setup fo the actual travis.yml file for Go. After this I go through a few stages of troubleshooting getitng the build going, with some white space in the ole’ yaml file and such. Including also, the famous casing issue! Ugh!
1:26:20 Here I start a wrap up of what is accomplished in this session.

NOTE: Yes, I realize I spaced and forgot the feature where I export it out to Apache Cassandra. Yes, I will indeed have a future stream where I build out the part that exports the responses to Apache Cassandra! So subcribe, stay tuned, and I’ll get that one done ASAP!!!

1:31:10 Further CI troubleshooting as one build is green and one build is yellow. More CI troubleshooting! Learn about the travis yaml here.
1:34:32 Finished, just the bad ass outtro now!

The Codez

In the previous posts I outlined two specific functions that were built out:

  • Part 1 – The config function for the twitz config command.
  • Part 2 – The parse function for the twitz parse command.

In this post I focused on updating both of these and adding additional functions for the bearer token retrieval for auth and ident against the Twitter API and other functionality. Let’s take a look at what the functions looked like and read like after this last session wrap up.

The config command basically ended up being 5 lines of fmt.Printf functions to print out pertinent configuration values and environment variables that are needed for the CLI to be used.

The parse command was a small bit changed. A fair amount of the functionality I refactored out to the buildTwitterList() and exportFile, and rebuildForExport functions. The buildTwitterList() I put in the helper.go file, which I’ll cover a littler later. But in this file, which could still use some refactoring which I’ll get to, I have several pieces of functionality; the export to formats functions, and the if else if logic of the exportParsedTwitterList function.

Next up after parse, it seems fitting to cover the helpers.go file code. First I have the check function, which simply wraps the routinely copied error handling code snippet. Check out the file directly for that. Then below that I have the buildTwitterList() function which gets the config setting for the file name to open to parse for Twitter accounts. Then the code reads the file, splits the results of the text file into fields, then steps through and parses out the Twitter accounts. This is done with a REGEX (I know I know now I have two problems, but hey, this is super simple!). It basically finds fields that start with an @ and then verifies the alphanumeric nature, combined with a possible underscore, that then remove unnecessary characters on those fields. Wrapping all that up by putting the fields into a string/slice array and returning that string array to the calling code.

The next function in the Helpers.go file is the getBearerToken function. This was a tricky bit of code. This function takes in the consumer key and secret from the Twitter app (check out the video at 5:20 for where to set it up). It returns a string and error, empty string if there’s an error, as shown below.

The code starts out with establishing a POST request against the Twitter API, asking for a token and passing the client credentials. Catches an error if that doesn’t work out, but if it can the code then sets up the b64Token variable with the standard encoding functionality when it receives the token string byte array ( lines 9 and 10). After that the request then has the header built based on the needed authoriztaion and content-type properties (properties, values? I don’t recall what spec calls these), then the request is made with http.DefaultClient.Do(req). The response is returned, or error and empty response (or nil? I didn’t check the exact function signature logic). Next up is the defer to ensure the response is closed when everything is done.

Next up the JSON result is parsed (unmarshalled) into the v struct which I now realize as I write this I probably ought to rename to something that isn’t a single letter. But it works for now, and v has the pertinent AccessToken variable which is then returned.

Wow, ok, that’s a fair bit of work. Up next, the findem.go file and related function for twitz. Here I start off with a few informative prints to the console just to know where the CLI has gotten to at certain points. The twitter list is put together, reusing that same function – yay code reuse right! Then the access token is retrieved. Next up the http client is built, the twitter client is passed that and initialized, and the user lookup request is sent. Finally the users are printed out and below that a count and print out of the count of users is printed.

I realized, just as I wrapped this up I completely spaced on the Apache Cassandra export. I’ll have those post coming soon and will likely do another refactor to get the output into a more usable state before I call this one done. But the core functionality, setup of the systemic environment needed for the tool, the pertinent data and API access, and other elements are done. For now, that’s a wrap, if you’re curious about the final refactor and the Apache Cassandra export then subscribe to my Twitch @adronhall and/or my YouTube channel ThrashingCode.

UPDATED SERIES PARTS

    1. Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files
    2. Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation
    3. Twitz Coding Session in Go – Cobra + Viper CLI Wrap Up + Twitter Data Retrieval (this post)

     

Creating Distributed Database Application Starter Kits

I’ve boarded a bus, and as always, when I board a bus I almost always code. Unless of course there are people I’m hanging out with then I chit chat, but right now this is the 212 and I don’t know anybody on this chariot anyway. So into the code I go.

I’ve been re-reviewing the Docker and related collateral we offer at DataStax. In that review it seems like it would be worth having some starter kit applications along with these “default” Docker options. This post I’ve created to provide the first language & tech stack of several starter kits I’m going to create.

Starter Kit – The Todo List Template

This first set of starter kits will be based upon a todo list application. It’s really simple, minimal in features, and offers a complete top to bottom implementation of a service, and an application on top of that service all built on Apache Cassandra. In some places, and I’ll clearly mark these places, I might add a few DataStax Enterprise features around search, analytics, or graph.

The Todo List

Features: The following detail the features, from the users perspective, that this application will provide. Each implementation will provide all of these features.

  • A user wants to create a user account to create todo lists with.
  • A user wants to be able to store a username, full name, email, and some simple notes with their account.
  • A user wants to be able to create a todo list that is identified by a user defined name. (i.e. “Grocery List”, “Guitar List”, or “Stuff to do List”)
  • A user want to be able to logout and return, then retrieve a list from a list of their lists.
  • A user wants to be able to delete a todo list.
  • A user wants to be able to update a todo list name.
  • A user wants to be able to add items to a todo list.
  • A user wants to be able to update items in the todo list.
  • A user wants to be able to delete items in a todo list.

Architecture: The following is the architecture of the todo list starter kit application.

  • Database: Apache Cassandra.
  • Service: A small service to manage the data tier of the application.
  • User Interface: A web interface using React/Vuejs ??

As you can see, some of the items are incomplete, but I’ll decide on them soon. My next review is to check out what I really want to use for the user interface, and also to get a user account system figured out. I don’t really want to create the entire user interface, but instead would like to use something like Auth0 or Okta.

May I Ask?

There are numerous things I’d love help with. Are there any user stories you think are missing? Should I add something? What would make these helpful to you? Leave a comment, or tweet at me @Adron. I’d be happy to get some feedback and other’s thoughts on the matter so that I can ensure that these are simple, to the point, usable, and helpful to people. Cheers!

Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation

Part 2 of 3 – Coding Session in Go – Cobra + Viper CLI for Parsing Text Files, Retrieval of Twitter Data, Exports to various file formats, and export to Apache Cassandra.

UPDATED PARTS:

  1. Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files
  2. Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation (this post)
  3. Twitz Coding Session in Go – Cobra + Viper CLI Wrap Up + Twitter Data Retrieval

Updated links to each part will be posted at bottom of  this post when I publish them. For code, written walk through, and the like scroll down below the video and timestamps.

Hacking Together a CLI Installing Cassandra, Setting Up the Twitter API, ENV Vars, etc.

0:04 Kick ass intro. Just the standard rocking tune.

3:40 A quick recap. Check out the previous write “Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files” of this series.

4:30 Beginning of completion of twitz parse command for exporting out to XML, JSON, and CSV (already did the text export previous session). This segment also includes a number of refactorings to clean up the functions, break out the control structures and make the code more readable.

In the end of refactoring twitz parse came out like this. The completed list is put together by calling the buildTwitterList() function which is actually in the helpers.go file. Then prints that list out as is, and checks to see if a file export should be done. If there is a configuration setting set for file export then that process starts with a call to exportParsedTwitterList(exportFilename string, exportFormat string, ... etc ... ). Then a simple single level control if then else structure to determine which format to export the data to, and a call to the respective export function to do the actual export of data and writing of the file to the underlying system. There’s some more refactoring that could be done, but for now, this is cleaned up pretty nicely considering the splattering of code I started with at first.

50:00 I walk through a quick install of an Apache Cassandra single node that I’ll use for development use later. I also show quickly how to start and stop post-installation.

Reference: Apache Cassandra, Download Page, and Installation Instructions.

53:50 Choosing the go-twitter API library for Go. I look at a few real quickly just to insure that is the library I want to use.

Reference: go-twitter library

56:35 At this point I go through how I set a Twitter App within the API interface. This is a key part of the series where I take a look at the consumer keys and access token and access token secrets and where they’re at in the Twitter interface and how one needs to reset them if they just showed the keys on a stream (like I just did, shockers!)

57:55 Here I discuss and show where to setup the environment variables inside of Goland IDE to building and execution of the CLI. Once these are setup they’ll be the main mechanism I use in the IDE to test the CLI as I go through building out further features.

1:00:18 Updating the twitz config command to show the keys that we just added as environment variables. I set these up also with some string parsing and cutting off the end of the secrets so that the whole variable value isn’t shown but just enough to confirm that it is indeed a set configuration or environment variable.

1:16:53 At this point I work through some additional refactoring of functions to clean up some of the code mess that exists. Using Goland’s extract method feature and other tooling I work through several refactoring efforts that clean up the code.

1:23:17 Copying a build configuration in Goland. A handy little thing to know you can do when you have a bunch of build configuration options.

1:37:32 At this part of the video I look at the app-auth example in the code library, but I gotta add the caveat, I run into problems using the exact example. But I work through it and get to the first error messages that anybody would get to pending they’re using the same examples. I get them fixed however in the next session, this segment of the video however provides a basis for my pending PR’s and related work I’ll submit to the repo.

The remainder of the video is trying to figure out what is or isn’t exactly happening with the error.

I’ll include the working findem code in the next post on this series. Until then, watch the wrap up and enjoy!

1:59:20 Wrap up of video and upcoming stream schedule on Twitch.

UPDATED SERIES PARTS

    1. Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files
    2. Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation (this post)
    3. Twitz Coding Session in Go – Cobra + Viper CLI Wrap Up + Twitter Data Retrieval