Survey of Go Libraries for Database Work

Over the past few months I’ve picked up a number of libraries in the Go ecosystem to help me get work done around database engineering. These libraries are ones that I have used to do a range of work primarily around Apache Cassandra, DataStax Enterprise, PostgreSQL, and to a lesser degree MS SQL Server, MySQL, and others. The following is a survey of libraries that I’ve found to be pretty solid for getting the job done.

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering (5)

I’ve broken the follow tooling libraries out into the following categories:

  • Observability, Monitoring, & Insight – I created this section, and added libraries to it based specifically on the specific and peculiarly pedantic nature of observability in light of monitoring that work to provide insight into one’s applications they’re responsible for. For additional information about observability check out the Wikipedia article on the topic observability, it’s a great starting point. For monitoring however it gets more specific with a breakdown of monitoring types: application performance monitoring, network monitoring, system monitoring, and business transaction monitoring. The libraries in this section apply to some or all of the criteria in this definitions.
  • Data Schema Migration – Managing one’s data schema for a database, even really, truly, honestly if you have a schema-less system you still need to manage the underlying schema at some level.
  • Flow, Pipelines, Extraction, Transformation, and Loading – This section is mutative in the sense that it includes a lot of various types of libraries that have a very wide range of work to do and they offers a plethora of ways to do this work. Creating pipelines, to flow sequences, to extraction and transformation, to standard bulk loading. These libraries provide ways to get the data where you need it when you need it there in effective and reliable ways.
  • Database Backup Libraries – There are a zillion different things to maintaining effective and useful database backups; onsite storage, offsite storage, rotation periods, transmission & security control, scheduling, full or differential, and other topics of concern. One of the most important and often overlooked aspect of database backups is actually restoring the database from backup! These libraries can be used to get those backups, automate, and implement restoration of data in a more seamless way.
  • Database Drivers – At the core of any programmable automation of databases, one needs to have some way to connect to and work with the databases they’re automating, that’s where database drivers come into play. For Go, there’s a ton of support on every relatively known database in existence. MS SQL, Apache Cassandra, PostgreSQL, and dozens more!

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering

Veneur – Largely used by and originating from Stripe. This library works as a distributed, fault tolerant pipeline for data emitted from run time on systems and services throughout your environment. It has server implementations of the DogStatsD protocol or SSF (Sensor Sensibility Format) for aggregating metrics and sending these metrics for storage or via sinks to various other systems. The system can also works up histograms, sets, and counters as global aggregator.

TLDR;

Veneur is a convenient sink for various observability primitives with lots of outputs!

Honeycomb.io – Honeycomb I did some work for back in February of 2018 and gotta say I loved the team. Charity @mipsytipsy, Christine @cyen, Ben @maplebed and crew are tops! Friendly, wildly smart, and humble thrown in for good measure. With that said, I’m also a fan of the product. It’s a solid high cardinality, query and event intake system for observability. There are libraries for Go as well as others, and it’s pretty easy to use the library to setup ingest for appropriately instrumented applications.

TLDR;

Honeycomb.io is a Saas tool with available libraries for Go to provide observability insight and data collection for your applications!

OpenCensus – This framework and toolsetprovides ways to get telemetry out of your services. Currently  there are libraries for a number of languages that allow you to capture, manipulate, and export metrics and distributed traces to your data store of choice. The key idea is that OpenCensus works via tracing through the course of events in an application and that data is logged for awareness, insight, and thus observability of your systems.

TLDR;

OpenCensus is a library that provides ways to gather telemetry for your services and store it in your choice of a location.

RxGo – This library is a reactive extensions built for Go. This one is as much a programming concept as it is a way to enhance and specifically focus on observability, so let’s take a look at the intro example they’ve got on the actual repo README.md itself.

ReactiveX, or Rx for short, is an API for programming with observable streams. This is a ReactiveX API for the Go language.

ReactiveX is a new, alternative way of asynchronous programming to callbacks, promises and deferred. It is about processing streams of events or items, with events being any occurrences or changes within the system.

In Go, it is simpler to think of a observable stream as a channel which can Subscribe to a set of handler or callback functions.

The pattern is that you Subscribe to an Observable using an Observer:

subscription := observable.Subscribe(observer)

An Observer is a type consists of three EventHandler fields, the NextHandlerErrHandler, and DoneHandler, respectively. These handlers can be evoked with OnNextOnError, and OnDone methods, respectively.

The Observer itself is also an EventHandler. This means all types mentioned can be subscribed to an Observable.

nextHandler := func(item interface{}) interface{} {
    if num, ok := item.(int); ok {
        nums = append(nums, num)
    }
}

// Only next item will be handled.
sub := observable.Subscribe(handlers.NextFunc(nextHandler))

TLDR;

RxGo are the reactive extensions that make it easier to go full scale and spectrum observability, with significantly greater insight into your applications over time and the events they execute.

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering (1)

Go-Migrate – This library is written in Go and handles data schema migrations for a significant number of databases; PostgreSQL, MySQL, SQLite, RedShift, Neo4j, CockroadDB, and that’s just a few.

Example:

migrate -source file://path/to/migrations -database postgres://localhost:5432/database up 2

TLDR;

Go-Migrate is an open source library that can be used via CLI or in code to manage all your schema migration needs.

Gocqlx Migrate – This library primarily provides extensions to the Go CQL driver library, and one of those extensions specifically is a data-schema migration functionality.

Example:

package main

import (
    "context"

    "github.com/scylladb/gocqlx/migrate"
)

const dir = "./cql" 

func main() {
    session := CreateSession()
    defer session.Close()

    ctx := context.Background()
    if err := migrate.Migrate(ctx, session, dir); err != nil {
        panic(err)
    }
}

TLDR;

Gocqlx Migrate is a feature of the Gocqlx extensions library that can be used for schema migrations from within code.

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering (2)

Pachyderm – (Open Source Repo) A pachyderm is

a very large mammal with thick skin, especially an elephant, rhinoceros, or hippopotamus.

So it is kind of a fitting name for this library. The library, the project itself, has found funding and bills itself as “Scalable, Reproducible Data Science“. I’ve used it minimally myself, but find it continually popping up on my “use this tool because you’ll need a ton of the features” list.

TLDR;

Pachyderm is an open source library, and paired capital funded company, that does indeed provide scalable, reproducible data science in addition to being a great library for your ETL and related data management needs.

Reflow – This library provides incremental data processing in the cloud. Providing this ability gives scientists and engineers the ability to put tools together, packaged in Docker images, using programming constructs. The library then evaluates the programs transparently parallelizing the work and memoizing results – i.e. using go routines and caching data appropriately to speed up tasks. The library was created at GRAIL to manage our NGS (next generation sequencing) bioinformatics workloads on AWS, but has also been used for many other applications, including model training and ad-hoc data analyses. Severl of Reflow’s key features include:

  • functional, lazy, type-safe Domain Specific Language (DSL) for writing workflow programs.
  • the runtime for the DSL evaluates incrementally, coordinating cluster execution, and memoization.
  • a cluster scheduler to dynamically provision and tear down resources in the cloud (currently AWS is supported).
  • with containers the same processing workloads can also be executed locally.

TLDR;

Reflow provides a way for data scientists, and by proxy database administrators, data programmers, programmers, and anybody that needs to work through ETL or related work to write programs against that data in the cloud or locally.

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering (3)

Restic (Github) – Restic is a backup CLI and Go library that will backup to a number of sources, a few including; local directory, sftp, http REST, S3, Google Cloud Storage, Azure Blob Storage, and others.

Restic follows several objectives:

  • The tool aims to be easy, with minimal singular steps to execute a backup.
  • The tool aims to be fast, using appropriate mechanisms to ensure speedy backups.
  • The tool aims to provide verifiable backups that can easily be restored.
  • The tool aims to incorporate cryptographic guarantees of confidentiality to make sure the backups are secure.
  • The tool aims to be efficient with additional snapshots only taking the storage of the actual increment and de-duplicated to save space in the storage back end.

DevOps Days Vancouver - Architecture Guidance - Venomous Database Reliability Engineering (4)

For each of these there’s a particular single driver that I use for each. Except in the case of Apache Cassandra and DataStax Enterprise I have also picked up gocqlx to add to my gocql usage.

PostgreSQL – Features:

  • SSL
  • Handles bad connections for database/sql
  • Scan time.Time correctly (i.e. timestamp[tz], time[tz], date)
  • Scan binary blobs correctly (i.e. bytea)
  • Package for hstore support
  • COPY FROM support
  • pq.ParseURL for converting urls to connection strings for sql.Open.
  • Many libpq compatible environment variables
  • Unix socket support
  • Notifications: LISTEN/NOTIFY
  • pgpass support

Gocql & Gocqlx

Gocql Features:

  • Modern Cassandra client using the native transport
  • Automatic type conversions between Cassandra and Go
    • Support for all common types including sets, lists and maps
    • Custom types can implement a Marshaler and Unmarshaler interface
    • Strict type conversions without any loss of precision
    • Built-In support for UUIDs (version 1 and 4)
  • Support for logged, unlogged and counter batches
  • Cluster management
    • Automatic reconnect on connection failures with exponential falloff
    • Round robin distribution of queries to different hosts
    • Round robin distribution of queries to different connections on a host
    • Each connection can execute up to n concurrent queries (whereby n is the limit set by the protocol version the client chooses to use)
    • Optional automatic discovery of nodes
    • Policy based connection pool with token aware and round-robin policy implementations
  • Support for password authentication
  • Iteration over paged results with configurable page size
  • Support for TLS/SSL
  • Optional frame compression (using snappy)
  • Automatic query preparation
  • Support for query tracing
  • Support for Cassandra 2.1+ binary protocol version 3
    • Support for up to 32768 streams
    • Support for tuple types
    • Support for client side timestamps by default
    • Support for UDTs via a custom marshaller or struct tags
  • Support for Cassandra 3.0+ binary protocol version 4
  • An API to access the schema metadata of a given keyspace

Gocqlx Features:

  • Binding query parameters form struct or map
  • Scanning results directly into struct or slice
  • CQL query builder (package qb)
  • Super simple CRUD operations based on table model (package table)
  • Database migrations (package migrate)

Go-MSSQLDB – Features:

  • Can be used with SQL Server 2005 or newer
  • Can be used with Microsoft Azure SQL Database
  • Can be used on all go supported platforms (e.g. Linux, Mac OS X and Windows)
  • Supports new date/time types: date, time, datetime2, datetimeoffset
  • Supports string parameters longer than 8000 characters
  • Supports encryption using SSL/TLS
  • Supports SQL Server and Windows Authentication
  • Supports Single-Sign-On on Windows
  • Supports connections to AlwaysOn Availability Group listeners, including re-direction to read-only replicas.
  • Supports query notifications

So this is just a few of the libraries I use, have worked with, and suggest checking out if you’re delving into database work and especially building systems around databases for reliability and related efforts.

If you’ve got other libraries that you’ve used, or really like, definitely leave a comment and let me know and I’ll update the post to include new libraries for Go. Subscribe to the blog too as I’ve got more posts in the cooker for database work, Go libraries and usage with databases, and a lot more. Happy thrashing code!

A Classic, Groovier, Stonier, Grittier Metal Monday for ya, This Week of Monday the 25th of March

This last week got a ton of stuff done in spite of the season changing allergy onslaught! Meetup recorded on “Does the Cloud Kill Open Source?” with Richard Seroter and got several new meetups lined up with, as always, solid speakers. The meetup continued on into the evening to McMenamins for drinks and food, and great conversation into the night. The topic brought up a lot of things and just let me Lay Down my Burden.

Then the week continued with more craziness, grooving to code and trying to manage the inability to deal with the seasonal allergy eyes, rivers flooding out my my eye sockets! But hey, a groovy tune by Stoned Jesus and cutting the lights while I hacked away on my upcoming talk for DevOps Days VancouverArchitecture Guidance for Venomous Database Reliability Engineering” worked out pretty well. So much so I’ve got the initial slide deck down now (Monday the 25th!!) and am just shining up some code samples and demos now!

To wrap up the week I found myself also listening to some more groovy stoner doom metal from Conan. Here’s a little Volt Thrower.

 

Meetup Video: “Does the Cloud Kill Open Source?”

🆕 Had a great time at the last Seattle Scalability Meetup. I’ve also just finished processing and fixing up the talk video from this last Seattle Scalability Meetup. I feel like I’ve finally gotten the process of streaming and getting things put together post-stream so that I can make them available almost immediately afterwards.

Here @rseroter gives us a full review of various business models, open source licenses, and a solid situational report on cloud providers and open source.

Join the meetup group here: https://www.meetup.com/Seattle-Scalability-Meetup/

The next meetup on April 23rd we’ve got Dr. Ryan Zhang coming in to talk about serverless options. More details, and additional topic content will be coming soon.

Then in May, on the 28th, Guinevere (@guincodes) is going to present “The Pull Request That Wouldn’t Merge”. More details, and additional topic content will be coming soon.

Here’s some of the talks I streamed recently. Note, didn’t have the gear setup all that well just yet, but the content is there!

Adding and Returning Value to the Community via Twitter, LinkedIn, and Twitch

Twitter-512Twitter

Goal: Grow our follower count and reach, entertain, laugh, make hot takes – as one does on Twitter, educate, and get value out of it ourselves.

Don’t!

  • Don’t buy followers (i.e. don’t pay anybody that promises X followers, market share, or whatever it is they’re selling). We can’t trust this method as it’s often just a pile of Russian bots or other garbage followers. This does nothing to increase visibility and penetration to those that want, are interested in, or need to communicate with us (i.e. customers and fans).
  • Do not just repost things via RT or use tooling to just post arbitrary things. People notice this and won’t follow or will unfollow you. It’s a sure fire way to be blacklisted as *marketing* which will involve going to zero eyeballs, even when the account statistics keep showing people see it.
  • Do not post identical or similar content one tweet after another. i.e. Don’t post a marketing blurb with one image, then post another marketing blurb with another image that’s exactly the same size, theme, and fill up the entire tweet stream this way. The followers you get will not be active, will not be who you actually want to speak with or interact with, and don’t really add value over time if this is all that is done. It’s similar to those blog theft sites that just re-post the exact RSS stream and then, by proxy, get blacklisted and erased from Google/search results.

Do!

  • Just make it about you. Grow your personal brand first and foremost. Such as “Dern this is a wicked awesome band.” or “Wow, best burger in the world” and add pictures, content, and other interesting things for people. It doesn’t have to be “I just cured all diseases yo, check me out!” you can, and people will follow based on honesty, integrity, taking a stance, being informative, and providing useful information of all sorts. But more than anything they’ll follow the person not any specific *thing* you’re selling, pushing or what not. So be yourself, share, and be involved with the network you create.
  • Build things you’re interested in, especially when they’re related in some way to products and services you like to use and find interesting – i.e. Apache Cassandra, DSE, Databases, Application Development, etc. Build on these things via threading, via initiating discussion with others that are discussing these things, and among all this find valuable fellow Twitterers that you want to be connected to. This helps all involved, you, your network, the company, the people and companies you connect to, and more. Bringing the network wide with an on point effort around topics will dramatically increase your collective opportunity but also anything and everybody around you.
  • When retweeting, intersperse it among other things, and happily add content to RT’s. In other words don’t just make it endless retweets, but just throw in a few retweets for things you’re interested in or support, and then have your regular stream of tweets, links, and other content.
  • Use emoticons, use pictures, and definitely blurt memes out there. Aim to have fun with Twitter.

Examples of good Twitterers that really provide high value to followers, but also back to the Twitterer themselves in the way of speaking opportunities and all sorts of other things:

LinkedIn-512LinkedIn

Goal: Build an extensive professional network and return value to the LinkedIn Network of connections you have.

Don’t!

  • Don’t use LinkedIn like Facebook. This is obvious but for some reason much the world doesn’t seem to get this, so it feels like it needs stated for the LOL’s. i.e. Don’t hit on people, don’t ask people out on dates, just talk business. Ideally leave politics out of it too.
  • Ideally, don’t send droves of InMail messages to people unless that’s specifically the game being played on LinkedIn. For more grassroots and non-marketing community focus, just interact with people directly, that you know, and don’t arbitrary chase down people you do NOT actually know. This is another thing that decreases authenticity, and makes an individual – even if not – appear like they’re shilling for something.

Do!

  • Post content regularly about what you’re working on, provide links, and provide respective researched content for other mediums you might have like Medium, a blog, Twitter, and all that jazz.
  • Talk about your professional achievements and whatever else that might come up related to your work, hobbies (pending some business relation or something you do/did professional, i.e. like the music you play, or other hobby of sorts). Sometimes hobbies count too, so put that content into rotation now and again too. But do remember, if it fits better on Facebook than LinkedIn, just don’t post it on LinkedIn.
  • Reach out if there is legitimate business that you are both involved in. Start that as a simple conversation, not a sale, not something pushy, just simple, friendly, curious conversation.

Examples of good LinkedIn Accounts, that use their accounts for benefit for themselves but also provide benefit directly or indirectly for all of us:

iconmonstr-twitch-5Twitch

Goal: Grow our follower count and increase our collective content and work material to show, teach, work, and hang out with viewers to build tomorrow’s best, most kick ass, wicked awesome applications, data science analysis, and more!

Don’t!

  • Not a whole lot here yet. Twitch is kind of wide open and not a lot of no no’s here. Don’t do illegal things is all I’ve got at the moment.

Do!

  • Setup your OBS or streaming process so that you have chat on screen, chat somewhere you can monitor it, code is clear and fonts are readable, you add all the interaction content you can for new follows (alerts), subscribes (alerts), and whatever else comes up.
  • When on stream, take your time, interact with people that follow, subscribe, or chat/whisper with you.
  • Don’t worry about making mistakes, just work through them, let the audience help if they offer it. Even if you know that they’re wrong, work through things with them and let them get involved. Then lead into the correct fix, etc. This is a great way to teach and build involvement on stream so that everybody gets a win, and you get an advocate to your own advocacy.
  • If you’re going to heavily curse or do anything even slightly liberal/conservative/religious/ideological etc it’s probably best to mark one’s stream as 13 or older (I think that’s the setting).

Some excellent Twitch streamers to reference for their involvement, OBS setup, configuration, and general awesomeness in the community.

That’s it for this post. Got more do’s or don’ts? Lemme know, will start a repo!

A little lagniappe for ya, that hygge feeling.

Thrashing Metal Monday for Week of March 18th, Mixing a Dreamy Day At the Gates Among Carcasses! \m/

In an effort to throw you some wild curve balls, here’s some wild options for Monday. Again a brash way to awaken and take on the world!

First, this is a real curve ball, not exactly my cup of tea I’m looking for, but I’ve thrown some lemon into the cup and it’s actually a pretty sweet flavor of tunes. This band Dream State I’ve known about for some time and they keep sneaking back into my playlist. This is one of their newer tunes In This Hell.

This next band is one of the forerunners of, I’d say, melodic death metal. Discover At The Gates.

This next carcass, they’ve been in the mix for a long while now for me. Ever since rolling upon the flanged wheel into New York City and my buddy Mike picking up one of their albums, their first album Heartwork to be specific, at random. We then spent the next 20 hours on the train riding back to NOLA listening to Carcass and our other finds from New York City.