Let’s Talk Top 7 Options for Database Gumbo

When one starts to dig into databases things get really complex really fast. There’s not only a whole plethora of database companies and projects, but database types, storage engines, and other options and functionality to choose from. One place to get a start is just to take a look at the crazy long list of databases on db-engines. In this post I’m going to take a look at a few of the top database engines to create a starting point – which I’ll reference – for future video streaming coding sessions (follow me @ twitch.tv/adronhall).

My Options for Database Gumbo

  1. Apache Cassandra / DataStax Enterprise
  2. Postgresql
  3. SQL Server
  4. Elasticsearch
  5. Redis
  6. SQLite
  7. Dynamo DB

The Reasons

Ok, so the list is as such, and as stated it’s my list. There are a lot of databases, and of course some are still more used such as Oracle. However here’s some of the logic and reasoning behind my choices above.

Oracle

First off I feel like I need to broach the Oracle topic. Mostly because of their general use in industry. I’m not doing anything with Oracle now, nor have I for years for a long, long, LONG list of reasons. Using their software tends to be buried in bureaucratic, oddly broken and unnecessary usage today anyway. They use predatory market tactics, completely dishonorable approach to sales and services, as well as threatening and suing people for doing benchmarks, and a host of other practices. In face to face experiences, Oracle tends to give off experiences, that Lawrence from Office Space would say, “naw man, I think you’d get your ass kicked for that!” and I agree. Oracle’s practices are too often disgusting. But even from the purely technical point of view, the Oracle Database and ecosystem itself really isn’t better than other options out there. It is indeed a better, more intelligently strategic and tactical option to use a number of alternatives.

Apache Cassandra / DataStax Enterprise

This combo has multiple reasons and logic to be on the list. First and foremost, much of my work today is using DataStax Enterprise (DSE) and Apache Cassandra since I work for DataStax. But it’s important to know I didn’t just go to DataStax because I needed a job, but because I chose them (and obviously they chose me by hiring me) because of the team and technology. Yes, they pay me, but it’s very much a two way street, I advocate Cassandra and DSE because I personally know the tech is top tier and solid.

On the fact that Apache Cassandra is top tier and solid, it is simply the remaining truly masterless distributed database that provides a linear path of scalability on the market that you can use, buy support for, and is actually actively and knowingly maintained not just by DataStax but by members of the community. One could make an argument for MongoDB but I’ll maybe elaborate on that in the future.

In addition to being a solid distributed database there are capabilities inherent in Apache Cassandra because of the data types and respective the CQL (Cassandra Query Language) that make it a great database to use too. DataStax Enterprise extends that to provide spatial (re: GIS/Geo Data/Queries), graph data, analytics engine, and more built on other components like SOLR and related technology. Overall a great database and great prospective combinations with the database.

Postgresql

Postgres is a relational database that has been around for a long time. It’s got some really awesome features like native JSON support, which I’m a big fan of. But I digress, there’s tons of other material that lays out thoroughly why to use Postgres which I very much agree with.

Just from the perspective of the extensive and rich data types Postgres is enough to be put on this list, but considering there are a lot of reasons around multi-tenancy, scalability, and related characteristics that are mostly unique to Postgres it’s held a solid position.

SQL Server

This one is on my list for a few reasons that have nothing to do with features or capabilities. This is the first database I was responsible for in its entirety. Administration, queries, query tuning, setup, and developer against with the application tier. I think of all my experience, this database I’ve spent the most time with, with Apache Cassandra being a close second, then Postgres and finally Riak.

Kind of a pattern there eh? Relational, distributed, relational, distributed!

The other thing about SQL Server however is the integrations, tooling, and related development ecosystem around SQL Server is above and beyond most options out there. Maybe, with a big maybe, Oracle’s ecosystem might be comparable but the pricing is insanely different. In that SQL Server basically can carry the whole workload, reporting, ETL, and other feature capabilities that the Oracle ecosystem has traditionally done. Combine SQL Server with SSIS (SQL Server Integration Services), SSRS (SQL Server Reporting Services), and other online systems like Azure’s SQL Database and the support, tooling, and ecosystem is just massive. Even though I’ve had my ins and outs with Microsoft over the years, I’ve always found myself enjoying working on SQL Server and it’s respective tooling options and such. It’s a feature rich, complete, solidly, and generally well performing relational database, full stop.

Elasticsearch

Ok, this is kind of a distributed database of sorts but focused more exclusively (not totally since it’s kind of expanded its roles) search engine. Overall I’ve had good experiences with Elasticsearch and it’s respective ELK (or Elastic ecosystem) of tooling and such, with some frustrating flakiness here and there over the years. Most of my experience has come from an operational point of view with Elasticsearch. I’ve however done a fair bit of work over the years in supporting teams that are doing actual software development against the system. I probably won’t write a huge amount about Elasticsearch in the coming months, but I’ll definitely bring it up at certain times.

Redis / SQLite / DynamoDB

These I’ll be covering in the coming months. For Redis and DynamoDB I have wanted to dig in for some comparison analysis from the perspective of implementing data tiers against these databases, where they are a good option, and determining where they’re just an outright bad option.

For SQLite I’ve used it on and off for many years, but have wanted to sit down and just learn it and try out some of its features a bit more.

All That Tech SitRep – Elastic Meetup and Quote Center Updates

qcI started working with the Quote Center (QC) back in November, and wrote about it in “After 816 Days I’m Taking a Job!” Now that I’m a few months into the effort, it’s sitrep time. Sitrep, btw is military speak for

S ituational R eport.

The three core priorities I have at Quote Center in my role are: Community Contributions, Site Reliability, and Talent Recon.

Community Contributions (and Organizing)

Some of the progress I’ve made, is direct and immediate involvement with some really interesting groups here in Portland. The first seemed a prime option, and that’s the Elastic User Group.

Myself and some of the QC Team traveled late last year to check out the Elasticon Tour stop in Seattle. It was an educational experience where I got some of my first introductions to Elasticsearch and also a new product Elastic had just released recently called Beats. I was fairly impressed by what I saw and several other things aligned perfectly for follow up community involvement after that.

I’ve since kept in touch with the Elastic Team and started coordinating the Elastic User Group in Portland (Join the group on Meetup for future meetings & content). In March the group will be hosting a great meetup from Ward & Jason…

http://www.meetup.com/The-Portland-ElasticSearch-Meetup-Group/events/228064228/

So be sure to RSVP for that meetup as it’s looking to be a really interesting presentation.

The second group I’ve stepped up to help out with is the Docker Meetup here in Portland. The first meetup we have planned at this time is from Casey West.

http://www.meetup.com/Docker-Portland-OR/events/228249211/

Site Reliability

One of the other priorities I’ve been focusing on is standard site reliability. Everything from automation to continuous integration and deployment. I’ve been making progress, albeit at this stage going from zero to something, in the space of a site reliability practice takes time. I’ve achieved a few good milestones however, which will help build upon the next steps of the progress.

We’ve started to slowly streamline and change our practice around Rackspace and AWS Usage. This is a very good thing as we move toward a faster paced continuous integration process around our various projects. At this time it’s a wide mixture of .NET Solutions that we’re moving toward .NET Core. At the same time there are some Node.js and other project stacks that we’re adding to our build server process.

Team City

Our build server at this time is shaping up to be Team City. We have some build processes that are running in Jenkins, but those are being moved off and onto a TeamCity Server for a number of reasons. I’m going to outline these reasons and I’m happy to hear any reasons there may be other better options. So feel free to throw a tweet at me or leave a comment or three.

  1. Jetbrains has a pretty solid and reliable product in Team City. It tends to be cohesive in building the core types of applications that we would prospectively have: Java, .NET, Node.js, C/C++ and a few others. That makes it easy to get all projects onto one build server type.
  2. TeamCity has intelligence about what is and isn’t available for Java & .NET, enabling various package management and other capabilities without extensive scripting or extra coding needed. There are numerous plugins to help with these capabilities also.
  3. TeamCity has fairly solid, quick, and informative support.

Those are my top reasons at this point. Another reason, which isn’t really something I felt should be enumerated, because it’s a feeling versus something I’ve confirmed. That is, the Jenkins Community honestly feels a bit haphazard and disconnected. Maybe I’m just not asking or haven’t seen the right forums to read or something, but I’ve found it a frustrating experience to deal with the Jenkins Server and find information and help regarding getting a disparate and wide ranging set of tech stacks building on it. TeamCity has always just been easy, and getting some continuous integration going the easiest way possible is very appealing.

Monitoring

We use a number of resources for monitoring of our systems. New Relic is one of them, and they’re great, however it’s a bit tough when things are locked down inside of a closed (physically closed) network. How does one monitor those systems and the respective network? Well, you get Nagios or something of the sort installed and running.

I installed it, but Nagios left me with another one of those dirty feelings like I just spilled a bunch of sour milk everywhere. I went about cleaning up the Nagios mess I’d made and, upon attending the aforementioned Elasticon Tour Stop in Seattle, decided to give Beats a try. After a solid couple weeks of testing and confirming the various things work well and would work well for our specific needs, I went about deploying Beats among our systems.

So far, albeit only being a few weeks into using Beats (and still learning how to actually make reports in Kibana) Beats appears to have been a good decision. Dramatically more cohesive and not spastically splintered all over the place like Nagios. I’m already looking into adding additional Beats beyond the known three; Topbeats, Packetbeats, and Filebeats. There are a number of other beats that we could add specific to our needs, that would be good open source projects. Stay tuned for those, I’ll talk about them in this space and get a release out to all as soon as we lay a single line of code for those.

Talent Recon

Currently, nothing to report, but more to come in the space of talent recon.