Let’s Talk Top 7 Options for Database Gumbo

When one starts to dig into databases things get really complex really fast. There’s not only a whole plethora of database companies and projects, but database types, storage engines, and other options and functionality to choose from. One place to get a start is just to take a look at the crazy long list of databases on db-engines. In this post I’m going to take a look at a few of the top database engines to create a starting point – which I’ll reference – for future video streaming coding sessions (follow me @ twitch.tv/adronhall).

My Options for Database Gumbo

  1. Apache Cassandra / DataStax Enterprise
  2. Postgresql
  3. SQL Server
  4. Elasticsearch
  5. Redis
  6. SQLite
  7. Dynamo DB

The Reasons

Ok, so the list is as such, and as stated it’s my list. There are a lot of databases, and of course some are still more used such as Oracle. However here’s some of the logic and reasoning behind my choices above.

Oracle

First off I feel like I need to broach the Oracle topic. Mostly because of their general use in industry. I’m not doing anything with Oracle now, nor have I for years for a long, long, LONG list of reasons. Using their software tends to be buried in bureaucratic, oddly broken and unnecessary usage today anyway. They use predatory market tactics, completely dishonorable approach to sales and services, as well as threatening and suing people for doing benchmarks, and a host of other practices. In face to face experiences, Oracle tends to give off experiences, that Lawrence from Office Space would say, “naw man, I think you’d get your ass kicked for that!” and I agree. Oracle’s practices are too often disgusting. But even from the purely technical point of view, the Oracle Database and ecosystem itself really isn’t better than other options out there. It is indeed a better, more intelligently strategic and tactical option to use a number of alternatives.

Apache Cassandra / DataStax Enterprise

This combo has multiple reasons and logic to be on the list. First and foremost, much of my work today is using DataStax Enterprise (DSE) and Apache Cassandra since I work for DataStax. But it’s important to know I didn’t just go to DataStax because I needed a job, but because I chose them (and obviously they chose me by hiring me) because of the team and technology. Yes, they pay me, but it’s very much a two way street, I advocate Cassandra and DSE because I personally know the tech is top tier and solid.

On the fact that Apache Cassandra is top tier and solid, it is simply the remaining truly masterless distributed database that provides a linear path of scalability on the market that you can use, buy support for, and is actually actively and knowingly maintained not just by DataStax but by members of the community. One could make an argument for MongoDB but I’ll maybe elaborate on that in the future.

In addition to being a solid distributed database there are capabilities inherent in Apache Cassandra because of the data types and respective the CQL (Cassandra Query Language) that make it a great database to use too. DataStax Enterprise extends that to provide spatial (re: GIS/Geo Data/Queries), graph data, analytics engine, and more built on other components like SOLR and related technology. Overall a great database and great prospective combinations with the database.

Postgresql

Postgres is a relational database that has been around for a long time. It’s got some really awesome features like native JSON support, which I’m a big fan of. But I digress, there’s tons of other material that lays out thoroughly why to use Postgres which I very much agree with.

Just from the perspective of the extensive and rich data types Postgres is enough to be put on this list, but considering there are a lot of reasons around multi-tenancy, scalability, and related characteristics that are mostly unique to Postgres it’s held a solid position.

SQL Server

This one is on my list for a few reasons that have nothing to do with features or capabilities. This is the first database I was responsible for in its entirety. Administration, queries, query tuning, setup, and developer against with the application tier. I think of all my experience, this database I’ve spent the most time with, with Apache Cassandra being a close second, then Postgres and finally Riak.

Kind of a pattern there eh? Relational, distributed, relational, distributed!

The other thing about SQL Server however is the integrations, tooling, and related development ecosystem around SQL Server is above and beyond most options out there. Maybe, with a big maybe, Oracle’s ecosystem might be comparable but the pricing is insanely different. In that SQL Server basically can carry the whole workload, reporting, ETL, and other feature capabilities that the Oracle ecosystem has traditionally done. Combine SQL Server with SSIS (SQL Server Integration Services), SSRS (SQL Server Reporting Services), and other online systems like Azure’s SQL Database and the support, tooling, and ecosystem is just massive. Even though I’ve had my ins and outs with Microsoft over the years, I’ve always found myself enjoying working on SQL Server and it’s respective tooling options and such. It’s a feature rich, complete, solidly, and generally well performing relational database, full stop.

Elasticsearch

Ok, this is kind of a distributed database of sorts but focused more exclusively (not totally since it’s kind of expanded its roles) search engine. Overall I’ve had good experiences with Elasticsearch and it’s respective ELK (or Elastic ecosystem) of tooling and such, with some frustrating flakiness here and there over the years. Most of my experience has come from an operational point of view with Elasticsearch. I’ve however done a fair bit of work over the years in supporting teams that are doing actual software development against the system. I probably won’t write a huge amount about Elasticsearch in the coming months, but I’ll definitely bring it up at certain times.

Redis / SQLite / DynamoDB

These I’ll be covering in the coming months. For Redis and DynamoDB I have wanted to dig in for some comparison analysis from the perspective of implementing data tiers against these databases, where they are a good option, and determining where they’re just an outright bad option.

For SQLite I’ve used it on and off for many years, but have wanted to sit down and just learn it and try out some of its features a bit more.

Reality Distortion Field : 17 Companies’ Sitrep

I’m sitting on the bus this morning. As happens almost every day of the week. I’m flipping pages, sort of, it’s an eBook on my Kindle App. I’m reading about Steve Jobs taking over the Macintosh Program at Apple. How things started to fall into place for Apple, for the Macintosh, and how Jobs saw what could be a pushed for it. Everybody else; Microsoft, Xerox, Canon, and practically every single other company was missing it. Xerox Parc had it right in front of them, the GUI, Mouse, Object Oriented Language, and about every single thing we assume for computer use and development today but wasn’t doing anything with it. They were all missing it, except Jobs. The eccentric, crazed, reality distortion field generating Jobs pushed forward and found those that agreed, this was absolutely the future. Today’s computers owe so much to Jobs efforts to pull these people together, to what he saw as the future, and our modern computing world will forever be indebted to Steve Jobs.

Howard Hues had done this 50 years earlier. He simply stated, “nobody wants to fly on a plane at 10k feet and get shaken to pieces, planes need to fly at 30,000 feet or more where the air is smooth!” He then went about working to get a plane built that could do this! The Government was in his way, the industry was fighting him, everybody said this wasn’t the way to go. Nobody could build a plane that would do that right now! It’s absurd. He did it, and bought every single one of them he could putting the airline (TWA) in hock at the same time! But it paid off, and his airline had the nicest planes, best flight in the world, easily. Today’s airlines are all modeled after this ideal, our modern travel owes a huge debt to what Howard Hughes pushed forward.

The competition, the fighting pushed the envelope, but in both cases a visionary could see the future. To them it was plain as an image on a clear sunny day. To them, the future didn’t need to be tomorrow, it was ready right now. The future just needed dragged kicking and screaming directly into today! They did this, they pulled people together who could make these changes, and they with their teams yanked the future right into humanity’s grasp.

Utility Computing / Cloud Computing

With those thoughts flying around at Warp 10 in my mind, everywhere, at every moment it seemed to occur to me. We’re merely putting the motherboards and cassette tape drives together right now in cloud computing. We have no Macintosh of cloud computing, we have no clear direction, there has to be something bigger, much bigger. At this point we’re merely making small steps, slight little strides toward the future. What we need to do is create the future and pull it directly into now!

There could be more though. Some of these things are being put together by individuals at various companies, oriented toward the platform level. There is, somewhere, a growing movement toward that next big shift in the way things are done. The gap between big architectures, big ideas, and launching these things is decreasing by the day – literally!

With these big ideas and big architectures and all the small steps and small pieces the industry is moving in the right direction. We’ve experienced shifts over the years and some more are definitely coming up very soon!

The Playing Field : Sitrep

With these thoughts racing around I felt compelled to look at where the industry stands right now. These are in no particular order, they all provide some type of building blocks for the next big thing, all in some aspect of the industry.

Amazon Web Services : This one should not need explaining. They’re probably the most utilized, nearly the most advanced, robust, price conscious utility storage, compute, and services provider in existence today. They continue to defeat the innovator’s dilemma over and over again, this company, and the departments in the company are hungry, very hungry and they fight the fight to stay in the lead.

Cloudability : This company is about keeping utility/cloud computing costs in check, and knowing where and when you’re pushing the pricing limits among all the various building blocks. There has been more than a few issues with billing, and people blowing through budgets by inadvertently leaving on their 1000 node EC2 instances and Cloudability helps devops keep these types of things under control stopping overages cold!

New Relic : The key to this offering is monitoring of everything, everywhere, all the time. New Relic offers absolutely beautiful charting and information displays around services, compute, storage, and a zillion other metrics among Ruby, PHP, Python, .NET, and about everything else available.

Puppet Labs : Imagine operations, IT, and systems administration all rolled into a single bad ass company’s product efforts. Imagine ways to automate and monitor ritualized machines, get them deployed, all with elegant and extremely powerful tools. Imagine that power now, you’ll know what Puppet Labs provides.

Opscode : The cloud needs management, hard core powerful management. Opscode and their respective chef product does just that. The influence of chef has gone so far as to influence Amazon Web Services (and others) to design their systems automation in a way as to enable chef usage. The devops community around Opscode is growing, the inroads to systems agility they’re making is getting to a point as to even be considered a disruptive market force!

Joyent : The birthplace of node.js, do I need to add more? Well, ok, I will. Joyent has a host of amazing devs, and amazing ops goals. The advances coming out of  Joyent aren’t always associated back to the company (maybe they should be) but rest assured there is some heavy duty research and dev going on over there. Things to check out would be their SmartDataCenter and of course the JoyentCloud.

MongoHQ : Mongo HQ is one of the distributed cloud hosting provider for Mongo DB. Mongo HQ  is also a supported provider in several of the other PaaS Providers such as Heroku and AppHarbor.

MongoLabs : Mongo Labs, another distributed cloud hosting provider for Mongo DB. Mongo Labs is also a supported provider in several of the other PaaS Providers such as Heroku and AppHarbor.

Nodester : Nodester is a hosting solution for node.js applications beautifully distributed in a horizontal way.

Nodejitsu : One of the leading node.js hosting providers and a very active participant in the community in and around New York.

AppFog : AppFog is a Platform as a Service (PaaS Provider) that is working on providing a cloud based horizontally distributed platform for creating applications with a wide variety of frameworks and languages. Some of those include .NET, Ruby on Rails, Java, and many others.

PhpFog : This is the PHP root of the PaaS Provider AppFog. They have a good history and an absolutely spectacular architecture for PHP Applications with a screaming simple and fast deployment model to cloud/utility based systems. They have a really great product.

Heroku : Deploy Ruby, Node.js, Clojure, Java, Python, and Scala. Probably the leader in PaaS based deployment right now. Got git, get Heroku, get push heroku master is about all the gettin’ for your application to be running there.

EngineYard : Think Ruby, Ruby on Rails, Rubinius, or any other aspect of Ruby and you’ll probably arrive at EngineYard in short order. The teams at EngineYard are heavily active in the cloud & Ruby scene. They are easily one of the leaders in PaaS based git workflow deployment in the Ruby & Ruby on Rails Community. They also, however, support tons of other technologies so don’t think they’re limited to just Ruby & Rails.

AppHarbor : The .NET Framework, often thought to be left completely out in the cold when it comes to serious cloud computing git based agile work flows, finally got included with AppHarbor! With the release of AppHarbor the trifecta of IaaS and a solid PaaS offering were finally available for the .NET stack.

Windows Azure : Windows Azure is Microsoft’s official cloud service, which supports a host of capabilities centered around a mostly PaaS based service. Windows Azure has however spread into SaaS and IaaS also. Some of the frameworks and tools they support include Ruby on Rails, Java, PHP, .NET (of course), node.js, Hadoop and others.

CloudFoundry : Cloud Foundry is an open source PaaS Solution that serves to link up various back end and front end architectures. Currently it is supported by a host of companies including VMWare, AppFog, and others.

Putting the Pieces Together

That’s where we stand in the industry today. We have all the pieces and they need fit together to create something great, something awesome, something truly remarkable. I fully intend to create part of the future, will I see you there? I’d hope so!