Relational Database Query Optimization

This is a continuation of my posts on relational databases, started here. Previous posts on this theme, “Data Modeling“, “Let’s Talk About Database Schema“, “The Exasperating Topic of Database Indexes“, and “The Keys & Relationships of Relational Databases“.

Query optimization refers to the process of selecting the most efficient way to execute a SQL query. The goal is to retrieve the requested data as quickly and efficiently as possible, making the best use of system resources. Given that there can be multiple strategies to retrieve the same set of data, the optimizer’s role is to choose the best one based on factors like data distribution, statistics, and available resources.

Query Optimizers

Continue reading “Relational Database Query Optimization”

Let’s Talk Top 7 Options for Database Gumbo

When one starts to dig into databases things get really complex really fast. There’s not only a whole plethora of database companies and projects, but database types, storage engines, and other options and functionality to choose from. One place to get a start is just to take a look at the crazy long list of databases on db-engines. In this post I’m going to take a look at a few of the top database engines to create a starting point – which I’ll reference – for future video streaming coding sessions (follow me @ twitch.tv/adronhall).

My Options for Database Gumbo

  1. Apache Cassandra / DataStax Enterprise
  2. Postgresql
  3. SQL Server
  4. Elasticsearch
  5. Redis
  6. SQLite
  7. Dynamo DB

The Reasons

Ok, so the list is as such, and as stated it’s my list. There are a lot of databases, and of course some are still more used such as Oracle. However here’s some of the logic and reasoning behind my choices above.

Oracle

First off I feel like I need to broach the Oracle topic. Mostly because of their general use in industry. I’m not doing anything with Oracle now, nor have I for years for a long, long, LONG list of reasons. Using their software tends to be buried in bureaucratic, oddly broken and unnecessary usage today anyway. They use predatory market tactics, completely dishonorable approach to sales and services, as well as threatening and suing people for doing benchmarks, and a host of other practices. In face to face experiences, Oracle tends to give off experiences, that Lawrence from Office Space would say, “naw man, I think you’d get your ass kicked for that!” and I agree. Oracle’s practices are too often disgusting. But even from the purely technical point of view, the Oracle Database and ecosystem itself really isn’t better than other options out there. It is indeed a better, more intelligently strategic and tactical option to use a number of alternatives.

Apache Cassandra / DataStax Enterprise

This combo has multiple reasons and logic to be on the list. First and foremost, much of my work today is using DataStax Enterprise (DSE) and Apache Cassandra since I work for DataStax. But it’s important to know I didn’t just go to DataStax because I needed a job, but because I chose them (and obviously they chose me by hiring me) because of the team and technology. Yes, they pay me, but it’s very much a two way street, I advocate Cassandra and DSE because I personally know the tech is top tier and solid.

On the fact that Apache Cassandra is top tier and solid, it is simply the remaining truly masterless distributed database that provides a linear path of scalability on the market that you can use, buy support for, and is actually actively and knowingly maintained not just by DataStax but by members of the community. One could make an argument for MongoDB but I’ll maybe elaborate on that in the future.

In addition to being a solid distributed database there are capabilities inherent in Apache Cassandra because of the data types and respective the CQL (Cassandra Query Language) that make it a great database to use too. DataStax Enterprise extends that to provide spatial (re: GIS/Geo Data/Queries), graph data, analytics engine, and more built on other components like SOLR and related technology. Overall a great database and great prospective combinations with the database.

Postgresql

Postgres is a relational database that has been around for a long time. It’s got some really awesome features like native JSON support, which I’m a big fan of. But I digress, there’s tons of other material that lays out thoroughly why to use Postgres which I very much agree with.

Just from the perspective of the extensive and rich data types Postgres is enough to be put on this list, but considering there are a lot of reasons around multi-tenancy, scalability, and related characteristics that are mostly unique to Postgres it’s held a solid position.

SQL Server

This one is on my list for a few reasons that have nothing to do with features or capabilities. This is the first database I was responsible for in its entirety. Administration, queries, query tuning, setup, and developer against with the application tier. I think of all my experience, this database I’ve spent the most time with, with Apache Cassandra being a close second, then Postgres and finally Riak.

Kind of a pattern there eh? Relational, distributed, relational, distributed!

The other thing about SQL Server however is the integrations, tooling, and related development ecosystem around SQL Server is above and beyond most options out there. Maybe, with a big maybe, Oracle’s ecosystem might be comparable but the pricing is insanely different. In that SQL Server basically can carry the whole workload, reporting, ETL, and other feature capabilities that the Oracle ecosystem has traditionally done. Combine SQL Server with SSIS (SQL Server Integration Services), SSRS (SQL Server Reporting Services), and other online systems like Azure’s SQL Database and the support, tooling, and ecosystem is just massive. Even though I’ve had my ins and outs with Microsoft over the years, I’ve always found myself enjoying working on SQL Server and it’s respective tooling options and such. It’s a feature rich, complete, solidly, and generally well performing relational database, full stop.

Elasticsearch

Ok, this is kind of a distributed database of sorts but focused more exclusively (not totally since it’s kind of expanded its roles) search engine. Overall I’ve had good experiences with Elasticsearch and it’s respective ELK (or Elastic ecosystem) of tooling and such, with some frustrating flakiness here and there over the years. Most of my experience has come from an operational point of view with Elasticsearch. I’ve however done a fair bit of work over the years in supporting teams that are doing actual software development against the system. I probably won’t write a huge amount about Elasticsearch in the coming months, but I’ll definitely bring it up at certain times.

Redis / SQLite / DynamoDB

These I’ll be covering in the coming months. For Redis and DynamoDB I have wanted to dig in for some comparison analysis from the perspective of implementing data tiers against these databases, where they are a good option, and determining where they’re just an outright bad option.

For SQLite I’ve used it on and off for many years, but have wanted to sit down and just learn it and try out some of its features a bit more.

Starting an Ubuntu Dev Tools List

I’ve recently setup a completely clean virtual machine for doing web, system, and related development on Ubuntu. Here’s the shortlist of what I’ve installed after a default installation. The ongoing list of tools and related items I have installed on my Linux dev box I’m keeping here, and it will be kept as a living doc, so I’ll change it as I add new tools, apps and related changes. So lemme know what I ought to add to that list and I’ll add it to my docs page here. Here’s what I have so far…

Other To-dos

  • Always run sudo apt-get update once the system is installed. It never hurts to have the latest updates.
  • I always install Chrome as my first app. Sometimes the Ubuntu Software Center flakes out on this, but just try again and it’ll work. I use the 64-bit Chrome btw, as I’ve noticed that the 32-bit often flakes out when attempting installation on my virtual machines. Your mileage may vary.

What this enables…

At this point I can launch into about any language; Java, JavaScript, and a few others with a minimal amount of headache. Since it’s a Linux instance it gives me a full range of Linuxy things at my disposal.


Default Java Installation

  1. Run a ‘sudo apt-get update’.
  2. To install the default Java JRE and the JDK run the following commands.

    [sourcecode language=”bash”]sudo apt-get install default-jre
    sudo apt-get install default-jdk[/sourcecode]


Oracle Java v8 Installation

  1. [sourcecode language=”bash”]sudo add-apt-repository ppa:webupd8team/java
    sudo apt-get update
    sudo apt-get install oracle-java8-installer[/sourcecode]


WebStorm Installation

  1. I download the application zip from JetBrains and then run

    [sourcecode language=”bash”]tar xfz WebStorm-*.tar.gz[/sourcecode]

  2. Next I always move the unzipped content to the directory in which I’d like to have the application stored. It’s good practice to not keep things in the download directory, just sayin’. Generally I put these in my usr/bin directory.

    [sourcecode language=”bash”]mv /downloads/WebStorm-* your/desired/spot[/sourcecode]

  3. Now at your terminal, navigate to the path where the application is stored and run the WebStorm.sh executable.

    [sourcecode language=”bash”]./bin/webstorm.sh[/sourcecode]

  4. To add WebStorm to the Quicklaunch, just right click on the icon and select to Lock to Launcher.


IDEA IntelliJ Installation

  1. Follow all the steps listed under WebStorm, it’s the exact same process.


Sublime 3

  1. Go to download the latest v3.
  2. Run the package and it should launch the actual Ubuntu installer, setup Sublime for bash use and get it installed.

(NOTE UPDATED 1/18/2016 > The installer doesn’t seem to get it installed, so I went with this link http://olivierlacan.com/posts/launch-sublime-text-3-from-the-command-line/ which has a good solution.)

What really is Open Source Software and what’s this community nonsense they ask…

Open Source Software (OSS), Why Some Fail At It

OSS has won the war. It has been over for years now. Microsoft has ceded, Oracle, VMware and many others have stepped up and attempted to embrace the open source community. Sometimes they’ve been successful, sometimes they haven’t. They’re slowly changing their models to play well with that of the open source software model. Sure, some software is kept closed, but that software in large part is becoming more and more irrelevant while open source efforts are becoming the forefront of technological progress.

What exactly is open source, besides just the opening up of code for others to download? Open source covers a vastly larger ideal than merely providing code for download. A case in point, has been the learning phase Microsoft has gone through. Microsoft, as a company, used to attempt to dictate to its consumer & developer base standards and practices that the company had deemed necessary or in some cases merely a good idea.

Microsoft failed at this miserably over the last decade. Time and time again an open source project would start and Microsoft would create a duplicate library – sometimes directly taking the OSS Project exactly functionality, sometimes they’d merely duplicate it with a basic understanding. Everything from Entity Framework duplicating the functionality of the dozens of ORMs before it. Basically stabbing those efforts in the back instead of being part of the community, Microsoft would remove itself and attempt to subjugate the community efforts.

It back fired over and over and over…

Oracle did something different, yet still blindly stupid. They purchased entire OSS Projects from mySQL to Java. In each case they’ve tainted the efforts significantly by attempting to make these products encourage an unspoken lock in to their proprietary tooling all while litigating (re suing). Their attempt to patent (another issue we can discuss later) the most absurd features and functionality, akin to patenting the breathing process in people! In addition they’ve tried to set legal precedent for things as simple as a URI end point and other notions. Again, something that hasn’t gone over well in the software development world. With the current result being a growing backlash against Oracle. To top all that off, their patent cases have been far worse than even Apple’s Samsung debacle. Oracle, has taken the crown for stabbing the development community, and especially the OSS Community in the back. Multiple wounds too, not just once. They’re doing it over and over as I write this.

Others have continued to make this mistake. They’re starting to suffer for it, and well justified that they do. To stab the community is more than just merely disrespecting one or two developers. It is desecrating the entire community’s efforts, the individuals and their thoughts, ideas, creations and more. It is a slap in the face in so many ways.

Some are starting to do it right, albeit slowly…

Some companies have started to get their act together. One company that is learning right now, slowly but steadily and confidently (they have a good team working on this) is VMware. With their introduction of Cloud Foundry, some could argue this, but they have generally and are trying diligently to open up and be inclusive in the community around their Cloud Foundry Product. Yes, I might have a slightly biased view since I build products for Cloud Foundry with the Iron Foundry Organization at Tier 3, I talk to the teams & individuals at VMware and they, with all their might, intend and do the best they can. As in almost all cases, as long as management keeps everything in their heads clear, they’ll maintain a great project and the future is bright.

Microsoft is another company, with tons of closed things, many attempts at opening products, and is finally starting to get it. They’re starting to be part of the community instead of trying to dictate to it. The windows azure team directly involves itself with jQuery, Node.js and other projects these days. They actively put forth a good foot and have opened up Web API and other web application frameworks and pieces, allowing for pull requests and openly having conversations int he public for full view and inclusion with the community.

So What Really Is Open Source?

Open source software itself, just the code, is simple. But it is assumed and written into the all legal licenses that are included to protect the software from theft and closing by errant companies & individuals. Open source software is code, used as examples or as production web sites is software that is available freely to others to learn from, fix, change, or otherwise modify. It often excludes sharing and using the code with closed source environments or redistributing with closed source products – because OSS efforts do not want to encourage the bad behavior and errant ideals in closed source software by contributing to it.

Building OSS includes a very specific idealism. One doesn’t just throw something into the code, one encourages and builds a culture of openness and being free in thought when working with and contributing to the code bases. The OSS Community is about sharing ideals between individuals to accelerate learning, expand the capabilities of the community as a whole, and push forward progress and development. To summarize with a standard quote, “open source software is about freedom”, and it truly is.

So far, it’s working in a huge way. Some simple successes that have been massive…

OSS Victories

Linux & FreeBSD pretty much hosts the Internet. From Facebook to Google to Amazon they all use some type of Linux variants. Estimates range from 60% to 93% of the Internet & Super Computing is hosted on UNIX machines of the Linux or FreeBSD variety. The notorious Microsoft Windows Server only claims 0.4% of the super computer space and about 30-37% of the Internet Server space. Summarized, the Internet runs on UNIX and specifically on variances of Linux.

When it comes to the web, not only is the majority of the web hosted on systems built by the open source community, but the web applications hosted and run on those systems are open source. The most widely used framework in the world is PHP. One of the biggest up and comers for serving websites and providing interactive web applications is Node.js, with Ruby on Rails being a stalwart for speedy prototyping and production application for thousands of businesses.

Mysql & postgresql round out two of the most heavily utilized databases in existence. Postgresql has grown from zero code to a massively capable database, regularly one-upping the stalwarts at Oracle, Microsoft or other database makers. Mysql has become the go to database for those starting a website or collecting 60 billion rows of data a day, such as New Relic. Many successful businesses have turned these databases into absolutely powerhouses that truly eclipse the need to expend the revenue on databases like SQL Server or Oracles Databases.

NoSQL has come into existence and exists today because of the community. Not everyone has a big data problem and a need for a NoSQL database, or what is sometimes called Not Only SQL these days. The open source community stepped up to build out solutions where relational databases and their history of vertically scaling falls down for modern web applications that run at larger than normal scale. Companies like Facebook, Twitter and others have helped to bring people into this fold and bring more great minds developing the open source that powers these things. This entire movement has been a huge win for the Internet and increased functionality – and the mere ability for many of the large sites to continue to exist at scale (think LinkedIn, Netflix, Facebook, etc)

O’Reilly Books is a company that does a number of things, two of which are run conferences and publish books. They’re a well respected company that encourages open source software and learning through its books, conferences and is heavily involved in supporting the individuals and community around open source software. Even though they don’t create open source software, they provide a massive boost to the ability of developers to write and create open source software by mere involvement. This is a perfect example of example from outside of software development that gains value and adds value back to the community as a whole.

That’s just a few. So really, there’s no reason to resist the freedom of open source. Ignoring or turning away from open source is like turning your back on your family, your community and those around you. There’s no reason to believe the nonsense about TCO (Total Cost of Ownership) that’s been wielded as a weapon against open source solutions. There’s no reason to listen to the FUD about “will it survive” or “what happens when X happens to the project”. There are many companies out there that will help you understand these solutions further, so the TCO argument is dead on arrival. The only path here is toward more software freedoms, preventing large companies from limiting our development options (re: Oracle suing to control API end point pathing, etc). When you do development these days, your first option and thought should be to open source “all the things”. It will help you as a developer, it will be better for you company and those that are leading the way, and it will be better for the community as a whole.

Stay relevant. Write code, individually contribute and be part of your community.

I’m Adron (@adron twitter or @adron app.net) and I write open source software.

Adam & Krishan Got Me Motivated Today… to toss the trash conversations

I was speaking with Krishan Subramanian (@krishnan) and Adam Seligman (@adamse) today. I love talking to these guys. They’re both smart, intelligent and upbeat guys. They see the positive things we’re all working toward and accomplishing in the technology space, specifically around PaaS, Cloud Computing and around the cultural implications of stronger technology communities, involvement of individuals. We all can see the positives, of how the industry is moving forward so that corporations aren’t the only enablers that are juxtaposed against developers or consumers but instead act to serve consumers based on the progress that individuals make themselves. There’s so much to do and so much progress to be made, the venders can simply follow the community and step up to provide points of leadership.

Absolutely great talking with these guys…

On that topic, what is it that we discussed that has me so motivated? Well there’s a few things that I’m done with and I’m going to make every effort to just throw away the trash. Here’s a few of these things that we discussed and I challenge everybody out there, drop the trash talk and let’s move forward because there is a LOT of awesome things to accomplish. Here’s the two things I’m just dropping…  cold. No reason to discuss them anymore.

  • Toss the language and framework religious wars. It is far simpler than it is sometimes perceived. We have a polyglot industry now where we can easily use the right tool for the job, the right framework, or the language that handles our particular domain the best. There is literally no reason to argue about this anymore. Of course we can talk semantics, debate best use cases, and of course we’ll talk accomplishments and what various things do well. That’s exactly what the focus should be on, not the harping on my X is better than your Y nonsense.
  • The culture war is basically over. Sure there are the hold outs that haven’t gotten a clue yet. But it’s an open source world at this point. Even the dreaded and horrible Oracle has generally conceded this and is frantically waving its marketing arms around trying to get attention. But at the core, mysql, java and the other things that they’ve purchased they’re keeping alive. They’re active participants in the community now, albeit in a somewhat strange way. Considering that even Oracle, Microsoft, Apple and so many others contribute back to the open source community in massive ways, that war can be considered won. Victory, the community and every individual in that community!
  • Lockin is basically dead. The technological reasons to lock in are gone, seriously. There’s some issues around data gravity that are to be overcome, but that’s where a solid architecture (see below) comes in. Anything you need can be contributed to and derived from the development community. Get involved and figure out how technology can be a major piece of your business in a positive way. If you design something poorly, lock in becomes a huge issue. Use the rights tools, don’t get into binding contracts, because in the polyglot world we’re in now there’s no reason to be permanently locked in to anything. Be flexible, be where you need to be, and make those decisions based on the community, your support systems, and your business partners. Don’t tie yourself to vendors unless there is mutual reasons to do exactly that. Lock in is a dead conversation, just don’t, time to move on.

So what are the key conversations today?

  • Ecosystem Architecture – If you’re deploying to AWS, Heroku, Tier 3, AppFog or Windows Azure it all boils down to something very specific that will make or break you. Your architecture. This is where the real value add in the cloud & respective systems are, but there are many discussions and many elements of the technology to understand. This is a fundamentally key conversation topic in the industry today. Pick this one up and drop the other trash.
  • Movement & Data Gravity – How do you access your data, how do you store it, where and how do you derive insight from that data? This is one of the topics that came up in our discusssion and it is huge. The entire computer industry basically exists for the reason of insight. What should we eat today, how do I shift my investments, how is my development team doing, what’s the status of my house being built, where is my family today and can I contact them! All of these things are insights we derive from computer systems. These are the fundamental core reason that computers exist. As an industry we’re finally getting to a point were we can get some pretty solid insightful, intelligent and useful information from our systems. The conversation however continues, there is so much more we can still achieve. So again, drop the wasteful convo and jump on board the conversations about data, information and insights!
  • Community Involvement – I’ve left the key topic for last. This is huge, companies have to be involved today. Companies aren’t dictating progress but instead the community is leading as it should. The community is providing a path for companies to follow or lead, but the community, the individuals are the ones that are seen and known to be innovating. This is so simple it’s wild that it is only now becoming a known reality – companies don’t innovate, people do. Companies don’t involve, people do. Individuals are the drivers of companies, the drivers of Governments, they’re the ones driving innovation and progress. The focus should now and should have always been on the individuals and what they’re working toward to accomplish. So get involved, get the companies involved as a whole and keep the semantic ideal of individuals and the progress they can make core to the way you think of communities. The idea of the “company” innovating is silly, let’s talk and build community with the people that are working around and innovating with these technologies.

Of course there are more, I’d love to hear your take on what the conversations of today should be about. What do we need to resolve? How do we improve our lives, our work and the efforts we’re working toward on a day to day basis?