Over the last few weeks the I’ve been putting together multi-cloud conversations and material related to multi-cloud implementation, conversations, and operational situations that exist today. I took a quick look at some of my repos on Github and realized I’d put together a multi-cloud Node.js sample app some time ago and should update it. I’ll get to that, hopefully, but I also stumbled into some tweets and other material I wanted to collect a few of them together.
Our Killrvideo Reference Application Work @ DataStax which we’ll have more coverage of, blog entries, and related material coming out in the coming weeks about what we’ve been doing to make this reference application multi-cloud ready and then multi-cloud capable.
I attended the PDX Cloud meeting to present, but more to ask a question. Here’s how I posed that question (slide deck at the bottom of this blog entry). I frame the scenario of the distributed development world of cloud computing, dive into the vertical world of enterprise dev and then throw down the big question…
This is a situational report on the current state, of the somewhat bi-polar condition that exists in software development right now. This is reflective of my train of thought around a number of aspects of the industry and what questions have come up time and time again while working with fellow coders and technologists.
The first segment of the industry that we often here about. it’s the hip and cool thing to do, as well as the obvious path into the future right now. It’s not particularly the idea that this segment, of building things as distributed systems is new, it’s just that it has become more important and more capable now than it ever has in the past.
A lot of this has to do with the advent of key technologies around virtualization, cloud computing and large scale object storage and network capabilities. We can spool up enough compute to rival a super computer, sitting alone at home, to storing more data than we can imagine with zero theoretical limit to that storage. All of this networked together behind load balancers, switches and programmable devices that a mere half dozen years ago would have taken more resources than any reasonably sized small business could even afford. All of these capabilities are literally at our fingertips now.
I’ve spooled up a 1000 EC2 instances for a demo before. That was 2 years ago even! Now I as well as many host applications and databases entirely in memory. SSDs as a cloud back end option at AWS and other locations provide another avenue that brings these devices into a world where they can be utilized immediately. Blink an eye, you’ll have the resources.
The storage realm, with costs falling through the floor with Glacier to operationally effective options like S3, EBS, Table Store, Object storage and others make our junk trunks limitless. The option to throw away any data at all seems less and less appealing.
Many developers, but definitely not all, have seized opportunities to alter the way they work and what they’re able to accomplish by using these new capabilities. From the now common asynchronous approach to development, shifting languages and stack to the invention of new paradigms around development and operations into a devop practice, leadership has stepped up to this changing game.
Vertical systems have in the past twenty years held the main position in the enterprise as the go to architectures. Client server or three tier or whatever one may call it. With a synchronous mindset the vertical implementation of systems produced several benefits.
We gained the ability through diligent documentation and widget style architecture to build CRUD (Create Read Update Delete) and LOB (Line of Business) applications at a rapid rate. With a simplified approach like this businesses spent a lot of time focusing on their business, not particularly on efficient utilization of resources, processing or reliability. But who could blame them, with Moore’s Law it seemed the only real ways to scale vertical systems were by writing faster code or buying a bigger computer, for a while that seemed to work fine.
Most of the, what I’ll call “vertical revolution” happened with the GSD mindset. GSD mean Get Shit Done. Again, another idea that sort of worked pretty well as long as Moore’s Law was in effect. But things have started to change, with Moore’s Law faltering.
Management practices also became a complete TLA soup during this time. The last 20 years continued the standard “let’s cookie cut people into widget producers”. It never works as well as it could or should, but the industry – and really all humanity keeps trying – to do this anyway. This is fine, we’ve got to try. The vertical stack however brought this to the extreme forefront as the industry tried to shoe horn all sorts of development into singular types of management practices.
Overall though as long as things stayed simple, we stuck to our KISS principles as software craftspeoples the architecture stays straight forward enough and the stack stays easy. However there are voluminous limitations. There are massive management and project issues with all of this.
Many parts of the industry are screaming for the future. As we have it, some agree on certain aspects of what the future should be and others agree on other aspects of the future.
We have some bright spots amid the confusion that is making the distributed world much easier, and the technology continues to do this.
Some want convergence. Which may work well in some ways, but in others it is converging into a clustered mess. As with the roadways of the 50s and the effervescent ideas of 50s planners, we’re finding the idea of the superhighways aren’t working either. The same is starting to appear for some types of device convergence. So where does this really leave us? Where are our weak spots as an industry? It seems like right now we’re stuck in that traffic jam getting to the next step.
Things are looking a little like this freakingnews.com MAV. Multifunction and not functional at all.
So to gain clarity on direction I pose the question…
How do we change the later world to work as well as the new world of distributed systems?
…and a few follow ups.
What do developers in the industry need to make true distributed computing advances while drawing on the known elements of the vertical computing realm?
What do we need as developers and leaders to more reliably advance the industry without setbacks?
What do we need as leaders to move the industry forward to the next steps, stages and developments in converging technology?
Are these even valid questions? What would you propose to ask?
Over the last couple years there have been two prominent open source PaaS Solutions come onto the market. Cloud Foundry & OpenShift. There’s been a lot of talk about these plays and the talk has slowly but steadily turned into traction. Large enterprises are picking these up and giving their developers and operations staff a real chance to make changes. Sometimes disruptive in a very good way.
However, with all the grandeur I’m going to hit on the negatives. These are the missing parts, the serious pain points beyond just some little deployment nuisance. Then a last note on why, even amidst the pain points, you still need to make real movement with PaaS tooling and technologies.
Negative: The Data Story is Lacking
Both Cloud Foundry and OpenShift have a way to plug into databases easily.
Cloud Foundry and OpenShift however have distinctive weak spots when it comes to services that go beyond a mere single instance database. In the case of a true distributed database such as Cassandra, HBase or Riak, it is inordinately difficult to integrate a system that any PaaS inter-operates with well. In some cases it’s irrelevant to even try.
The key problem being that both of the PaaS systems assume the mantle of master while subjugating the distributed database a lower tier of coordination. The way to resolve this at the moment is to do an autonomous installation of Riak, Cassandra, Neo4j or other database that may be distributed, stored hot swappable, or otherwise spread across multiple machine or instance points. Then create a bound connection between it and the PaaS Application that is hosted. This is the big negative in PaaS systems and tooling right now, the data story just doesn’t expand well to the latest in data and database technologies. I’ll elaborate more about this below.
Negative: Deployment is Sometimes Easy, Maintenance is Sometimes Hard
Cloud Foundry is extremely rough to deploy, unless you use Bosh to deploy to either VMware Virtualized instances or AWS. Now, you could if resources were available get Bosh to deploy your Cloud Foundry environment anywhere you wanted. However, that’s not easy to do. Bosh is still a bit of a black box. I myself along with others in the community are working to document Bosh, but it is slow going.
OpenShift is dramatically easier to deploy, but is missing a few key pieces once deployed that draw some additional operational overhead. One of those is that OpenShift requires more networking management to handle routing between various parts of the PaaS Ecosystem.
Overall, this boils down to what you need between the two PaaS tool chains. If you want Cloud Foundry’s automatic routing and management between nodes. This is a viable route, but if your team wants to manage the networking tier more autonomous from the PaaS environment then maybe OpenShift is the way to go. In the end, it’s negative bumpy territory to determine which you may or may not want based on that.
Negative: Full Spectrum Polyglot, Missing Some
Cloud Foundry has a wider selection of languages and frameworks with community involvement around those with groups like Iron Foundry. OpenShift I’m sure will be getting to parity in the coming months. I have no doubt between both of these PaaS Ecosystems that they’ll expand to new languages and frameworks over time. Being polyglot after all is a no brainer these days!
Why PaaS Is, IMHO, Still Vitally Important
First toss out the idea that huge, web scale, Facebooks and Googles need to be built. Think about what the majority of developers out there in the world work on. Tons and tons and tons of legacy or greenfield enterprise applications. Sometimes the developer is lucky enough to work on a full vertical mix of things for a small business, but generally, the standard developer in the world is working on an enterprise app.
PaaS tooling takes the vast majority of that enterprise app maintenance from an operational side and tosses it out. Instead of managing a bunch of servers with a bunch of different apps the operations team manages an ecosystem that has a bunch of apps. This, for the enterprises that have enough foresight and have managed their IT assets well enough to be able to implement and use PaaS tooling, is HUGE!
For companies working to stay relevant in the enterprise, for companies looking to make inroads into the enterprise and especially for enterprises that are looking to maintain, grow or struggling to keep ahead of the curve – PaaS tooling is something that is a must have.
Just ask a dev, do they want to spend a few hours configuring and testing a server? Do they want to deploy their application and focus on building more value into that application?
…being I’ve spent a few years being the developer, I’ll hedge on the side of adding value.
So what’s next? Two major things in my opinion.
1. Fill the data gap. Most of the PaaS tooling needs to bridge the gap with the data story. I’m working my part with testing, development and efforts to get real options built into these environments, but this often leads back to the data story of PaaS being weak. What’s the solution here? I’m in talks, ongoing, planning sessions ongoing, and we’ll eventually get a solid solution around the data side.
2. Fix deployments & deployment management. Bosh isn’t straight forward or obvious in what it does, Cloud Foundry is easily the hardest thing to deploy with many dependencies. OpenShift is easier to deploy and neither of them actually have a solid management story over time. Bosh does some impressive updates of Cloud Foundry, and OpenShift has some upgrade methods, but still over time and during day to day operations there hasn’t been any clear cut wins with viewing, monitoring and managing nodes and data within these environments.
PaaS Systems aren’t always effectively distributed. Heroku has fallen over every time east-1 has gone down at AWS. Not that I’m saying they’ve done bad, just pointing that out. With Cloud Foundry, there’s several key SPOFs (Single Points of Failure), and with all PaaS Systems the data tier is often the neglected pairing of the system. I’ve been wanting to write about this for a few months now and Deploycon has lit a fire for me to do just that.
Deploycon – “Platform Services and Developer Expectations” **
I’m on a panel at Deploycon titled “Platform Services and Developer Expectations” and this leads right back around to that. This SPOF issue is concerning to me as PaaS Providers talk up the offerings more and more with little light actually shone on this issue. In some ways each is moving away form their respective SPOFs, but overall they’re all pretty prevalent throughout. For security, each has a non-distributed database, which technically needs backed up still – no clear replication or other mechanisms setup to ensure data integrity in a failure situation. Of course, the huge saving grace with a PaaS, is that if the overall system goes down or a SPOF blows up, all the existing deployed applications will generally continue to run. Unless of course the routing and networking are also SPOF. This is the largest glaring concern with PaaS Systems that I see today.
One of the other things about PaaS that has always led to a ton of questions is “what about my PostGresql/mysql/Riak/mongodb/database thing and how do I do X, Y, Z with it to ensure scalability in my PaaS.” In almost every case it ends with a simple and unfortunate answer, “…when it comes to data, a PaaS doesn’t really do a damn thing for ya…” This is obviously not very helpful. The entire reason to put a PaaS into place is to simplify life, the sad fact that it barely does a thing for the data tier isn’t very helpful.
Now, hold on a second before you start screaming at me about “but a PaaS does X, Y and Z and isn’t even supposed to touch that aspect of things…” let me elaborate a bit more. The panel at Deploycon states “…Developer Expectations” and when things are getting simplified in the way a PaaS does, developers assume that if it does all this fancy magic for an application it ought to simplify the data side of things too! Right? Well no, and it isn’t going to for the foreseeable future. But no matter what, it doesn’t change the fact that developers often have that expectation.
Now, I could write at length about all the reasons that PaaS doesn’t really do anything for the data tier. I could wax poetic about how a distributed database (re: Riak, Cassandra, etc) just doesn’t lend itself to a cookie cutter approach to deployment under a PaaS or an RDBMS has umpteen different configurations for stability, scaling, hot swappable services, and other such complexities around the data tier. But instead I’m going to skip all, maybe cover some of those things another day, and jump right into some of the things that are actually moving forward to fill this gap.
BOSH, Cloud Foundry, OpenShift & fixing the data tier…
The most obvious reason there isn’t a simple turn key solution to the data side of things with a PaaS ecosystem is that data is complex and extremely diverse. There’s distributed key/value stores (Riak, Cassandra), there’s sort of kind of distributed databases (Mongo), graph databases (Neo4j), the age old RDBMS (DB2, SQL Server, Oracle’s Stuff, etc) and the million solutions around that, there’s key/value in memory styled databases that are insanely fast, like Redis. Expanding just slightly you have software that works around these systems such as Hadoop & Riak CS & the list goes on. All of it focused on the data tier and maintaining one, two or some form of the three points around CAP Theorem (http://en.wikipedia.org/wiki/CAP_theorem), atomicity and other key capbilities.
All of the PaaS Systems, including public and private often have some sort of plug-in style architectures for data. Whether it is Apprenda which is closed to community and closed source or an ongoing open to community PaaS like OpenShift or Cloud Foundry, things still fall almost entirely to the developers or database team to build an architecture around the data. When looking at solutions to simplify data in PaaS Systems the closed source solutions we have no idea what they’re up to in this regard. The one’s that are open source or in large part public and involved in the community PaaSes, like EngineYard, Heroku, Cloudbees and others we can really see the directions and efforts around creating real PaaS style solutions to the data tier problem.
BOSH, Vagrant, etc… One of the best solutions I’ve seen so far is the ability of Bosh, which was created by the Cloud Foundry team while at VMware, to spool up an environment that includes such things as a Riak Cluster (or other cluster). Currently Brian McClain & Dr Nic have worked to put together such Bosh + Vagrant scripts & get things rolling. I myself will be spending some considerable time on just that. But beyond that this is a good start in enabling data tier back ends.
How to close the gap, between absurdly simple application deployment and still arduous and difficult data tier deployment? For the next several years I think we’ll have cumbersome deployment practices around the data tier. There won’t be anything as elegantly simple as Cloud Foundry’s single line deployment or AppFog’s one click deployment of a web application. The best we can do at this time, is to streamline around pieces and architectures, and at least get them into a kind of simple 3 step deployment.
Please drop a comment or two on how you think we might simplify the data side of the PaaS toolchain. Also drop a few tweets in the twitterverse too, I’m sure that’ll be exploding as usual. I’m @adron, ping me.
What is Riak? Who builds it? Who maintains it? Can I download it? How does it work? What are the features?
Here’s the start of answers to these questions and more.
First, the basic high level description:
“Riak is an open source, highly scalable, fault-tolerant distributed database.”
That’s the first line you’ll read when checking out the product via the Basho product link. It provides good information, but here I’m going to add more to the definition without the need to dig around yourself. Maybe I can save you some time & provide some links directly to solid information in the docs. Kind of a “Cliff Notes” of Riak. Let’s take this feature by feature which will in turn get us to a definitive definition of what exactly Riak is.
Riak is Open Source.
Riak is built and contributed to by the community, with Basho being the steward and an active member that extends, builds and provides support for additional products. The avenues to reach the Riak Open Source Community members is pretty straight forward, following known avenues of communication. Hit us up on the email list, especially feel free to contribute & ask questions via the Github Basho organization, there is the Basho Riak Blog, the weekly recap and jump into the IRC chat room #riak on freenode. Oh, and there’s a twitter feed @basho.
So what exactly does this get you, when you become a user or contributor of Riak? The entire community is behind you, will help you get started using Riak and provide help whenever you run into problems. If you want SLAs or 24 hour support Basho can provide this for you. But for bugs, issues, queries, searching and all sorts of other related development questions there is the community. An open source community like this is passionate, which means you’ll have support like no closed source company will ever provide you, and absolutely no closed source product’s community will provide you. We’re talking about a different level of interest, passion and levels of personal involvement.
Riak is a key value based database store.
Riak is a key value store. What exactly is a key value store? It’s pretty simple and you’re probably already familiar with what a key value store is. A key value is made up of two pieces of data, the first is the identifier for the second element within the data structure. This gives a system or developer using key value storage a schema-less way of working with data.
Riak is designed for highly distributed environments.
This type of distributed isn’t the “we put one database over here and one database over here and you gotta figure out how they work together” type of distribution. So this isn’t some of that oddball pretend stuff Oracle keeps hoisting on people. This is the honest to goodness distribution of the sort, when one node goes down you don’t blink, you don’t stop eating dinner, you don’t sweat it. You just continue onward with life knowing full well that you’ll just spool up another node when you need to.
Riak is master-less, with no single point of failure.
This is one of self explanatory features. But what does a master-less system provide us? One thing is no single point of failure. Being that all nodes can act autonomously to work around the loss of one or more nodes it also helps add to the high availability of the system.
Riak is fault tolerant, like a disk drive you wish was real.
Ever have a backup disk drive? What? You don’t have one of those? Ugh. Ok, so imagine you had a backup disk drive that had an unfortunately high failure rate. Well, why, because you know, they have an oddly high failure rate. If you do backups like good practices dictate, eventually you’ll end up with some dead drives.
RAID, both software and hardware, are built specifically to deal with this type of failure. With a distributed system like Riak, it bumps the level of abstraction above software or hardware RAID, enabling another level of even greater fault tolerance. Not to remove the relevance of RAID capabilities, but with a multi-node system like Riak, you can easily remove nodes and swap them out as needed, keeping costs down by using simple drives in simple machines. If you want to, you could indeed get higher I/O machines and faster drives, but it isn’t necessary to insure fault tolerance in a Riak Database System.
Riak scales, with hot swappable nodes enabling zero downtime.
The ability to commit hot swappable changes while in the midst of operating starts at a very low level for Riak. The language used to build Riak, Erlang has the ability to change pieces of an application system in realtime built into the precepts of the language. This provides, at the core, the inherent capability to change out systems, and by proxy of architectural design, the ability for nodes in Riak to be changed out simply by removing them from a cluster ring. Once that is done it is just as simple to add another node or nodes back into the cluster ring, enabling a number of additional practices around upgrades, hot swaps for failures, or even version changes.
Riak can be used as a building block for distributed (aka cloud) infrastructure.
The concepts and contractual components that Riak Database is built on are available for use via the Riak Core Project. If you’re looking into starting a project around distributed systems this is a great place to get start. Also be sure to do a general web engine (re: google) search for “riak core” and you’ll find lots of material around the project, and projects people have started with the project as a base. I’m currently in the process of putting together one of these projects myself.
Riak is eventually consistent.
The term eventually consistent is becoming more and more common place. Riak is one of the many systems, that inherently often apply to distributed systems, that use the concepts of eventual consistency. The idea, is that even though all nodes may not immediately receive a new piece of data, or updated piece of data, they eventually will receive that update and by synchronized with the cluster ring of nodes. This goes back to the equality of nodes and removal of the master-less concepts, providing the availability and other capabilities, with some trade off in the synchronization of data through eventual consistency.
That’s round one for the many features of Riak. I’ll be adding more in the future, but for now this is a good starting point in knowing about and knowing what Riak is, what it can be used for, and how it might help you extend, maintain or invent the next great piece of technology.