Architectural PaaS Cracks or Crack PaaS

Over the last couple years there have been two prominent open source PaaS Solutions come onto the market. Cloud Foundry & OpenShift. There’s been a lot of talk about these plays and the talk has slowly but steadily turned into traction. Large enterprises are picking these up and giving their developers and operations staff a real chance to make changes. Sometimes disruptive in a very good way.

However, with all the grandeur I’m going to hit on the negatives. These are the missing parts, the serious pain points beyond just some little deployment nuisance. Then a last note on why, even amidst the pain points, you still need to make real movement with PaaS tooling and technologies.

Negative: The Data Story is Lacking

Both Cloud Foundry and OpenShift have a way to plug into databases easily.

Cloud Foundry provides ways to build a Cloud Foundry Service that becomes the bound and hooked in SQL Server, MySQL, Postgresql, Redis or whatever data storage service you need. For more details on building a service, check out the echo example on the vcap sample github project.

OpenShift has what are called Cartridges which provide the ability to add databases and other services into the system. For more information about the cartridges check out Red Hat’s OpenShift Documentation and also the forums.

Cloud Foundry and OpenShift however have distinctive weak spots when it comes to services that go beyond a mere single instance database. In the case of a true distributed database such as Cassandra, HBase or Riak, it is inordinately difficult to integrate a system that any PaaS inter-operates with well. In some cases it’s irrelevant to even try.

The key problem being that both of the PaaS systems assume the mantle of master while subjugating the distributed database a lower tier of coordination. The way to resolve this at the moment is to do an autonomous installation of Riak, Cassandra, Neo4j or other database that may be distributed, stored hot swappable, or otherwise spread across multiple machine or instance points. Then create a bound connection between it and the PaaS Application that is hosted. This is the big negative in PaaS systems and tooling right now, the data story just doesn’t expand well to the latest in data and database technologies. I’ll elaborate more about this below.

Negative: Deployment is Sometimes Easy, Maintenance is Sometimes Hard

Cloud Foundry is extremely rough to deploy, unless you use Bosh to deploy to either VMware Virtualized instances or AWS. Now, you could if resources were available get Bosh to deploy your Cloud Foundry environment anywhere you wanted. However, that’s not easy to do. Bosh is still a bit of a black box. I myself along with others in the community are working to document Bosh, but it is slow going.

OpenShift is dramatically easier to deploy, but is missing a few key pieces once deployed that draw some additional operational overhead. One of those is that OpenShift requires more networking management to handle routing between various parts of the PaaS Ecosystem.

Overall, this boils down to what you need between the two PaaS tool chains. If you want Cloud Foundry’s automatic routing and management between nodes. This is a viable route, but if your team wants to manage the networking tier more autonomous from the PaaS environment then maybe OpenShift is the way to go. In the end, it’s negative bumpy territory to determine which you may or may not want based on that.

Negative: Full Spectrum Polyglot, Missing Some

Cloud Foundry has a wider selection of languages and frameworks with community involvement around those with groups like Iron Foundry. OpenShift I’m sure will be getting to parity in the coming months. I have no doubt between both of these PaaS Ecosystems that they’ll expand to new languages and frameworks over time. Being polyglot after all is a no brainer these days!

Why PaaS Is, IMHO, Still Vitally Important

First toss out the idea that huge, web scale, Facebooks and Googles need to be built. Think about what the majority of developers out there in the world work on. Tons and tons and tons of legacy or greenfield enterprise applications. Sometimes the developer is lucky enough to work on a full vertical mix of things for a small business, but generally, the standard developer in the world is working on an enterprise app.

PaaS tooling takes the vast majority of that enterprise app maintenance from an operational side and tosses it out. Instead of managing a bunch of servers with a bunch of different apps the operations team manages an ecosystem that has a bunch of apps. This, for the enterprises that have enough foresight and have managed their IT assets well enough to be able to implement and use PaaS tooling, is HUGE!

For companies working to stay relevant in the enterprise, for companies looking to make inroads into the enterprise and especially for enterprises that are looking to maintain, grow or struggling to keep ahead of the curve – PaaS tooling is something that is a must have.

Just ask a dev, do they want to spend a few hours configuring and testing a server?  Do they want to deploy their application and focus on building more value into that application?

…being I’ve spent a few years being the developer, I’ll hedge on the side of adding value.

What’s Next?

So what’s next? Two major things in my opinion.

1. Fill the data gap. Most of the PaaS tooling needs to bridge the gap with the data story. I’m working my part with testing, development and efforts to get real options built into these environments, but this often leads back to the data story of PaaS being weak. What’s the solution here? I’m in talks, ongoing, planning sessions ongoing, and we’ll eventually get a solid solution around the data side.

2. Fix deployments & deployment management. Bosh isn’t straight forward or obvious in what it does, Cloud Foundry is easily the hardest thing to deploy with many dependencies. OpenShift is easier to deploy and neither of them actually have a solid management story over time. Bosh does some impressive updates of Cloud Foundry, and OpenShift has some upgrade methods, but still over time and during day to day operations there hasn’t been any clear cut wins with viewing, monitoring and managing nodes and data within these environments.

Red Hat, OpenShift PaaS and Cartridges for Riak

Today I participated in the OpenShift Community Day here in Portland at the Doubletree Hotel. One of the things I wanted to research was the possibility of putting together a OpenShift Origin Cartridge for Riak. As with most PaaS Systems this isn’t the most straight forward process. The reason is simple, OpenShift and CloudFoundry have a deployment model based around certain conventions that don’t fit with the multi-node deployment of a distributed database. But there are ways around this and my intent was to create or come up with a plan for a Cartridge to commit these work-arounds.

After reading the “New OpenShift Cartridge Format – Part 1” by Mike McGrath @Michael_Mcgrath I set out to get a Red Hat Enterprise Linux image up and running. The quickest route to that was to spool up an AWS EC2 instance. 30 seconds later I had exactly that up and running. The next goal was to get Riak installed and running on this instance. I wasn’t going to actually build a cluster right off, but I wanted at least a single running Riak node to use for trying this out.

In the article “New OpenShift Cartridge Format – Part 1” Mike skips the specifics of the cartridge and focuses on getting a service up and running that will be turned into a Cartridge. As Mike writes,

What do we really need to do to create an new cartridge? Step one is to pick something to create a cartridge for.

…to which my answer is, “alright, creating a Cartridge for Riak!”  😉

However, even though I have the RHEL instance up and running already, with Riak installed, I decided I’d follow along with his exactly example too. So I dove in with

[sourcecode language=”bash”]
sudo yum install httpd
[/sourcecode]

to install Apache. With that done I now have Riak & Apache installed on the RHEL EC2 instance. The goal with both of these services is to get them running as the regular local Unix user in a home directory.

With both Riak and Apache installed, time to create a local user directory for each of the respective tools. However, before that, with this version of Linux on AWS we’ll need to create a local user account.

[sourcecode language=”bash”]
useradd -c "Adron Hall" adron
passwd adron

Changing password for user adron.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[/sourcecode]

Next I switched to the user I created ‘su adron’ and created the following directories in the home path for attempting to get Apache and Riak up and running locally like this. I reviewed the rest of the steps in making the Cartridge w/ Apache and then immediately started running into a few issues with getting Riak setup just like I need it to be able to build a cartridge around it. At least, with my first idea of how I should build a cartridge.

At this point I decided we need to have a conversation around the strategy here. So Bill Decoste, Ryan and some of the other Red Hat team on hand today. After a discussion with Bill it sounds like there are some possibilities to get Riak running via the OpenShift Origin Cartridges.

The Strategy

The plan now is to get a cartridge setup so that the cartridge can launch a single Riak instance. That instance, then with post-launch scripts can then join itself to the overall Riak cluster. The routing can be done via the internal routing and some other capabilities that are inherent to what is in OpenShift itself. It sounds like it’ll take a little more tweaking, but the possibility is there for the near future.

At this point I sat down and read up on the Cartridge a little more before taking off for the day. Overall a good start and interesting to get an overview of the latest around OpenShift.

Thanks to the Red Hat Team, have a great time at the OpenStack Conference and I’ll be hacking on this Cartridge strategy!

References

Deploycon, PaaS & the pending data tier gravity fallout…

For a quick recap of last years Deploycon & related talks, check out my “Day #3 => DeployCon && Enterprise && Data Gravity” entry from last year.

PaaS Systems aren’t always effectively distributed. Heroku has fallen over every time east-1 has gone down at AWS. Not that I’m saying they’ve done bad, just pointing that out. With Cloud Foundry, there’s several key SPOFs (Single Points of Failure), and with all PaaS Systems the data tier is often the neglected pairing of the system. I’ve been wanting to write about this for a few months now and Deploycon has lit a fire for me to do just that.

Deploycon – “Platform Services and Developer Expectations” **

I’m on a panel at Deploycon titled “Platform Services and Developer Expectations” and this leads right back around to that. This SPOF issue is concerning to me as PaaS Providers talk up the offerings more and more with little light actually shone on this issue. In some ways each is moving away form their respective SPOFs, but overall they’re all pretty prevalent throughout. For security, each has a non-distributed database, which technically needs backed up still – no clear replication or other mechanisms setup to ensure data integrity in a failure situation. Of course, the huge saving grace with a PaaS, is that if the overall system goes down or a SPOF blows up, all the existing deployed applications will generally continue to run. Unless of course the routing and networking are also SPOF. This is the largest glaring concern with PaaS Systems that I see today.

One of the other things about PaaS that has always led to a ton of questions is “what about my PostGresql/mysql/Riak/mongodb/database thing and how do I do X, Y, Z with it to ensure scalability in my PaaS.” In almost every case it ends with a simple and unfortunate answer, “…when it comes to data, a PaaS doesn’t really do a damn thing for ya…” This is obviously not very helpful. The entire reason to put a PaaS into place is to simplify life, the sad fact that it barely does a thing for the data tier isn’t very helpful.

Now, hold on a second before you start screaming at me about “but a PaaS does X, Y and Z and isn’t even supposed to touch that aspect of things…” let me elaborate a bit more. The panel at Deploycon states “…Developer Expectations” and when things are getting simplified in the way a PaaS does, developers assume that if it does all this fancy magic for an application it ought to simplify the data side of things too! Right? Well no, and it isn’t going to for the foreseeable future. But no matter what, it doesn’t change the fact that developers often have that expectation.

Now, I could write at length about all the reasons that PaaS doesn’t really do anything for the data tier. I could wax poetic about how a distributed database (re: Riak, Cassandra, etc) just doesn’t lend itself to a cookie cutter approach to deployment under a PaaS or an RDBMS has umpteen different configurations for stability, scaling, hot swappable services, and other such complexities around the data tier. But instead I’m going to skip all, maybe cover some of those things another day, and jump right into some of the things that are actually moving forward to fill this gap.

BOSH, Cloud Foundry, OpenShift & fixing the data tier…

The most obvious reason there isn’t a simple turn key solution to the data side of things with a PaaS ecosystem is that data is complex and extremely diverse. There’s distributed key/value stores (Riak, Cassandra), there’s sort of kind of distributed databases (Mongo), graph databases (Neo4j), the age old RDBMS (DB2, SQL Server, Oracle’s Stuff, etc) and the million solutions around that, there’s key/value in memory styled databases that are insanely fast, like Redis. Expanding just slightly you have software that works around these systems such as Hadoop & Riak CS & the list goes on. All of it focused on the data tier and maintaining one, two or some form of the three points around CAP Theorem (http://en.wikipedia.org/wiki/CAP_theorem), atomicity and other key capbilities.

All of the PaaS Systems, including public and private often have some sort of plug-in style architectures for data. Whether it is Apprenda which is closed to community and closed source or an ongoing open to community PaaS like OpenShift or Cloud Foundry, things still fall almost entirely to the developers or database team to build an architecture around the data. When looking at solutions to simplify data in PaaS Systems the closed source solutions we have no idea what they’re up to in this regard. The one’s that are open source or in large part public and involved in the community PaaSes, like EngineYard, Heroku, Cloudbees and others we can really see the directions and efforts around creating real PaaS style solutions to the data tier problem.

BOSH, Vagrant, etc…  One of the best solutions I’ve seen so far is the ability of Bosh, which was created by the Cloud Foundry team while at VMware, to spool up an environment that includes such things as a Riak Cluster (or other cluster). Currently Brian McClain & Dr Nic have worked to put together such Bosh + Vagrant scripts & get things rolling. I myself will be spending some considerable time on just that. But beyond that this is a good start in enabling data tier back ends.

How to close the gap, between absurdly simple application deployment and still arduous and difficult data tier deployment? For the next several years I think we’ll have cumbersome deployment practices around the data tier. There won’t be anything as elegantly simple as Cloud Foundry’s single line deployment or AppFog’s one click deployment of a web application. The best we can do at this time, is to streamline around pieces and architectures, and at least get them into a kind of simple 3 step deployment.

Please drop a comment or two on how you think we might simplify the data side of the PaaS toolchain. Also drop a few tweets in the twitterverse too, I’m sure that’ll be exploding as usual. I’m @adron, ping me.

Cheers, happy data architecting.

** the Deployconpanel will be at 4:30pm in Santa Clara on April 2nd. Come check it out.