Deploycon, PaaS & the pending data tier gravity fallout…

For a quick recap of last years Deploycon & related talks, check out my “Day #3 => DeployCon && Enterprise && Data Gravity” entry from last year.

PaaS Systems aren’t always effectively distributed. Heroku has fallen over every time east-1 has gone down at AWS. Not that I’m saying they’ve done bad, just pointing that out. With Cloud Foundry, there’s several key SPOFs (Single Points of Failure), and with all PaaS Systems the data tier is often the neglected pairing of the system. I’ve been wanting to write about this for a few months now and Deploycon has lit a fire for me to do just that.

Deploycon – “Platform Services and Developer Expectations” **

I’m on a panel at Deploycon titled “Platform Services and Developer Expectations” and this leads right back around to that. This SPOF issue is concerning to me as PaaS Providers talk up the offerings more and more with little light actually shone on this issue. In some ways each is moving away form their respective SPOFs, but overall they’re all pretty prevalent throughout. For security, each has a non-distributed database, which technically needs backed up still – no clear replication or other mechanisms setup to ensure data integrity in a failure situation. Of course, the huge saving grace with a PaaS, is that if the overall system goes down or a SPOF blows up, all the existing deployed applications will generally continue to run. Unless of course the routing and networking are also SPOF. This is the largest glaring concern with PaaS Systems that I see today.

One of the other things about PaaS that has always led to a ton of questions is “what about my PostGresql/mysql/Riak/mongodb/database thing and how do I do X, Y, Z with it to ensure scalability in my PaaS.” In almost every case it ends with a simple and unfortunate answer, “…when it comes to data, a PaaS doesn’t really do a damn thing for ya…” This is obviously not very helpful. The entire reason to put a PaaS into place is to simplify life, the sad fact that it barely does a thing for the data tier isn’t very helpful.

Now, hold on a second before you start screaming at me about “but a PaaS does X, Y and Z and isn’t even supposed to touch that aspect of things…” let me elaborate a bit more. The panel at Deploycon states “…Developer Expectations” and when things are getting simplified in the way a PaaS does, developers assume that if it does all this fancy magic for an application it ought to simplify the data side of things too! Right? Well no, and it isn’t going to for the foreseeable future. But no matter what, it doesn’t change the fact that developers often have that expectation.

Now, I could write at length about all the reasons that PaaS doesn’t really do anything for the data tier. I could wax poetic about how a distributed database (re: Riak, Cassandra, etc) just doesn’t lend itself to a cookie cutter approach to deployment under a PaaS or an RDBMS has umpteen different configurations for stability, scaling, hot swappable services, and other such complexities around the data tier. But instead I’m going to skip all, maybe cover some of those things another day, and jump right into some of the things that are actually moving forward to fill this gap.

BOSH, Cloud Foundry, OpenShift & fixing the data tier…

The most obvious reason there isn’t a simple turn key solution to the data side of things with a PaaS ecosystem is that data is complex and extremely diverse. There’s distributed key/value stores (Riak, Cassandra), there’s sort of kind of distributed databases (Mongo), graph databases (Neo4j), the age old RDBMS (DB2, SQL Server, Oracle’s Stuff, etc) and the million solutions around that, there’s key/value in memory styled databases that are insanely fast, like Redis. Expanding just slightly you have software that works around these systems such as Hadoop & Riak CS & the list goes on. All of it focused on the data tier and maintaining one, two or some form of the three points around CAP Theorem (http://en.wikipedia.org/wiki/CAP_theorem), atomicity and other key capbilities.

All of the PaaS Systems, including public and private often have some sort of plug-in style architectures for data. Whether it is Apprenda which is closed to community and closed source or an ongoing open to community PaaS like OpenShift or Cloud Foundry, things still fall almost entirely to the developers or database team to build an architecture around the data. When looking at solutions to simplify data in PaaS Systems the closed source solutions we have no idea what they’re up to in this regard. The one’s that are open source or in large part public and involved in the community PaaSes, like EngineYard, Heroku, Cloudbees and others we can really see the directions and efforts around creating real PaaS style solutions to the data tier problem.

BOSH, Vagrant, etc…  One of the best solutions I’ve seen so far is the ability of Bosh, which was created by the Cloud Foundry team while at VMware, to spool up an environment that includes such things as a Riak Cluster (or other cluster). Currently Brian McClain & Dr Nic have worked to put together such Bosh + Vagrant scripts & get things rolling. I myself will be spending some considerable time on just that. But beyond that this is a good start in enabling data tier back ends.

How to close the gap, between absurdly simple application deployment and still arduous and difficult data tier deployment? For the next several years I think we’ll have cumbersome deployment practices around the data tier. There won’t be anything as elegantly simple as Cloud Foundry’s single line deployment or AppFog’s one click deployment of a web application. The best we can do at this time, is to streamline around pieces and architectures, and at least get them into a kind of simple 3 step deployment.

Please drop a comment or two on how you think we might simplify the data side of the PaaS toolchain. Also drop a few tweets in the twitterverse too, I’m sure that’ll be exploding as usual. I’m @adron, ping me.

Cheers, happy data architecting.

** the Deployconpanel will be at 4:30pm in Santa Clara on April 2nd. Come check it out.

Day #3 => DeployCon && Enterprise && Data Gravity

A lot of things were mentioned during the panels and sessions during DeployCon. Some things I agree with, some things I don’t.

The “Web Way”

There is one prevalent thing that came up over and over, the “web way”. What’s the web way? It is building horizontally, scalable, with RESTful APIs, and applications at an upper tier that are loosely coupled to any back end, and generally geographically independent (i.e. dispersed). This could be a mix of IaaS and PaaS, public or private, or a mix of all these things. Above all, it is about building horizontally instead of vertically, so that one can scale big, really big.

It was generally accepted in the dozens of conversations I had about the “web way” that the enterprise was only realizing that their idea of “big” and “redundant” just doesn’t even compare to what has been built using ideas from the Internet Generation. The Facebook PHP scale has blown away almost any enterprise’s scale problem, Twitter’s piping relationships across big data trumps most enterprises service bus concerns, Netflix’s streaming makes most enterprise’s issues with data transfer seem like a child’s game.

But all this isn’t to spite or hate on enterprise teams. They have an immensely important job to do. With all of these advances, and with management allowing enterprises to step closer into the community and get involved with things built the “web way” there is a massive upside. The enterprise is getting closer to cloud technologies. They’re learning how to leverage their on premise assets for various reasons, such as not throwing out the investment, and public or private cloud tech together.

Combing these things, and the complex needs of the enterprise together is all benefits, they’re getting the bleeding edge technology and research basically for free. All while effectively finding ways for a slow migration to not damage existing investments, day to day operations, and improve overall resiliency and services within the enterprise itself.

Overall, a huge win-win for everybody.

Data Gravity is Vital to Understand for Application Architecture

Dave McCrory kicked off the Data Gravity analogy to application and data source spacing. It is a pretty flawless analogy which brings us a much clearer understanding around application architecture when it comes to where data sources and where applications sit within a system. It also helps to define the value, cost, and gravity of data in relation to it’s location relative to the application.

With this topic brought up, every single tie it hinged around people fighting the gravity between data and applications. The difficulty in using cloud technologies, big data and other things is highly advantageous for companies, but as Dave points out in his presentations and writings about data gravity, it is getting exponentially more difficult to pull apart data and applications to rejoin them somewhere else. Such as the massively more scalable, powerful expansive public cloud.

These conversations lead to a growing opportunity space in the industry, moving data. This could be done with physical moves of drives, big pipes, pipes as a service or a host of other offerings. So far, very few are doing much in this realm, understandably considering the difficulty. But I’m betting that we’ll start seeing some serious efforts put into this. One of my personal notions is the idea of going cloud to cloud, we’ll see if anything pops up in the near future. If not, I might have to make a play on that myself.

…and last but not least, DeployCon!

DeployCon was the major reason I attended Cloud Expo. DeployCon was about PaaS Tech & the future movement of cloud technologies. Even though the rest of the conference, for most enterprises, seems bleeding edge, DeployCon was about the truly bleeding edge technology. Not only that, the elephant in the room is about the new king makers of IT, of technology, and of business in general; the software developers. Yup, I said it. We might want to stick IT on things here and there but the fact of the matter, over the next 10 years – if the IT moniker even sticks – it’s going to be more and more developers, business apps, and business development – not more network or system admin roles that are needed. The abstractions are pushing developers into an even more pivotal role in a company and providing even greater benefits to business. You can ask any number of people in the PaaS space, from AppFog, AppHarbor, Tier 3, Stackato/ActiveState all the way to Windows Azure (they need to actually show up next time, just saying!). They ALL SEE THIS HUGE CHANGE coming. Not only do they see it, they are experiencing the beginnings of the change.

So hats off to an excellent job Krishnan (@krishnan @ Rishidot Research) for putting this together! Wendy White (@wendywhite) for throwing in major logistics support and all the others too! I had a blast and am looking forward to the next event.

Thanks also to the fire starters and technologists on the panels! You guys know who you are, good job! Here’s a few parting shots of the awesome people I got to meet and talk tech with!

Parting Shots

@brianmmcclain & @markkropf enjoying a round of drinks!
@brianmmcclain & @markkropf enjoying a round of drinks! These guys are smart, seriously. (click for full size)
The one and only  @wattersjames the man with impeccable taste @mortonheroku and the man with the plan @guy_marion
The one and only @wattersjames (thanks for the fire starting!) the man with impeccable taste @mortonheroku and the man with the plan @guy_marion (click for full size image)
The beautifully witty @briellenikaido and @mortenheroku showing us the where the premo tastes are in New York City.
The beautifully witty @briellenikaido and @mortenheroku (GREAT Sake choice btw) showing us the where the premo tastes are in New York City. (click for full size image)