Is PaaS Tech Still Around? Maybe Containers Will Kill it or Bring it?

Recently a post from @Gigabarb popped up on the ole’ Twitter that started a micro-storm of twitter responses.

This got me thinking about a number of things and I started to write her an email specifically, but realized I should really just blog it. After all, the topic is actually part of what should be the public conversation. It’s about the changing world of technology, which we’re all part of…

First Topic: Usage of PaaS

Barb, just shortly after the tweet above was posted, this other tweet altered what information I might provide her. @TheSteve0 had responded with some items, which @GigaBarb then responded with

Now, not to pick on OpenShift & Red Hat (the effort @TheSteve0 is working with), because they have a great open source effort going on around this PaaS Technology, but Barb had a point. If Cloud Foundry responded with something like this, she’d still have a point. The only companies that continually sign up new companies is AWS & Beanstalk (ok, so they don’t call it PaaS, it gets you to the same place – arguably better than most of the others), a little bit at Windows Azure and a few companies pop up every once in a long while that might take Cloud Foundry or OpenShift and run with it. Most of the early adopters are already on board and most that might get on board are still mostly just waiting in the sidelines.

This fact is frustrating for those

in the space that want to see more penetration, but for those that arent’ technically in the space, it seems kind of like ASP. Oh wait, I should add context now, ASPs as in Application Service Providers. The technology from the beginning of the 21st century similar in many ways to what is dubbed SaaS now. At the time it could have been revolutionary. However at the time nobody picked it up either. This is similar to what PaaS is seeing. However…

A Hypothesis of What Will Happen to PaaS Tech

I have a theory of what will happen to PaaS Tech, it is similar to ASP Tech. PaaS will keep plundering along in odd ways, and eventually one day, it will become a mainstream tech. Right now however it will remain limited. In that same turn, by the time it becomes a common tech, it’ll be called something else.

Here’s a few reasons. One, is that many developers see PaaS and their response, especially if they’re seasoned developers with more than a few years under their belt, is to respond will immediate apprehension to the tech. It removes key elements of what they want to control. It hides things they can’t actually get to and it abstracts in ways that don’t always make sense. The result is that many senior devs stay away from pure PaaS offerings and instead use it only for prototyping, but production gets something totally different. I’ve been there more than a few times myself.

However, the result of what most senior devs end up with, when they get their continuous integration and development environments running at full tilt, is exactly what PaaS is attempting to promise. There are some companies, with senior devs, and extremely intelligent members that have taken PaaS and effectively implemented it into their continuous integration and delivery environment giving them strengths that most companies can only imagine to have.

One of those companies is lucky and smart enough to have Jonathan Murray @adamalthus heading up efforts. On his team he also has Dave McCrory @mccrory and Brian McClain @brianmmclain. To boot, they are close to the Cloud Foundry team (and @wattersjames, who cuts a path when there are issues) and keep a solid effort going working with key partners such as @Tier3 (now part of CenturyLink)  and other companies that help bring together one of the most strategically and tactically relevant PaaS deployments to date.

Other PaaS deployments are questionable for various reasons, they’re trying, but they aren’t there. At least not the types of companies and efforts that Barb was looking for. So really, if there is another out there that’s hiding, but wants serious street cred. A boost to hiring serious A grade talent, and to push forward past competitors, please let us know. Let me know, let Barb know and let’s hear about what you’re doing. If a company is hiding their implementation and doesn’t want to be part of the community, then fine, they can stay hidden and not gain the benefit of the community that presses forward beyond them. But I would love to hear from those that I might have missed, that want to push forward, so ping me. Ping Barb, we’ll get word out there and get developers checking out and making sure your company is getting it done! 😉

Second Topic: PaaS on PaaS and Start Docker

PaaS is nice. If your company can get it deployed and use it effectively, the you’re going to push forward fast in many regards. Deployments, savings, code cleanliness, effective separation of concerns and abstraction at a systems level are some of the things you can expect from a good PaaS implementation. Sometimes however, as the senior devs I mentioned pointed out, you give up control and certain levels of abstraction. However almost all senior devs understand that they want the ability to abstract at the levels that PaaS enables. They want to break apart the app cleanly at the system level from the software level. No reason for an app to know where or what a hard drive is doing right? That’s a rhetorical question, onward with the topic…

Docker has entered the market with a BOOM, part of the abstraction level that enables PaaS tooling in the first place. This tool enables a team to jump into the code or to just deploy the tool to abstract at a PaaS level, but to build the elements that they need specifically. The components are able to be brought together in a composite way that provides all the advantages of PaaS, while put together specifically for the problem space that the team is attacking. For environments that don’t make cookie cutter apps that fit perfectly to PaaS tooling as it is, that needs that little bit extra control of the environment, Docker is the perfect tool to bring those pieces together.

So really, is Docker and containerization that new word (from a technically old tech! lolz), that new tech, that’s going to bring PaaS into the mainstream as the standard implementation? Is it going to make PaaS become containerization when we developers talk about it? It could very well be the next big step. It could be that last mile coverage that devs want to push environments into a PaaS Tech ecosystem and make full use of hardware, software and move to the next stage of application development. Could it? Will it?

Personally I’m ready for the next stage of the whole PaaS thing, are you?

Next up on other thought patterns, WTF are people using Oracle for still when mariadb and postgres mean their freedom to innovate, move forward and surpass their competition.

Learning About Docker

Over the next dozen or so few days I’ll be ramping up on Docker, where my gaps are and where the project itself is going. I’ve been using it on and off and will have more technical content, but today I wanted to write a short piece about what, where, who and how Docker came to be.

As an open source engine Docker automates deployment of lightweight, portable, resilient and self-sufficient containers that run primarily on Linux. Docker containers are used to contain a payload, encapsulate that and consistently run it on a server.

This server can be virtual, on AWS or OpenStack, in clusters, public instances or private, bare-metal servers or wherever one can get an operating system to run. I’d bet it would show up on an Arduino cluster one of these days.  😉

User cases for Docker include taking packaging and deployment of applications and automating it into a simple container bundle. Another is to build PaaS style environments, lightweight that scale up and down extremely fast. Automate testing and continuous integration and deployment, because we all want that. Another big use case is simply building resilient, scalable applications that then can be deployed to Docker containers and scaled up and down rapidly.

A Little History

The creators of Docker formed a company called dotCloud that provided PaaS Services. On October 29th, 2013 however they changed the name from dotCloud to Docker Inc to emphasize the focus change from the dotCloud PaaS Technology to the core of dotCloud, Docker itself. As Docker became the core of a vibrant ecosystem the founders of dotCloud chose to focus on this exciting new technology to help guide and deliver on an ever more robust core.

Docker Ecosystem from the Docker Blog. Hope they don't mind I linked it, it shows the solid lifecycle of the ecosystem. (Click to go view the blog entry that was posted with the image)
Docker Ecosystem from the Docker Blog. Hope they don’t mind I linked it, it shows the solid lifecycle of the ecosystem. (Click to go view the blog entry that was posted with the image)

The community of docker has been super active with a dramatic number of contributors, well over 220 now, most who don’t work for Docker and they’ve made a significant percentage of the commits to the code base. As far as the repo goes, it has been downloaded over a 100,000 times, yup, over a hundred. thousand. times!!! It’s container tech, I’m still impressed just by this fact! On Github the repo has thousands of starred observers and over 15,000 people are using Docker. One other interesting fact is the slice of languages, with a very prominent usage of Go.

Docker Language Breakout on Github
Docker Language Breakout on Github

Overall the Docker project has exploded in popularity, which I haven’t seen since Node.js set the coder world on fire! It’s continuing to gain steam in how and in which ways people deploy and manage their applications – arguably more effectively in many ways.

Portland Docker Meetup. Click image for link to the meetup page.
Portland Docker Meetup. Click image for link to the meetup page.

The community is growing accordingly too, not just a simple push by Docker/dotCloud itself, but actively by grass roots efforts. One is even sprung up in Portland in the Portland Docker Meetup.

So Docker, Getting Operational

The Loading Bay
The Loading Bay

One of the best ways to describe docker (which the Docker team often uses, hat tip to the analogy!) and containers in general is to use a physical parallel. One of the best stories that is a great example is that of the shipping and freight industry. Before containers ships, trains,

Manually Guiding Freight, To Hand Unload Later.
Manually Guiding Freight, To Hand Unload Later.

trucks and buggies (ya know, that horses pulled) all were loaded by hand. There wasn’t any standardization around movement of goods except for a few, often frustrating tools like wooden barrels for liquids, bags for grains and other assorted things. They didn’t mix well and often were stored in a way that caused regular damage to good. This era is a good parallel to hosting applications on full hypervisor virtual machines or physical machines with one operating system. The operating system kind of being the holding bay or ship, with all the freight crammed inside haphazardly.

Shipping Yards, All of a Sudden Organized!
Shipping Yards, All of a Sudden Organized!

When containers were introduced like the shiny blue one shown here, everything began a revolutionary change. The manpower dramatically

A Flawlessly Rendered Container
A Flawlessly Rendered Container

dropped, injuries dropped, shipping became more modular and easy to fit the containers together. To put it simply, shipping was revolutionized through this invention. In the meantime we’ve all benefitted in some way from this change. This can be paralleled to the change in container technology shifting the way we deploy and host applications.

Next post, coming up in just a few hours “Docker, Containers Simplified!”

Sorry Database Nerds, Nobody Actually Gives a Shit…

So I’ve been in more than a few conversations about data structures, various academic conversations and other notions about where and how data should be stored. I’ve been on projects and managed projects that involve teams of people determining how to manage data so that other people can just not manage data. They want to focus on business use and not the data mechanisms underneath. The root of everything around databases really boils down to a single thing – how can we store X and retrieve X – nobody actually trying to get business done or change the world is going to dig into the data storage mechanisms if they don’t have to. To summarize,

nobody actually gives a shit…

At least nobody does until the database breaks, or somebody has to be hired to manage or tune queries or something or some other problem comes up. In the ideal world we could just put data into the ether and have it come back when we ask for it. Unfortunately we have to keep caring for where the data is, how it’s stored, the schema (even in schema-less, you still need to know the schema of the data at some point, it’s just another abstraction to push off dealing with the database), how to backup, recover, data gravity, proximity and a host of other concerns. Wouldn’t it be cool if we could just work on our app or business? Wouldn’t it be nice to just, well, focus on things we actually give a shit about?

Managed Data Systems!

The whole *aaS and PaaS World has been pushing to simplify operations to the point that the primary, if not the only concern, is the business itself. This is a pretty big step in many ways, but holds a lot of hope and promise around fixing the data gravity, proximity, management and related concerns. One provider of services that has an interesting start around the NoSQL realm is Orchestrate.io. I’ll have more about them in the future, as I’ll actually be working on hacking on some code against their platform. They’re currently solving a number of the mentioned issues. Which is great, a solid starting point that takes us past the draconian nature of the old approach to NoSQL and Relational Databases in general.

There has been some others, such as Mongo Labs or such, that have created a sort of DBaaS. This however doesn’t fill the gap that Orchestrate.io is filling. So far almost every *aaS database or other solution has merely been a single type of database that a developer can just throw data at in a single kind of way. Not really flexible, and really only abstracting some manual work, but not providing much additional value add around using the actual data. Orchestrate.io is bridging these together with search, replication and other features to provide a platform on which multiple options are available via the API. Key value, geo, time series and others are all coming together for them nicely. Having all the options actually creates a real value add, versus just provide one single way to do one thing.

Intelligent Data Systems?

After checking out and interviewing Orchestrate.io recently I’ve stumbled into a few other ideas. It would be perfect for them to implement or for the open source community to take a stab at. What would happen if the systems storing the data knew where to put things? What would be the case for providing an intelligent indexing policy or architecture at the schema design decision layer, the area where a person usually must intervene? Could it be done?

A decision tier that scans and makes decisions on the data to revamp the way it is stored against a key value, geo, time series or other method. Could it be done in real time? Would it have to go through some type of processing system? The options around implementing something like this are numerous, but this just leaves a lot of space for providing value add around the data to reduce the complexity of this decision making.

Imagine you have key value data, that needs to be associative based on graph principles, that you must store in a highly available system with pertinent real-time data provided based on those graph relations. A decision layer, to create an intelligent data system, could monitor the data and determine the frequent query paths against the data. If the data is growing old it could move data from real-time to archival via the key value. Other decisions could be made to push up data segments into a cache tier or some other mechanism to provide realtime graph connections to client queries. These are all decisions that would need to be made by somebody working on the data, but could be put into a set of rules to allow for re-allocation of the data via automated mechanisms into better storage options. Why keep old data that isn’t queried in the active in memory graph store, push it to the distributed key store. Why keep the graph data on drive when it can be in memory with correlated keys in a key value in memory store, backed by an on drive key value? All valid decisions, all becoming better understood day by day. It’s about time some of this decision process started to be automated.

What are your thoughts? Pro-intelligent data systems or anti-intelligent data systems? Think it’ll work or is it the wrong approach? Maybe the system should approach some other zenith or axiom point to become truly abstracted and transparent?

Architectural PaaS Cracks or Crack PaaS

Over the last couple years there have been two prominent open source PaaS Solutions come onto the market. Cloud Foundry & OpenShift. There’s been a lot of talk about these plays and the talk has slowly but steadily turned into traction. Large enterprises are picking these up and giving their developers and operations staff a real chance to make changes. Sometimes disruptive in a very good way.

However, with all the grandeur I’m going to hit on the negatives. These are the missing parts, the serious pain points beyond just some little deployment nuisance. Then a last note on why, even amidst the pain points, you still need to make real movement with PaaS tooling and technologies.

Negative: The Data Story is Lacking

Both Cloud Foundry and OpenShift have a way to plug into databases easily.

Cloud Foundry provides ways to build a Cloud Foundry Service that becomes the bound and hooked in SQL Server, MySQL, Postgresql, Redis or whatever data storage service you need. For more details on building a service, check out the echo example on the vcap sample github project.

OpenShift has what are called Cartridges which provide the ability to add databases and other services into the system. For more information about the cartridges check out Red Hat’s OpenShift Documentation and also the forums.

Cloud Foundry and OpenShift however have distinctive weak spots when it comes to services that go beyond a mere single instance database. In the case of a true distributed database such as Cassandra, HBase or Riak, it is inordinately difficult to integrate a system that any PaaS inter-operates with well. In some cases it’s irrelevant to even try.

The key problem being that both of the PaaS systems assume the mantle of master while subjugating the distributed database a lower tier of coordination. The way to resolve this at the moment is to do an autonomous installation of Riak, Cassandra, Neo4j or other database that may be distributed, stored hot swappable, or otherwise spread across multiple machine or instance points. Then create a bound connection between it and the PaaS Application that is hosted. This is the big negative in PaaS systems and tooling right now, the data story just doesn’t expand well to the latest in data and database technologies. I’ll elaborate more about this below.

Negative: Deployment is Sometimes Easy, Maintenance is Sometimes Hard

Cloud Foundry is extremely rough to deploy, unless you use Bosh to deploy to either VMware Virtualized instances or AWS. Now, you could if resources were available get Bosh to deploy your Cloud Foundry environment anywhere you wanted. However, that’s not easy to do. Bosh is still a bit of a black box. I myself along with others in the community are working to document Bosh, but it is slow going.

OpenShift is dramatically easier to deploy, but is missing a few key pieces once deployed that draw some additional operational overhead. One of those is that OpenShift requires more networking management to handle routing between various parts of the PaaS Ecosystem.

Overall, this boils down to what you need between the two PaaS tool chains. If you want Cloud Foundry’s automatic routing and management between nodes. This is a viable route, but if your team wants to manage the networking tier more autonomous from the PaaS environment then maybe OpenShift is the way to go. In the end, it’s negative bumpy territory to determine which you may or may not want based on that.

Negative: Full Spectrum Polyglot, Missing Some

Cloud Foundry has a wider selection of languages and frameworks with community involvement around those with groups like Iron Foundry. OpenShift I’m sure will be getting to parity in the coming months. I have no doubt between both of these PaaS Ecosystems that they’ll expand to new languages and frameworks over time. Being polyglot after all is a no brainer these days!

Why PaaS Is, IMHO, Still Vitally Important

First toss out the idea that huge, web scale, Facebooks and Googles need to be built. Think about what the majority of developers out there in the world work on. Tons and tons and tons of legacy or greenfield enterprise applications. Sometimes the developer is lucky enough to work on a full vertical mix of things for a small business, but generally, the standard developer in the world is working on an enterprise app.

PaaS tooling takes the vast majority of that enterprise app maintenance from an operational side and tosses it out. Instead of managing a bunch of servers with a bunch of different apps the operations team manages an ecosystem that has a bunch of apps. This, for the enterprises that have enough foresight and have managed their IT assets well enough to be able to implement and use PaaS tooling, is HUGE!

For companies working to stay relevant in the enterprise, for companies looking to make inroads into the enterprise and especially for enterprises that are looking to maintain, grow or struggling to keep ahead of the curve – PaaS tooling is something that is a must have.

Just ask a dev, do they want to spend a few hours configuring and testing a server?  Do they want to deploy their application and focus on building more value into that application?

…being I’ve spent a few years being the developer, I’ll hedge on the side of adding value.

What’s Next?

So what’s next? Two major things in my opinion.

1. Fill the data gap. Most of the PaaS tooling needs to bridge the gap with the data story. I’m working my part with testing, development and efforts to get real options built into these environments, but this often leads back to the data story of PaaS being weak. What’s the solution here? I’m in talks, ongoing, planning sessions ongoing, and we’ll eventually get a solid solution around the data side.

2. Fix deployments & deployment management. Bosh isn’t straight forward or obvious in what it does, Cloud Foundry is easily the hardest thing to deploy with many dependencies. OpenShift is easier to deploy and neither of them actually have a solid management story over time. Bosh does some impressive updates of Cloud Foundry, and OpenShift has some upgrade methods, but still over time and during day to day operations there hasn’t been any clear cut wins with viewing, monitoring and managing nodes and data within these environments.

Deploycon, PaaS & the pending data tier gravity fallout…

For a quick recap of last years Deploycon & related talks, check out my “Day #3 => DeployCon && Enterprise && Data Gravity” entry from last year.

PaaS Systems aren’t always effectively distributed. Heroku has fallen over every time east-1 has gone down at AWS. Not that I’m saying they’ve done bad, just pointing that out. With Cloud Foundry, there’s several key SPOFs (Single Points of Failure), and with all PaaS Systems the data tier is often the neglected pairing of the system. I’ve been wanting to write about this for a few months now and Deploycon has lit a fire for me to do just that.

Deploycon – “Platform Services and Developer Expectations” **

I’m on a panel at Deploycon titled “Platform Services and Developer Expectations” and this leads right back around to that. This SPOF issue is concerning to me as PaaS Providers talk up the offerings more and more with little light actually shone on this issue. In some ways each is moving away form their respective SPOFs, but overall they’re all pretty prevalent throughout. For security, each has a non-distributed database, which technically needs backed up still – no clear replication or other mechanisms setup to ensure data integrity in a failure situation. Of course, the huge saving grace with a PaaS, is that if the overall system goes down or a SPOF blows up, all the existing deployed applications will generally continue to run. Unless of course the routing and networking are also SPOF. This is the largest glaring concern with PaaS Systems that I see today.

One of the other things about PaaS that has always led to a ton of questions is “what about my PostGresql/mysql/Riak/mongodb/database thing and how do I do X, Y, Z with it to ensure scalability in my PaaS.” In almost every case it ends with a simple and unfortunate answer, “…when it comes to data, a PaaS doesn’t really do a damn thing for ya…” This is obviously not very helpful. The entire reason to put a PaaS into place is to simplify life, the sad fact that it barely does a thing for the data tier isn’t very helpful.

Now, hold on a second before you start screaming at me about “but a PaaS does X, Y and Z and isn’t even supposed to touch that aspect of things…” let me elaborate a bit more. The panel at Deploycon states “…Developer Expectations” and when things are getting simplified in the way a PaaS does, developers assume that if it does all this fancy magic for an application it ought to simplify the data side of things too! Right? Well no, and it isn’t going to for the foreseeable future. But no matter what, it doesn’t change the fact that developers often have that expectation.

Now, I could write at length about all the reasons that PaaS doesn’t really do anything for the data tier. I could wax poetic about how a distributed database (re: Riak, Cassandra, etc) just doesn’t lend itself to a cookie cutter approach to deployment under a PaaS or an RDBMS has umpteen different configurations for stability, scaling, hot swappable services, and other such complexities around the data tier. But instead I’m going to skip all, maybe cover some of those things another day, and jump right into some of the things that are actually moving forward to fill this gap.

BOSH, Cloud Foundry, OpenShift & fixing the data tier…

The most obvious reason there isn’t a simple turn key solution to the data side of things with a PaaS ecosystem is that data is complex and extremely diverse. There’s distributed key/value stores (Riak, Cassandra), there’s sort of kind of distributed databases (Mongo), graph databases (Neo4j), the age old RDBMS (DB2, SQL Server, Oracle’s Stuff, etc) and the million solutions around that, there’s key/value in memory styled databases that are insanely fast, like Redis. Expanding just slightly you have software that works around these systems such as Hadoop & Riak CS & the list goes on. All of it focused on the data tier and maintaining one, two or some form of the three points around CAP Theorem (http://en.wikipedia.org/wiki/CAP_theorem), atomicity and other key capbilities.

All of the PaaS Systems, including public and private often have some sort of plug-in style architectures for data. Whether it is Apprenda which is closed to community and closed source or an ongoing open to community PaaS like OpenShift or Cloud Foundry, things still fall almost entirely to the developers or database team to build an architecture around the data. When looking at solutions to simplify data in PaaS Systems the closed source solutions we have no idea what they’re up to in this regard. The one’s that are open source or in large part public and involved in the community PaaSes, like EngineYard, Heroku, Cloudbees and others we can really see the directions and efforts around creating real PaaS style solutions to the data tier problem.

BOSH, Vagrant, etc…  One of the best solutions I’ve seen so far is the ability of Bosh, which was created by the Cloud Foundry team while at VMware, to spool up an environment that includes such things as a Riak Cluster (or other cluster). Currently Brian McClain & Dr Nic have worked to put together such Bosh + Vagrant scripts & get things rolling. I myself will be spending some considerable time on just that. But beyond that this is a good start in enabling data tier back ends.

How to close the gap, between absurdly simple application deployment and still arduous and difficult data tier deployment? For the next several years I think we’ll have cumbersome deployment practices around the data tier. There won’t be anything as elegantly simple as Cloud Foundry’s single line deployment or AppFog’s one click deployment of a web application. The best we can do at this time, is to streamline around pieces and architectures, and at least get them into a kind of simple 3 step deployment.

Please drop a comment or two on how you think we might simplify the data side of the PaaS toolchain. Also drop a few tweets in the twitterverse too, I’m sure that’ll be exploding as usual. I’m @adron, ping me.

Cheers, happy data architecting.

** the Deployconpanel will be at 4:30pm in Santa Clara on April 2nd. Come check it out.