Deploycon, PaaS & the pending data tier gravity fallout…

For a quick recap of last years Deploycon & related talks, check out my “Day #3 => DeployCon && Enterprise && Data Gravity” entry from last year.

PaaS Systems aren’t always effectively distributed. Heroku has fallen over every time east-1 has gone down at AWS. Not that I’m saying they’ve done bad, just pointing that out. With Cloud Foundry, there’s several key SPOFs (Single Points of Failure), and with all PaaS Systems the data tier is often the neglected pairing of the system. I’ve been wanting to write about this for a few months now and Deploycon has lit a fire for me to do just that.

Deploycon – “Platform Services and Developer Expectations” **

I’m on a panel at Deploycon titled “Platform Services and Developer Expectations” and this leads right back around to that. This SPOF issue is concerning to me as PaaS Providers talk up the offerings more and more with little light actually shone on this issue. In some ways each is moving away form their respective SPOFs, but overall they’re all pretty prevalent throughout. For security, each has a non-distributed database, which technically needs backed up still – no clear replication or other mechanisms setup to ensure data integrity in a failure situation. Of course, the huge saving grace with a PaaS, is that if the overall system goes down or a SPOF blows up, all the existing deployed applications will generally continue to run. Unless of course the routing and networking are also SPOF. This is the largest glaring concern with PaaS Systems that I see today.

One of the other things about PaaS that has always led to a ton of questions is “what about my PostGresql/mysql/Riak/mongodb/database thing and how do I do X, Y, Z with it to ensure scalability in my PaaS.” In almost every case it ends with a simple and unfortunate answer, “…when it comes to data, a PaaS doesn’t really do a damn thing for ya…” This is obviously not very helpful. The entire reason to put a PaaS into place is to simplify life, the sad fact that it barely does a thing for the data tier isn’t very helpful.

Now, hold on a second before you start screaming at me about “but a PaaS does X, Y and Z and isn’t even supposed to touch that aspect of things…” let me elaborate a bit more. The panel at Deploycon states “…Developer Expectations” and when things are getting simplified in the way a PaaS does, developers assume that if it does all this fancy magic for an application it ought to simplify the data side of things too! Right? Well no, and it isn’t going to for the foreseeable future. But no matter what, it doesn’t change the fact that developers often have that expectation.

Now, I could write at length about all the reasons that PaaS doesn’t really do anything for the data tier. I could wax poetic about how a distributed database (re: Riak, Cassandra, etc) just doesn’t lend itself to a cookie cutter approach to deployment under a PaaS or an RDBMS has umpteen different configurations for stability, scaling, hot swappable services, and other such complexities around the data tier. But instead I’m going to skip all, maybe cover some of those things another day, and jump right into some of the things that are actually moving forward to fill this gap.

BOSH, Cloud Foundry, OpenShift & fixing the data tier…

The most obvious reason there isn’t a simple turn key solution to the data side of things with a PaaS ecosystem is that data is complex and extremely diverse. There’s distributed key/value stores (Riak, Cassandra), there’s sort of kind of distributed databases (Mongo), graph databases (Neo4j), the age old RDBMS (DB2, SQL Server, Oracle’s Stuff, etc) and the million solutions around that, there’s key/value in memory styled databases that are insanely fast, like Redis. Expanding just slightly you have software that works around these systems such as Hadoop & Riak CS & the list goes on. All of it focused on the data tier and maintaining one, two or some form of the three points around CAP Theorem (http://en.wikipedia.org/wiki/CAP_theorem), atomicity and other key capbilities.

All of the PaaS Systems, including public and private often have some sort of plug-in style architectures for data. Whether it is Apprenda which is closed to community and closed source or an ongoing open to community PaaS like OpenShift or Cloud Foundry, things still fall almost entirely to the developers or database team to build an architecture around the data. When looking at solutions to simplify data in PaaS Systems the closed source solutions we have no idea what they’re up to in this regard. The one’s that are open source or in large part public and involved in the community PaaSes, like EngineYard, Heroku, Cloudbees and others we can really see the directions and efforts around creating real PaaS style solutions to the data tier problem.

BOSH, Vagrant, etc…  One of the best solutions I’ve seen so far is the ability of Bosh, which was created by the Cloud Foundry team while at VMware, to spool up an environment that includes such things as a Riak Cluster (or other cluster). Currently Brian McClain & Dr Nic have worked to put together such Bosh + Vagrant scripts & get things rolling. I myself will be spending some considerable time on just that. But beyond that this is a good start in enabling data tier back ends.

How to close the gap, between absurdly simple application deployment and still arduous and difficult data tier deployment? For the next several years I think we’ll have cumbersome deployment practices around the data tier. There won’t be anything as elegantly simple as Cloud Foundry’s single line deployment or AppFog’s one click deployment of a web application. The best we can do at this time, is to streamline around pieces and architectures, and at least get them into a kind of simple 3 step deployment.

Please drop a comment or two on how you think we might simplify the data side of the PaaS toolchain. Also drop a few tweets in the twitterverse too, I’m sure that’ll be exploding as usual. I’m @adron, ping me.

Cheers, happy data architecting.

** the Deployconpanel will be at 4:30pm in Santa Clara on April 2nd. Come check it out.

Who is Node PDX for? Who should attend? What’s going on? Where are we at?

Node PDX is a conference that is for programmers (ticket link below!!) both new to the industry and stalwarts, ladies and gentlemen, and simply the curious or already seasoned pros of JavaScript. Our goal is to have great speakers, great topics, encourage and support intelligent and forward thinking conversation. All of this and we’ll strive to move the industry forward as a whole in whatever way we can. Node PDX is for those that want to help do just that!

What is Node PDX? What is Portland?

Node PDX is a conference held yearly in Portland, Oregon. It’s driven by the passionate tech industry located in Portland and around the country and world. At the conference we’ll dive into technologies surrounding JavaScript, Node.js, NPM, hardware and beyond. Other topics may include patterns, design practices, new libraries, and technology add ons.

Portland is a coder’s paradise! Nuff’ said. 🙂

Where is Node PDX?

PDX is the airport code for Portland, Oregon. It’s not at the airport, but if you fly in to our fair city for Node PDX, take the light rail Red Line MAX to downtown and enjoy our coffee, beer and some of the finest food and the most amazing cart collections you’ve ever seen in your life. Promise!

If you’ve never heard of Portland, Oregon here’s a documentary starter…

When is Node PDX?

First off, it’s going to be every year. This year, the grand year of two thousand and thirteen, is going to be on the 16th and 17th of May. That’s a Thursday and Friday on the 3rd week of the fifth month of 2013. Come for Thursday or Friday or Thursday and Friday and enjoy the weekend afterwards. However you pick your days, we hope you come and join us for Node PDX 2013 on May 16-17, 2013.

Why Node PDX?

This is easy. Node PDX is a conference that Troy Howard and I put together out of a frustration from two things: we wanted a conference without the high prices and had a desire to contribute to the local JavaScript and Node.js community, and any other interested coder community. Thus, Node PDX was born in a few quick weeks of hectic hustle. This year we’re back at it with Luc Perkins hacking the program along with us.

Node PDX, How is This Happening?

Mostly magic, with the help of some druids and unicorns. But for some of the heavy lifting we’re looking for sponsors, but we’re throwing an interesting little twists into the fray.

Last year we worked hard to be inclusive and bring together great talks. We did this in a various ways. The first was, we used a Github repository for the site and pull requests as the method to submit talks. We figured, what better way to submit a talk to a developer’s conference than to throw down a commit and pull request to Github. Needless to say, it worked great!

In 2012 to insure we had proper party groove and vittles for the attendees we had volunteers throw down some chef skills and some great sponsors including New Relic, Mozilla and others. This year we’re aiming for a slightly different angle and putting a little more crowd sourcing into the ingredient list.

So here’s how you can get involved!

  • First and foremost: Come to Node PDX – Get your tickets here!!  Oh yeah, there’s a 200 limit, and only so many early birds so do NOT wait!!  🙂
  • Second, throw a talk into the mix! We’ve got two tracks lined up so we’re looking for a lot of material, beginner to advanced!

That’s it! To keep up with the latest, subscribe to my blog here and/or read up on the Node PDX Site! We’re looking forward to seeing you in Portland (even if we see you in Portland every day!)  Cheers!

Distributed Coding Prefunc: Chicago Boss, Rails Based Erlang Power

Troy Howard and I sat down on a Friday night to do some straight up thrashing of Erlang and Chicago Boss. It seemed like a framework that has a lot of promise. Thus, we wanted to hack at it a bit and see what we could come up with.

First things first, I running OS-X and Troy hacking with the latest Ubuntu, we dove in. I first downloaded from the main Chicago Boss Site, which I immediately decided not to use and went straight for a fork and cloning of the repo.

[sourcecode language=”bash”]
git clone git@github.com:Adron/ChicagoBoss.git
cd ChicagoBoss
make
make app PROJECT=bigonion
cd ../bigonion
./init-dev.sh
[/sourcecode]

The ‘./init-dev.sh’ will launch the server to view whatever is available on the site. Now when I went to the localhost at port 8001, it shows some general mess which is kind of nondescript. But one can identify that the server is running if this works.

Hitting http://localhost:8001
Hitting http://localhost:8001
Boardwalk Explosion!
Boardwalk Explosion!

Oh dear, then the explosion…

At this point Troy started running into a number of problems on Ubuntu. Ranch wasn’t building based on what was included. He went through fighting and attempting to manually install this dependency but no go. He had originally downloaded RB14 of Erlang which should have worked. However it did not. It caused all sorts of madness.

I spooled up my VM of Ubuntu and went through the same steps above, even did the quick intro to setup a hello world, and it worked. We then checked out Erlang versions and I was running with R15B01. Troy went with installing that version and lo’ n’ behold, magic of the druids! Mayor Daley kicked this thing and made it run. So if you are running anything besides R15B01 on Ubuntu, we can’t attest to this stuff working. Just sayin’.

So Back to the Chicago Boss

So go back to the bigonion project above that we created and throw in a controller file. Be sure to use the exact naming, because this is seriously like rails, follow the convention or you’ll spend the rest of your life in the jail of convention-less hell!

Add a file at /src/controller/bigonion_greeting_controller.erl and add the following code.

[sourcecode language=”erlang”]
-module(bigonion_greeting_controller, [Req]).
-compile(export_all).

hello(‘GET’, []) ->
{output, "<strong>Daley will tell you how to run the Big Onion!</strong>"}.
[/sourcecode]

Now execute the code, or if you have the server running with ‘./init-dev.sh’ still then just navigate to ‘http://localhost:8001/greeting/hello&#8217; and you’ll get a hello world reality check about Chicago history. 😉

On to next things, let’s get a conformance test in there. Add a file called ‘src/test/functional/bigonion_test.erl’ with the following code snippet of a test in it.

[sourcecode language=”erlang”]
-module(bigonion_test).
-compile(export_all).

start() ->
boss_web_test:get_request("/greeting/hello", [],
[ fun boss_assert:http_ok/1,
fun(Res) -> boss_assert:tag_with_text("strong",
"Daley will tell you how to run the Big Onion!", Res) end ], []).
[/sourcecode]

Now build the project to production.

[sourcecode language=”bash”]
./rebar compile
./rebar boss c=test_functional
[/sourcecode]

…which left me with

[sourcecode language=”bash”]
$ ./rebar boss c=test_functional
==> bigonion (boss)
FATAL: Config file "boss.test.config" not found.
[/sourcecode]

Joy, another completely random error. But hey, that’s software development right. So that’s coming later… for now, I’m out. Keep hacking the Erlang bits. If you know what happened with the last command to run the test above, please leave a comment and help out! 🙂

Getting Github : JavaScript Libraries Spilled EVERYWHERE! Series #003

This is an ongoing effort putting together some JavaScript app code on client and on server that started with blog entry series #001 and #002.

This how-to is going to kind of go all over the place. My goal is to get github data. The question however is, how and with what. I knew there were some available libraries, so writing straight and pulling straight off of the API myself seemed like it would be unnecessary work.

The github API documentation is located at http://developer.github.com/v3/ with the list of client libraries for ease of access listed at http://developer.github.com/v3/libraries/. The first two that I forked and cloned were the gh3 and npm installed the octonode and node-github libraries.

Node.js Based Github Libraries

The two node based projects install via npm, as things go with node and were super easy. The first one I gave a test drive to is the https://github.com/ajaxorg/node-github project. I forked it and dove right in.

[sourcecode language=”bash”]
$ npm install github
npm http GET https://registry.npmjs.org/github
npm http 200 https://registry.npmjs.org/github
npm http GET https://registry.npmjs.org/github/-/github-0.1.8.tgz
npm http 200 https://registry.npmjs.org/github/-/github-0.1.8.tgz
$
[/sourcecode]

After that quick install I took a stab at the test code they have in the README.md.

[sourcecode language=”javascript”]
var GitHubApi = require("github");

var github = new GitHubApi({
// required
version: "3.0.0",
// optional
timeout: 5000
});
github.user.getFollowingFromUser({
user: "adron"
}, function(err, res) {
console.log(JSON.stringify(res));
});
[/sourcecode]

This worked all well and good, so I moved on to some other examples. The following example however needed authentication. To authenticate you’ll need to add the little snippet below with the username and password. However there’s also a Oauth token method you can use too, which I’ve not documented below. To check out other auth methods check out the documentation.

[sourcecode language=”javascript”]
var GitHubApi = require("github");

var github = new GitHubApi({
version: "3.0.0", timeout: 5000,
});

github.authenticate({
type: "basic",
username: "adron",
password: "yoTurkiesGetYourOwn"
});

github.orgs.get({
org: "Basho"
}, function(err, res){
console.log(res);
});
[/sourcecode]

The result is prefect for putting together a good display page or something of the organizations.

[sourcecode language=”bash”]
$ node adron_test.js
{ login: ‘basho’,
id: 176293,
url: ‘https://api.github.com/orgs/basho&#8217;,
repos_url: ‘https://api.github.com/orgs/basho/repos&#8217;,
events_url: ‘https://api.github.com/orgs/basho/events&#8217;,
members_url: ‘https://api.github.com/orgs/basho/members{/member}’,
public_members_url: ‘https://api.github.com/orgs/basho/public_members{/member}’,
avatar_url: ‘https://secure.gravatar.com/avatar/ce5141b78d2fe237e8bfba49d6aff405?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-org-420.png&#8217;,
name: ‘Basho Technologies’,
company: Basho,
blog: ‘http://basho.com/blog/&#8217;,
location: ‘Cambridge, MA’,
email: null,
public_repos: 105,
public_gists: 0,
followers: 0,
following: 0,
html_url: ‘https://github.com/basho&#8217;,
created_at: ‘2010-01-04T19:05:19Z’,
updated_at: ‘2013-03-17T20:29:09Z’,
type: ‘Organization’,
total_private_repos: YYY,
owned_private_repos: XXX,
private_gists: 0,
disk_usage: 788016,
collaborators: 0,
billing_email: ‘not_a_valid_address@basho.com’,
plan: { name: ‘platinum’, space: 62914560, private_repos: billions },
meta: { ‘x-ratelimit-limit’: ‘5000’, ‘x-ratelimit-remaining’: ‘azillion’ }
}
[/sourcecode]

Now at this point there’s a few significant problems. Setting up tests of the integration variety for this library gets real tricky because you need to authenticate, or at least I do for the data that I want. This doesn’t bode well for sending any integration tests or otherwise to Travis-CI or otherwise. So even though this library works, and would be processed on the server-side and not on the client side, having it as a non-tested part of the code base bothers me a bit. What’s a good way to setup tests to verify that things are working? I’ll get that figured out shortly and it’ll have to be another blog entry, maybe. For now though, let’s jump into the client side library and see how it functions.

Client Side JavaScript Github

For the client side I started testing around with the gh3 library. It has two dependencies, jQuery and Underscore.js. jQuery is likely always going to be in your projects. Underscore.js is also pretty common, but sometimes you’ll find you’ll need to go download the library. Upon download and getting the additional libraries I needed installed, I gave the default sample a shot.

[sourcecode language=”html”]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>gh3 Sample</title>
</head>
<body>
<ul id="user"></ul>
</body>
<script src="js/jquery-1.7.1.min.js"></script>
<script src="js/underscore-min.js"></script>
<script src="js/gh3.js"></script>
<script>
var adron = new Gh3.User("adron")
, userInfos = $("#user");

adron.fetch(function (err, resUser){
if(err) {
throw "outch …"
}
console.log(adron, resUser);
_.each(_.keys(resUser), function (prop) {
userInfos.append(
$(‘<li>’).append(prop+" : "+resUser[prop])
);
});
});
</script>
</html>
[/sourcecode]

This worked pretty seamlessly. Also it got me thinking, “what do I really want to do with the github library?” If it’s a server side service, obviously I’d want to use the Node.js libraries probably. However if it is client side data I want, is it even ideal that the server side actually pull the data anyway? The other issues around cross site scripting and related matters come into play too if it is a client side script, but this might be, even in spite of that, just what I needed. For now, that left me with some solid things to think about. But I was done for now… so until next entry, cheers!

Distributed Coding Prefunc: Rebar Multinode Riak Core

Before diving into this entry, you might want to check out some of my other getting Erlang installed with appropriate testing frameworks entries. Moving on…

At Basho we’re are always trying to make it easier to do big things. A short time ago we pushed forward on Rebar, Riak Core and getting things put together to make it simpler to get kick started working on distributed systems like the Riak Database & distributed system itself. There’s way more that is possible, which I’ll get into in just a minute. Before diving into some of those things, here’s a few quick links & context of what exactly Rebar and Riak Core are.

Riak Core
Github: https://github.com/basho/riak_core

Riak Core has been available for quite some time. We’ve also been hustling for a while getting together a robust array of material around Riak Core. One excellent place to get started on learning about Riak Core is the “Introducing Riak Core” blog article published on the Basho blog a while back. To describe Riak Core, or riak_core, it is the underpinnings of what Riak is built on. It provides many features to get you started building distributed systems. A few of the key features are being able to track and manage the nodes, clusters and related pieces of the distributed architecture within a system.

Rebar
Github: https://github.com/basho/rebar
Wiki: https://github.com/basho/rebar/wiki

Rebar is an Erlang build tool that helps you in putting together projects based on Riak Core.

Rebar Riak Core
Github: https://github.com/basho/rebar_riak_core

The Rebar Riak Core project repository template helps you start writing things like the Riak Database itself. It’s based on setting up Riak Core via template scripting an N Node Cluster devrel, vnodes, etc. Once you’re up and running it can be used to help develop distributed, scalable and fault tolerant applications.

For more on the Rebar Riak Core check out the README.md in the github repository. There are some great examples of how to get a multinode devrel running in a few steps.

Rebar Riak Core Quick Start

The quickest way to get started using the Riak Core and Rebar scripts is to get the prebuilt binaries or you can just clone and install the Rebar scripts if you’d like all the things. To get the binaries and executables you can download and have them ready by wget (or use your preferred method to download).

[sourcecode language=”bash”]
wget http://cloud.github.com/downloads/basho/rebar/rebar && chmod u+x rebar
[/sourcecode]

To get the cloned repository and ready for use.

[sourcecode language=”bash”]
$ git clone git://github.com/rebar/rebar.git
$ cd rebar
$ ./bootstrap
[/sourcecode]

Now the easiest way possible is to use the Riak Core Templates with a quick git clone. After cloning the repo, copy them to the rebar templates directory (note that you’ll need to create this initially) and then create a working directory to put the project in and navigate into that directory.

[sourcecode language=”bash”]
git clone git://github.com/rzezeski/rebar_riak_core.git
mkdir -p ~/.rebar/templates
cp rebar_riak_core/* ~/.rebar/templates
mkdir projectNameHere
cd projectNameHere
[/sourcecode]

Now that a template is available, run the following command to create the Erlang Project.

[sourcecode language=”bash”]
rebar create template=riak_core_multinode appid=rabbits nodeid=rabbits
[/sourcecode]

You’re now ready to go to work using Rebar and the template you’ve created. I followed the try-try-try example repo in the example above to get started, check it out for a great walk through that dives in deeper to Riak Core, each small element of the project and files created, and a multi-node project as the sample.

So what to do now?

This is where it is time to throw around some creativity to get real solutions to real problems. Building distributed systems is becoming more and more paramount to effective usage of infrastructure and systems. Using Riak Core to get started building out your distributed system is an ideal place to start. These are a few ideas that the team was brainstorming on. Over the coming weeks we’ll be putting together material to outline ways to not only get started, but to implement systems like this.

Distributed Web Caching Tier

Caching tiers often come up in conversations, whether related to distributed systems or not, and often end up on the distributed topic. The question resounds, “how do I create a caching tier that can be distributed and provide real session, state management, cached elements, live data and other needs?” Well Riak Core is a great place to start developing a custom distributed caching tier that could even extend to use Riak KV (the Riak Database implemented on Riak Core), Redis, Rabbit MQ or many other solutions by pulling them together to provide appropriate cache at the appropriate tiers of an application architecture.

In House Cluster Monitoring & Smart Resolver

One of the things that the Riak Core would be used to great effect for is a multi-node, clustered and geographically dispersed monitoring system for any multi-data center application. This could be built out and used for almost any actual environment, with custom specifics or a completely generic situation of pizza box servers. Because fo the distributed concepts behind Riak Core it would provide an ideal basis for monitoring – and re-launching or otherwise dealing with systems that need high uptime and recovered as fast as possible if they go down.

Logging, Web, Server, and Business Analytics

In any situation where analytics are collected there are often dozens if not thousands of servers, various systems and even numerous devices that may be emitting data via services or other mediums. Riak Core is a great place to lay the groundwork for a distributed system that could maintain a massive store of managed data for fast searching of analytics. This could be the groundwork for biotech research analytics, analysis of market data or a dozen other things that need highly available systems storing vast data with map reduction or other search capabilities. Think Business Intelligence (BI) with serious technological power.

Multi-node Project

Of course, as the example I used to create the first sample above, dive into the try-try-try tutorial for some great multi-node how to. If you have any questions please jump in ping me on twitter @adron or ping @basho, join the mailing list, the IRC #riak channel on freenode.