OSCON: What it is.

This is the first of a few articles I’m going to write over the next couple of weeks related to the O’Reilly Open Source Conference, or what is more colloquially called OSCON. Before the conference event topic let us sync up on understanding exactly what this conference has been, what it was intended to be, and what it is today and its roots in open source.

OSCON was inaugurated in 1999 with its first conference held in Portland, Oregon. The location, that generally, has been the accepted home of OSCON. There have been other OSCON events in other locations but the sentiment remains – OSCON is a Portland conference and it’s a bit rough going in other cities hosting the conference.

OSCON started as a conference centered around the open source community since day one. It’s consistently held that course even when open source was regularly lamented, insulted, and cursed by the software industry. At one point Microsoft, the biggest of big software companies in the early days of OSCON relentlessly attacked open source. Steve Ballmer stated, “Linux is a Cancer” back in 2001.

Jim Allchin attacked open source as “the worst”.

Even the founder Bill Gates even went on record saying open source would make it so, “nobody can ever improve the software”.

Microsoft execs weren’t the only ones, just some of the richest, prominent, and loudest about berating the licensing model. Many corporations and others attacked it as communist and in other ways. But OSCON continued onward every year with solid turnout in Portland. The community continued to grow. But considering where we are now, that might seem a bit obvious. But way back then it wasn’t so obvious that open source licenses and related open models would become the way a vast percentage of software would be developed, as it is today.

But here we are!

OSCON started around those earlier days when open source was more often maligned than celebrated. At least in the business world and in the places the vast majority of us were, or would have been employed. When it started the conference aimed high and achieved a lot of victories in bringing together key people within the industry to grow open source development from multiple angles. As time went on OSCON expanded, as did its host library of open source books, on all the tools, options, and available solutions that were being created via open source licensing and the plethora of development paradigms.

Fast forward to today and OSCON is still that stalwart conference that brings people together, from those early days, to people that have just joined the open source communities today. This cohesive gathering of minds has a very low barrier for entry with its hallway pass, all the way to standard – more expensive fare – that covers the whole conference, specific and special gatherings, presentations, demos, and related activities.

Stay tuned, subscribe to the blog, and my next post I’ll take you on a whirlwind tour of more OSCON events, The New Stack‘s birthday at the conference, and more.

Cassie Schema Migrator >> CaSMa

A few weeks back I started working on a schema migration tool for Apache Cassandra and DataStax Enterprise. Just for context, here are the short definitions of what each of the elements of CaSMa are.

  • cstar-iconApache Cassandra
    • Definition: Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
    • History: Avinash Lakshman, one of the authors of Amazon’s Dynamo, and Prashant Malik initially developed Cassandra at Facebook to power the Facebook inbox search feature. Facebook released Cassandra as an open-source project on Google code in July 2008. In March 2009 it became an Apache Incubator project. On February 17, 2010 it graduated to a top-level project. Facebook developers named their database after the Trojan mythological prophet Cassandra, with classical allusions to a curse on an oracle.
  • dse-logoDataStax Enterprise
    • Definition: DataStax Enterprise, or routinely just referred to as DSE, is an extended version of Apache Cassandra with multi-model capabilities around graph, search, analytics, and other features like security capabilities and a core data engine 2x speed improvement.
    • History: DataStax was formed in 2009 by Jonathan Ellis and Matt Pfeil and originally named Riptano. In 2011 Riptano changes names to DataStax. For more history check out the Wikipedia page or company page for a timeline of events.
  • command-toolsSchema Migration
    • Definition:In software engineering, schema migration (also database migration, database change management) refers to the management of incremental, reversible changes to relational database schemas. A schema migration is performed on a database whenever it is necessary to update or revert that database’s schema to some newer or older version. Migrations are generally performed programmatically by using a schema migration tool. When invoked with a specified desired schema version, the tool automates the successive application or reversal of an appropriate sequence of schema changes until it is brought to the desired state.
    • Addition reference and related materials:

iconmonstr-twitch-5Over the next dozen weeks or so as I work on this application via the DataStax Devs Twitch stream (next coding session events list) I’ll also be posting some blog posts in parallel about schema migration and my intent to expand on the notion of schema migration specifically for multi-model databases and larger scale NoSQL systems; namely Apache Cassandra and DataStax Enterprise. Here’s a shortlist for the next three episodes;

The other important pieces include the current code base on Github, the continuous integration build, and the tasks and issues.

Alright, now that all the collateral and context is listed, let’s get into at a high level what this is all about.

CaSMa’s Mission

Schema migration is a powerful tool to get a project on track and consistently deployed and development working against the core database(s). However, it’s largely entrenched in the relational database realm. This means it’s almost entirely focused on a schema with the notions of primary and foreign keys, the complexities around many to many relationships, indexes, and other errata that needs to be built consistently for a relational database. Many of those things need to be built for a distributed columnar store, key value, graph, time series, or a million other possibilities too. However, in our current data schema world, that tooling isn’t always readily available.

The mission of CaSMa is to first resolve this gap around schema migration, first and foremost for Apache Cassandra and prospectively in turn for DataStax Enterprise and then onward for other database systems. Then the mission will continue around multi-model systems that should, can, and ought to take advantage of schema migration for graph, and related schema modeling. At some point the mission will expand to include other schema, data, and state management focused around software development and data needs within that state

As progress continues I’ll publish additional posts here on the different data model concepts and nature behind various multi-model database options. These modeling options will put us in a position to work consistently, context based, and seamlessly with ongoing development efforts. In addition to all this, there will be the weekly Twitch sessions where I’ll get into coding and reviewing what coding I’ve done off camera too. Check those out on the DataStax Devs Channel.

If you’d like to get into the project and help out just ping me via Twitter @Adron or message me here.

Coding on Orchestrate.io & Orchestrate.js & Orchestrate.NET

First context, then I’ll dive in.

Orchestrate

http://orchestrate.io/

Orchestrate is a service that provides a simple API to access a multitude of database types all in one location. Key value, graph or events, some of the database types I’ve been using, are but a few they’ve already made available. There are many more on the way. Having these databases available via an API instead of needing to go through the arduous process of setting up and maintaining each database for each type of data structure is a massive time saver! On top of having a clean API and solid database platform and infrastructure Orchestrate has a number of client drivers that provide easy to use wrappers. These client drivers are available for a number of languages. Below I’ve written about two of these that I’ve been involved with in some way over the last couple of months.

Orchestrate.NET

https://github.com/RobertSmith/Orchestrate.NET

This library I’m currently using for a demonstration application built against the Deconstructed.io services (follow us on twitter ya! @BeDeconstructed), a startup I’m co-founding. I’m not sure exactly what the app will be, but being .NET it’ll be something enterprisey. Because: .NET is Enterprise! For more on this project check out the Deconstructed.io Blog.

Some of the latest updates with this library.

But there’s still a bit of work to do for the library, so consider this a call out for anybody that has a cycle they’d like to throw in on the project, let us know. We’d happily take a few more pull requests!  The main two things we’d like to have done real soon are…

Orchestrate.js

https://github.com/orchestrate-io/orchestrate.js

With the latest fixes, additions and updates the orchestrate.js client driver is getting more feature rich by the day. In addition @housejester has created an orchestrate-brain project for Hubot that uses Orchestrate.js. If you’re not familiar with Hubot, but sure to check out the company robot that can dramatically improve and reduce employee efficiency! Keep an eye on that project for more great things, or create a Hubot to keep a robotic eye on the project.

Here are a few key things to note that have been added to help in day-to-day coding on the project.

  • The travis.yml file has been added for the Travis Continuous Integration build. This build runs against node.js v0.10 and v0.8.
  • Testing is done with mocha, expect.js and nock. To get the tests up and running, clone the repo and then build with the make file. The tests will run in tdd format.
  • Promises are provided via the kew library.

If you’re opening up the project in WebStorm, it’s great to setup the mocha tests with the integrated mocha testing as shown below. After you’ve cloned the project and run ‘npm install’ then follow these steps to add the Mocha testing to the project. We’ve already setup exclusions in the .gitignore for the .idea directory and files that WebStorm uses.

First add a configuration by clicking on Edit Configurations.

Edit Configurations
Edit Configurations

Next click on the + to add a new configuration to run. Select the Mocha option from the list of configurations.

Mocha & Other Configurations in WebStorm
Mocha & Other Configurations in WebStorm

On the next screen set a name for the configuration. Set the test directory to the path for the test directory in the project. Then finally set the User interface option for Mocha to TDD instead of the default BDD.

Edit Configuration Dialog
Edit Configuration Dialog

Last but not least run the tests and you’ll see the list of green lights light up the display with positive results.

Test Build
Test Build

OSCON : Conversations, Deployments, Architecture, Docker and the Future?

I wrote about my first day of OSCON “OSCON : Day 1, Windows Just Doesn’t Do Cloud Foundry… but, there’s a fix for that…“. The rest of the week was most excellent. I caught up with friends and past coworkers. I heard about people working on some amazing new projects. Some things I will try to write up in the coming days, as I’m sure some of it will be making the tech news (if not the regular people news too).

Conversations

Had some great conversations about the direction of enterprise and paas uptake. It’s great to hear that there is some movement in that space finally. As one would expect however, there is still a lot of distance for the enterprise to catch up on, but they’ll get there – or fall apart in the meantime.

There were also tons of conversation about the Indiegogo Ubuntu Edge mobile device. This device is a great looking and sounds like a solid idea. The questions arise in the fact that they’re working to make this a purely crowd funded project. This wouldn’t be a concern if they were trying to just get a few million in capital, but they’re aiming for $32 million! Overall though, with 128 GB, Dual LTE Antennas for Europe and the US, a top tier screen in quality and design, a metal body and also multiple other features that put this phone ahead of anything out there. I hope it’s successful, but I must admit my own hesitance. What’s your take on the device?

Deployments

Over the course of the conference I talked to and worked with a number of other individuals playing around with Cloud Foundry and also OpenShift. The primary aspect that we worked on was strategies around deployment of these PaaS Technologies.

We also worked with Iron Foundry to extend Cloud Foundry to support .NET. If you love .NET or hate .NET, wherever in that spectrum, it has an absolutely huge user base still. Primarily because .NET spent the last decade and a few years going head to head against Java in the Enterprise, and we all know the enterprise is slow to shift anything. So for now and the foreseeable future .NET is an extremely large part of the development world. Having it work in your PaaS is fundamental to gaining significant enterprise share. Cloud Foundry is the only open source, internally usable PaaS on the market today. There are closed source options available, but that obviously doens’t come up at OSCON.

While at OSCON, I also got to discuss architecture and deployment of Riak with a number of people. The usage of Riak continues to grow and the environments, use cases and tooling that people are using Riak with and for is always an interesting space for me. I also got to discuss deployment of Cassandra and even some Neo4j, Redis and Riak side by side deployments. People have used an interesting mix of NoSQL solutions out there to pull their respective data together for their needs.

Among all these deployments, conversations regularly returned to a known topic of mine. Cloud computing and who is capable of what, where and when. AWS is still an easy leader in cloud computing, not just in customers but in technology. This also brought up the concerns and apathy that some have around OpenStack (hat tip to Ben Kepes for the write up) working more homogeneously with AWS. Whatever the case might be, the path for OpenStack needs to be clarified regularly. I imagine the next movement is going to be away from being too concerned with infrastructure and increased concern with portability of applications and development of applications.

Another growing topic of discussion was around building applications for, on and with Windows Azure. Microsoft has actually become dramatically more involved in open source in an honest and more integrity based way. I’m honestly amazed at how far they’ve come from the declaration years ago that “open source is a cancer” and the all too famous, “linux is communism“. Whatever that was supposed to mean, they didn’t seem to get it back then. Now however, they regularly contribute to open source projects on codeplex but also github and other places. Microsoft has even contributed to the Linux kernal a few months ago.

That leads me to the next topic that came up a number of times…

Architecture

There’s been a lot of discussion about architecture around PaaS, containers (more on that in a moment), distributed systems in general and distributed databases. As I wrote about recently, “Architectural PaaS Cracks or Crack PaaS” the world of distributed systems and distributed databases has more than a few issues when working together in a PaaS environment. This brought up the discussion about what solutions exist today, solutions I look forward to writing and building in the coming months.

The most immediate solution to scalable data sources is still to run your operational data sources such as Neo4j, Redis, Riak or other database autonomously but residing close to your PaaS System. The current public PaaS Providers do exactly this and in some cases extend that to offer the databases and data sources as services through add-ons. These are currently great solutions, but require time, effort and custom development work when setting up internally.

This leads me to the last topic…

The Story of a Container – Docker

Well, not just Docker, but containers in general and Docker specifically. First some context about what a container is.

Container – In this particular context I’m writing about a container, or more specifically a runtime-container, that isolates resources for applications or services. Containers are common in PaaS technologies to help isolate the specific services or applications when they’re on a single physical machine or instance. For each of the respective PaaS systems that came up at OSCON we have dotCloud from the same team that created Docker, Cloud Foundry has Warden and OpenShift has gears and Red Hat Enterprise Linux OS specific containers.

I’ve studied Warden a little in the past while I was working with AppFog and Tier 3 around Cloud Foundry. Warden is a great piece of technology. However the star at OSCON was clearly Docker. I jumped into a number of conversations around Docker. This conversation would then take the direction to containers becoming the key to PaaS tooling and systems growth and increasing capabilities. That leads me back to my previous blog entry “Architectural PaaS Cracks or Crack PaaS” and one of the key solutions to the data tier issue.

Containers, A Solution for Scaling the Data Tier

One of the issues that comes up when trying to scale any distributed database in a PaaS Environment is how to provide multi-tenancy without spooling up new instances for each and every single installation of a node within that distributed database. Here’s an example diagram of the requirements behind a scalable distributed database.

Masterless, Distributed Cluster of Nodes
Masterless, Distributed Cluster of Nodes

In a default configuration you’d want each node to be running on a physical machine or dedicated virtual instance. This is for performance reasons as well as reasons for load balancing, security, data integrity and a host of others. This is the natural beginning state of a highly available distributed database or distributed system.

Trying to deploy something like this into a PaaS environment is tricky. Take into account that there is no such thing in application or service speak as an instance, and especially not anything such as a physical server. The real division between process and resources are containers. These containers are what actually needs to run the distributed system node. This becomes possible, if a distributed system node can be deployed to and executed from within a container.

Enter Docker

After reviewing Docker, the capabilities around it and the requirements of a distributed database, it looks like an ideal marriage of the two technologies. Already Docker has Redis and other database technologies running on it. The Container technology around Docker looks like an ideal fit to extend distributed systems to run autonomously of a single physical machine or single instance per node. This would enable nodes to be deployed as resources are available to provide a more seamless and PaaS style deployment for systems like Cassandra, Riak and related distributed systems. Could this be the next evolution of affordable distributed systems, containers to the rescue?

I’ll be reporting back on my progress, this could be cool!

Stay tuned for a write up on Docker in the near future. For more information now check out http://www.docker.io.

OS Bridge Day 1… Coffee, Missing Angular JS, Distributed Systems, Lego, Hardware, Terraformer…

OS Bridge Day 1 kicked off. I had more than a few goals to achieve for the day.

  1. Give my presentation “Data and Applications Across the Void :: Distributing Systems“, the first with this layout, of key topics and concepts around distributed systems.
  2. Meet Jason Denizac @_jden for coffee at Public Domain and catch up.
  3. Attend Beer Um’ Tuesday Too (i.e. B.U.T.T.) the almost unknown yet known beer meetup from the mind of genius Jerry Sievert @jerrysievert and march over with a contingent from OS Bridge.
  4. Attend the following: Kicking Impostor Syndrome in the Head, Test Driven Development with Angular JS and Terraformer.
  5. Plot next steps involving Bosh, Cloud Foundry, Riak and OpenShift.

Upon arriving I checked in and got the super sweet water bottle that the OS Bridge team got for speaker gifts. Gotta say good job, something a bit different, something that’s quality and something worth keeping! I dig it. I immediately washed it out and carried it around for thirst quenching the rest of the day.

Kicking Impostor Syndrome in the Head

This talk tackled the ideas of how to be more inclusive, allow people to actually gain buy in and confidence in the work they’re doing. This is a hugely important set of ideas that most of the large corporate world has no clue about. Thus the dramatically lower productivity, individual leadership, pride and happiness that people have working in large corporate enterprises & especially Government. This is a space that should be an extremely high priority for those businesses to study.

Mistakes...
Mistakes…

Denise Paolucci did a great job engaging the crowd and relaying the ideas of how to improve work environments to really bring out the best in people. Simply, it occurred to me this could be summarized as, “Don’t be a dick, how to kick ass, and build the whole team to do just that!

The talk included ideas such as making it safe to fail, don’t scapegoat someone around an idea that doesn’t work, but try a new path and move toward succeeding. Don’t setup people to fail, because that drags everybody down. Document things even when everybody supposedly knows those things. The list goes on, but that’s a good base for the ideas.

Check out Dreamwith Studios for more of Denise’s work.

Test Driven Development with Angular JS

This session was presented by Joe Eames @josepheames. I really wanted to go check this out, as I’ve been keen on AngularJS the last couple months but have not been able to work with it as much as I’d like to. So any exposure is good exposure in my book. This is when the bad news kicked in, I had to run off and take care of some minor priorities. Errands, ugh.

For those like me, that either weren’t at OS Bridge or missed this session, this one will be put up live at some point so keep an eye out for the videos being posted. For an immediate fix, Joe has a podcast at JavaScript Jabber. He’s also got a site related to doing TDD & JavaScript at Test Driven JS.

The standard mode of arrival at OS Bridge.
The standard mode of arrival at OS Bridge.

DIY Electric Vehicles

My friend, beverage connoisseur and JavaScripting genius Jerry Sievert @jerrysievert strolled by and mentioned DIY Electric Vehicles, DIY Electric Cars, DIY Electric Bikes and DIY DIY DIY DIY Stuffs. So I packed and headed to this workshop without any original plan to attend anything at this time.

This was a solid session with an introduction to electric vehicles, what they look like, how they work, what types of batteries are good for this use and coverage of Benjamin Kero’s @bkero DIY Electric Bike. Really cool stuff, and something that I really want to expand on and connect even more tech, similar to this plus something like Helios Bars.

Next up…

Terraformer

Terraformer is a project kicked off by Jerry Sievert @jerrysievert that provides some pretty solid mapping toolkit. For more information on this project, check out these links:

Jerry showing off other cool Terraformer features.
Jerry showing off other cool Terraformer features.

Hacker Lounge

During and after all the sessions OS Bridge is fairly well known for its awesome Hacker Lounge. Before many arrived, early in the morning just before the first keynote I snapped a wide angle of the Hacker Lounge…

Hacker Lounge, unoccupied.
Hacker Lounge, unoccupied.

…and here’s a few shots of the Hacker Lounge in full effect.

A wide angle of activity ala the Hacker Lounge. Click for full size image.
A wide angle of activity ala the Hacker Lounge. Click for full size image.

…the Lego table for solutions…

Lego table!
Lego table!

…and hardware hacking.

Hardware hacking, a little soldering brings together different worlds.
Hardware hacking, a little soldering brings together different worlds.

That’s it for day one. Happy hacking.