A few days ago Troy Howard, Jeremiah Peschka and I all traveled via Amtrak Cascades up to Seattle. The mission was simple, Jeremiah was presenting “Riak in a .NET World”, I was handling logistics and Troy was handling video.
So I took the video that Troy shot, I edited it, put together some soundtrack to it and let Jeremiah’s big data magic shine. He covers the basics around RDBMSes, SQL Server in this case but easily it applies to any RDBMS in large part. These basics bring us up to where and why an architecture needs to shift from an RDBMS solution to a distributed solution like Riak. After stepping through some of the key reasons to move to Riak, Jeremiah walks through a live demo of using CorrugatedIron, the .NET Client for Riak (Github repo). During the walk through he covers the specific characteristics of how CorrugatedIron interacts with Riak through indexs, buckets and during puts and pulls of data.
Toward the end of the video Joseph Blomstedt @jtuple, Troy Howard @thoward37, Jeremiah Peschka @peschkaj, Clive Boulton @iC and Richard Turner @bitcrazed. Also note, I’ve enabled download for this specific video since it is actually a large video (1.08GB total). So you may want to download and watch it if you don’t have a super reliable high speed internet connection.
Whew, it’s been a total blast working at Basho. I’ve accomplished a ton of things. Riak is a solid distributed database system and I’m glad to have worked with the team on advocating its use, teaching distributed systems ideas and concepts and generally spreading the knowledge. I’ve seen some truly great things that people are hacking together, setting up for projects and redesigning old systems to utilize newer, better, faster and more capable distributed systems concepts and ideas. Some of the things I’m happy to have contributed to in my time at Basho.
I helped negotiate and get an effort started that came to fruition with Tier 3 releasing a Riak CS backed object store for their customers. A very cool feature added to their already formidable enterprise cloud offerings. Read more about that here “Tier 3 Object Storage” and here “Tier 3 Launches Global Object Storage“. Their implementation is really nice, including many geographic regions of accessibility, S3 API compatibility and high end storage capabilities that offer a bigger punch of performance than your average object storage in the cloud!
I launched both the Seattle Riak (up to 52 members now!) and Portland Riak (up to 74 members now!!) groups started, which you should join, they’re a good time, good conversation and great information.
I partnered with Troy Howard @thoward37 to run the second year of Node PDX. Basho was excellent enough to contribute not just a few bucks but also sent Chris Meiklejohn @cmeik out to speak at the conference.
I got to work directly with a number of people at Windows Azure, AWS and EngineYard in deploying Riak, testing out how the respective images (azure VM Depot & AWS AMI) and deployments (Riak EDS) would work. In the end, this has been a great opportunity to learn more about the latest and greatest of each of these services. I’ve been impressed as they’ve each been doing a seriously kick ass job lately!
…and there has been a whole lot more. Suffice it to say, Basho has provided me with some sweet opportunities to work on some extremely interesting data projects from a very data sciency point of view (yeah I know sciency aint a word). There may be more Riak work and Riak meetups and Riak hacks and Riak who knows what coming from me, but the meetups & such are now at the hands of the core Riak crew and…
Where Am I Headed?
Right now, I’m moving 20 blocks away from where I currently live, setting up a couch to hack on and grabbing a beer. I’ve got a few personal projects I’ve been wanting to work on. Then I’m taking a few weeks to do some side projects that have been on the burner. Keep an eye out, I’ll be kicking off one, maybe two of these open source projects in the next few days. As @tsantero twitted…
i wish i had the time to work on even 5% of the ideas in my notebook :
…I’m going to attack my own notebook of ideas. Maybe I’ll even work on that Riak CS Video object store that Tom and I spoke about 10 months ago? Either way, whatever the projects are, I’ll have them posted right here. Until then…
This week I’ve traveled to Philadelphia to meet with a number of the Basho team to work together and receive training with the trainers or the best ways to approach content on Riak and more generally the best ways we can all brainstorm up to approach specific topics. Some of those topics include things like:
Access Patterns around Log Storage & Analysis
LevelDB and Bitcask Backends
Out of the options we discussed in training today I ran with benchmarking. It is always near and dear to many of the customers’, clients’ and curious’ that I talk to. I dove in to see what exactly we offer with basho_bench (docs info, github repo) in detail and functionality, but also dove into other benchmarks are out there that others may have run in the past.
What exactly is basho_bench? The basho_bench project is a code repo on Github that offers a set of benchmarking tests that are run against a Riak cluster. There are a few prerequisites to the quick steps below, the prerequisites are:
git clone git://github.com/basho/basho_bench.git
Once that is done building, review the directory structure that is in the basho_bench directory. The following should be available in the directory.
FAQ deps rebar
LICENSE ebin rebar.config
Makefile examples src
README.org include tests
The examples directory has several default config files available to run with basho_bench for testing. If there is a devrel setup with the default 127.0.0.1 IP usage, just run the following command to begin generating stats. If the cluster being tested is not a devrel with 127.0.0.1 then give the configuration section of the docs a read for information on how to point basho_bench at an alternative cluster.
The reason I post both is that ‘make results‘ doesn’t seem to always work to build the results and the manual execution will actually get the results built. With the results built, check the tests directory in the basho_bench directory for the summary.png file. If you open the file it should look something like this.
From here you can now run basho_bench and get the results that are specific to basho_bench. However, this now leads me to a higher abstract topic of why do benchmarking in the first place.
Why Benchmark? How to Benchmark!
The definition for benchmark,
1.a standard of excellence, achievement, etc., against which similar things must be measured or judged: The new hotel is a benchmark in opulence and comfort.
2.any standard or reference by which others can be measured or judged: The current price for crude oil may become the benchmark.
3.Computers. an established point of reference against which computers or programs can be measured in tests comparing their performance, reliability, etc.
4.Surveying. Usually, bench mark. a marked point of known or assumed elevation from which other elevations may be established. Abbreviation: BM adjective
5.of, pertaining to, or resulting in a benchmark: benchmark test,benchmark study.
While basho_bench provides an interesting baseline test that shows various pieces of data to work with, it shows nothing by default that is specific to YOUR use case. The basho_bench is not ideal for your production environment, it is not your dev or user acceptance testing or test criteria, it is an example. To truly get effective numbers that really encompass your needs for your project you will need to provide custom configuration for basho_bench or write your own specific benchmark.
The reason behind this is, with Riak as with other NoSQL solutions, is that you’re working toward a goal that is very data specific and unknown. It has specific domain logic and criteria that is specific to the use case, a custom benchmark can provide real data related to that domain logic and criteria.
In the end, even though basho_bench is a great tool to get started, do basic tests, and a great project to get ideas from it is not the panacea benchmark. You’ll need to create the specific benchmark for your use case yourself.
Two things have worked together that made me want to write up the new Riak 1.4 features. With Riak 1.4 hitting the streets and the work I’ve been doing with CorrugatedIron there are a few features that are going to add icing the cake. If you want to dive more into the release, check out the release notes. If you’re interested in the .NET Client CorrugatedIron, check it out here or check out the code on github. Now on to the client APIs.
…the command attaches to the named pipe to communicate with the running erlang nodes. Now when you hit Ctrl-C it kills just the pipe versus killing the pipe and riak node that you’re on. This is something that has bit me in the keister more than a few times. Bringing down a node or two while working on viewing what is going on with a node. This leads me to the next enhancement.
If you’re using riak_kv_bitcask_backend, riak_kv_eleveldb_backend or riak_kv_memory_backend the riak-admin transfers command now shows per-transfer progress and displays long node names better. Giving you a better idea of what and where things are going. The way this is reported depends slightly on the specific back end. For bitcask or in memory back end the progress is calculated by the keys already transferred out of the total keys, where as the level DB back end calculates based on bytes transferred. Based on this the level DB calculation can get slightly off over time.
Protocol Buffers & Multiple Interface Binding
Protocol Buffers can now bind to multiple ports and interfaces, so clients such as CorrugatedIron for .NET (http://corrugatediron.org/), Riakjs (http://riakjs.com/) can now bind to the Protocol Buffers outside of the set configuration. For more on Riak configuration around the binding, check out the Basho Docs (http://docs.basho.com/riak/latest/references/Configuration-Files/). This also brings feature parity around interface binding equal to that of the HTTP interfaces. This changes the pb_port and pb_ip to a single pb setting which is now a list of IP and port pairs.
Milliseconds can now be assigned to a timeout value for clients. This can be used for object manipulation around fetch, store and delete, listing buckets or keys. This takes care of some time out issues that may have been occurring during certain types of requests. This will come in handy for asynchronous and pivotal if anyone goes the synchronous route.
Bucket Properties for Protocol Buffers
If you’re needing to reset a bucket to it’s defaults, this is now possible. Besides a reset to defaults all bucket properties are now usable for protocol buffer usage. This can definitely help client usage of protocol buffers in a dramatic way.
List-buckets Streaming – Realtime
Listing keys or buckets via a streaming request will send bucket names to the client as received. This prevents any need to wait for a request from all nodes to respond. This can help with response time and time outs from the client point of view. This gives the ability to use the streaming features with Node.js, C#, Java and other languages and frameworks that support realtime streaming data feeds.
…these are the features that have jumped out at me, so until next release.
The Rails 2013 Conference kicked off for me, with a short bike ride through town to the conference center. The Portland conference center is one of the most connected conference centers I’ve seen; light rail, streetcar, bus, bicycle boulevards, trails & of course pedestrian access is all available. I personally have no idea if you can drive to it, but I hear there is parking & such for drivers.
Rails Conf however clearly places itself in the category of a conference of people that give a shit! This is evident in so many things among the community, from the inclusive nature creating one of the most diverse groups of developers to the fact they handed out 7 day transit passes upon picking up your Rails Conf Pass!
The keynote was by DHH (obviously right?). He laid out where the Rails stack is, some roadmap topics & drew out how much the community had grown. Overall, Rails is now in the state of maintain and grow the ideal. Considering its inclusive nature I hope to see it continue to grow and to increase options out there for people getting into software development.
I also met a number of people while at the conference. One person I ran into again was Travis, who lives out yonder in Jacksonville, Florida and works with Hashrocket. Travis & I, besides the pure metal, have Jacksonville as common stomping ground. Last year I’d met him while the Hash Rocket Crew were in town. We discussed Portland, where to go and how to get there, plus what Hashrocket has been up to in regards to use around Mongo, other databases and how Ruby on Rails was treating them. The conclusion, all good on the dev front!
One of these days though, the Hashrocket crew is just gonna have to move to Portland. Sorry Jacksonville, we’ll visit one day. 😉
For the later half of the conferene I actually dove out and headed down for some client discussions in the country of Southern California. Nathan Aschbacher headed up Basho attendance at the conference from this point on. Which reminds me, I’ve gotta get a sitrep with Nathan…
RICON East (May 13th & 14th)
Ok, so I didn’t actually attend RICON East (sad face), I had far too many things to handle over here in Portlandia – but I watched over 1/3rd of the talks via the 1080p live stream. The basic idea of the RICON Conferences, is a conference series focused on distributed systems. Riak is of course a distributed database, falling into that category, but RICON is by no means merely about Riak at all. At RICON the talks range from competing products to acedemic heavy hitting talks about how, where and why distributed systems are the future of computing. They may touch on things you may be familiar with such as;
PaaS (Platform as a Service)
Existing databases and how they may fit into the fabric of distributed systems (such as Postgresql)
How to scale distributed across AWS Cloud Services, Azure or other cloud providers
As the videos are posted online I’ll be providing some blog entries around the talks. It will however be extremely difficult to choose the first to review, just as RICON back in October of 2012, every single talk was far above the modicum of the median!
Two immediate two talks that stand out was Christopher Meiklejohn’s @cmeik talk, doing a bit o’ proofs and all, in realtime off the cuff and all. It was merely a 5 minute lightnight talk, but holy shit this guy can roll through and hand off intelligence via a talk so fast in blew my mind!
The other talk was Kyle’s, AKA @aphry, who went through network partitions with databases. Basically destroying any comfort you might have with your database being effective at getting reads in a partition event. Kyle knows his stuff, that is without doubt.
There are many others, so subscribe keep reading and I’ll be posting them in the coming weeks.
Node PDX 2013 (May 16th & 17th)
Holy moley we did it, again! Thanks to EVERYBODY out there in the community for helping us pull together another kick ass Node PDX event! That’s two years in a row now! My fellow cohort of Troy Howard @thoward37 and Luc Perkins @lucperkins had hustled like some crazed worker bees to get everything together and ready – as always a lot always comes together the last minute and we don’t get a wink of sleep until its all done and everybody has had a good time!
Polyglot Conference was held in Vancouver again this year, with clear intent to expand to Portland and Seattle in the coming year or two. I’m super stoked about this and will definitely be looking to help out – if you’re interested in helping let me know and I’ll get you in contact with the entire crew that’s been handling things so far!
The biggest problem with this conference, is that it’s technically only one day. I hope that we can extend it to two days for next year – and hopefully even have the Seattle and Portland branches go with an extended two day itenerary.
This year the break out sessions that that I attended included “Dev Tools”, “How to Be a Better Programmer”, “Go (Language) Noises”, other great sessions and I threw down a session of my own on “Distributed Systems”. Overall, great time and great sessions! I had a blast and am looking forward to next year.
By the way, I’m not sure if I’ve mentioned this at the beginning of this blog entry, but this is only THE BEGINNING OF SUMMER IN CASCADIA! I’ll have more coverage of these events and others coming up, the roadmap includes OS Bridge (where I’m also speaking) and Portland’s notorious OSCON.
Until the next conference, keep hacking on that next bad ass piece of software, cheers!