Riak in a .NET World

Jeremiah's Demo Works, IT WORKS IT WORKS!
Jeremiah’s Demo Works, IT WORKS IT WORKS!

A few days ago Troy Howard, Jeremiah Peschka and I all traveled via Amtrak Cascades up to Seattle. The mission was simple, Jeremiah was presenting “Riak in a .NET World”, I was handling logistics and Troy was handling video.

So I took the video that Troy shot, I edited it, put together some soundtrack to it and let Jeremiah’s big data magic shine. He covers the basics around RDBMSes, SQL Server in this case but easily it applies to any RDBMS in large part. These basics bring us up to where and why an architecture needs to shift from an RDBMS solution to a distributed solution like Riak. After stepping through some of the key reasons to move to Riak, Jeremiah walks through a live demo of using CorrugatedIron, the .NET Client for Riak (Github repo). During the walk through he covers the specific characteristics of how CorrugatedIron interacts with Riak through indexs, buckets and during puts and pulls of data.

Toward the end of the video Joseph Blomstedt @jtuple, Troy Howard @thoward37, Jeremiah Peschka @peschkaj, Clive Boulton @iC and Richard Turner @bitcrazed. Also note, I’ve enabled download for this specific video since it is actually a large video (1.08GB total). So you may want to download and watch it if you don’t have a super reliable high speed internet connection.

Also for more on Jeremiah’s work check out http://www.brentozar.com/articles/riak/  and contact him at http://www.brentozar.com/contact/

Introducing Junction

Today I’ve officially kicked off a new project from my notebook of projects based around building a Riak admin, data manipulation, reporting and news tool for Windows 8. If you want to jump right to the project, here’s the Github Pages Site, the Github Junction Repo and eventually I’ll have it listed in the Windows 8 Store for download. Yes, it’ll be free as in beer, it’ll all be Apache 2.0 Licensed and the project is open to contributors and others that want to jump into things. There’s also a quick intro for how I setup the “Windows 8 Logos, Badges & Splash Screens of Riak“.

So now that I’ve provided the links, here’s a quick intro to each of the application sections, what this application is for, where the workflow for contributions will be and what the next steps are. Trust me, I roll easy, I’ll be working as hard as I can to make pull requests easy peasy, keep the issues down to workable contributions and the whole “this is a good OSS project”.

Riak Junction Application rocking on the Windows 8 desktop with a full tile!
Riak Junction Application rocking on the Windows 8 desktop with a full tile!

Juncture Divisions

The juncture application should be split into several key components, or application divisions of functionality. I’ve broken each out with a basic description. If you just want to watch a video where I outline each division, play the video below for a quick 5 minute intro to the application and the idea behind it all.

A quick run through of the first sample UI.

Call the Doctor! (Administration & Maintenance)

This part of the application would provide an interface for all the general administration and maintenance needs around individual nodes and around the overall cluster of nodes. The ability to add, remove and generally administer everything that is available via the riak-admin command line interface.

Time Travel That Data (Performance Benchmarking)

This section of the application will provide the ability to benchmark the timing of data in and out of a cluster. In addition it should show standard benchmarking similar to that which is offered with the basho_bench project.

Love of the Data (Reporting)

This division of the application would be focused on reporting. I’m not sure what exactly that would entail, but something with charts, graphs and pulling together trending points of some sort. If you have ideas and want to work on this part of the application, weigh in!

Golfing With Your Data (Query, Put, Deletes, Etc. Handling the CRUD)

The application will have an interface to provide access to add and remove data, as well as viewing the data that is available within a cluster. The primary means for implementing this part of the application will be with the CorrugatedIron Project. It’s a library available via Nuget that @peschkaj and @TheColonial have put together.

News! News! News! (News…  RSS Feed Reader)

The idea is that this will provide a quick and easy way to get familiar with Windows 8 dev and the project overall. I’m aiming to eat the Basho blog feed and provide it as key highlights for the application with future abilities around mining other RSS feeds or such and having those fed into a ??  Riak Cluster? Again, everything is open to change, addition or removal! So jump into the project and let me know your thoughts.

Cheers & Happy Hacking!

Farewell Basho, It’s Been Swell Yo!

Whew, it’s been a total blast working at Basho. I’ve accomplished a ton of things. Riak is a solid distributed database system and I’m glad to have worked with the team on advocating its use, teaching distributed systems ideas and concepts and generally spreading the knowledge. I’ve seen some truly great things that people are hacking together, setting up for projects and redesigning old systems to utilize newer, better, faster and more capable distributed systems concepts and ideas. Some of the things I’m happy to have contributed to in my time at Basho.

…and there has been a whole lot more. Suffice it to say, Basho has provided me with some sweet opportunities to work on some extremely interesting data projects from a very data sciency point of view (yeah I know sciency aint a word). There may be more Riak work and Riak meetups and Riak hacks and Riak who knows what coming from me, but the meetups & such are now at the hands of the core Riak crew and…

Where Am I Headed?

Right now, I’m moving 20 blocks away from where I currently live, setting up a couch to hack on and grabbing a beer. I’ve got a few personal projects I’ve been wanting to work on. Then I’m taking a few weeks to do some side projects that have been on the burner. Keep an eye out, I’ll be kicking off one, maybe two of these open source projects in the next few days. As @tsantero twitted…

…I’m going to attack my own notebook of ideas. Maybe I’ll even work on that Riak CS Video object store that Tom and I spoke about 10 months ago? Either way, whatever the projects are, I’ll have them posted right here. Until then…

Cheers & Happy Hacking!

Philadelphia Riak Training & Presentations

R Graphs
R Graphs

This week I’ve traveled to Philadelphia to meet with a number of the Basho team to work together and receive training with the trainers or the best ways to approach content on Riak and more generally the best ways we can all brainstorm up to approach specific topics. Some of those topics include things like:

  • Access Patterns around Log Storage & Analysis
  • Bloom Filters
  • CRDTs
  • Consensus Protocols
  • Erasure Coding
  • LevelDB and Bitcask Backends
  • MDC Repl

Out of the options we discussed in training today I ran with benchmarking. It is always near and dear to many of the customers’, clients’ and curious’ that I talk to. I dove in to see what exactly we offer with basho_bench (docs info, github repo) in detail and functionality, but also dove into other benchmarks are out there that others may have run in the past.

basho_bench

What exactly is basho_bench? The basho_bench project is a code repo on Github that offers a set of benchmarking tests that are run against a Riak cluster. There are a few prerequisites to the quick steps below, the prerequisites are:

  1. Make sure you have a cluster or a devrel (Basho docs on devrel) setup that you can point basho_bench at.
  2. Erlang R15B03 should be installed (OS-X Compiler battles and fixes).
  3. Make sure R is installed.

Now to get basho_bench setup.

[sourcecode language=”bash”]
git clone git://github.com/basho/basho_bench.git
cd basho_bench
make all
[/sourcecode]

Once that is done building, review the directory structure that is in the basho_bench directory. The following should be available in the directory.

[sourcecode language=”bash”]
$ ls
FAQ deps rebar
LICENSE ebin rebar.config
Makefile examples src
README.org include tests
basho_bench priv
[/sourcecode]

The examples directory has several default config files available to run with basho_bench for testing. If there is a devrel setup with the default 127.0.0.1 IP usage, just run the following command to begin generating stats. If the cluster being tested is not a devrel with 127.0.0.1 then give the configuration section of the docs a read for information on how to point basho_bench at an alternative cluster.

[sourcecode language=”bash”]
./basho_bench examples/http.config
[/sourcecode]

You’ll see something akin to this spit out onto the screen.

[sourcecode language=”bash”]
$ ./basho_bench examples/http.config
10:34:41.736 [debug] Lager installed handler lager_console_backend into lager_event
10:34:41.752 [debug] Lager installed handler {lager_file_backend,"/Users/adronhall/Codez/basho_bench/tests/20130808_103441/error.log"} into lager_event
10:34:41.752 [debug] Lager installed handler {lager_file_backend,"/Users/adronhall/Codez/basho_bench/tests/20130808_103441/console.log"} into lager_event
10:34:41.757 [debug] Lager installed handler error_logger_lager_h into error_logger
10:34:41.758 [info] Application lager started on node nonode@nohost
10:34:41.767 [info] Est. data size: 488.28 KB
10:34:41.807 [debug] Supervisor sasl_safe_sup started alarm_handler:start_link() at pid
10:34:41.808 [debug] Supervisor sasl_safe_sup started overload:start_link() at pid
10:34:41.809 [debug] Supervisor sasl_sup started supervisor:start_link({local,sasl_safe_sup}, sasl, safe) at pid
10:34:41.810 [debug] Supervisor sasl_sup started release_handler:start_link() at pid
10:34:41.811 [info] Application sasl started on node nonode@nohost

….AND A WHOLE LOT MORE HERE….

10:35:42.954 [info] {{{put_re,{"localhost",4567,"/","{\"this\":\"is_json_%%V\"}"},[{‘Content-Type’,’application/json’}]},{put_re,{"localhost",4567,"/","{\"this\":\"is_json_%%V\"}"},[{‘Content-Type’,’application/json’}]}},{put,{conn_failed,{error,econnrefused}}}}: 1701
10:35:42.955 [info] Application basho_bench exited with reason: stopped
10:35:42.955 [info] Test completed after 1 mins.
$
[/sourcecode]

After letting the test run for it’s designated minute, run the following command to get some pretty graphs with R.

[sourcecode language=”bash”]
make results
[/sourcecode]

…or maybe run…

[sourcecode language=”bash”]
priv/summary.r -i tests/current
[/sourcecode]

The reason I post both is that ‘make results‘ doesn’t seem to always work to build the results and the manual execution will actually get the results built. With the results built, check the tests directory in the basho_bench directory for the summary.png file. If you open the file it should look something like this.

Default empty http.config results from basho_bench. (Click for full size image)
Default empty http.config results from basho_bench. (Click for full size image)

From here you can now run basho_bench and get the results that are specific to basho_bench. However, this now leads me to a higher abstract topic of why do benchmarking in the first place.

Why Benchmark? How to Benchmark!

The definition for benchmark,

bench·mark

[bench-mahrk] 
noun
1.a standard of excellence, achievement, etc., against which similar things must be measured or judged: The new hotel is a benchmark in opulence and comfort.
2.any standard or reference by which others can be measured or judged: The current price for crude oil may become the benchmark.
3.Computers. an established point of reference against which computers or programs can be measured in tests comparing their performance, reliability, etc.
4.Surveying. Usually, bench mark. a marked point of known or assumed elevation from which other elevations may be established. Abbreviation:  BM
adjective
5.of, pertaining to, or resulting in a benchmark: benchmark test,benchmark study.

While basho_bench provides an interesting baseline test that shows various pieces of data to work with, it shows nothing by default that is specific to YOUR use case. The basho_bench is not ideal for your production environment, it is not your dev or user acceptance testing or test criteria, it is an example. To truly get effective numbers that really encompass your needs for your project you will need to provide custom configuration for basho_bench or write your own specific benchmark.

The reason behind this is, with Riak as with other NoSQL solutions, is that you’re working toward a goal that is very data specific and unknown. It has specific domain logic and criteria that is specific to the use case, a custom benchmark can provide real data related to that domain logic and criteria.

In the end, even though basho_bench is a great tool to get started, do basic tests, and a great project to get ideas from it is not the panacea benchmark. You’ll need to create the specific benchmark for your use case yourself.

Happy hacking (and benchmarking).