Junction Two Weeks on Tuesday Bi-weekly Review : Issue #002

It’s time for another Tuesday Bi-weekly Review! We’ve been making some progress and so far we’ve tackled a few elements of the project. The first big task was to get more information out there for the community & team working on the project. I’ve spent some time along with the contributors on github and via other means to make more information available to what the intent is and how people can contribute. So if you’re interested in helping with an entire domain space or merely a small element of the application, ping me and I’ll work with you to make it as easy as possible to contribute. With that, let’s jump into what’s what and what’s new. Cheers!

We Have a Build Server, More on This Soon, but for now…

I’ll have a post on how to setup Team City and quick tour of what is setup for the Junction Project. So stay tuned and I’ll have that and other news posted as it happens this coming week along with Team City & other tutorials related to the project itself. For a quick sneak peek feel free to take a look at the build server located at:  http://teamcity.cascadiahacks.org/.  Just login with “guest” and no password.

More Items Listed and Working on First Feature Commits and Comments For…

We also got a conversation started among a few of us “What would teams that use Riak like to see in a Riak Admin Application?” Jump into and add your two cents regardless of whether you’re diving into the project or not.

Until later, happy coding!

It’s Happening Again, Seattle Code Camp!

I’ve got two presentations happening this year at Seattle Code Camp! Are you signed up? If not, hit this and get signed up ASAP:  https://seattlecodecamp2013.eventbrite.com/

My two presentations are:

Distributed Databases – An Introduction to Riak

Presenter:Adron Hall

I’ll dive in with a quick definition and context of what distributed databases are. From there we’ll quickly move into what Riak is, how its architecture lends it toward being one of the premier distributed database solutions on the market today. We’ll take a walk through vector clocks to consistent hashs, clusters and rings managing the world of the distributed systems. Then we’ll dive into a use case with a put and pull of data from a walkthrough implementation of Riak.

…and…

Developer Workflow: From Angular.js, Riak, Testing and Vagrant Dev Environments

Presenter:Adron Hall

Each developer has to come up with a workflow that works well for them. Sometimes a lot of the workflow is dictated but there is still a lot that’s left up to the individual. With many modern tools you have a selection of everything from text editor, to IDE to actual operating system distribution. In this presentation I’m going to walk through some of the tooling to help keep all of these things under control during the course of programming efforts. …and yes, this will go beyond just the IDE (or text editor, etc)

…and others to check out!

Much Ado About Hadoop

By now you’ve heard the words “Big Data” and “Hadoop”, but you’re not sure what they mean, much less how to get started. You’re struggling with storing a lot of data, rapidly processing a huge volume of data, or maybe you’re just curious. There are a bewildering array of options and use cases within the Hadoop ecosystem. Every day I help customers understand their data problems, understand where Hadoop fits into their environment, and determine how they can use Hadoop to solve their problem. This session provides an introduction to what Hadoop is, when it’s appropriate to use Hadoop, and guidance on how to get started.

Unit Testing Web Development

Presenter:Mark Michaelis

When it comes to testing, Web Development is fraught with challenges whether it be from variations in browser behavior, the lack of compilation on JavaScript, or the traditional coupling between the UI and the code. In this session we walk through the complexities surrounding the testing of web projects and cover how to overcome these. This includes leveraging everything from source code analysis and JavaScript unit testing to UI and performance testing. Don’t miss this session to learn a multitude ways to significantly improve the quality of your web development.

Riak in a .NET World

Developers have a lot of choices when it comes to storing data. In this session, we’ll introduce .NET developers to Riak, a distributed key-value database. Through a combination of concepts and practical examples, attendees will learn when Riak might be appropriate, how to get started with Riak using CorrugatedIron (a full-featured .NET client for Riak), and how to solve data modeling problems they’re likely to encounter. This talk is for developers who are interested in backing their applications with a fault-tolerant, distributed database.

Introduction to Ember.js

Presenter:Jon Cortez

Ember.js is an open-source client-side JavaScript web application framework based on the Model-View-Controller (MVC) software architectural pattern. It is designed to help developers build scalable Single Page Applications (SPAs) by incorporating common idioms and best practices into a framework that provides a rich object model, declarative two-way data binding, computed properties, automatically-updating templates, and a router for managing application state. In this session, you will learn the key concepts of Ember.js and how to use it to create a simple Single Page Application.

Think Like a Dev: Cognitive Pitfalls in Software Development

Presenter:Michael Ibarra

Our own minds are often working against us. What makes estimating so hard? Is there real value in planning poker? How effective are weekly retrospectives, really? Let’s explore how our minds may be working against us in ways we might not realize. We’ll examine the sources of some common cognitive biases, how they apply to our work efforts, and discuss some “strategery” for overcoming them.

Building a Server Appliance in Node.js

Presenter:Eugenio Pace

Auth0 is a server/service to drastically simplify authentication, identity federation & SSO scenarios; for web & mobile apps. It’s our first big project on node. One of the reasons we decided to build it entirely on node, is the ability to package it and deploy it anywhere: as a service in the public cloud, as a virtual appliance on private cloud, or as an appliance on-premises. In this session we’ll show how we built it. How we use JS for extensibility and easy customization. What worked well, what didn’t. Tools we used, etc.

Hope to see you there. Cheers!

Junction Two Weeks on Tuesday Bi-weekly Review : Issue #001

So every two weeks I intend to provide an update for the Junction Project. Who might have joined, what was worked on, where we are and generally any other bits of news related to the project. This is the first “Junction Two Weeks on Tuesday Review” so enjoy!  🙂

  • Two weeks ago today I wrote the entry “Introducing Junction” to kick off the project. Everything is hosted on github via github pages at http://adron.github.io/junction/ and the git repository at https://github.com/Adron/junction. The video in which I described at a high level each of the sections of the application is located here: http://vimeo.com/adronhall/junction.
  • Clive Boulton @cliveb, Jared Wray @jaredwray, Kristen Mozian @kmozian and OJ Reeves @OJ joined the project to help out.
  • Issues, as stories and tasks were added to get started with the project. Here’s a first draft of the things we’re all working on. If you’d like to jump in, feel free to ping me and I’ll add you to the project, you can submit a PR (Pull Request) or talk to me about organizing a hackathon to help move the project forward.

Github Issues – Working Items

The easiest way to view these is to log into the Huboard Kanban Board and give a look see of what is in progress and who’s working on what. Currently I’ve outlined the big items that we’re working on and would love a fellow coder to jump in on. If you’re interested, ping me @adron or just jump into the issues list on Github (or view by milestone – i.e. functional area) and comment on the issue you want to dive into, I’ll add you so you can get started!

For the “Call the Doctor (Administration and Maintenance)” part of the application there are a number of questions to answer. How should we connect to Riak to ensure a secure SSH connection? Should we even use SSH? Is there another way to connect to the Riak Cluster for a secure way to administer the cluster?

In the “Golfing With Your Data (Query, Put, Deletes, Etc. Handling the CRUD)” one could dive into creating a functional query space to pull data out of a Riak Cluster. A lot of UI work needs to be done in this space, so if you’re up for putting together some awesome windows 8 interfaces, I’d love to hear from you!

Review Summary

At this point we’re moving forward. We’re always looking forward to new participants so reach out if you’re up for helping out!  So until the next two weeks are up, see ya at the Junction!

Philadelphia Riak Training & Presentations

R Graphs
R Graphs

This week I’ve traveled to Philadelphia to meet with a number of the Basho team to work together and receive training with the trainers or the best ways to approach content on Riak and more generally the best ways we can all brainstorm up to approach specific topics. Some of those topics include things like:

  • Access Patterns around Log Storage & Analysis
  • Bloom Filters
  • CRDTs
  • Consensus Protocols
  • Erasure Coding
  • LevelDB and Bitcask Backends
  • MDC Repl

Out of the options we discussed in training today I ran with benchmarking. It is always near and dear to many of the customers’, clients’ and curious’ that I talk to. I dove in to see what exactly we offer with basho_bench (docs info, github repo) in detail and functionality, but also dove into other benchmarks are out there that others may have run in the past.

basho_bench

What exactly is basho_bench? The basho_bench project is a code repo on Github that offers a set of benchmarking tests that are run against a Riak cluster. There are a few prerequisites to the quick steps below, the prerequisites are:

  1. Make sure you have a cluster or a devrel (Basho docs on devrel) setup that you can point basho_bench at.
  2. Erlang R15B03 should be installed (OS-X Compiler battles and fixes).
  3. Make sure R is installed.

Now to get basho_bench setup.

[sourcecode language=”bash”]
git clone git://github.com/basho/basho_bench.git
cd basho_bench
make all
[/sourcecode]

Once that is done building, review the directory structure that is in the basho_bench directory. The following should be available in the directory.

[sourcecode language=”bash”]
$ ls
FAQ deps rebar
LICENSE ebin rebar.config
Makefile examples src
README.org include tests
basho_bench priv
[/sourcecode]

The examples directory has several default config files available to run with basho_bench for testing. If there is a devrel setup with the default 127.0.0.1 IP usage, just run the following command to begin generating stats. If the cluster being tested is not a devrel with 127.0.0.1 then give the configuration section of the docs a read for information on how to point basho_bench at an alternative cluster.

[sourcecode language=”bash”]
./basho_bench examples/http.config
[/sourcecode]

You’ll see something akin to this spit out onto the screen.

[sourcecode language=”bash”]
$ ./basho_bench examples/http.config
10:34:41.736 [debug] Lager installed handler lager_console_backend into lager_event
10:34:41.752 [debug] Lager installed handler {lager_file_backend,"/Users/adronhall/Codez/basho_bench/tests/20130808_103441/error.log"} into lager_event
10:34:41.752 [debug] Lager installed handler {lager_file_backend,"/Users/adronhall/Codez/basho_bench/tests/20130808_103441/console.log"} into lager_event
10:34:41.757 [debug] Lager installed handler error_logger_lager_h into error_logger
10:34:41.758 [info] Application lager started on node nonode@nohost
10:34:41.767 [info] Est. data size: 488.28 KB
10:34:41.807 [debug] Supervisor sasl_safe_sup started alarm_handler:start_link() at pid
10:34:41.808 [debug] Supervisor sasl_safe_sup started overload:start_link() at pid
10:34:41.809 [debug] Supervisor sasl_sup started supervisor:start_link({local,sasl_safe_sup}, sasl, safe) at pid
10:34:41.810 [debug] Supervisor sasl_sup started release_handler:start_link() at pid
10:34:41.811 [info] Application sasl started on node nonode@nohost

….AND A WHOLE LOT MORE HERE….

10:35:42.954 [info] {{{put_re,{"localhost",4567,"/","{\"this\":\"is_json_%%V\"}"},[{‘Content-Type’,’application/json’}]},{put_re,{"localhost",4567,"/","{\"this\":\"is_json_%%V\"}"},[{‘Content-Type’,’application/json’}]}},{put,{conn_failed,{error,econnrefused}}}}: 1701
10:35:42.955 [info] Application basho_bench exited with reason: stopped
10:35:42.955 [info] Test completed after 1 mins.
$
[/sourcecode]

After letting the test run for it’s designated minute, run the following command to get some pretty graphs with R.

[sourcecode language=”bash”]
make results
[/sourcecode]

…or maybe run…

[sourcecode language=”bash”]
priv/summary.r -i tests/current
[/sourcecode]

The reason I post both is that ‘make results‘ doesn’t seem to always work to build the results and the manual execution will actually get the results built. With the results built, check the tests directory in the basho_bench directory for the summary.png file. If you open the file it should look something like this.

Default empty http.config results from basho_bench. (Click for full size image)
Default empty http.config results from basho_bench. (Click for full size image)

From here you can now run basho_bench and get the results that are specific to basho_bench. However, this now leads me to a higher abstract topic of why do benchmarking in the first place.

Why Benchmark? How to Benchmark!

The definition for benchmark,

bench·mark

[bench-mahrk] 
noun
1.a standard of excellence, achievement, etc., against which similar things must be measured or judged: The new hotel is a benchmark in opulence and comfort.
2.any standard or reference by which others can be measured or judged: The current price for crude oil may become the benchmark.
3.Computers. an established point of reference against which computers or programs can be measured in tests comparing their performance, reliability, etc.
4.Surveying. Usually, bench mark. a marked point of known or assumed elevation from which other elevations may be established. Abbreviation:  BM
adjective
5.of, pertaining to, or resulting in a benchmark: benchmark test,benchmark study.

While basho_bench provides an interesting baseline test that shows various pieces of data to work with, it shows nothing by default that is specific to YOUR use case. The basho_bench is not ideal for your production environment, it is not your dev or user acceptance testing or test criteria, it is an example. To truly get effective numbers that really encompass your needs for your project you will need to provide custom configuration for basho_bench or write your own specific benchmark.

The reason behind this is, with Riak as with other NoSQL solutions, is that you’re working toward a goal that is very data specific and unknown. It has specific domain logic and criteria that is specific to the use case, a custom benchmark can provide real data related to that domain logic and criteria.

In the end, even though basho_bench is a great tool to get started, do basic tests, and a great project to get ideas from it is not the panacea benchmark. You’ll need to create the specific benchmark for your use case yourself.

Happy hacking (and benchmarking).

Consistent Hashing – Learning About Distributed Databases :: Issue 002

One of the core tools in the belt of the distributed database is consistent hashing. In Riak this is especially true, as it stands at the core of a Riak Cluster. Hashing, using a hash function, is an algorithm that maps data to variable length to data that’s fixed. In other words, odd things like the name of things mapped to integers. Consistent hashing is a special kind of hashing that provides the pattern for mapping keys and all related functionality around a cluster ring in Riak.

Consistent hashing was originally devised by David Karger, a professor of computer science at MIT (Massachusetts Institute of Technology). He’s also known for Karger’s Algorithm, a Monte Carlo method that computes the minimum cut in a connected graph (graph theory related stuff). Along with these developments he’s been part of many other efforts and contributed to computer science in many ways.

Remapping, Mapping and Keeping Distributed (& Available)

One key property of a consistent hash is that it minimizes the number of keys that must be remapped. With a regular hash changes, the entire key hash must be remapped.

Consistent hashing is based around mapping each object to a point of a circle. The system maps each storage bucket to pseudo-randomly distributed points on the edge of this circle.

The system finds where to place the object based on the key on the edge of the circle. It then walks the circle falling into the first bucket it finds. This results in the buckets containing the resources between its point and the next bucket point.

When a bucket disappears for any reason, the pseudo randomly mapped objects will now get re-mapped to different buckets. When a bucket appears, such as becoming available again or being added, a similar process occurs.

The Basho Docs describe in brief that,

Consistent hashing is a technique used to limit the reshuffling of keys when a hash-table data structure is rebalanced (when slots are added or removed). Riak uses consistent hashing to organize its data storage and replication. Specifically, the vnodes in the Riak Ring responsible for storing each object are determined using the consistent hashing technique.

NOTES: This is not a single blog entry topic by any means. This is merely a cursory look at consistent hashing. This entry I aimed to provide a basic description and coverage of the actions around consistent hashing. For more information and to dive even deeper into consistent hashing I’ve included a few links that have extensive information on the topic: