Tag Archives: basho

Riak in a .NET World

Jeremiah's Demo Works, IT WORKS IT WORKS!

Jeremiah’s Demo Works, IT WORKS IT WORKS!

A few days ago Troy Howard, Jeremiah Peschka and I all traveled via Amtrak Cascades up to Seattle. The mission was simple, Jeremiah was presenting “Riak in a .NET World”, I was handling logistics and Troy was handling video.

So I took the video that Troy shot, I edited it, put together some soundtrack to it and let Jeremiah’s big data magic shine. He covers the basics around RDBMSes, SQL Server in this case but easily it applies to any RDBMS in large part. These basics bring us up to where and why an architecture needs to shift from an RDBMS solution to a distributed solution like Riak. After stepping through some of the key reasons to move to Riak, Jeremiah walks through a live demo of using CorrugatedIron, the .NET Client for Riak (Github repo). During the walk through he covers the specific characteristics of how CorrugatedIron interacts with Riak through indexs, buckets and during puts and pulls of data.

Toward the end of the video Joseph Blomstedt @jtuple, Troy Howard @thoward37, Jeremiah Peschka @peschkaj, Clive Boulton @iC and Richard Turner @bitcrazed. Also note, I’ve enabled download for this specific video since it is actually a large video (1.08GB total). So you may want to download and watch it if you don’t have a super reliable high speed internet connection.

Also for more on Jeremiah’s work check out http://www.brentozar.com/articles/riak/  and contact him at http://www.brentozar.com/contact/

Farewell Basho, It’s Been Swell Yo!

Whew, it’s been a total blast working at Basho. I’ve accomplished a ton of things. Riak is a solid distributed database system and I’m glad to have worked with the team on advocating its use, teaching distributed systems ideas and concepts and generally spreading the knowledge. I’ve seen some truly great things that people are hacking together, setting up for projects and redesigning old systems to utilize newer, better, faster and more capable distributed systems concepts and ideas. Some of the things I’m happy to have contributed to in my time at Basho.

…and there has been a whole lot more. Suffice it to say, Basho has provided me with some sweet opportunities to work on some extremely interesting data projects from a very data sciency point of view (yeah I know sciency aint a word). There may be more Riak work and Riak meetups and Riak hacks and Riak who knows what coming from me, but the meetups & such are now at the hands of the core Riak crew and…

Where Am I Headed?

Right now, I’m moving 20 blocks away from where I currently live, setting up a couch to hack on and grabbing a beer. I’ve got a few personal projects I’ve been wanting to work on. Then I’m taking a few weeks to do some side projects that have been on the burner. Keep an eye out, I’ll be kicking off one, maybe two of these open source projects in the next few days. As @tsantero twitted…

…I’m going to attack my own notebook of ideas. Maybe I’ll even work on that Riak CS Video object store that Tom and I spoke about 10 months ago? Either way, whatever the projects are, I’ll have them posted right here. Until then…

Cheers & Happy Hacking!

Philadelphia Riak Training & Presentations

R Graphs

R Graphs

This week I’ve traveled to Philadelphia to meet with a number of the Basho team to work together and receive training with the trainers or the best ways to approach content on Riak and more generally the best ways we can all brainstorm up to approach specific topics. Some of those topics include things like:

  • Access Patterns around Log Storage & Analysis
  • Bloom Filters
  • CRDTs
  • Consensus Protocols
  • Erasure Coding
  • LevelDB and Bitcask Backends
  • MDC Repl

Out of the options we discussed in training today I ran with benchmarking. It is always near and dear to many of the customers’, clients’ and curious’ that I talk to. I dove in to see what exactly we offer with basho_bench (docs info, github repo) in detail and functionality, but also dove into other benchmarks are out there that others may have run in the past.


What exactly is basho_bench? The basho_bench project is a code repo on Github that offers a set of benchmarking tests that are run against a Riak cluster. There are a few prerequisites to the quick steps below, the prerequisites are:

  1. Make sure you have a cluster or a devrel (Basho docs on devrel) setup that you can point basho_bench at.
  2. Erlang R15B03 should be installed (OS-X Compiler battles and fixes).
  3. Make sure R is installed.

Now to get basho_bench setup.

git clone git://github.com/basho/basho_bench.git
cd basho_bench
make all

Once that is done building, review the directory structure that is in the basho_bench directory. The following should be available in the directory.

$ ls
FAQ		deps		rebar
LICENSE		ebin		rebar.config
Makefile	examples	src
README.org	include		tests
basho_bench	priv

The examples directory has several default config files available to run with basho_bench for testing. If there is a devrel setup with the default IP usage, just run the following command to begin generating stats. If the cluster being tested is not a devrel with then give the configuration section of the docs a read for information on how to point basho_bench at an alternative cluster.

./basho_bench examples/http.config

You’ll see something akin to this spit out onto the screen.

$ ./basho_bench examples/http.config
10:34:41.736 [debug] Lager installed handler lager_console_backend into lager_event
10:34:41.752 [debug] Lager installed handler {lager_file_backend,"/Users/adronhall/Codez/basho_bench/tests/20130808_103441/error.log"} into lager_event
10:34:41.752 [debug] Lager installed handler {lager_file_backend,"/Users/adronhall/Codez/basho_bench/tests/20130808_103441/console.log"} into lager_event
10:34:41.757 [debug] Lager installed handler error_logger_lager_h into error_logger
10:34:41.758 [info] Application lager started on node nonode@nohost
10:34:41.767 [info] Est. data size: 488.28 KB
10:34:41.807 [debug] Supervisor sasl_safe_sup started alarm_handler:start_link() at pid
10:34:41.808 [debug] Supervisor sasl_safe_sup started overload:start_link() at pid
10:34:41.809 [debug] Supervisor sasl_sup started supervisor:start_link({local,sasl_safe_sup}, sasl, safe) at pid
10:34:41.810 [debug] Supervisor sasl_sup started release_handler:start_link() at pid
10:34:41.811 [info] Application sasl started on node nonode@nohost


10:35:42.954 [info]   {{{put_re,{"localhost",4567,"/","{\"this\":\"is_json_%%V\"}"},[{'Content-Type','application/json'}]},{put_re,{"localhost",4567,"/","{\"this\":\"is_json_%%V\"}"},[{'Content-Type','application/json'}]}},{put,{conn_failed,{error,econnrefused}}}}: 1701
10:35:42.955 [info] Application basho_bench exited with reason: stopped
10:35:42.955 [info] Test completed after 1 mins.

After letting the test run for it’s designated minute, run the following command to get some pretty graphs with R.

make results

…or maybe run…

priv/summary.r -i tests/current

The reason I post both is that ‘make results‘ doesn’t seem to always work to build the results and the manual execution will actually get the results built. With the results built, check the tests directory in the basho_bench directory for the summary.png file. If you open the file it should look something like this.

Default empty http.config results from basho_bench. (Click for full size image)

Default empty http.config results from basho_bench. (Click for full size image)

From here you can now run basho_bench and get the results that are specific to basho_bench. However, this now leads me to a higher abstract topic of why do benchmarking in the first place.

Why Benchmark? How to Benchmark!

The definition for benchmark,


1.a standard of excellence, achievement, etc., against which similar things must be measured or judged: The new hotel is a benchmark in opulence and comfort.
2.any standard or reference by which others can be measured or judged: The current price for crude oil may become the benchmark.
3.Computers. an established point of reference against which computers or programs can be measured in tests comparing their performance, reliability, etc.
4.Surveying. Usually, bench mark. a marked point of known or assumed elevation from which other elevations may be established. Abbreviation:  BM
5.of, pertaining to, or resulting in a benchmark: benchmark test,benchmark study.

While basho_bench provides an interesting baseline test that shows various pieces of data to work with, it shows nothing by default that is specific to YOUR use case. The basho_bench is not ideal for your production environment, it is not your dev or user acceptance testing or test criteria, it is an example. To truly get effective numbers that really encompass your needs for your project you will need to provide custom configuration for basho_bench or write your own specific benchmark.

The reason behind this is, with Riak as with other NoSQL solutions, is that you’re working toward a goal that is very data specific and unknown. It has specific domain logic and criteria that is specific to the use case, a custom benchmark can provide real data related to that domain logic and criteria.

In the end, even though basho_bench is a great tool to get started, do basic tests, and a great project to get ideas from it is not the panacea benchmark. You’ll need to create the specific benchmark for your use case yourself.

Happy hacking (and benchmarking).

Riak 1.4 – A Few Notes, Notices & Thoughts…

The release notes for Riak 1.4 can be found via github

Two things have worked together that made me want to write up the new Riak 1.4 features. With Riak 1.4 hitting the streets and the work I’ve been doing with CorrugatedIron there are a few features that are going to add icing the cake. If you want to dive more into the release, check out the release notes. If you’re interested in the .NET Client CorrugatedIron, check it out here or check out the code on github. Now on to the client APIs.

riak-attach changed to not nuke a node

So when issuing the attach command like…

riak attach

…the command attaches to the named pipe to communicate with the running erlang nodes. Now when you hit Ctrl-C it kills just the pipe versus killing the pipe and riak node that you’re on. This is something that has bit me in the keister more than a few times. Bringing down a node or two while working on viewing what is going on with a node. This leads me to the next enhancement.

riak-admin transfers

If you’re using riak_kv_bitcask_backend, riak_kv_eleveldb_backend or riak_kv_memory_backend the riak-admin transfers command now shows per-transfer progress and displays long node names better. Giving you a better idea of what and where things are going. The way this is reported depends slightly on the specific back end. For bitcask or in memory back end the progress is calculated by the keys already transferred out of the total keys, where as the level DB back end calculates based on bytes transferred. Based on this the level DB calculation can get slightly off over time.

Protocol Buffers & Multiple Interface Binding

Protocol Buffers can now bind to multiple ports and interfaces, so clients such as CorrugatedIron for .NET (http://corrugatediron.org/), Riakjs (http://riakjs.com/) can now bind to the Protocol Buffers outside of the set configuration. For more on Riak configuration around the binding, check out the Basho Docs (http://docs.basho.com/riak/latest/references/Configuration-Files/). This also brings feature parity around interface binding equal to that of the HTTP interfaces. This changes the pb_port and pb_ip to a single pb setting which is now a list of IP and port pairs.

Total radness in paging, of 2i

Secondary indexes now have results available via pagination. Check out this PR for bunches more info.

Client-specified Timeouts

Milliseconds can now be assigned to a timeout value for clients. This can be used for object manipulation around fetch, store and delete, listing buckets or keys. This takes care of some time out issues that may have been occurring during certain types of requests. This will come in handy for asynchronous and pivotal if anyone goes the synchronous route.

Bucket Properties for Protocol Buffers

If you’re needing to reset a bucket to it’s defaults, this is now possible. Besides a reset to defaults all bucket properties are now usable for protocol buffer usage. This can definitely help client usage of protocol buffers in a dramatic way.

List-buckets Streaming – Realtime

Listing keys or buckets via a streaming request will send bucket names to the client as received. This prevents any need to wait for a request from all nodes to respond. This can help with response time and time outs from the client point of view. This gives the ability to use the streaming features with Node.js, C#, Java and other languages and frameworks that support realtime streaming data feeds.

…these are the features that have jumped out at me, so until next release.

Conference Recap – The awe inspiring quality & number of conferences in Cascadia!

Rails 2013 Conf (April 29th-May 1st)

The Rails 2013 Conference kicked off for me, with a short bike ride through town to the conference center. The Portland conference center is one of the most connected conference centers I’ve seen; light rail, streetcar, bus, bicycle boulevards, trails & of course pedestrian access is all available. I personally have no idea if you can drive to it, but I hear there is parking & such for drivers.



Rails Conf however clearly places itself in the category of a conference of people that give a shit! This is evident in so many things among the community, from the inclusive nature creating one of the most diverse groups of developers to the fact they handed out 7 day transit passes upon picking up your Rails Conf Pass!



The keynote was by DHH (obviously right?). He laid out where the Rails stack is, some roadmap topics & drew out how much the community had grown. Overall, Rails is now in the state of maintain and grow the ideal. Considering its inclusive nature I hope to see it continue to grow and to increase options out there for people getting into software development.

Railsconf 2013

Railsconf 2013

I also met a number of people while at the conference. One person I ran into again was Travis, who lives out yonder in Jacksonville, Florida and works with Hashrocket. Travis & I, besides the pure metal, have Jacksonville as common stomping ground. Last year I’d met him while the Hash Rocket Crew were in town. We discussed Portland, where to go and how to get there, plus what Hashrocket has been up to in regards to use around Mongo, other databases and how Ruby on Rails was treating them. The conclusion, all good on the dev front!

One of these days though, the Hashrocket crew is just gonna have to move to Portland. Sorry Jacksonville, we’ll visit one day. 😉

For the later half of the conferene I actually dove out and headed down for some client discussions in the country of Southern California. Nathan Aschbacher headed up Basho attendance at the conference from this point on. Which reminds me, I’ve gotta get a sitrep with Nathan…

RICON East (May 13th & 14th)



Ok, so I didn’t actually attend RICON East (sad face), I had far too many things to handle over here in Portlandia – but I watched over 1/3rd of the talks via the 1080p live stream. The basic idea of the RICON Conferences, is a conference series focused on distributed systems. Riak is of course a distributed database, falling into that category, but RICON is by no means merely about Riak at all. At RICON the talks range from competing products to acedemic heavy hitting talks about how, where and why distributed systems are the future of computing. They may touch on things you may be familiar with such as;

  • PaaS (Platform as a Service)
  • Existing databases and how they may fit into the fabric of distributed systems (such as Postgresql)
  • How to scale distributed across AWS Cloud Services, Azure or other cloud providers


As the videos are posted online I’ll be providing some blog entries around the talks. It will however be extremely difficult to choose the first to review, just as RICON back in October of 2012, every single talk was far above the modicum of the median!

Two immediate two talks that stand out was Christopher Meiklejohn’s @cmeik talk, doing a bit o’ proofs and all, in realtime off the cuff and all. It was merely a 5 minute lightnight talk, but holy shit this guy can roll through and hand off intelligence via a talk so fast in blew my mind!

The other talk was Kyle’s, AKA @aphry, who went through network partitions with databases. Basically destroying any comfort you might have with your database being effective at getting reads in a partition event. Kyle knows his stuff, that is without doubt.

There are many others, so subscribe keep reading and I’ll be posting them in the coming weeks.

Node PDX 2013 (May 16th & 17th)

Horse_js and other characters, planning some JavaScript hacking!

Horse_js and other characters, planning some JavaScript hacking!

Holy moley we did it, again! Thanks to EVERYBODY out there in the community for helping us pull together another kick ass Node PDX event! That’s two years in a row now! My fellow cohort of Troy Howard @thoward37 and Luc Perkins @lucperkins had hustled like some crazed worker bees to get everything together and ready – as always a lot always comes together the last minute and we don’t get a wink of sleep until its all done and everybody has had a good time!

Node PDX Sticker Selection was WICKED COOL!

Node PDX Sticker Selection was WICKED COOL!

Node PDX, it’s pretty self descriptive. It’s a one Node.js conference that also includes topics on hardware, javascript on the client side and a host of other topics. It’s also Portland specific. We have Portland Local Roasted Coffee (thanks Ristretto for the pour over & Coava for the custom roast!), Portland Beer (thanks brew capital of the world!), Portland Food (thanks Nicolas’!), Portland DJs (thanks Monika Mhz!), Portland Bands and tons of Portland wierdness all over the place. It’s always a good time! We get the notion at Node PDX, with all the Portlandia spread all over it’s one of the reasons that 8-12 people move to and get hired in Portland after this conference every year (it might become a larger range, as there are a few people planning to make the move in the coming months!).

A wide angle view of Holocene where Node PDX magic happened!

A wide angle view of Holocene where Node PDX magic happened!

The talks this year increased in number, but maintained a solid range of topics. We had a node.js disco talk, client side JavaScript, sensors and node.js, and even heard about people’s personal stories of how they got into programming JavaScript. Excellent talks, and as with RICON, I’ll be posting a blog entry and adding a few penny thoughts of my own to each talk.

Polyglot Conference 2013 (May 24th Workshops, 25th Conference)

Tea & Chris kick off Polyglot Conference 2013!

Tea & Chris kick off Polyglot Conference 2013!

A smiling crowd!

A smiling crowd!

Polyglot Conference was held in Vancouver again this year, with clear intent to expand to Portland and Seattle in the coming year or two. I’m super stoked about this and will definitely be looking to help out – if you’re interested in helping let me know and I’ll get you in contact with the entire crew that’s been handling things so far!

Polyglot Conference itself is a yearly conference held as an open spaces event. The way open space conferences work is described well on Wikipedia were it is referred to as Open Spaces Technology.

The crowds amass to order the chaos of tracks.

The crowds amass to order the chaos of tracks.

The biggest problem with this conference, is that it’s technically only one day. I hope that we can extend it to two days for next year – and hopefully even have the Seattle and Portland branches go with an extended two day itenerary.

A counting system...

A counting system…

This year the break out sessions that that I attended included “Dev Tools”, “How to Be a Better Programmer”, “Go (Language) Noises”, other great sessions and I threw down a session of my own on “Distributed Systems”. Overall, great time and great sessions! I had a blast and am looking forward to next year.

By the way, I’m not sure if I’ve mentioned this at the beginning of this blog entry, but this is only THE BEGINNING OF SUMMER IN CASCADIA! I’ll have more coverage of these events and others coming up, the roadmap includes OS Bridge (where I’m also speaking) and Portland’s notorious OSCON.

Until the next conference, keep hacking on that next bad ass piece of software, cheers!

Backup Riak – Learning About Distributed Databases :: Issue 001

I’ve got more than a few series in the queue, so why not another one eh! The intent is, I’ll grab a specific topic to break down and add details to related to distributed systems, primarily around Riak. I will however diverge into other distributed databases too, but I’ll primarily be sticking to Riak. Without more introduction, the first topic is…

Backing Up and Recovery of Riak (Nodes)

I’ve been asked approximately 423,983,321.7 zillion times how this is done. So here’s a quick summary and respective links to the best ways to backup Riak, how to recover nodes.

When backing up Riak there are two key things that need copied to the backup storage; the ring and data directories. Each of these things are specific based on the backend used with Riak. In addition to the core backup containing the ring and data, another good thing to backup is the configuration directory. When recovering this comes in useful.

For the locations of data, it depends slightly based on the operating system being used. The two big variances are OS-X and Linux Distros. On OS-X the data path, ring data and configuration are located at the locations listed below:

  • Bitcask data: ./data/bitcask
  • LevelDB data: ./data/leveldb
  • Ring data: ./data/riak/ring
  • Configuration: ./etc

For each specific distro, there are slight variations on where the locations are, for a full list check out the Basho Riak docs on backups. But on Linux distros the paths are as follows:

Debian and Ubuntu

  • Bitcask data: /var/lib/riak/bitcask
  • LevelDB data: /var/lib/riak/leveldb
  • Ring data: /var/lib/riak/ring
  • Configuration: /etc/riak

Fedora and RHEL

  • Bitcask data: /var/lib/riak/bitcask
  • LevelDB data: /var/lib/riak/leveldb
  • Ring data: /var/lib/riak/ring
  • Configuration: /etc/riak

Other Operating System Paths


  • Bitcask data: /var/db/riak/bitcask
  • LevelDB data: /var/db/riak/leveldb
  • Ring data: /var/db/riak/ring
  • Configuration: /usr/local/etc/riak


  • Bitcask data: /var/db/riak/bitcask
  • LevelDB data: /var/db/riak/leveldb
  • Ring data: /var/db/riak/ring
  • Configuration: /opt/local/etc/riak


  • Bitcask data: /opt/riak/data/bitcask
  • LevelDB data: /opt/riak/data/leveldb
  • Ring data: /opt/riak/ring
  • Configuration: /opt/riak/etc

When backing things up, it’s important to note that each node could have slightly inconsistent data. The data however is rebuilt by the Riak read-repair system once it is recovered and brought into use.

Backup Jobs

One of the easiest ways to backup Riak is to setup a cron job with your choice of cp, rsync or tar. Then just get those files onto whatever your choice of backup medium. An example tar cron job to backup a Bitcask backend is shown below (snagged from the documentation) just to give you an idea of where to start.

tar -czf /mnt/riak_backups/riak_data_`date +%Y%m%d_%H%M`.tar.gz /var/lib/riak/bitcask /var/lib/riak/ring /etc/riak

For a leveldb back end the most important thing to note is that the node must be stopped. The basic workflow of backing up a node in this manner is to stop the node, backup the data, ring and configuration and then start the node back up.

Backup Recovery / Restoring

When recovering data on a node that is replacing an existing node that has the same name (fully qualified or IP) then follow the steps below:

  1. Install Riak
  2. Restore the old node’s configuration, data & ring.
  3. Start the node

Once you’ve got the node started back up it’s a good idea to do a ping or status against the node to verify it is in a good state.

If node names have been changed there are additional steps.

  1. Mark the original instance down
    riak-admin down 
  2. Join the restored cluster  
    riak-admin join 
  3. Replace the original with 
    riak-admin cluster force-replcae  
  4. Get the cluster plan built 
    riak-admin cluster plan
  5. Commit the changes 
    riak-admin cluster commit
  6. Change the -name setting in the vm.args configuration file to match the new name.
  7. Change & verify that the IP reflects the instances IP in the app.config for http and protocol buffer interfaces.

Cluster Backups via Riak Enterprise Multi-Data Center (MDC)

In the above sections I wrote about the traditional backup approaches. This is very similar to the way RDBMS are backed up. However, with a distributed system like Riak there is another great alternative if you’re utilizing multiple datacenters and Enterprise Riak. In this version of Riak, which is basically Riak with additional features and capabilities, one of the possible backup scenarios is to use the Multi-Data Center, or MDC, to replicate a duplicate cluster and use it as an active, real-time and always ready backup.

One workflow that is an exceptionally effective way to provide backups is to setup the “backup” cluster beside the current operative cluster. As an example, if your cluster is operational in AWS and it is running in X region and Y zone then you’d want to put the backup cluster in that same region and zone. Once you’ve setup Riak Enterprise and MDC, then just setup a full sync. Once the full sync is done you can then remove the backup cluster and it provides a point in time backup of the data.

riak-repl start-fullsync

It’s easy to schedule full sync operations to low usage periods and it is also possible to pause and resume full sync operations.

riak-repl resume-fullsync<br />riak-repl pause-fullsync

The variations on backing up data with Riak Enterprise and MDC are pretty expansive. Doing a point in time, maintaining a secondary live copy of the data, using the replication as a data dump to another cluster or even just using the MDC replication to dump all of the data to a single instance.

File System Snapshots

One other technique that is extremely efficient, fast and thorough is snapshotting the file system. The backup workflow for snapshots is extremely easy. First stop Riak, then snapshot, then start Riak again. Of all the methods, snapshotting is one of the easiest of the options. Just like setting up a cron job, automating snapshots based on some pre-defined schedule and meshing that with automated start and stop of Riak provides a very thorough backup.

With these options, have fun strategizing your stratagems into strategies for backups.


One of the oldest, tried and true backups is the old diskette. The bestest way to backup with diskettes is to backup each node on three diskettes each. The send one of each diskettes to a geographically dispersed to a bank lock box or other secure facility. Do this for each node, and if need be use as many diskettes for each node as needed. A particularly useful method is to use the sharded zip strategy to stripe a backup across many diskettes. Once each lock box has a copy of the node for each node in the cluster, you’ll have one of the most secure backups in existence. Nothing compares to the diskette backup!


  1. Basho Docs – Backups
  2. Basho Docs – MDC Full Sync

Light up a Riak Cluster with AWS, A Few Notes…

I wanted to write up an intro to getting Riak installed on AWS, even though the steps are absurdly simple and already available on the Basho Docs site, there’s a few extra notes that can be very helpful for a few specific points during the process.

Start off by logging into AWS. At this point you can take two different paths that are almost identical. You can follow the path of using the pre-built AWS Marketplace image of Riak, or just start form scratch. The difference is a total of about 2 steps; installing & setting some security port connections. I’m going to step through without using the prebuilt image in these instructions.

Security Group

First thing you’ll need to get a security group with the correct permissions setup. For that, you’ll need to make a security group.

NOTE: No, I didn’t mean to misspell Riak, but it’s in there now.  😉

Before adding the ports, go to the security group details tab and copy the security group id. I’ve pointed it out in the image above.

Now add the following three and assign the security group to the ports; 4369, 8099 & 6000-7999. For the source set it to the security group id. Once you get all three added the list should look like this (below). For each rule click the Add Rule button and remember to click the Apply Rule Changes. I often forget this because the screen on some of the machines I use only shows to the bottom of the Add Rule button, so you’ll have to scroll down to find the Apple Rule Changes button.

Now add the standard port 22 for SSH. Next get the final two of 8087 and 8098 setup and we’re ready for moving on to creating the virtual machines.

Server Virtual Machines

For creating virtual machines I just clicked on Launch Instance and used the classic wizard. From there you get a selection of items. I’ve used the AWS image to do this, but would actually suggest using a CentOS image of your choice or Red Hat Enterprise Linux (RHEL). Another great option is to use the Ubuntu 12.04 LTS. Really though, use whatever Linux version or distro you like, there are 1-2 step instructions for installing Riak on almost every distro out.

Next just launch a single instance. We’ll be able to launch duplicates of these further along in the process. I’ve selected a “Micro” here but I’m not intending to do anything with a remotely heavy load right now. At some point, I’ll upgrade this cluster to larger instances when I start putting it under a real load. I’ll have another blog entry to describe exactly how I do this too.

Keep hitting continue until you get to the key pair selection. Pick the key pair you want, either making a new one for this cluster or use one you already have. Either way works fine.

Continue again until you can select the security group that we created above.

Now keep hitting that continue button, until you get to launch, and launch this thing. Once the instance is launched launch your preferred SSH connection tooling. The easiest way I’ve found for getting the most current private IP to connect to with the appropriate command is to right click on the instance in the AWS Console and click on Connect. There you’ll find the command to connect via SSH.

Paste that in and hit enter in your SSH App, you’ll see something akin to this.

$ cd Codez/working-content/
$ ssh -i riaktionz.pem root@ec2-54-245-201-97.us-west-2.compute.amazonaws.com
The authenticity of host 'ec2-54-245-201-97.us-west-2.compute.amazonaws.com (' can't be established.
RSA key fingerprint is 31:18:ac:1a:ac:fc:6e:6d:55:e8:8a:83:9a:8f:c7:5f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-54-245-201-97.us-west-2.compute.amazonaws.com,' (RSA) to the list of known hosts.
Please login as the user "ubuntu" rather than the user "root".

Enter yes to continue connecting. For some instance types, like Ubuntu you’ll have to do some teaks to log into as “ubuntu” vs. “root” and the same goes for the AWS image or others. I’ll leave that to you, dear reader to get connected via ole’ SSH.

One of the other things, that you may have to do some tweaking about and googling, is figuring out the firewall setups on the various virtual machine images. For the RHEL you’ll want to turn off the firewall or open up the specific connection ports and such. Since the AWS firewall does this, it isn’t particularly important for the OS to continue running its firewall service. In this case, I’ve turned off the OS firewall and just rely on the AWS firewall. To turn off the RHEL firewall, execute the following commands.

[root@ip-x-x-x-x]# service iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables:[  OK  ]
[root@ip-x-x-x-x]# service iptables stop
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Unloading modules:                               [  OK  ]
[root@ip-x-x-x-x]# chkconfig iptables off

Now is a perfect time to start those other instances. Navigate into the AWS Console again and right click on the virtual machine instance you’ve created. On that menu select Launch More Like This.

Go through and check the configuration on each of these, make sure the firewall is turned off, etc. Then move on to the next step and install Riak and cluster them. So it’s time to get to the distributed, massively complex, extensive list of steps to install & cluster Riak. Ok, so that’s sarcasm.  😉

Step 1: Install Riak

Install Riak on each of the instances.

package=basho-release-6-1.noarch.rpm && \
wget http://yum.basho.com/gpg/$package -O /tmp/$package && \
sudo rpm -ivh /tmp/$package
sudo yum install riak

NOTE: For other installation methods, such as directly downloading the RPM or other Linux OSes, check out the http://docs.basho.com/riak/latest/tutorials/installation/Installing-on-RHEL-and-CentOS/.

Step 2: Setup the Cluster

On the first instance, get the IP. You won’t need to do anything to this instance, just keep the IP handy. Then move on to the second instance and run the cluster command.

sudo riak-admin cluster join riak@<ip_of_the_first_node>

Do this on each of the instances you’ve added, using that first node. When you’ve added them all, on that last instance (or really any of them) then run the plan. This will get you a display plan of what will take place when the cluster is committed.

sudo riak-admin cluster plan

If that looks all cool. Commit the plan.

sudo riak-admin cluster commit

Get a check of the cluster.

sudo riak-admin member_status

That’s it all done. You know have a Riak Cluster. For more operations to try out your cluster, check out this list of base API Operations.