Backup Riak – Learning About Distributed Databases :: Issue 001

I’ve got more than a few series in the queue, so why not another one eh! The intent is, I’ll grab a specific topic to break down and add details to related to distributed systems, primarily around Riak. I will however diverge into other distributed databases too, but I’ll primarily be sticking to Riak. Without more introduction, the first topic is…

Backing Up and Recovery of Riak (Nodes)

I’ve been asked approximately 423,983,321.7 zillion times how this is done. So here’s a quick summary and respective links to the best ways to backup Riak, how to recover nodes.

When backing up Riak there are two key things that need copied to the backup storage; the ring and data directories. Each of these things are specific based on the backend used with Riak. In addition to the core backup containing the ring and data, another good thing to backup is the configuration directory. When recovering this comes in useful.

For the locations of data, it depends slightly based on the operating system being used. The two big variances are OS-X and Linux Distros. On OS-X the data path, ring data and configuration are located at the locations listed below:

  • Bitcask data: ./data/bitcask
  • LevelDB data: ./data/leveldb
  • Ring data: ./data/riak/ring
  • Configuration: ./etc

For each specific distro, there are slight variations on where the locations are, for a full list check out the Basho Riak docs on backups. But on Linux distros the paths are as follows:

Debian and Ubuntu

  • Bitcask data: /var/lib/riak/bitcask
  • LevelDB data: /var/lib/riak/leveldb
  • Ring data: /var/lib/riak/ring
  • Configuration: /etc/riak

Fedora and RHEL

  • Bitcask data: /var/lib/riak/bitcask
  • LevelDB data: /var/lib/riak/leveldb
  • Ring data: /var/lib/riak/ring
  • Configuration: /etc/riak

Other Operating System Paths

Freebsd

  • Bitcask data: /var/db/riak/bitcask
  • LevelDB data: /var/db/riak/leveldb
  • Ring data: /var/db/riak/ring
  • Configuration: /usr/local/etc/riak

SmartOS

  • Bitcask data: /var/db/riak/bitcask
  • LevelDB data: /var/db/riak/leveldb
  • Ring data: /var/db/riak/ring
  • Configuration: /opt/local/etc/riak

Solaris

  • Bitcask data: /opt/riak/data/bitcask
  • LevelDB data: /opt/riak/data/leveldb
  • Ring data: /opt/riak/ring
  • Configuration: /opt/riak/etc

When backing things up, it’s important to note that each node could have slightly inconsistent data. The data however is rebuilt by the Riak read-repair system once it is recovered and brought into use.

Backup Jobs

One of the easiest ways to backup Riak is to setup a cron job with your choice of cp, rsync or tar. Then just get those files onto whatever your choice of backup medium. An example tar cron job to backup a Bitcask backend is shown below (snagged from the documentation) just to give you an idea of where to start.

[sourcecode language=”bash”]tar -czf /mnt/riak_backups/riak_data_`date +%Y%m%d_%H%M`.tar.gz /var/lib/riak/bitcask /var/lib/riak/ring /etc/riak
[/sourcecode]

For a leveldb back end the most important thing to note is that the node must be stopped. The basic workflow of backing up a node in this manner is to stop the node, backup the data, ring and configuration and then start the node back up.

Backup Recovery / Restoring

When recovering data on a node that is replacing an existing node that has the same name (fully qualified or IP) then follow the steps below:

  1. Install Riak
  2. Restore the old node’s configuration, data & ring.
  3. Start the node

Once you’ve got the node started back up it’s a good idea to do a ping or status against the node to verify it is in a good state.

If node names have been changed there are additional steps.

  1. Mark the original instance down[sourcecode language=”bash”]riak-admin down [/sourcecode]
  2. Join the restored cluster  [sourcecode language=”bash”]riak-admin join [/sourcecode]
  3. Replace the original with [sourcecode language=”bash”]riak-admin cluster force-replcae  [/sourcecode]
  4. Get the cluster plan built [sourcecode language=”bash”]riak-admin cluster plan[/sourcecode]
  5. Commit the changes [sourcecode language=”bash”]riak-admin cluster commit[/sourcecode]
  6. Change the -name setting in the vm.args configuration file to match the new name.
  7. Change & verify that the IP reflects the instances IP in the app.config for http and protocol buffer interfaces.

Cluster Backups via Riak Enterprise Multi-Data Center (MDC)

In the above sections I wrote about the traditional backup approaches. This is very similar to the way RDBMS are backed up. However, with a distributed system like Riak there is another great alternative if you’re utilizing multiple datacenters and Enterprise Riak. In this version of Riak, which is basically Riak with additional features and capabilities, one of the possible backup scenarios is to use the Multi-Data Center, or MDC, to replicate a duplicate cluster and use it as an active, real-time and always ready backup.

One workflow that is an exceptionally effective way to provide backups is to setup the “backup” cluster beside the current operative cluster. As an example, if your cluster is operational in AWS and it is running in X region and Y zone then you’d want to put the backup cluster in that same region and zone. Once you’ve setup Riak Enterprise and MDC, then just setup a full sync. Once the full sync is done you can then remove the backup cluster and it provides a point in time backup of the data.

[sourcecode language=”bash”]riak-repl start-fullsync[/sourcecode]

It’s easy to schedule full sync operations to low usage periods and it is also possible to pause and resume full sync operations.

[sourcecode language=”bash”]riak-repl resume-fullsync<br />riak-repl pause-fullsync[/sourcecode]

The variations on backing up data with Riak Enterprise and MDC are pretty expansive. Doing a point in time, maintaining a secondary live copy of the data, using the replication as a data dump to another cluster or even just using the MDC replication to dump all of the data to a single instance.

File System Snapshots

One other technique that is extremely efficient, fast and thorough is snapshotting the file system. The backup workflow for snapshots is extremely easy. First stop Riak, then snapshot, then start Riak again. Of all the methods, snapshotting is one of the easiest of the options. Just like setting up a cron job, automating snapshots based on some pre-defined schedule and meshing that with automated start and stop of Riak provides a very thorough backup.

With these options, have fun strategizing your stratagems into strategies for backups.

Diskettes

One of the oldest, tried and true backups is the old diskette. The bestest way to backup with diskettes is to backup each node on three diskettes each. The send one of each diskettes to a geographically dispersed to a bank lock box or other secure facility. Do this for each node, and if need be use as many diskettes for each node as needed. A particularly useful method is to use the sharded zip strategy to stripe a backup across many diskettes. Once each lock box has a copy of the node for each node in the cluster, you’ll have one of the most secure backups in existence. Nothing compares to the diskette backup!

References:

  1. Basho Docs – Backups
  2. Basho Docs – MDC Full Sync

Distributed Coding Prefunc: Rebar Multinode Riak Core

Before diving into this entry, you might want to check out some of my other getting Erlang installed with appropriate testing frameworks entries. Moving on…

At Basho we’re are always trying to make it easier to do big things. A short time ago we pushed forward on Rebar, Riak Core and getting things put together to make it simpler to get kick started working on distributed systems like the Riak Database & distributed system itself. There’s way more that is possible, which I’ll get into in just a minute. Before diving into some of those things, here’s a few quick links & context of what exactly Rebar and Riak Core are.

Riak Core
Github: https://github.com/basho/riak_core

Riak Core has been available for quite some time. We’ve also been hustling for a while getting together a robust array of material around Riak Core. One excellent place to get started on learning about Riak Core is the “Introducing Riak Core” blog article published on the Basho blog a while back. To describe Riak Core, or riak_core, it is the underpinnings of what Riak is built on. It provides many features to get you started building distributed systems. A few of the key features are being able to track and manage the nodes, clusters and related pieces of the distributed architecture within a system.

Rebar
Github: https://github.com/basho/rebar
Wiki: https://github.com/basho/rebar/wiki

Rebar is an Erlang build tool that helps you in putting together projects based on Riak Core.

Rebar Riak Core
Github: https://github.com/basho/rebar_riak_core

The Rebar Riak Core project repository template helps you start writing things like the Riak Database itself. It’s based on setting up Riak Core via template scripting an N Node Cluster devrel, vnodes, etc. Once you’re up and running it can be used to help develop distributed, scalable and fault tolerant applications.

For more on the Rebar Riak Core check out the README.md in the github repository. There are some great examples of how to get a multinode devrel running in a few steps.

Rebar Riak Core Quick Start

The quickest way to get started using the Riak Core and Rebar scripts is to get the prebuilt binaries or you can just clone and install the Rebar scripts if you’d like all the things. To get the binaries and executables you can download and have them ready by wget (or use your preferred method to download).

[sourcecode language=”bash”]
wget http://cloud.github.com/downloads/basho/rebar/rebar && chmod u+x rebar
[/sourcecode]

To get the cloned repository and ready for use.

[sourcecode language=”bash”]
$ git clone git://github.com/rebar/rebar.git
$ cd rebar
$ ./bootstrap
[/sourcecode]

Now the easiest way possible is to use the Riak Core Templates with a quick git clone. After cloning the repo, copy them to the rebar templates directory (note that you’ll need to create this initially) and then create a working directory to put the project in and navigate into that directory.

[sourcecode language=”bash”]
git clone git://github.com/rzezeski/rebar_riak_core.git
mkdir -p ~/.rebar/templates
cp rebar_riak_core/* ~/.rebar/templates
mkdir projectNameHere
cd projectNameHere
[/sourcecode]

Now that a template is available, run the following command to create the Erlang Project.

[sourcecode language=”bash”]
rebar create template=riak_core_multinode appid=rabbits nodeid=rabbits
[/sourcecode]

You’re now ready to go to work using Rebar and the template you’ve created. I followed the try-try-try example repo in the example above to get started, check it out for a great walk through that dives in deeper to Riak Core, each small element of the project and files created, and a multi-node project as the sample.

So what to do now?

This is where it is time to throw around some creativity to get real solutions to real problems. Building distributed systems is becoming more and more paramount to effective usage of infrastructure and systems. Using Riak Core to get started building out your distributed system is an ideal place to start. These are a few ideas that the team was brainstorming on. Over the coming weeks we’ll be putting together material to outline ways to not only get started, but to implement systems like this.

Distributed Web Caching Tier

Caching tiers often come up in conversations, whether related to distributed systems or not, and often end up on the distributed topic. The question resounds, “how do I create a caching tier that can be distributed and provide real session, state management, cached elements, live data and other needs?” Well Riak Core is a great place to start developing a custom distributed caching tier that could even extend to use Riak KV (the Riak Database implemented on Riak Core), Redis, Rabbit MQ or many other solutions by pulling them together to provide appropriate cache at the appropriate tiers of an application architecture.

In House Cluster Monitoring & Smart Resolver

One of the things that the Riak Core would be used to great effect for is a multi-node, clustered and geographically dispersed monitoring system for any multi-data center application. This could be built out and used for almost any actual environment, with custom specifics or a completely generic situation of pizza box servers. Because fo the distributed concepts behind Riak Core it would provide an ideal basis for monitoring – and re-launching or otherwise dealing with systems that need high uptime and recovered as fast as possible if they go down.

Logging, Web, Server, and Business Analytics

In any situation where analytics are collected there are often dozens if not thousands of servers, various systems and even numerous devices that may be emitting data via services or other mediums. Riak Core is a great place to lay the groundwork for a distributed system that could maintain a massive store of managed data for fast searching of analytics. This could be the groundwork for biotech research analytics, analysis of market data or a dozen other things that need highly available systems storing vast data with map reduction or other search capabilities. Think Business Intelligence (BI) with serious technological power.

Multi-node Project

Of course, as the example I used to create the first sample above, dive into the try-try-try tutorial for some great multi-node how to. If you have any questions please jump in ping me on twitter @adron or ping @basho, join the mailing list, the IRC #riak channel on freenode.

Light up a Riak Cluster with AWS, A Few Notes…

I wanted to write up an intro to getting Riak installed on AWS, even though the steps are absurdly simple and already available on the Basho Docs site, there’s a few extra notes that can be very helpful for a few specific points during the process.

Start off by logging into AWS. At this point you can take two different paths that are almost identical. You can follow the path of using the pre-built AWS Marketplace image of Riak, or just start form scratch. The difference is a total of about 2 steps; installing & setting some security port connections. I’m going to step through without using the prebuilt image in these instructions.

Security Group

First thing you’ll need to get a security group with the correct permissions setup. For that, you’ll need to make a security group.

NOTE: No, I didn’t mean to misspell Riak, but it’s in there now.  😉

Before adding the ports, go to the security group details tab and copy the security group id. I’ve pointed it out in the image above.

Now add the following three and assign the security group to the ports; 4369, 8099 & 6000-7999. For the source set it to the security group id. Once you get all three added the list should look like this (below). For each rule click the Add Rule button and remember to click the Apply Rule Changes. I often forget this because the screen on some of the machines I use only shows to the bottom of the Add Rule button, so you’ll have to scroll down to find the Apple Rule Changes button.

Now add the standard port 22 for SSH. Next get the final two of 8087 and 8098 setup and we’re ready for moving on to creating the virtual machines.

Server Virtual Machines

For creating virtual machines I just clicked on Launch Instance and used the classic wizard. From there you get a selection of items. I’ve used the AWS image to do this, but would actually suggest using a CentOS image of your choice or Red Hat Enterprise Linux (RHEL). Another great option is to use the Ubuntu 12.04 LTS. Really though, use whatever Linux version or distro you like, there are 1-2 step instructions for installing Riak on almost every distro out.

Next just launch a single instance. We’ll be able to launch duplicates of these further along in the process. I’ve selected a “Micro” here but I’m not intending to do anything with a remotely heavy load right now. At some point, I’ll upgrade this cluster to larger instances when I start putting it under a real load. I’ll have another blog entry to describe exactly how I do this too.

Keep hitting continue until you get to the key pair selection. Pick the key pair you want, either making a new one for this cluster or use one you already have. Either way works fine.

Continue again until you can select the security group that we created above.

Now keep hitting that continue button, until you get to launch, and launch this thing. Once the instance is launched launch your preferred SSH connection tooling. The easiest way I’ve found for getting the most current private IP to connect to with the appropriate command is to right click on the instance in the AWS Console and click on Connect. There you’ll find the command to connect via SSH.

Paste that in and hit enter in your SSH App, you’ll see something akin to this.

[sourcecode language=”bash”]
$ cd Codez/working-content/
$ ssh -i riaktionz.pem root@ec2-54-245-201-97.us-west-2.compute.amazonaws.com
The authenticity of host ‘ec2-54-245-201-97.us-west-2.compute.amazonaws.com (54.245.201.97)’ can’t be established.
RSA key fingerprint is 31:18:ac:1a:ac:fc:6e:6d:55:e8:8a:83:9a:8f:c7:5f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘ec2-54-245-201-97.us-west-2.compute.amazonaws.com,54.245.201.97′ (RSA) to the list of known hosts.
Please login as the user "ubuntu" rather than the user "root".
[/sourcecode]

Enter yes to continue connecting. For some instance types, like Ubuntu you’ll have to do some teaks to log into as “ubuntu” vs. “root” and the same goes for the AWS image or others. I’ll leave that to you, dear reader to get connected via ole’ SSH.

One of the other things, that you may have to do some tweaking about and googling, is figuring out the firewall setups on the various virtual machine images. For the RHEL you’ll want to turn off the firewall or open up the specific connection ports and such. Since the AWS firewall does this, it isn’t particularly important for the OS to continue running its firewall service. In this case, I’ve turned off the OS firewall and just rely on the AWS firewall. To turn off the RHEL firewall, execute the following commands.

[sourcecode language=”bash”]
[root@ip-x-x-x-x]# service iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables:[ OK ]
[root@ip-x-x-x-x]# service iptables stop
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
[root@ip-x-x-x-x]# chkconfig iptables off
[root@ip-x-x-x-x]#
[/sourcecode]

Now is a perfect time to start those other instances. Navigate into the AWS Console again and right click on the virtual machine instance you’ve created. On that menu select Launch More Like This.

Go through and check the configuration on each of these, make sure the firewall is turned off, etc. Then move on to the next step and install Riak and cluster them. So it’s time to get to the distributed, massively complex, extensive list of steps to install & cluster Riak. Ok, so that’s sarcasm.  😉

Step 1: Install Riak

Install Riak on each of the instances.

[sourcecode language=”bash”]
package=basho-release-6-1.noarch.rpm && \
wget http://yum.basho.com/gpg/$package -O /tmp/$package && \
sudo rpm -ivh /tmp/$package
sudo yum install riak
[/sourcecode]

NOTE: For other installation methods, such as directly downloading the RPM or other Linux OSes, check out the http://docs.basho.com/riak/latest/tutorials/installation/Installing-on-RHEL-and-CentOS/.

Step 2: Setup the Cluster

On the first instance, get the IP. You won’t need to do anything to this instance, just keep the IP handy. Then move on to the second instance and run the cluster command.

[sourcecode language=”bash”]
sudo riak-admin cluster join riak@<ip_of_the_first_node>
[/sourcecode]

Do this on each of the instances you’ve added, using that first node. When you’ve added them all, on that last instance (or really any of them) then run the plan. This will get you a display plan of what will take place when the cluster is committed.

[sourcecode language=”bash”]
sudo riak-admin cluster plan
[/sourcecode]

If that looks all cool. Commit the plan.

[sourcecode language=”bash”]
sudo riak-admin cluster commit
[/sourcecode]

Get a check of the cluster.

[sourcecode language=”bash”]
sudo riak-admin member_status
[/sourcecode]

That’s it all done. You know have a Riak Cluster. For more operations to try out your cluster, check out this list of base API Operations.

Not So Versus, Riak Versus Redis

Recently a friend of mine and fellow coder, harkening back to my Russell Investments Enterprise Developer days posed the discussion of Redis versus Riak. Well, first off, I thought I myself needed to write down a line by line comparison. I’ve worked with both in various ways but I’d never really thought about them lined up side by side. Often I think of the two pieces of technologies as complementary in various ways. But before I dive into that, let’s take a look at the stats side by side of each.

Company/Maintainer/Builder Basho Technologies @basho Salvatore Sanfilippo @antirez w/ VMware
Official Product Name Riak Redis
License Apache (link) BSD (link)
Storage Type Key Value Key Value
Protocols HTTP/RESTful & Custom Binaries  Telnet like / Proprietary
Replication/Clustering Masterless Master / Slave Replication
Language/Framework Erlang / C C / C++
Best Use Dynamo style architecture & concepts. Primarily used for extreme high availability. Best known and used for extremely fast access to quickly changing data and known size.
Key Feature Fault Tolerant Crazy Fast

Redis is primarily something you’re going to use to move data in and out at crazy fast speeds. However, when you need to store data, it isn’t ideal. There are ways, but it tends to work better handing off to something else to store the data. However Riak on the other hand isn’t always the fastest database, but it’ll withstand serious hits and still maintain integrity of data. It is also tunable for writes, reads and other characteristics that enable tuning and also integrity of the data among nodes. The more nodes in Riak you have the higher available iOPs and fault tolerance. The other thing that Riak can do over time, is truly scale from a horizontal and vertical perspective. Grow Riak tall and wide, it’ll give you linear performance and integrity improvements.

The thing I’ve seen over and over, is Riak as a store and Redis as a cache or other temporal specific data store for websites or other high transaction systems. Overall each serves a very specific purpose but work well in conjunction with each other when you’re rolling together an extremely high performance architecture with an extremely highly available back end data store.

If you’re in Seattle and up for lunch this Wednesday, join me for the Riak Nerd Lunch.

A Few Notes on Riak 1.3 RC

Full context – Riak 1.3 RC came out just a couple dozen hours ago. RC stands for release candidate, which in turn basically means that version 1.3 is complete and any other additions will be for quick fixes or any issues that crop up. I’ve just started rolling a few new systems myself with this new version and hope you’ll join me in taking a hack at it. Let’s jump into a few reason why you’d want to leap into 1.3. You can read about the features below via the release notes also, but I’ve turned them into smaller bit size chunks below.

  • Giddyup in action!
    Giddyup in action!

    The first thing with the latest v1.3 has been the massive effort put into testing via the riak_test and the giddyup repos. Ongoing there will be a much easier way to move forward in features & quality. This is one of the reasons I love working for Basho, the whole team isn’t about smoke and mirrors with testing, they readily and diligently work on testing. Which to add context, remember we’re talking about distributed systems here, which aren’t exactly the easiest thing to test. One doesn’t just merely walk in and write unit tests and assume a distributed systems is tested. This moves us forward, and those that want to contribute and get involved more heavily in Riak now have a platform to dive in confidently when using these testing repositories.

  • Active Anti-Entropy – Alright, now we’re getting to the features with bad ass sounding names. Also referred to as AAE, this feature grabs bad replica data and begins a correction through read repair to protect data. It’s one more layer of protection against any type of data loss, disaster, bit rot, etc).
  • MapReduce Sink Backpressure – This one reminds me of tuning when setting up forced induction, AKA a turbo on a car. But I digress, I’ve snagged a description from the release notes for this feature, “Riak Pipe brought inter-stage backpressure to Riak KV’s MapReduce system. However, prior to Riak 1.3, that backpressure did not extend to the sink. It was assumed that the Protocol Buffers or HTTP endpoint could handle the full output rate of the pipe. With Riak 1.3, backpressure has been extended to the sink so that those endpoint processes no longer become overwhelmed. This backpressure is tunable via a soft cap on the size of the sink’s buffer, and a period at which a worker should check that cap. These can be configured at the Riak console by setting application environment variables” ….suffice it to say this helps out with map reduce in certain situations.
  • Additional IPv6 Support – Riak Handoff and Protocol Buggers listen ala IPv6 now. Nuff’ said.
  • Luke removal – Luke is completely and utterly gone now. Dead. Don’t look for Luke here.
  • Riaknostic – This is now part of the default featureset instead of separate tooling.
  • SmartOS 1.8 Packages – They’re available.
  • Health Check – This is a pretty awesome system that’s been added. Basically it watches the system and enables and disables services based on conditions. It’s super easy, just flick the switch in the app.config.
    [sourcecode language=”yml”]
    {enable_health_checks, true}
    [/sourcecode]
  • Reset Bucket Properties – A quickie definition from the release notes “The HTTP interface now supports resetting bucket properties to their default values. Bucket properties are stored in Riak’s ring structure that is gossiped around the cluster. Resetting bucket properties for buckets that are no longer used or that are using the default properties can reduce the amount of gossiped data.”

There were also a lot of PRs and more that you can check out on Github. These are the main key features that are now available and ready for use in 1.3. Check em’ out, feel free to contact me or any of the team to ask questions, let us know your 2 cents or otherwise banter about. Cheers! Sometime in the coming days I’ll have a quick start, akin to what’s in the docs, but with some specific ops on some IaaS Providers. So keep reading, coming up soon.

Happy hacking!  \m/   \m/