Category Archives: Basho

Introducing Junction

Today I’ve officially kicked off a new project from my notebook of projects based around building a Riak admin, data manipulation, reporting and news tool for Windows 8. If you want to jump right to the project, here’s the Github Pages Site, the Github Junction Repo and eventually I’ll have it listed in the Windows 8 Store for download. Yes, it’ll be free as in beer, it’ll all be Apache 2.0 Licensed and the project is open to contributors and others that want to jump into things. There’s also a quick intro for how I setup the “Windows 8 Logos, Badges & Splash Screens of Riak“.

So now that I’ve provided the links, here’s a quick intro to each of the application sections, what this application is for, where the workflow for contributions will be and what the next steps are. Trust me, I roll easy, I’ll be working as hard as I can to make pull requests easy peasy, keep the issues down to workable contributions and the whole “this is a good OSS project”.

Riak Junction Application rocking on the Windows 8 desktop with a full tile!

Riak Junction Application rocking on the Windows 8 desktop with a full tile!

Juncture Divisions

The juncture application should be split into several key components, or application divisions of functionality. I’ve broken each out with a basic description. If you just want to watch a video where I outline each division, play the video below for a quick 5 minute intro to the application and the idea behind it all.

A quick run through of the first sample UI.

Call the Doctor! (Administration & Maintenance)

This part of the application would provide an interface for all the general administration and maintenance needs around individual nodes and around the overall cluster of nodes. The ability to add, remove and generally administer everything that is available via the riak-admin command line interface.

Time Travel That Data (Performance Benchmarking)

This section of the application will provide the ability to benchmark the timing of data in and out of a cluster. In addition it should show standard benchmarking similar to that which is offered with the basho_bench project.

Love of the Data (Reporting)

This division of the application would be focused on reporting. I’m not sure what exactly that would entail, but something with charts, graphs and pulling together trending points of some sort. If you have ideas and want to work on this part of the application, weigh in!

Golfing With Your Data (Query, Put, Deletes, Etc. Handling the CRUD)

The application will have an interface to provide access to add and remove data, as well as viewing the data that is available within a cluster. The primary means for implementing this part of the application will be with the CorrugatedIron Project. It’s a library available via Nuget that @peschkaj and @TheColonial have put together.

News! News! News! (News…  RSS Feed Reader)

The idea is that this will provide a quick and easy way to get familiar with Windows 8 dev and the project overall. I’m aiming to eat the Basho blog feed and provide it as key highlights for the application with future abilities around mining other RSS feeds or such and having those fed into a ??  Riak Cluster? Again, everything is open to change, addition or removal! So jump into the project and let me know your thoughts.

Cheers & Happy Hacking!

Farewell Basho, It’s Been Swell Yo!

Whew, it’s been a total blast working at Basho. I’ve accomplished a ton of things. Riak is a solid distributed database system and I’m glad to have worked with the team on advocating its use, teaching distributed systems ideas and concepts and generally spreading the knowledge. I’ve seen some truly great things that people are hacking together, setting up for projects and redesigning old systems to utilize newer, better, faster and more capable distributed systems concepts and ideas. Some of the things I’m happy to have contributed to in my time at Basho.

…and there has been a whole lot more. Suffice it to say, Basho has provided me with some sweet opportunities to work on some extremely interesting data projects from a very data sciency point of view (yeah I know sciency aint a word). There may be more Riak work and Riak meetups and Riak hacks and Riak who knows what coming from me, but the meetups & such are now at the hands of the core Riak crew and…

Where Am I Headed?

Right now, I’m moving 20 blocks away from where I currently live, setting up a couch to hack on and grabbing a beer. I’ve got a few personal projects I’ve been wanting to work on. Then I’m taking a few weeks to do some side projects that have been on the burner. Keep an eye out, I’ll be kicking off one, maybe two of these open source projects in the next few days. As @tsantero twitted…

…I’m going to attack my own notebook of ideas. Maybe I’ll even work on that Riak CS Video object store that Tom and I spoke about 10 months ago? Either way, whatever the projects are, I’ll have them posted right here. Until then…

Cheers & Happy Hacking!

OSCON : Conversations, Deployments, Architecture, Docker and the Future?

I wrote about my first day of OSCON “OSCON : Day 1, Windows Just Doesn’t Do Cloud Foundry… but, there’s a fix for that…“. The rest of the week was most excellent. I caught up with friends and past coworkers. I heard about people working on some amazing new projects. Some things I will try to write up in the coming days, as I’m sure some of it will be making the tech news (if not the regular people news too).

Conversations

Had some great conversations about the direction of enterprise and paas uptake. It’s great to hear that there is some movement in that space finally. As one would expect however, there is still a lot of distance for the enterprise to catch up on, but they’ll get there – or fall apart in the meantime.

There were also tons of conversation about the Indiegogo Ubuntu Edge mobile device. This device is a great looking and sounds like a solid idea. The questions arise in the fact that they’re working to make this a purely crowd funded project. This wouldn’t be a concern if they were trying to just get a few million in capital, but they’re aiming for $32 million! Overall though, with 128 GB, Dual LTE Antennas for Europe and the US, a top tier screen in quality and design, a metal body and also multiple other features that put this phone ahead of anything out there. I hope it’s successful, but I must admit my own hesitance. What’s your take on the device?

Deployments

Over the course of the conference I talked to and worked with a number of other individuals playing around with Cloud Foundry and also OpenShift. The primary aspect that we worked on was strategies around deployment of these PaaS Technologies.

We also worked with Iron Foundry to extend Cloud Foundry to support .NET. If you love .NET or hate .NET, wherever in that spectrum, it has an absolutely huge user base still. Primarily because .NET spent the last decade and a few years going head to head against Java in the Enterprise, and we all know the enterprise is slow to shift anything. So for now and the foreseeable future .NET is an extremely large part of the development world. Having it work in your PaaS is fundamental to gaining significant enterprise share. Cloud Foundry is the only open source, internally usable PaaS on the market today. There are closed source options available, but that obviously doens’t come up at OSCON.

While at OSCON, I also got to discuss architecture and deployment of Riak with a number of people. The usage of Riak continues to grow and the environments, use cases and tooling that people are using Riak with and for is always an interesting space for me. I also got to discuss deployment of Cassandra and even some Neo4j, Redis and Riak side by side deployments. People have used an interesting mix of NoSQL solutions out there to pull their respective data together for their needs.

Among all these deployments, conversations regularly returned to a known topic of mine. Cloud computing and who is capable of what, where and when. AWS is still an easy leader in cloud computing, not just in customers but in technology. This also brought up the concerns and apathy that some have around OpenStack (hat tip to Ben Kepes for the write up) working more homogeneously with AWS. Whatever the case might be, the path for OpenStack needs to be clarified regularly. I imagine the next movement is going to be away from being too concerned with infrastructure and increased concern with portability of applications and development of applications.

Another growing topic of discussion was around building applications for, on and with Windows Azure. Microsoft has actually become dramatically more involved in open source in an honest and more integrity based way. I’m honestly amazed at how far they’ve come from the declaration years ago that “open source is a cancer” and the all too famous, “linux is communism“. Whatever that was supposed to mean, they didn’t seem to get it back then. Now however, they regularly contribute to open source projects on codeplex but also github and other places. Microsoft has even contributed to the Linux kernal a few months ago.

That leads me to the next topic that came up a number of times…

Architecture

There’s been a lot of discussion about architecture around PaaS, containers (more on that in a moment), distributed systems in general and distributed databases. As I wrote about recently, “Architectural PaaS Cracks or Crack PaaS” the world of distributed systems and distributed databases has more than a few issues when working together in a PaaS environment. This brought up the discussion about what solutions exist today, solutions I look forward to writing and building in the coming months.

The most immediate solution to scalable data sources is still to run your operational data sources such as Neo4j, Redis, Riak or other database autonomously but residing close to your PaaS System. The current public PaaS Providers do exactly this and in some cases extend that to offer the databases and data sources as services through add-ons. These are currently great solutions, but require time, effort and custom development work when setting up internally.

This leads me to the last topic…

The Story of a Container – Docker

Well, not just Docker, but containers in general and Docker specifically. First some context about what a container is.

Container – In this particular context I’m writing about a container, or more specifically a runtime-container, that isolates resources for applications or services. Containers are common in PaaS technologies to help isolate the specific services or applications when they’re on a single physical machine or instance. For each of the respective PaaS systems that came up at OSCON we have dotCloud from the same team that created Docker, Cloud Foundry has Warden and OpenShift has gears and Red Hat Enterprise Linux OS specific containers.

I’ve studied Warden a little in the past while I was working with AppFog and Tier 3 around Cloud Foundry. Warden is a great piece of technology. However the star at OSCON was clearly Docker. I jumped into a number of conversations around Docker. This conversation would then take the direction to containers becoming the key to PaaS tooling and systems growth and increasing capabilities. That leads me back to my previous blog entry “Architectural PaaS Cracks or Crack PaaS” and one of the key solutions to the data tier issue.

Containers, A Solution for Scaling the Data Tier

One of the issues that comes up when trying to scale any distributed database in a PaaS Environment is how to provide multi-tenancy without spooling up new instances for each and every single installation of a node within that distributed database. Here’s an example diagram of the requirements behind a scalable distributed database.

Masterless, Distributed Cluster of Nodes

Masterless, Distributed Cluster of Nodes

In a default configuration you’d want each node to be running on a physical machine or dedicated virtual instance. This is for performance reasons as well as reasons for load balancing, security, data integrity and a host of others. This is the natural beginning state of a highly available distributed database or distributed system.

Trying to deploy something like this into a PaaS environment is tricky. Take into account that there is no such thing in application or service speak as an instance, and especially not anything such as a physical server. The real division between process and resources are containers. These containers are what actually needs to run the distributed system node. This becomes possible, if a distributed system node can be deployed to and executed from within a container.

Enter Docker

After reviewing Docker, the capabilities around it and the requirements of a distributed database, it looks like an ideal marriage of the two technologies. Already Docker has Redis and other database technologies running on it. The Container technology around Docker looks like an ideal fit to extend distributed systems to run autonomously of a single physical machine or single instance per node. This would enable nodes to be deployed as resources are available to provide a more seamless and PaaS style deployment for systems like Cassandra, Riak and related distributed systems. Could this be the next evolution of affordable distributed systems, containers to the rescue?

I’ll be reporting back on my progress, this could be cool!

Stay tuned for a write up on Docker in the near future. For more information now check out http://www.docker.io.

Riak 1.4 – A Few Notes, Notices & Thoughts…

The release notes for Riak 1.4 can be found via github

Two things have worked together that made me want to write up the new Riak 1.4 features. With Riak 1.4 hitting the streets and the work I’ve been doing with CorrugatedIron there are a few features that are going to add icing the cake. If you want to dive more into the release, check out the release notes. If you’re interested in the .NET Client CorrugatedIron, check it out here or check out the code on github. Now on to the client APIs.

riak-attach changed to not nuke a node

So when issuing the attach command like…

riak attach

…the command attaches to the named pipe to communicate with the running erlang nodes. Now when you hit Ctrl-C it kills just the pipe versus killing the pipe and riak node that you’re on. This is something that has bit me in the keister more than a few times. Bringing down a node or two while working on viewing what is going on with a node. This leads me to the next enhancement.

riak-admin transfers

If you’re using riak_kv_bitcask_backend, riak_kv_eleveldb_backend or riak_kv_memory_backend the riak-admin transfers command now shows per-transfer progress and displays long node names better. Giving you a better idea of what and where things are going. The way this is reported depends slightly on the specific back end. For bitcask or in memory back end the progress is calculated by the keys already transferred out of the total keys, where as the level DB back end calculates based on bytes transferred. Based on this the level DB calculation can get slightly off over time.

Protocol Buffers & Multiple Interface Binding

Protocol Buffers can now bind to multiple ports and interfaces, so clients such as CorrugatedIron for .NET (http://corrugatediron.org/), Riakjs (http://riakjs.com/) can now bind to the Protocol Buffers outside of the set configuration. For more on Riak configuration around the binding, check out the Basho Docs (http://docs.basho.com/riak/latest/references/Configuration-Files/). This also brings feature parity around interface binding equal to that of the HTTP interfaces. This changes the pb_port and pb_ip to a single pb setting which is now a list of IP and port pairs.

Total radness in paging, of 2i

Secondary indexes now have results available via pagination. Check out this PR for bunches more info.

Client-specified Timeouts

Milliseconds can now be assigned to a timeout value for clients. This can be used for object manipulation around fetch, store and delete, listing buckets or keys. This takes care of some time out issues that may have been occurring during certain types of requests. This will come in handy for asynchronous and pivotal if anyone goes the synchronous route.

Bucket Properties for Protocol Buffers

If you’re needing to reset a bucket to it’s defaults, this is now possible. Besides a reset to defaults all bucket properties are now usable for protocol buffer usage. This can definitely help client usage of protocol buffers in a dramatic way.

List-buckets Streaming – Realtime

Listing keys or buckets via a streaming request will send bucket names to the client as received. This prevents any need to wait for a request from all nodes to respond. This can help with response time and time outs from the client point of view. This gives the ability to use the streaming features with Node.js, C#, Java and other languages and frameworks that support realtime streaming data feeds.

…these are the features that have jumped out at me, so until next release.

Conference Recap – The awe inspiring quality & number of conferences in Cascadia!

Rails 2013 Conf (April 29th-May 1st)

The Rails 2013 Conference kicked off for me, with a short bike ride through town to the conference center. The Portland conference center is one of the most connected conference centers I’ve seen; light rail, streetcar, bus, bicycle boulevards, trails & of course pedestrian access is all available. I personally have no idea if you can drive to it, but I hear there is parking & such for drivers.

Streetcars

Streetcars

Rails Conf however clearly places itself in the category of a conference of people that give a shit! This is evident in so many things among the community, from the inclusive nature creating one of the most diverse groups of developers to the fact they handed out 7 day transit passes upon picking up your Rails Conf Pass!

Bikes!

Bikes!

The keynote was by DHH (obviously right?). He laid out where the Rails stack is, some roadmap topics & drew out how much the community had grown. Overall, Rails is now in the state of maintain and grow the ideal. Considering its inclusive nature I hope to see it continue to grow and to increase options out there for people getting into software development.

Railsconf 2013

Railsconf 2013

I also met a number of people while at the conference. One person I ran into again was Travis, who lives out yonder in Jacksonville, Florida and works with Hashrocket. Travis & I, besides the pure metal, have Jacksonville as common stomping ground. Last year I’d met him while the Hash Rocket Crew were in town. We discussed Portland, where to go and how to get there, plus what Hashrocket has been up to in regards to use around Mongo, other databases and how Ruby on Rails was treating them. The conclusion, all good on the dev front!

One of these days though, the Hashrocket crew is just gonna have to move to Portland. Sorry Jacksonville, we’ll visit one day. 😉

For the later half of the conferene I actually dove out and headed down for some client discussions in the country of Southern California. Nathan Aschbacher headed up Basho attendance at the conference from this point on. Which reminds me, I’ve gotta get a sitrep with Nathan…

RICON East (May 13th & 14th)

RICON East

RICON East

Ok, so I didn’t actually attend RICON East (sad face), I had far too many things to handle over here in Portlandia – but I watched over 1/3rd of the talks via the 1080p live stream. The basic idea of the RICON Conferences, is a conference series focused on distributed systems. Riak is of course a distributed database, falling into that category, but RICON is by no means merely about Riak at all. At RICON the talks range from competing products to acedemic heavy hitting talks about how, where and why distributed systems are the future of computing. They may touch on things you may be familiar with such as;

  • PaaS (Platform as a Service)
  • Existing databases and how they may fit into the fabric of distributed systems (such as Postgresql)
  • How to scale distributed across AWS Cloud Services, Azure or other cloud providers
RICON East

RICON East

As the videos are posted online I’ll be providing some blog entries around the talks. It will however be extremely difficult to choose the first to review, just as RICON back in October of 2012, every single talk was far above the modicum of the median!

Two immediate two talks that stand out was Christopher Meiklejohn’s @cmeik talk, doing a bit o’ proofs and all, in realtime off the cuff and all. It was merely a 5 minute lightnight talk, but holy shit this guy can roll through and hand off intelligence via a talk so fast in blew my mind!

The other talk was Kyle’s, AKA @aphry, who went through network partitions with databases. Basically destroying any comfort you might have with your database being effective at getting reads in a partition event. Kyle knows his stuff, that is without doubt.

There are many others, so subscribe keep reading and I’ll be posting them in the coming weeks.

Node PDX 2013 (May 16th & 17th)

Horse_js and other characters, planning some JavaScript hacking!

Horse_js and other characters, planning some JavaScript hacking!

Holy moley we did it, again! Thanks to EVERYBODY out there in the community for helping us pull together another kick ass Node PDX event! That’s two years in a row now! My fellow cohort of Troy Howard @thoward37 and Luc Perkins @lucperkins had hustled like some crazed worker bees to get everything together and ready – as always a lot always comes together the last minute and we don’t get a wink of sleep until its all done and everybody has had a good time!

Node PDX Sticker Selection was WICKED COOL!

Node PDX Sticker Selection was WICKED COOL!

Node PDX, it’s pretty self descriptive. It’s a one Node.js conference that also includes topics on hardware, javascript on the client side and a host of other topics. It’s also Portland specific. We have Portland Local Roasted Coffee (thanks Ristretto for the pour over & Coava for the custom roast!), Portland Beer (thanks brew capital of the world!), Portland Food (thanks Nicolas’!), Portland DJs (thanks Monika Mhz!), Portland Bands and tons of Portland wierdness all over the place. It’s always a good time! We get the notion at Node PDX, with all the Portlandia spread all over it’s one of the reasons that 8-12 people move to and get hired in Portland after this conference every year (it might become a larger range, as there are a few people planning to make the move in the coming months!).

A wide angle view of Holocene where Node PDX magic happened!

A wide angle view of Holocene where Node PDX magic happened!

The talks this year increased in number, but maintained a solid range of topics. We had a node.js disco talk, client side JavaScript, sensors and node.js, and even heard about people’s personal stories of how they got into programming JavaScript. Excellent talks, and as with RICON, I’ll be posting a blog entry and adding a few penny thoughts of my own to each talk.

Polyglot Conference 2013 (May 24th Workshops, 25th Conference)

Tea & Chris kick off Polyglot Conference 2013!

Tea & Chris kick off Polyglot Conference 2013!

A smiling crowd!

A smiling crowd!

Polyglot Conference was held in Vancouver again this year, with clear intent to expand to Portland and Seattle in the coming year or two. I’m super stoked about this and will definitely be looking to help out – if you’re interested in helping let me know and I’ll get you in contact with the entire crew that’s been handling things so far!

Polyglot Conference itself is a yearly conference held as an open spaces event. The way open space conferences work is described well on Wikipedia were it is referred to as Open Spaces Technology.

The crowds amass to order the chaos of tracks.

The crowds amass to order the chaos of tracks.

The biggest problem with this conference, is that it’s technically only one day. I hope that we can extend it to two days for next year – and hopefully even have the Seattle and Portland branches go with an extended two day itenerary.

A counting system...

A counting system…

This year the break out sessions that that I attended included “Dev Tools”, “How to Be a Better Programmer”, “Go (Language) Noises”, other great sessions and I threw down a session of my own on “Distributed Systems”. Overall, great time and great sessions! I had a blast and am looking forward to next year.

By the way, I’m not sure if I’ve mentioned this at the beginning of this blog entry, but this is only THE BEGINNING OF SUMMER IN CASCADIA! I’ll have more coverage of these events and others coming up, the roadmap includes OS Bridge (where I’m also speaking) and Portland’s notorious OSCON.

Until the next conference, keep hacking on that next bad ass piece of software, cheers!

Backup Riak – Learning About Distributed Databases :: Issue 001

I’ve got more than a few series in the queue, so why not another one eh! The intent is, I’ll grab a specific topic to break down and add details to related to distributed systems, primarily around Riak. I will however diverge into other distributed databases too, but I’ll primarily be sticking to Riak. Without more introduction, the first topic is…

Backing Up and Recovery of Riak (Nodes)

I’ve been asked approximately 423,983,321.7 zillion times how this is done. So here’s a quick summary and respective links to the best ways to backup Riak, how to recover nodes.

When backing up Riak there are two key things that need copied to the backup storage; the ring and data directories. Each of these things are specific based on the backend used with Riak. In addition to the core backup containing the ring and data, another good thing to backup is the configuration directory. When recovering this comes in useful.

For the locations of data, it depends slightly based on the operating system being used. The two big variances are OS-X and Linux Distros. On OS-X the data path, ring data and configuration are located at the locations listed below:

  • Bitcask data: ./data/bitcask
  • LevelDB data: ./data/leveldb
  • Ring data: ./data/riak/ring
  • Configuration: ./etc

For each specific distro, there are slight variations on where the locations are, for a full list check out the Basho Riak docs on backups. But on Linux distros the paths are as follows:

Debian and Ubuntu

  • Bitcask data: /var/lib/riak/bitcask
  • LevelDB data: /var/lib/riak/leveldb
  • Ring data: /var/lib/riak/ring
  • Configuration: /etc/riak

Fedora and RHEL

  • Bitcask data: /var/lib/riak/bitcask
  • LevelDB data: /var/lib/riak/leveldb
  • Ring data: /var/lib/riak/ring
  • Configuration: /etc/riak

Other Operating System Paths

Freebsd

  • Bitcask data: /var/db/riak/bitcask
  • LevelDB data: /var/db/riak/leveldb
  • Ring data: /var/db/riak/ring
  • Configuration: /usr/local/etc/riak

SmartOS

  • Bitcask data: /var/db/riak/bitcask
  • LevelDB data: /var/db/riak/leveldb
  • Ring data: /var/db/riak/ring
  • Configuration: /opt/local/etc/riak

Solaris

  • Bitcask data: /opt/riak/data/bitcask
  • LevelDB data: /opt/riak/data/leveldb
  • Ring data: /opt/riak/ring
  • Configuration: /opt/riak/etc

When backing things up, it’s important to note that each node could have slightly inconsistent data. The data however is rebuilt by the Riak read-repair system once it is recovered and brought into use.

Backup Jobs

One of the easiest ways to backup Riak is to setup a cron job with your choice of cp, rsync or tar. Then just get those files onto whatever your choice of backup medium. An example tar cron job to backup a Bitcask backend is shown below (snagged from the documentation) just to give you an idea of where to start.

tar -czf /mnt/riak_backups/riak_data_`date +%Y%m%d_%H%M`.tar.gz /var/lib/riak/bitcask /var/lib/riak/ring /etc/riak

For a leveldb back end the most important thing to note is that the node must be stopped. The basic workflow of backing up a node in this manner is to stop the node, backup the data, ring and configuration and then start the node back up.

Backup Recovery / Restoring

When recovering data on a node that is replacing an existing node that has the same name (fully qualified or IP) then follow the steps below:

  1. Install Riak
  2. Restore the old node’s configuration, data & ring.
  3. Start the node

Once you’ve got the node started back up it’s a good idea to do a ping or status against the node to verify it is in a good state.

If node names have been changed there are additional steps.

  1. Mark the original instance down
    riak-admin down 
  2. Join the restored cluster  
    riak-admin join 
  3. Replace the original with 
    riak-admin cluster force-replcae  
  4. Get the cluster plan built 
    riak-admin cluster plan
  5. Commit the changes 
    riak-admin cluster commit
  6. Change the -name setting in the vm.args configuration file to match the new name.
  7. Change & verify that the IP reflects the instances IP in the app.config for http and protocol buffer interfaces.

Cluster Backups via Riak Enterprise Multi-Data Center (MDC)

In the above sections I wrote about the traditional backup approaches. This is very similar to the way RDBMS are backed up. However, with a distributed system like Riak there is another great alternative if you’re utilizing multiple datacenters and Enterprise Riak. In this version of Riak, which is basically Riak with additional features and capabilities, one of the possible backup scenarios is to use the Multi-Data Center, or MDC, to replicate a duplicate cluster and use it as an active, real-time and always ready backup.

One workflow that is an exceptionally effective way to provide backups is to setup the “backup” cluster beside the current operative cluster. As an example, if your cluster is operational in AWS and it is running in X region and Y zone then you’d want to put the backup cluster in that same region and zone. Once you’ve setup Riak Enterprise and MDC, then just setup a full sync. Once the full sync is done you can then remove the backup cluster and it provides a point in time backup of the data.

riak-repl start-fullsync

It’s easy to schedule full sync operations to low usage periods and it is also possible to pause and resume full sync operations.

riak-repl resume-fullsync<br />riak-repl pause-fullsync

The variations on backing up data with Riak Enterprise and MDC are pretty expansive. Doing a point in time, maintaining a secondary live copy of the data, using the replication as a data dump to another cluster or even just using the MDC replication to dump all of the data to a single instance.

File System Snapshots

One other technique that is extremely efficient, fast and thorough is snapshotting the file system. The backup workflow for snapshots is extremely easy. First stop Riak, then snapshot, then start Riak again. Of all the methods, snapshotting is one of the easiest of the options. Just like setting up a cron job, automating snapshots based on some pre-defined schedule and meshing that with automated start and stop of Riak provides a very thorough backup.

With these options, have fun strategizing your stratagems into strategies for backups.

Diskettes

One of the oldest, tried and true backups is the old diskette. The bestest way to backup with diskettes is to backup each node on three diskettes each. The send one of each diskettes to a geographically dispersed to a bank lock box or other secure facility. Do this for each node, and if need be use as many diskettes for each node as needed. A particularly useful method is to use the sharded zip strategy to stripe a backup across many diskettes. Once each lock box has a copy of the node for each node in the cluster, you’ll have one of the most secure backups in existence. Nothing compares to the diskette backup!

References:

  1. Basho Docs – Backups
  2. Basho Docs – MDC Full Sync

Distributed Coding Prefunc: Rebar Multinode Riak Core

Before diving into this entry, you might want to check out some of my other getting Erlang installed with appropriate testing frameworks entries. Moving on…

At Basho we’re are always trying to make it easier to do big things. A short time ago we pushed forward on Rebar, Riak Core and getting things put together to make it simpler to get kick started working on distributed systems like the Riak Database & distributed system itself. There’s way more that is possible, which I’ll get into in just a minute. Before diving into some of those things, here’s a few quick links & context of what exactly Rebar and Riak Core are.

Riak Core
Github: https://github.com/basho/riak_core

Riak Core has been available for quite some time. We’ve also been hustling for a while getting together a robust array of material around Riak Core. One excellent place to get started on learning about Riak Core is the “Introducing Riak Core” blog article published on the Basho blog a while back. To describe Riak Core, or riak_core, it is the underpinnings of what Riak is built on. It provides many features to get you started building distributed systems. A few of the key features are being able to track and manage the nodes, clusters and related pieces of the distributed architecture within a system.

Rebar
Github: https://github.com/basho/rebar
Wiki: https://github.com/basho/rebar/wiki

Rebar is an Erlang build tool that helps you in putting together projects based on Riak Core.

Rebar Riak Core
Github: https://github.com/basho/rebar_riak_core

The Rebar Riak Core project repository template helps you start writing things like the Riak Database itself. It’s based on setting up Riak Core via template scripting an N Node Cluster devrel, vnodes, etc. Once you’re up and running it can be used to help develop distributed, scalable and fault tolerant applications.

For more on the Rebar Riak Core check out the README.md in the github repository. There are some great examples of how to get a multinode devrel running in a few steps.

Rebar Riak Core Quick Start

The quickest way to get started using the Riak Core and Rebar scripts is to get the prebuilt binaries or you can just clone and install the Rebar scripts if you’d like all the things. To get the binaries and executables you can download and have them ready by wget (or use your preferred method to download).

wget http://cloud.github.com/downloads/basho/rebar/rebar && chmod u+x rebar

To get the cloned repository and ready for use.

$ git clone git://github.com/rebar/rebar.git
$ cd rebar
$ ./bootstrap

Now the easiest way possible is to use the Riak Core Templates with a quick git clone. After cloning the repo, copy them to the rebar templates directory (note that you’ll need to create this initially) and then create a working directory to put the project in and navigate into that directory.

git clone git://github.com/rzezeski/rebar_riak_core.git
mkdir -p ~/.rebar/templates
cp rebar_riak_core/* ~/.rebar/templates
mkdir projectNameHere
cd projectNameHere

Now that a template is available, run the following command to create the Erlang Project.

rebar create template=riak_core_multinode appid=rabbits nodeid=rabbits

You’re now ready to go to work using Rebar and the template you’ve created. I followed the try-try-try example repo in the example above to get started, check it out for a great walk through that dives in deeper to Riak Core, each small element of the project and files created, and a multi-node project as the sample.

So what to do now?

This is where it is time to throw around some creativity to get real solutions to real problems. Building distributed systems is becoming more and more paramount to effective usage of infrastructure and systems. Using Riak Core to get started building out your distributed system is an ideal place to start. These are a few ideas that the team was brainstorming on. Over the coming weeks we’ll be putting together material to outline ways to not only get started, but to implement systems like this.

Distributed Web Caching Tier

Caching tiers often come up in conversations, whether related to distributed systems or not, and often end up on the distributed topic. The question resounds, “how do I create a caching tier that can be distributed and provide real session, state management, cached elements, live data and other needs?” Well Riak Core is a great place to start developing a custom distributed caching tier that could even extend to use Riak KV (the Riak Database implemented on Riak Core), Redis, Rabbit MQ or many other solutions by pulling them together to provide appropriate cache at the appropriate tiers of an application architecture.

In House Cluster Monitoring & Smart Resolver

One of the things that the Riak Core would be used to great effect for is a multi-node, clustered and geographically dispersed monitoring system for any multi-data center application. This could be built out and used for almost any actual environment, with custom specifics or a completely generic situation of pizza box servers. Because fo the distributed concepts behind Riak Core it would provide an ideal basis for monitoring – and re-launching or otherwise dealing with systems that need high uptime and recovered as fast as possible if they go down.

Logging, Web, Server, and Business Analytics

In any situation where analytics are collected there are often dozens if not thousands of servers, various systems and even numerous devices that may be emitting data via services or other mediums. Riak Core is a great place to lay the groundwork for a distributed system that could maintain a massive store of managed data for fast searching of analytics. This could be the groundwork for biotech research analytics, analysis of market data or a dozen other things that need highly available systems storing vast data with map reduction or other search capabilities. Think Business Intelligence (BI) with serious technological power.

Multi-node Project

Of course, as the example I used to create the first sample above, dive into the try-try-try tutorial for some great multi-node how to. If you have any questions please jump in ping me on twitter @adron or ping @basho, join the mailing list, the IRC #riak channel on freenode.