A few days ago Troy Howard, Jeremiah Peschka and I all traveled via Amtrak Cascades up to Seattle. The mission was simple, Jeremiah was presenting “Riak in a .NET World”, I was handling logistics and Troy was handling video.
So I took the video that Troy shot, I edited it, put together some soundtrack to it and let Jeremiah’s big data magic shine. He covers the basics around RDBMSes, SQL Server in this case but easily it applies to any RDBMS in large part. These basics bring us up to where and why an architecture needs to shift from an RDBMS solution to a distributed solution like Riak. After stepping through some of the key reasons to move to Riak, Jeremiah walks through a live demo of using CorrugatedIron, the .NET Client for Riak (Github repo). During the walk through he covers the specific characteristics of how CorrugatedIron interacts with Riak through indexs, buckets and during puts and pulls of data.
Toward the end of the video Joseph Blomstedt @jtuple, Troy Howard @thoward37, Jeremiah Peschka @peschkaj, Clive Boulton @iC and Richard Turner @bitcrazed. Also note, I’ve enabled download for this specific video since it is actually a large video (1.08GB total). So you may want to download and watch it if you don’t have a super reliable high speed internet connection.
Whew, it’s been a total blast working at Basho. I’ve accomplished a ton of things. Riak is a solid distributed database system and I’m glad to have worked with the team on advocating its use, teaching distributed systems ideas and concepts and generally spreading the knowledge. I’ve seen some truly great things that people are hacking together, setting up for projects and redesigning old systems to utilize newer, better, faster and more capable distributed systems concepts and ideas. Some of the things I’m happy to have contributed to in my time at Basho.
I helped negotiate and get an effort started that came to fruition with Tier 3 releasing a Riak CS backed object store for their customers. A very cool feature added to their already formidable enterprise cloud offerings. Read more about that here “Tier 3 Object Storage” and here “Tier 3 Launches Global Object Storage“. Their implementation is really nice, including many geographic regions of accessibility, S3 API compatibility and high end storage capabilities that offer a bigger punch of performance than your average object storage in the cloud!
I launched both the Seattle Riak (up to 52 members now!) and Portland Riak (up to 74 members now!!) groups started, which you should join, they’re a good time, good conversation and great information.
I partnered with Troy Howard @thoward37 to run the second year of Node PDX. Basho was excellent enough to contribute not just a few bucks but also sent Chris Meiklejohn @cmeik out to speak at the conference.
I got to work directly with a number of people at Windows Azure, AWS and EngineYard in deploying Riak, testing out how the respective images (azure VM Depot & AWS AMI) and deployments (Riak EDS) would work. In the end, this has been a great opportunity to learn more about the latest and greatest of each of these services. I’ve been impressed as they’ve each been doing a seriously kick ass job lately!
…and there has been a whole lot more. Suffice it to say, Basho has provided me with some sweet opportunities to work on some extremely interesting data projects from a very data sciency point of view (yeah I know sciency aint a word). There may be more Riak work and Riak meetups and Riak hacks and Riak who knows what coming from me, but the meetups & such are now at the hands of the core Riak crew and…
Where Am I Headed?
Right now, I’m moving 20 blocks away from where I currently live, setting up a couch to hack on and grabbing a beer. I’ve got a few personal projects I’ve been wanting to work on. Then I’m taking a few weeks to do some side projects that have been on the burner. Keep an eye out, I’ll be kicking off one, maybe two of these open source projects in the next few days. As @tsantero twitted…
i wish i had the time to work on even 5% of the ideas in my notebook :
…I’m going to attack my own notebook of ideas. Maybe I’ll even work on that Riak CS Video object store that Tom and I spoke about 10 months ago? Either way, whatever the projects are, I’ll have them posted right here. Until then…
As I’ve started working on a Windows 8 project here at Basho, there are a few pieces of collateral that help out to bring some branding and appeal to the application appearance. The first two things to grab if you want build a good looking Riak + Windows 8 application are the assets for design.
Basho Design Assets : This includes several transparent images for the Basho & Riak, Riak CS and Riak Enterprise logos. Toward the bottom of the page there are also a number of Bashomen that you can download.
Windows 8 Store Design Assets : This page includes a lot of downloadable Photoshop & other design assets for putting together a Windows 8 user interface and experience.
Windows 8 Design Guidelines : This page shows how the interactions on Windows 8 are supposed to be used, developed around and what their best use is.
To check out the design assets I’ve put together I’ve created a github repository junction_design_assets. In this repository are all of the Adobe Photoshop *.psd files for each logo, wide logo, small logo, store logo, badge logo and splash screen image for the upcoming application. I’ve also attached an Apache 2.0 license to this for use by others. All of the images created had a transparent background, and are set to display against a black or other dark background.
Here’s a quick intro where all of these design assets go, via Visual Studio 2012. The first step was to create a good splash screen, as defined by the Windows 8 guidelines for images at sizes of 1116x540px, 868x420px and 620x300px.
Setting up the Splash Screen (Click for full size image)
A: Open the Package.appmanifest file and the designed that is displayed above will show a screen used for editing the file.
B: In this case I changed the background color to black since I created all of the images to display on a dark background. One can’t get any darker than the darkest black, and this is that black!
C: When you click on the elipsis to upload each of the images it will upload the image to the Assets folder within the project. With that in mind DO NOT manually put the images into the project. Use this screen or you’ll end up with all sorts of headaches.
Also important is the default SplashScreen.png that is put into the Spash screen: text box. Don’t fill this in first, instead click each ellipsis under each splash screen image size and select each image for each size. The odd thing is (one of those completely non-intuitive things) is that the Assets\SplashScreen.png file doesn’t actually exist in the Assets directory, but is instead created somehow through pixie dust magic and added to the project. This will be the same for other sections of the Package.appmanifest file settings and logo images that are to be added. So remember to NOT change this but to instead upload all of the other images required first.
The second step then was to get a good logo as defined by the guidelines at 270x270px, 210x210px, 150x150px and 120x120px. For this I went with the Riak Graphic Logo with images as shown at the different sizes.
Adding the Logo (Click for full size image)
Next up is the Wide Logo…
Wide Logo (Click for full size image)
The Small Logo…
Small Logo (Click for full size image)
…and the Store Logo.
Store Logo (Click for full size image)
This is the first step, which is where I’m at so far with this application. The application is also available with an Apache 2.0 license with the code on github. If you’d like to jump into the code and help me build this application please feel free to reach out to me twitter @adron or hit me on Github @adron.
This week I’ve traveled to Philadelphia to meet with a number of the Basho team to work together and receive training with the trainers or the best ways to approach content on Riak and more generally the best ways we can all brainstorm up to approach specific topics. Some of those topics include things like:
Access Patterns around Log Storage & Analysis
Bloom Filters
CRDTs
Consensus Protocols
Erasure Coding
LevelDB and Bitcask Backends
MDC Repl
Out of the options we discussed in training today I ran with benchmarking. It is always near and dear to many of the customers’, clients’ and curious’ that I talk to. I dove in to see what exactly we offer with basho_bench (docs info, github repo) in detail and functionality, but also dove into other benchmarks are out there that others may have run in the past.
basho_bench
What exactly is basho_bench? The basho_bench project is a code repo on Github that offers a set of benchmarking tests that are run against a Riak cluster. There are a few prerequisites to the quick steps below, the prerequisites are:
[sourcecode language=”bash”]
git clone git://github.com/basho/basho_bench.git
cd basho_bench
make all
[/sourcecode]
Once that is done building, review the directory structure that is in the basho_bench directory. The following should be available in the directory.
[sourcecode language=”bash”]
$ ls
FAQ deps rebar
LICENSE ebin rebar.config
Makefile examples src
README.org include tests
basho_bench priv
[/sourcecode]
The examples directory has several default config files available to run with basho_bench for testing. If there is a devrel setup with the default 127.0.0.1 IP usage, just run the following command to begin generating stats. If the cluster being tested is not a devrel with 127.0.0.1 then give the configuration section of the docs a read for information on how to point basho_bench at an alternative cluster.
The reason I post both is that ‘make results‘ doesn’t seem to always work to build the results and the manual execution will actually get the results built. With the results built, check the tests directory in the basho_bench directory for the summary.png file. If you open the file it should look something like this.
Default empty http.config results from basho_bench. (Click for full size image)
From here you can now run basho_bench and get the results that are specific to basho_bench. However, this now leads me to a higher abstract topic of why do benchmarking in the first place.
Why Benchmark? How to Benchmark!
The definition for benchmark,
bench·mark
[bench-mahrk] noun
1.a standard of excellence, achievement, etc., against which similar things must be measured or judged: The new hotel is a benchmark in opulence and comfort.
2.any standard or reference by which others can be measured or judged: The current price for crude oil may become the benchmark.
3.Computers. an established point of reference against which computers or programs can be measured in tests comparing their performance, reliability, etc.
4.Surveying. Usually, bench mark. a marked point of known or assumed elevation from which other elevations may be established. Abbreviation: BM adjective
5.of, pertaining to, or resulting in a benchmark: benchmark test,benchmark study.
While basho_bench provides an interesting baseline test that shows various pieces of data to work with, it shows nothing by default that is specific to YOUR use case. The basho_bench is not ideal for your production environment, it is not your dev or user acceptance testing or test criteria, it is an example. To truly get effective numbers that really encompass your needs for your project you will need to provide custom configuration for basho_bench or write your own specific benchmark.
The reason behind this is, with Riak as with other NoSQL solutions, is that you’re working toward a goal that is very data specific and unknown. It has specific domain logic and criteria that is specific to the use case, a custom benchmark can provide real data related to that domain logic and criteria.
In the end, even though basho_bench is a great tool to get started, do basic tests, and a great project to get ideas from it is not the panacea benchmark. You’ll need to create the specific benchmark for your use case yourself.
Over the last couple years there have been two prominent open source PaaS Solutions come onto the market. Cloud Foundry & OpenShift. There’s been a lot of talk about these plays and the talk has slowly but steadily turned into traction. Large enterprises are picking these up and giving their developers and operations staff a real chance to make changes. Sometimes disruptive in a very good way.
However, with all the grandeur I’m going to hit on the negatives. These are the missing parts, the serious pain points beyond just some little deployment nuisance. Then a last note on why, even amidst the pain points, you still need to make real movement with PaaS tooling and technologies.
Negative: The Data Story is Lacking
Both Cloud Foundry and OpenShift have a way to plug into databases easily.
OpenShift has what are called Cartridges which provide the ability to add databases and other services into the system. For more information about the cartridges check out Red Hat’s OpenShift Documentation and also the forums.
Cloud Foundry and OpenShift however have distinctive weak spots when it comes to services that go beyond a mere single instance database. In the case of a true distributed database such as Cassandra, HBase or Riak, it is inordinately difficult to integrate a system that any PaaS inter-operates with well. In some cases it’s irrelevant to even try.
The key problem being that both of the PaaS systems assume the mantle of master while subjugating the distributed database a lower tier of coordination. The way to resolve this at the moment is to do an autonomous installation of Riak, Cassandra, Neo4j or other database that may be distributed, stored hot swappable, or otherwise spread across multiple machine or instance points. Then create a bound connection between it and the PaaS Application that is hosted. This is the big negative in PaaS systems and tooling right now, the data story just doesn’t expand well to the latest in data and database technologies. I’ll elaborate more about this below.
Negative: Deployment is Sometimes Easy, Maintenance is Sometimes Hard
Cloud Foundry is extremely rough to deploy, unless you use Bosh to deploy to either VMware Virtualized instances or AWS. Now, you could if resources were available get Bosh to deploy your Cloud Foundry environment anywhere you wanted. However, that’s not easy to do. Bosh is still a bit of a black box. I myself along with others in the community are working to document Bosh, but it is slow going.
OpenShift is dramatically easier to deploy, but is missing a few key pieces once deployed that draw some additional operational overhead. One of those is that OpenShift requires more networking management to handle routing between various parts of the PaaS Ecosystem.
Overall, this boils down to what you need between the two PaaS tool chains. If you want Cloud Foundry’s automatic routing and management between nodes. This is a viable route, but if your team wants to manage the networking tier more autonomous from the PaaS environment then maybe OpenShift is the way to go. In the end, it’s negative bumpy territory to determine which you may or may not want based on that.
Negative: Full Spectrum Polyglot, Missing Some
Cloud Foundry has a wider selection of languages and frameworks with community involvement around those with groups like Iron Foundry. OpenShift I’m sure will be getting to parity in the coming months. I have no doubt between both of these PaaS Ecosystems that they’ll expand to new languages and frameworks over time. Being polyglot after all is a no brainer these days!
Why PaaS Is, IMHO, Still Vitally Important
First toss out the idea that huge, web scale, Facebooks and Googles need to be built. Think about what the majority of developers out there in the world work on. Tons and tons and tons of legacy or greenfield enterprise applications. Sometimes the developer is lucky enough to work on a full vertical mix of things for a small business, but generally, the standard developer in the world is working on an enterprise app.
PaaS tooling takes the vast majority of that enterprise app maintenance from an operational side and tosses it out. Instead of managing a bunch of servers with a bunch of different apps the operations team manages an ecosystem that has a bunch of apps. This, for the enterprises that have enough foresight and have managed their IT assets well enough to be able to implement and use PaaS tooling, is HUGE!
For companies working to stay relevant in the enterprise, for companies looking to make inroads into the enterprise and especially for enterprises that are looking to maintain, grow or struggling to keep ahead of the curve – PaaS tooling is something that is a must have.
Just ask a dev, do they want to spend a few hours configuring and testing a server? Do they want to deploy their application and focus on building more value into that application?
…being I’ve spent a few years being the developer, I’ll hedge on the side of adding value.
What’s Next?
So what’s next? Two major things in my opinion.
1. Fill the data gap. Most of the PaaS tooling needs to bridge the gap with the data story. I’m working my part with testing, development and efforts to get real options built into these environments, but this often leads back to the data story of PaaS being weak. What’s the solution here? I’m in talks, ongoing, planning sessions ongoing, and we’ll eventually get a solid solution around the data side.
2. Fix deployments & deployment management. Bosh isn’t straight forward or obvious in what it does, Cloud Foundry is easily the hardest thing to deploy with many dependencies. OpenShift is easier to deploy and neither of them actually have a solid management story over time. Bosh does some impressive updates of Cloud Foundry, and OpenShift has some upgrade methods, but still over time and during day to day operations there hasn’t been any clear cut wins with viewing, monitoring and managing nodes and data within these environments.
You must be logged in to post a comment.