I finally wrapped up my name server and DNS mapping needs with Name.com, Route 53 and Elastic Beanstalk. Since this was … More
I finally sat down and really started to take a stab at Cloud Foundry Bosh. Here’s the quick lowdown on … More
I wanted to write up an intro to getting Riak installed on AWS, even though the steps are absurdly simple … More
I’ll be presenting on the AWS Toolkit for Visual Studio 2010 in the very near future (Check out the SAWSUG Meetup on October 12th, that’s this Wednesday). I’ll be covering a number of things about the new AWS Toolkit for Visual Studio. My slides are available below (with links to the Google Docs and Slideshare Versions).[googleapps domain="docs" dir="present/embed" query="id=dgvssh3q_418chn3jg4r" width="410" height="342" /]
Direct link to Google Docs Presentation or the SlideShare Presentation.
The code for the presentation is available on Github under AWS-Toolkit-Samples. Beware, this code will be changing over time, the core will stay the same though.
Ok. First a few facts.
AWS has had a data center problem that has been ongoing for a couple of days.
AWS has NOT been forthcoming with much useful information.
AWS still has many data centers and cloud regions/etc up and live, able to keep their customers up and live.
Many people have NOT built their architecture to be resilient in the face of an issue such as this downtime. It all points to the mantra to the “keep a backup”, but many companies have NOT done that.
Now a few personal observations and attitudes toward this whole thing.
If you’re site is down because you had a single point of failure in one of the zones in one region at AWS. That’s your bad architectural design, plain and simple. You NEVER build a site like that if you actually expect to stay up with 99.99% or even 90% of the time. Anyone in the cloud business, SaaS, PaaS, or otherwise should know better than that. Everytime I hear someone from one of these companies whining about how it was AWSs responsiblity, I ask, is the auto manufacturer responsible for the 32k dead in 2010? How about the 50k dead in the year of peak automobile deaths? Nope, those deaths are the responsiblity of the drivers. When you get behind the wheel you need to, you MUST know what power you yield. You might laugh, you might jest that I use this corralary, but I wanted to use an example ala Frédéric Bastiat (if you don’t know who he is, check him out: Frédéric Bastiat). Cloud computing, and its use, is a responsibility of the user to build their system well.
One of the common things I keep hearing over and over about this is, “…we could have made our site resilient, but it’s expensive…” Ok, let me think for a second. Ummm, I call bullshit. Here’s why. If you’re a startup of the most modest means, you probably need to have at least 100-300 dollars of services (EC2, S3, etc) running to make sure you’re site can handle even basic traffic and a reasonable business level (i.e. 24/7, some traffic peaks, etc). With just $100 bucks one can setup multiple EC/2 instances, in DIFFERENT regions, load balance between those, and assure that they’re utilizing a logical storage medium (i.e. RDS, S3, SimpleDB, Database.com, SQL Azure, and the list goes on and on). There is zero reason that a business should have their data stored ON the flippin’ EC2 instance. If it is, please go RTFM on how to build an application for the Internets. K Thx. Awesomeness!! 🙂
Now there are some situations, like when Windows Azure went down (yeah, the WHOLE thing) for about an hour or two a few months after it was released. It was, however, still in “beta” at the time. If ALL of AWS went down then these people who have not built a resilient system could legitimately complain right along with anyone else that did build a legitimate system. But those companies, such as Netflix, AppHarbor, and thousands of others, have not had downtime because of this data center problem AWS is having. Unless you’re on one instance, and you want to keep your bill around $15 bucks a month, then I see ZERO reason that you should still be whining. Roll your site up somewhere else, get your act together and ACT. Get it done.
I’m honestly not trying to defend AWS either. On that note, the response time and responses have been absolutely horrible. There has been almost ZERO social media, forum, or other responses that remotely resemble some educated technical people working on this problem. In addition to this, Amazon has allowed the media to run wild with absolutely inane and sensational headlines about the whole incident. Not only are they not providing information to the customers, but whatever mechanisms are internal to the company that are supposed to manage this type of incident from a public relations aspect are acting morbidly slow. From a technology company, especially of Amazon’s capabilities and technical prowess (generally, they’re YEARS ahead others) this is absolutely unacceptable and disrespectful on a personal level.