Foursquare Outage A Cloud(y) Issue

3060 Even the big guys have issues.

Foursquare, the gamelike location-based social network that so many people appear to love, dropped off the face of the map yesterday for an extended period of time. Stuff happens, we all know that. Foursquare is a startup that is likely subject to the same kind of technical growing pains that Twitter has experienced over the last two years, so that's not terribly surprising. What struck me as interesting is that it appears that the Foursquare outage is the result of a power failure in the Amazon EC2 (cloud) facility in northern Virginia. Thats right. A power failure in an Amazon datacenter.  *** GASP ***.

If you've spent at least 8 seconds reading any hosting related blogs or websites in the last year, you've heard about cloud computing. Its that bulletproof, instantly scalable, wonderfully elastic and adaptable service that allows you to spin up all kinds of virtual machines on demand and never goes down because there's so much redundancy built in to the cloud. The Cloud is all-knowing. The Cloud is all-powerful. The Cloud doesn't have a curfew and can stay out as late as it wants - even on a school night.

Well evidently The Cloud, at least a portion of Amazon's cloud, tripped and skinned its knee yesterday. Twice. According to the Amazon AWS status report, some virtual machine instances went down in a single EC2 availability zone twice on May 4, both times due to "localized power distribution failure". Both events took about an hour to fix, then another 2-3 hours to recover from as instances were spun back up.

Before you accuse me of delighting in the misfortune of The Cloud, I am fully aware that the outages were isolated in a single EC2 availability zone and that replacement instances were available to all customers during both outages. I'm also well aware that any customer with the foresight and budget to load balance instances across multiple availability zones would not have been impacted by these outages. The point is, even the much revered cloud computing model requires a good bit of planning and implementation that isn't necessarily cheap. Contrary to popular belief, cloud hosting is not the love child of bulletproof and economical. Cloud computing is a complex construct that has lots of excellent attributes, but requires a reasonable amount of time and money to realize the benefits of those attributes.

My guess in this instance is that Foursquare was not load balancing between zones and was therefore caught in a single point of failure scenario. Ouch.
We're asked on a regular basis what our plans are regarding cloud computing and hosting. In future blog entries I'm sure I'll talk about cloud computing, clustering and what our plans are. For now I'll just leave this little story to stand on its own merits as an illustration of how even The Cloud has issues that require time and money to address.

Founded in 1996, MacConnect is the first and largest Mac-Centric ISP on the planet. Providing world class hosting solutions that are as easy as your Mac, MacConnect is the first choice for any Mac user in need of web, email and application hosting. Find us online at MacConnect.com



Tags: Blogs ď All Things infrastructure ď

Login † or † Register † †

Follow Us

Twitter Facebook RSS! http://www.joeryan.com Joe Ryan

Most Popular

iPod




iPhone

iLife

Reviews

Software Updates

Games

Hot Topics

Hosted by MacConnect - Macintosh Web Hosting and Mac Mini Colocation                                                    Contact | Advanced Search|