
On Jun 30, 2012 12:25 AM, "joel jaeggli" <joelja@bogus.com> wrote:
On 6/30/12 12:11 AM, Tyler Haske wrote:
I am not a computer science guy but been around a long time. Data
centers
and clouds are like software. Once they reach a certain size, its impossible to keep the bugs out. You can test and test your heart out and something will slip by. You can say the same thing about nuclear reactors, Apollo moon missions, the NorthEast power grid, and most other technology disasters.
How to run a datacenter 101. Have more then one location, preferably far apart. It being Amazon I would expect more. :/
there are 7 regions in ec2 three in north america two in asia one in europe and one in south america.
us east coast, the one currently being impacted is further subdivided into 5 availability zones.
us east 1d appears to be the only one currently being impacted.
distributing your application is left as an exercise to the reader.
+1 Sorry to be the monday morning quarterback, but the sites that went down learned a valuable lesson in single point of failure analysis. A highly redundant and professionally run data center is a single point of failure. Geo-redundancy is key. In fact, i would take distributed data centers over RAID, UPS, or any other "fancy pants" © mechanisms any day. And, aws East also seems to be cursed. I would run out of west for a while. :-) I would also look into clouds of clouds. ... Who knows. Amazon could have an Enron moment, at which point a corporate entity with a tax id is now a single point of failure. Pay your money, take your chances. CB