
scott,
This was not a cascading failure. It was a simple power outage
Cascading failures involve interdependencies among components.
Not always. Cascading failures can also occur when there is zero dependency between components. The simplest form of this is where one environment fails over to another, but the target environment is not capable of handling the additional load and then "fails" itself as a result (in some form or other, but frequently different to the mode of the original failure).
indeed. and that is an interdependency among components. in particular, it is a capacity interdependency.
Whilst the Amazon outage might have been a "simple" power outage, it's likely that at least some of the website outages caused were a combination of not just the direct Amazon outage, but also the flow-on effect of their redundancy attempting (but failing) to kick in - potentially making the problem worse than just the Amazon outage caused.
i think you over-estimate these websites. most of them simply have no redundancy (and obviously have no tested, effective redundancy) and were simply hoping that amazon didn't really go down that much. hope is not the best strategy, as it turns out. i suspect that randy is right though: many of these businesses do not promise perfect uptime and can survive these kinds of failures with little loss to business or reputation. twitter has branded it's early failures with a whale that no only didn't hurt it but helped endear the service to millions. when your service fits these criteria, why would you bother doing the complicated systems and application engineering necessary to actually have functional redundancy? it simply isn't worth it. t
Scott