
On Mon, 2 Jul 2012, david raistrick wrote:
On Mon, 2 Jul 2012, James Downs wrote:
back-plane / control-plane was unable to cope with the requests. Netflix uses Amazon's ELB to balance the traffic and no back-plane meant they were unable to reconfigure it to route around the problem.
Someone needs to define back-plane/control-plane in this case. (and what wasn't working)
Amazon resources are controlled (from a consumer viewpoint) by API - that API is also used by amazon's internal toolkits that support ELB (and RDS..). Those (http accessed) API interfaces were unavailable for a good portion of the outages.
It seems like if you're going to outsource your mission critical infrastructure to "cloud" you should probably pick at least 2 unrelated cloud providers and if at all possible, not outsource the systems that balance/direct traffic...and if you're really serious about it, have at least two of these setup at different facilities such that if the primary goes offline, the secondary takes over. If a cloud provider fails, you redirect to another. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Senior Network Engineer | therefore you are Atlantic Net | _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________