Jon Lewis wrote:
It seems like if you're going to outsource your mission critical infrastructure to "cloud" you should probably pick at least 2 unrelated cloud providers and if at all possible, not outsource the systems that balance/direct traffic...and if you're really serious about it, have at least two of these setup at different facilities such that if the primary goes offline, the secondary takes over. If a cloud provider fails, you redirect to another.
Really, you need at least three independent providers. One primary (A), one backup (B), and one "witness" to monitor the others for failure. The witness site can of course be low-powered, as it is not in the data plane of the applications, but just participates in the control plane. In the event of a loss of communication, the majority clique wins, and the isolated environments shut themselves down. This is of course how any sane clustering setup has protected against "split brain" scenarios for decades. Doing it the right way makes the cloud far less cost-effective and far less "agile". Once you get it all set up just so, change becomes very difficult. All the monitoring and fail-over/fail-back operations are generally application-specific and provider-specific, so there's a lot of lock-in. Tools like RightScale are a step in the right direction, but don't really touch the application layer. You also have to worry about the availability of yet another provider! -- RPM