On 12 Mar 2000, Paul Vixie wrote:
That being said, if anyone has better ideas on how to provide for high availability to millions of web sites worldwide, please let me know.
TCP performance is affected by congestion symmetry, since TCP uses the spacing of ACK packets to control the spacing of data packets. While there's no way to guarantee congestion symmetry, one of the leading indicators of whether you will have congestion symmetry is "whether you have path symmetry." Furthermore, the leading indicator of whether you have path symmetry is "whether the outbound flow's first hop is the same as the incoming flow's last hop."
Thus http://www.vix.com/pub/vixie/ifdefault/. Try it. If you don't know how to apply patches to your kernel, then have a consultant do it. I wrote this for a pornography distributor whose pageviews-per-second went up by a factor of 1.7 peak and 1.25 average just as a result of using "interface defaults" rather than speaking BGP and trying to run defaultless.
That doesn't solve one of the growing uses of such systems, which is so-called "geographical redundancy". More and more, it simply isn't acceptable to have a single location with a bunch of network links, with an attempt being made to optimize how those links are used. A single location is a single location no matter what you do with it. You need multiple locations, with a reasonably robust and somewhat (although not necessarily completely) transparent failover between them. In these cases, any best path benefits are secondary. The ways to do this sort of thing are very limited. Whatever you do, you end up needing either "smart" (ie. sometimes lame) DNS servers or to originate BGP routes from multiple locations with the same IP address actually going to different machines depending on which route is used (which is lame even more often, although it is not as bad if one facility is normally an unused backup one, but that introduces lots of other issues). If you have a better solution for this, I'm sure the world would love to hear it. Yes, many or most or all of the current implementations do or can be configured to do some questionable things. However, your solution doesn't address the whole "distributed" aspect of it.