On Mon, Jun 04, 2007 at 07:29:03AM +0000, Paul Vixie wrote:
If you're load-balancing N nodes, and 1 node dies, the distribution hash is re-calced and TCP sessions to all N are terminated simultaneously.
i could just say that since i'm serving mostly UDP i don't care about this, but then i wouldn't have a chance to say that paying the complexity and bug and training cost of an extra in-path powered box 24x365.24 doesn't weigh well against the failure rate of the load balanced servers. somebody could drop an anvil on one of my servers twice a day (so, 730 times per year) and i would still come out ahead, given that most TCP traffic comes from web browsers and many users will click "Reload" before giving up.
It depends on the length of those TCP sockets. If you were load-balancing the increasingly common video-over-http, it would be very unacceptable. You also ignore the "thundering herd" problem that arises when you suddenly get all of your active clients re-requesting in a very short time-window like that. If I have 1000 active flows that last 10 seconds each, I can expect a peak rate of about 200 new flows per second. Kill them all in one go and I can expect a peak rate of 5 times that. That's a significant difference to plan for, and very different from the load you expect after an extended outage or initial switch on. This problem also gets increasingly worse the longer the TCP sockets live.
then there's CEF which i think keeps existing flows stable even through an OSPF recalc.
No CEF table I've used does that. Also, if you restrict yourself to CEF, you have to accept a decrease in the ammount of nodes you can balance Vs something like quagga on *nix. The limits are anywhere from just 6 ECMP routes to 32 (though of course you could do staggered load-balancing using multiple CEF devices). I'm open to correction on the 32, but it's the highest I've yet come accross. The routes get distributed accross the slots of the CEF table as evenly as possible, but when they dissappear the hashing completely changes (at least it does for me operationally, and if I use "show ip cef exact-route". Interestingly, there is a CEF table state that /could/ enable this functionality, the "punt" state promises to have an unswitchable packet get punted out of the CEF table and fall back to higher-level software switching. If the CEF slots occupied by a now-down node could be forced into the punt state then only traffic toward that node would be affected. But despite questions to Cisco dev teams and much experimentation, I can't see a reliable way to get a CEF table entry into the punt state (unlike say the "glean" state, which isn't good enough).
finally, there's the fact that we see less than one server failure per month among the 100 or so servers we've deployed behind OSPF ECMP.
Failure rates can and should be low indeed, but that's not where I see the primary utility of high-availability load-balancers. If I have 20 web-servers in a load-balanced cluster and I need to upgrade them to the latest version of Apache for security reasons, I want to do it one by one without losing a single HTTP session. This *is* possible with many load-balancers (plug: Including Apache's own load-balancing proxy), but with OSPF I'm forced to drop *all* sessions to the cluster 20 times (or yes I could do 10 nodes at a time, but you get the picture). I *like* OSPF ECMP load-balancing, it's *great*, and I use it in production, even load-balancing a tonne of https traffic, but in my opinion you are over-stating its abilities. It is not close to the capabilities of a good intelligent load-balancer. It is however extremely cost-effective and good enough for a lot of usage, as long as it's taken with some operational and engineering considerations. -- Colm MacCárthaigh Public Key: colm+pgp@stdlib.net