two replies here. i (paul@vix.com) said:
quagga ospf6d works great, and currently lacks only a health check API.
Donald Stahl <don@calis.blacksun.org> answered:
Health checks are unfortunately the most important aspect of a LB for some people.
understood.
Can you elaborate on where you use ECMP and specifics about your implementation that might interest people?
i could, but joe abley already did, and i wouldn't want to plagiarize him. plz see <http://www.isc.org/pubs/tn/index.pl?tn=isc-tn-2004-1.html>. --- Colm MacCarthaigh <colm@stdlib.net> answered:
If you're load-balancing N nodes, and 1 node dies, the distribution hash is re-calced and TCP sessions to all N are terminated simultaneously.
i could just say that since i'm serving mostly UDP i don't care about this, but then i wouldn't have a chance to say that paying the complexity and bug and training cost of an extra in-path powered box 24x365.24 doesn't weigh well against the failure rate of the load balanced servers. somebody could drop an anvil on one of my servers twice a day (so, 730 times per year) and i would still come out ahead, given that most TCP traffic comes from web browsers and many users will click "Reload" before giving up. then there's CEF which i think keeps existing flows stable even through an OSPF recalc. finally, there's the fact that we see less than one server failure per month among the 100 or so servers we've deployed behind OSPF ECMP. i know a lot of people who get paid well for building and selling and supporting Extra Powered Boxes, and a lot of other people who will never get fired for buying one... but that doesn't make it right.