Re: NANOG 40 agenda posted

4 Jun 2007

      two replies here.  i (paul@vix.com) said:
...
...
quagga ospf6d works great, and currently lacks only a health check API.
Donald Stahl <don@calis.blacksun.org> answered:
...
Health checks are unfortunately the most important aspect of a LB for some
people.
understood.
...
Can you elaborate on where you use ECMP and specifics about your
implementation that might interest people?
i could, but joe abley already did, and i wouldn't want to plagiarize him.
plz see <http://www.isc.org/pubs/tn/index.pl?tn=isc-tn-2004-1.html>.

---

Colm MacCarthaigh <colm@stdlib.net> answered:
...
If you're load-balancing N nodes, and 1 node dies, the distribution hash
is re-calced and TCP sessions to all N are terminated simultaneously.
i could just say that since i'm serving mostly UDP i don't care about this,
but then i wouldn't have a chance to say that paying the complexity and bug
and training cost of an extra in-path powered box 24x365.24 doesn't weigh
well against the failure rate of the load balanced servers.  somebody could
drop an anvil on one of my servers twice a day (so, 730 times per year) and
i would still come out ahead, given that most TCP traffic comes from web
browsers and many users will click "Reload" before giving up.  then there's
CEF which i think keeps existing flows stable even through an OSPF recalc.
finally, there's the fact that we see less than one server failure per month
among the 100 or so servers we've deployed behind OSPF ECMP.

i know a lot of people who get paid well for building and selling and
supporting Extra Powered Boxes, and a lot of other people who will never
get fired for buying one... but that doesn't make it right.

Re: NANOG 40 agenda posted

Paul Vixie