scg@gibbard.org (Steve Gibbard) writes:
On Sat, 3 May 2008, Mike Lewinski wrote:
David Coulson wrote:
Depends - It doesn't help if the DNS server is dead, but the front-end is still advertising the routes.
Possibly a good argument for allowing the DNS servers to originate the routes for them...? I've seen configuration where the routes were
Running Quagga or something similar on the anycasted server to announce the routes is the standard way of setting up anycast. That way, if the server fails completely, the route goes away.
that's what joe said to do in <http://www.isc.org/pubs/tn/isc-tn-2004-1.txt>.
A common improvement on that is to run a script on the server that checks to make sure the name server process is running and responding correctly, and kills BGP if it isn't. That covers cases where named has problems that don't take down the whole server.
in ISC-TN-2004-1 [ibid], appendix D, joe suggests bringing up and down the interface BIND listens on (which presumes that it's a dedicated loopback like lo1 whose address is covered by a quagga route advertisement rule). note that joe's example brings up the interface before starting the name server program, and bringing it down if the name server program exits. this presumes that the name server will start very quickly, and that while running, it is healthy. since i've seen name server programs be unhealthy while running, and/or take a long time to start, i'm now considering an outboard shell script that runs some kind of DNS query and decides, based on the result, whether to bring the dedicated loopback interface up or down.
... The right solution is to design the anycast servers to be as sure as possible that the route will go away when you want it gone, but to have multiple non-interdependent anycast clouds in the NS records for each zone. If the local node in one cloud does fail improperly, something will still be responding on the other cloud's IP address.
the need for multiple independent anycast clouds is an RFC 2182 topic, but joe's innovation both in ISC-TN-2004-1 and in his earlier ISC-TN-2003-1 (see <http://www.isc.org/pubs/tn/isc-tn-2003-1.txt> is that if each anycast cluster is really several servers, each using OSPF ECMP, then you can lose a server and still have that cluster advertising the route upstream, and only when you lose all servers in a cluster will that route be withdrawn.
Note that any of these failure scenarios is still preferable to what you get with unicast servers. With unicast, if the server has trouble, the route always stays up, and the the traffic always ends up in a black hole.
here, the real problem is the route staying up, which also blackholes anycast. the only things DNS anycast universally buys you are DDoS resilience and hot swap. anything else anycast can do (high availability, low avg. RTT, etc) can also be engineered using a unicast design, though probably at higher TCO. -- Paul Vixie