On Wed, Oct 6, 2021 at 10:45 AM Michael Thomas <mike@mtcc.com> wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). I can certainly understand for the DNS servers to not give answers they think are unreachable but there is always the problem that they may be partitioned and not the routes themselves. At a minimum, I would think they'd need some consensus protocol that says that it's broken across multiple servers.
But I just don't understand why this is a good idea at all. Network topology is not DNS's bailiwick so using it as a trigger to withdraw routes seems really strange and fraught with unintended consequences. Why is it a good idea to withdraw the route if it doesn't seem reachable from the DNS server? Give answers that are reachable, sure, but to actually make a topology decision? Yikes. And what happens to the cached answers that still point to the supposedly dead route? They're going to fail until the TTL expires anyway so why is it preferable withdraw the route too?
My guess is that their post while more clear that most doesn't go into enough detail, but is it me or does it seem like this is a really weird thing to do?
Mike
Hi Mike, You're kinda thinking about this from the wrong angle. It's not that the route is withdrawn if doesn't seem reachable from the DNS server. It's that your DNS server is geolocating requests to the nearest content delivery cluster, where the CDN cluster is likely fetching content from a core datacenter elsewhere. You don't want that remote/edge CDN node to give back A records for a CDN node that is isolated from the rest of the network and can't reach the datacenter to fetch the necessary content; otherwise, you'll have clients that reach the page, can load the static elements on the page, but all the dynamic elements hang, waiting for a fetch to complete from the origin which won't ever complete. Not a very good end user experience. So, the idea is that if the edge CDN node loses connectivity to the core datacenters, the DNS servers should stop answering queries for A records with the local CDN node's address, and let a different site respond back to the client's DNS request. In particular, you really don't want the client to even send the request to the edge CDN node that's been isolated, you want to allow anycast to find the next-best edge site; so, once the DNS servers fail the "can-I-reach-my-datacenter" health check, they stop announcing the Anycast service address to the local routers; that way, they drop out of the Anycast pool, and normal Internet routing will ensure the client DNS requests are now sent to the next-nearest edge CDN cluster for resolution and retrieving data. This works fine for ensuring that one or two edge sites that get isolated due to fiber cuts don't end up pulling client requests into them, and subsequently leaving the users hanging, waiting for data that will never arrive. However, it fails big-time if *all* sites fail their "can-I-reach-the-datacenter" check simultaneously. When I was involved in the decision making on a design like this, a choice was made to have a set of "really core" sites in the middle of the network always announce the anycast prefixes, as a fallback, so even if the routing wasn't optimal to reach them, the users would still get *some* level of reply back. In this situation, that would have ensured that at least some DNS servers were reachable; but it wouldn't have fixed the "oh crap we pushed 'no router bgp' out to all the routers at the same time" type problem. But that isn't really the core of your question, so we'll just quietly push that aside for now. ^_^; Point being--it's useful and normal for edge sites that may become isolated from the rest of the network to be configured to stop announcing the Anycast service address for DNS out to local peers and transit providers at that site during the period in which they are isolated, to prevent users from being directed to CDN servers which can't fetch content from the origin servers in the datacenter. It's just generally assumed that not every site will become "isolated" at the same time like that. :) I hope this helps clear up the confusion. Thanks! Matt