On Sat, Oct 9, 2021 at 1:40 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
Christopher Morrow wrote:
means their DNS servers were serving the zone, even after they recognize their zone data were too old, that is, expired.
that's not what this means. I think Mr. Petach previously described this,
He wrote:
So, the idea is that if the edge CDN node loses connectivity to the core datacenters, the DNS servers should stop answering queries for A records with the local CDN node's address, and let a different site respond back to the client's DNS request.
which may be performed by standard DNS with short expire period, after which name servers will return SERVFAIL and other name servers in other edge node with different IP addresses are tried.
(Apologies for the delayed response--I had back-to-back board meetings the past two days which had me completely tied up.) That is one way in which it *could* be done--but is by no means the ONLY way in which it can be done. With an anycast setup using the same IP addresses in every location, returning SERVFAIL doesn't have the same effect, however, because failing over from anycast address 1 to anycast address 2 is likely to be routed to the same pop location, where the same result will occur. You don't really want to hunt among different *IP addresses*, you want to hunt to a different *location*. This is why withdrawing the BGP announcement from that location works more effectively, because it allows the clients to continue querying the same IP address, but get routed to the next most proximal location. If you simply return SERVFAIL and have the client pick a different IP address from the list of NS entries, it falls into one of two situations: a) the new IP address is also anycasted, and is therefore likely to pick the same pop that is unhealthy, with similar results, or b) the new IP address is *not* anycasted, but is served from a single geographical location, which means answers given back by that DNS server are unlikely to be geolocated with any accuracy, and therefore the content served is also unlikely to be geographically relevant or correct.
It may be that facebook uses all the four name server IP addresses in each edge node. But, it effectively kills essential redundancy of DNS to have two or more name servers (at separate locations) and the natural consequence is, as you can see, mass disaster.
Even if the four anycasted nameserver IP addresses weren't completely overlapping (let's assume as a hypothetical that a.ns is served out of EU pops, b.ns is served out of NA pops, c.ns is served out of SA pops, and d.ns is served out of APAC pops), if all sites run the same healthcheck code, then if the underlying healthcheck fails, *every site* will decide it is unhealthy, and stop answering requests; so, all the EU sites fail health check and stop serving a.ns; all the North America sites fail health check, and stop serving b.ns...and so forth. You followed the best practices, you had different NS entries that were on different subnets, that were geographically dispersed around the globe, that were redundant for each other. But because they all used the same fundamental health check, they all *independently* decided they were unhealthy and needed to stop giving out DNS answers, and instead let one of the other healthier sites take over.
but: 1) dns server in pop serves some content (ttls aren't important right now)
You MUST distinguish TTL and EXPIRE. They are different.
TTL and EXPIRE are irrelevant here. The only thing changing those values would do is change how long it took for caching resolvers to reflect the loss of connectivity at the DNS layer. Once the underlying layer 3 connectivity had broken, DNS answers became meaningless. No matter what records were returned, or cached, you couldn't reach the servers. Yes, yes, as an academic exercise you can point out that there's a difference in how and when those DNS records stop being used, and you're right about that--but in terms of this particular failure, this particular post-mortem we're beating to a horse-shaped pulp, it's entirely meaningless. ^_^;
there's not a lot of magic here... and it's not about the zone data really at all.
Statement of Petach: "the edge CDN node loses connectivity to the core datacenters, the DNS servers should stop answering" means, with DNS terminology, zone data is expired, which has nothing to do with TTL.
As you're using my words, I'm going to have to point out that "the DNS servers should stop answering" does not require that any change happens *at the DNS layer* -- in this case, the change can happen at the routing layer, ensuring that even if some caching resolver out there is completely defiant of your expire time, you *will not answer* because the query packets can never reach you in the first place.
Masataka Ohta
Thanks! Matt