(I'm going to hate myself in the morning, but) On Fri, Oct 8, 2021 at 10:22 AM Masataka Ohta < mohta@necom830.hpcl.titech.ac.jp> wrote:
William Herrin wrote:
https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/
our DNS servers disable those BGP advertisements if they themselves can not speak to our data centers
The end result was that our DNS servers became unreachable even though they were still operational.
means their DNS servers were serving the zone, even after they recognize their zone data were too old, that is, expired.
that's not what this means. I think Mr. Petach previously described this, but: 1) dns server in pop serves some content (ttls aren't important right now) 2) dns server uses some quagga/gated/bird/etc to announce locally: "Hey, foo/32 here!" (imagine this triggers an 'aggregate route' or 'network statement' (pick your vendor solution) to appear in the global table) 3) dns server also 'ping backend server set' 4) when 3 fails for X period of time 'tell quagga/bird/etc to stop announcing the /32' then the local pop no longer sources the aggregate (/24 or /23 or whatever)... so traffic SHOULD (externally) flow toward another copy of the /23 or /24 or whatever... there's not a lot of magic here... and it's not about the zone data really at all.