On Thu, Oct 7, 2021 at 9:52 AM Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:
But, this time, the reality strikes back.
Not really. Or at all. Facebook the external service was down hard as soon as the cross-datacenter connections all failed. Whether or not the BGP routes for the external DNS were withdrawn had no impact on the outage. Facebook's _internal_ DNS, while not anycasted, followed a similar logic: if the data center is isolated and their data goes stale, they stop serving potentially wrong answers. Since the routing failure isolated all of the data centers, this left no usable _INTERNAL_ DNS on which more or less everything else depends. I didn't work for the DNS team when I worked as a production engineer for Facebook but I worked close enough to understand what happened from the posted description. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/