In facebook case, it was combined with poor understanding
on short/long expiration period to cause the disaster.

Still, no. 

The CAUSE of the outage was all of the FB datacenters being completely disconnected from their backbone, and thus the internet. DNS breaking was a direct RESULT of that. Even if FB's DNS was happily still providing answers to IPs that were still unreachable, they were still horked.

Could their DNS design possibly have contributed to some delay in the RESTORATION phase? Perhaps. But with the volume of traffic they do, that was certainly going to take a while anyways. 


On Fri, Oct 8, 2021 at 5:17 AM Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:
Sabri Berisha wrote:

> Let's for a moment contemplate about the sheer magnitude of
> their operation. With almost 3 billion users worldwide, can you imagine the
> amount of DNS queries they have to process? Their scale is unprecedented.
That's what I predicted about 20 years ago, which is why
I proposed to have anycast name servers analyzing its
implications.

As such I'm sure anycast route withdrawal ignoring rfc3258
is poor engineering.

Scalable solutions can be constructed only with careful
theoretical analysis, against which random hacks, which
may work 99% of the time, are just harmful.

In facebook case, it was combined with poor understanding
on short/long expiration period to cause the disaster.

                                        Masataka Ohta