On Wed, 6 Oct 2021, Michael Thomas wrote:
So if I understand their post correctly, their DNS servers have the ability to withdraw routes if they determine are sub-optimal (fsvo). I can certainly understand for the DNS servers to not give answers they think are unreachable but there is always the problem that they may be partitioned and not the routes themselves. At a minimum, I would think they'd need some consensus protocol that says that it's broken across multiple servers.
But I just don't understand why this is a good idea at all. Network topology is not DNS's bailiwick so using it as a trigger to withdraw routes seems
Everything I've seen posted about this (whether from Facebook directly, or others) is so vague as to what happened, that I think everyone's just making assumptions based on their own experiences or best guesses as to what really happened. In that vein, imagine you have dozens of small sites acting as anycast origins for DNS. Each regularly does some network health tests to determine if its links to the rest of the (region|backbone|world|etc.) are working within defined paramters. If the health test fails, the site needs to be removed from anycast until the network health issue is resolved. You're big, like automating things, and feel the need for speed, so when the health test fails, rather than trigger an alarm which your NOC may or may not act on in a timely manner, the local anycast origin routes are automatically suppressed from propagating beyond the site. Just suppose you pushed out a new network health test that was guaranteed to fail in every POP...and you pushed it out to every POP. All of a sudden, your anycast routes aren't advertised anywhere. Is this what happened? I really have no clue. It sounds like something like this might have happened. Unless someone at Facebook shares an actual detailed account of what they broke, most of us will never know what really happened. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route StackPath, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________