This bit posted by Randy might get lost in the other thread, but
it appears that their DNS withdraws BGP routes for prefixes that
they can't reach or are flaky it seems. Apparently that goes for
the prefixes that the name servers are on too. This caused
internal outages too as it seems they use their front facing DNS
just like everybody else.
Sounds like they might consider having at least one split horizon server internally. Lots of fodder here.
Mike
On Tue, Oct 5, 2021 at 1:26 PM Michael Thomas <mike@mtcc.com> wrote:
On 10/5/21 12:17 AM, Carsten Bormann wrote:
> On 5. Oct 2021, at 07:42, William Herrin <bill@herrin.us> wrote:
>> On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <mike@mtcc.com> wrote:
>>> They have a monkey patch subsystem. Lol.
>> Yes, actually, they do. They use Chef extensively to configure
>> operating systems. Chef is written in Ruby. Ruby has something called
>> Monkey Patches.
> While Ruby indeed has a chain-saw (read: powerful, dangerous, still the tool of choice in certain cases) in its toolkit that is generally called “monkey-patching”, I think Michael was actually thinking about the “chaos monkey”,
> https://en.wikipedia.org/wiki/Chaos_engineering#Chaos_Monkey
> https://netflix.github.io/chaosmonkey/
No, chaos monkey is a purposeful thing to induce corner case errors so
they can be fixed. The earlier outage involved a config sanitizer that
screwed up and then pushed it out. I can't get my head around why
anybody thought that was a good idea vs rejecting it and making somebody
fix the config.
Mike
--