On Jun 24, 2019, at 8:50 PM, Ross Tajvar <ross@tajvar.io> wrote:
Maybe I'm in the minority here, but I have higher standards for a T1 than any of the other players involved. Clearly several entities failed to do what they should have done, but Verizon is not a small or inexperienced operation. Taking 8+ hours to respond to a critical operational problem is what stood out to me as unacceptable.
Are you talking about a press response or a technical one? The impacts I saw were for around 2h or so based on monitoring I’ve had up since 2007. Not great but far from the worst as Tom mentioned. I’ve seen people cease to announce IP space we reclaimed from them for months (or years) because of stale config. I’ve also seen routes come back from the dead because they were pinned to an interface that was down for 2 years but never fully cleaned up. (Then the telco looped the circuit, interface came up, route in table, announced globally — bad day all around).
And really - does it matter if the protection *was* there but something broke it? I don't think it does. Ultimately, Verizon failed implement correct protections on their network. And then failed to respond when it became a problem.
I think it does matter. As I said in my other reply, people do things like drop ACLs to debug. Perhaps that’s unsafe, but it is something you do to debug. Not knowing what happened, I dunno. It is also 2019 so I hold networks to a higher standard than I did in 2009 or 1999. - Jared