Maybe I'm in the minority here, but I have higher standards for a T1 than any of the other players involved. Clearly several entities failed to do what they should have done, but Verizon is not a small or inexperienced operation. Taking 8+ hours to respond to a critical operational problem is what stood out to me as unacceptable. And really - does it matter if the protection *was* there but something broke it? I don't think it does. Ultimately, Verizon failed implement correct protections on their network. And then failed to respond when it became a problem. On Mon, Jun 24, 2019, 8:06 PM Tom Beecher <beecher@beecher.cc> wrote:
Disclaimer : I am a Verizon employee via the Yahoo acquisition. I do not work on 701. My comments are my own opinions only.
Respectfully, I believe Cloudflare’s public comments today have been a real disservice. This blog post, and your CEO on Twitter today, took every opportunity to say “DAMN THOSE MORONS AT 701!”. They’re not.
You are 100% right that 701 should have had some sort of protection mechanism in place to prevent this. But do we know they didn’t? Do we know it was there and just setup wrong? Did another change at another time break what was there? I used 701 many jobs ago and they absolutely had filtering in place; it saved my bacon when I screwed up once and started readvertising a full table from a 2nd provider. They smacked my session down an I got a nice call about it.
You guys have repeatedly accused them of being dumb without even speaking to anyone yet from the sounds of it. Shouldn’t we be working on facts?
Should they have been easier to reach once an issue was detected? Probably. They’re certainly not the first vendor to have a slow response time though. Seems like when an APAC carrier takes 18 hours to get back to us, we write it off as the cost of doing business.
It also would have been nice, in my opinion, to take a harder stance on the BGP optimizer that generated he bogus routes, and the steel company that failed BGP 101 and just gladly reannounced one upstream to another. 701 is culpable for their mistakes, but there doesn’t seem like there is much appetite to shame the other contributors.
You’re right to use this as a lever to push for proper filtering , RPKI, best practices. I’m 100% behind that. We can all be a hell of a lot better at what we do. This stuff happens more than it should, but less than it could.
But this industry is one big ass glass house. What’s that thing about stones again?
On Mon, Jun 24, 2019 at 18:06 Justin Paine via NANOG <nanog@nanog.org> wrote:
FYI for the group -- we just published this: https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-pa...
_________________ *Justin Paine* Director of Trust & Safety PGP: BBAA 6BCE 3305 7FD6 6452 7115 57B6 0114 DE0B 314D 101 Townsend St., San Francisco, CA 94107 <https://www.google.com/maps/search/101+Townsend+St.,+San+Francisco,+CA+94107?entry=gmail&source=g>
On Mon, Jun 24, 2019 at 2:25 PM Mark Tinka <mark.tinka@seacom.mu> wrote:
On 24/Jun/19 18:09, Pavel Lunin wrote:
Hehe, I haven't seen this text before. Can't agree more.
Get your tie back on Job, nobody listened again.
More seriously, I see no difference between prefix hijacking and the so called bgp optimisation based on completely fake announces on behalf of other people.
If ever your upstream or any other party who your company pays money to does this dirty thing, now it's just the right moment to go explain them that you consider this dangerous for your business and are looking for better partners among those who know how to run internet without breaking it.
We struggled with a number of networks using these over eBGP sessions they had with networks that shared their routing data with BGPmon. It sent off all sorts of alarms, and troubleshooting it was hard when a network thinks you are de-aggregating massively, and yet you know you aren't.
Each case took nearly 3 weeks to figure out.
BGP optimizers are the bane of my existence.
Mark.