Centurylink having a bad morning?
Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won't load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
Its dead JIM. I also can't get in starting around 7 est On Sun, Aug 30, 2020 at 8:19 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- Sincerely, Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com
Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted. They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years… From: NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Drew Weaver via NANOG <nanog@nanog.org> Reply-To: Drew Weaver <drew.weaver@thenap.com> Date: Sunday, August 30, 2020 at 8:23 AM To: "nanog@nanog.org" <nanog@nanog.org> Subject: Centurylink having a bad morning? Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
Yeah, I am still seeing them announce our Ips even though we’ve shut down our sessions with them. From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of David Hubbard Sent: Sunday, August 30, 2020 8:28 AM To: nanog@nanog.org Subject: Re: Centurylink having a bad morning? Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted. They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years… From: NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org<mailto:nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org>> on behalf of Drew Weaver via NANOG <nanog@nanog.org<mailto:nanog@nanog.org>> Reply-To: Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>> Date: Sunday, August 30, 2020 at 8:23 AM To: "nanog@nanog.org<mailto:nanog@nanog.org>" <nanog@nanog.org<mailto:nanog@nanog.org>> Subject: Centurylink having a bad morning? Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
Their route-servers just can’t keep up. They are not getting our community-string announcements at all From: NANOG <nanog-bounces+romeo.czumbil=tierpoint.com@nanog.org> On Behalf Of Drew Weaver Sent: Sunday, August 30, 2020 8:38 AM To: 'David Hubbard' <dhubbard@dino.hostasaurus.com>; 'nanog@nanog.org' <nanog@nanog.org> Subject: RE: Centurylink having a bad morning? [EXTERNAL] Yeah, I am still seeing them announce our Ips even though we’ve shut down our sessions with them. From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org<mailto:nanog-bounces+drew.weaver=thenap.com@nanog.org>> On Behalf Of David Hubbard Sent: Sunday, August 30, 2020 8:28 AM To: nanog@nanog.org<mailto:nanog@nanog.org> Subject: Re: Centurylink having a bad morning? Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted. They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years… From: NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org<mailto:nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org>> on behalf of Drew Weaver via NANOG <nanog@nanog.org<mailto:nanog@nanog.org>> Reply-To: Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>> Date: Sunday, August 30, 2020 at 8:23 AM To: "nanog@nanog.org<mailto:nanog@nanog.org>" <nanog@nanog.org<mailto:nanog@nanog.org>> Subject: Centurylink having a bad morning? Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://urldefense.com/v3/__https://www.centurylink.com/business/login/__;!!... <https://urldefense.com/v3/__https:/www.centurylink.com/business/login/__;!!LG9nLpOADg!E2yXBny8w6y2EZDXg_JjgIblaMZT433ZEZ_TDTcM3yhU2taQo_Gk4NRDVBCFBl9JiUeoEw$> Just wondering what others are seeing.
Is this what happens when your entire network is database driven?
Well at least it looks like the issue is starting to resolve and stuff is coming back up. On Sun, Aug 30, 2020 at 8:21 AM Matt Hoppes < mattlists@rivervalleyinternet.net> wrote:
Is this what happens when your entire network is database driven?
Well, When I tried calling I got a fast busy, so that's nice. On Sun, Aug 30, 2020 at 8:33 AM David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted.
They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years…
*From: *NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Drew Weaver via NANOG <nanog@nanog.org> *Reply-To: *Drew Weaver <drew.weaver@thenap.com> *Date: *Sunday, August 30, 2020 at 8:23 AM *To: *"nanog@nanog.org" <nanog@nanog.org> *Subject: *Centurylink having a bad morning?
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- Sincerely, Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com
We've been on hold for more than an hour trying to get an update. We see the same behavior where they continue to announce our blocks despite all the interfaces to them being hard down. Scott Helms On Sun, Aug 30, 2020 at 8:58 AM Jason Kuehl <jason.w.kuehl@gmail.com> wrote:
Well, When I tried calling I got a fast busy, so that's nice.
On Sun, Aug 30, 2020 at 8:33 AM David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted.
They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years…
From: NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Drew Weaver via NANOG <nanog@nanog.org> Reply-To: Drew Weaver <drew.weaver@thenap.com> Date: Sunday, August 30, 2020 at 8:23 AM To: "nanog@nanog.org" <nanog@nanog.org> Subject: Centurylink having a bad morning?
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- Sincerely,
Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com
Multiple BGP sessions with Level3 (DIA) started flapping at approx 03:00 Pacific: Aug 30 03:05:13 rtr02 Rib: %BGP-3-NOTIFICATION: sent to neighbor 4.35.X.Y (AS 3356) 4/0 (Hold Timer Expired Error/Unspecified) 0 bytes Aug 30 03:05:13 rtr02 Rib: %BGP-5-ADJCHANGE: peer 4.35.X.Y (AS 3356) old state Established event HoldTime new state Idle Aug 30 03:07:37 rtr02 Rib: %BGP-5-ADJCHANGE: peer 4.35.X.Y (AS 3356) old state OpenConfirm event RecvKeepAlive new state Established Aug 30 03:15:38 rtr02 Rib: %BGP-5-ADJCHANGE: peer 4.35.X.Y (AS 3356) old state Established event HoldTime new state Idle Aug 30 03:17:15 rtr02 Rib: %BGP-5-ADJCHANGE: peer 4.35.X.Y (AS 3356) old state OpenConfirm event RecvKeepAlive new state Established Aug 30 03:19:55 rtr02 Rib: %BGP-3-NOTIFICATION: sent to neighbor 4.35.X.Y+52091 (proto) 6/7 (Cease/connection collision resolution) 0 bytes Aug 30 03:20:11 rtr02 Rib: %BGP-3-NOTIFICATION: received from neighbor 4.35.X.Y (AS 3356) 4/0 (Hold Timer Expired Error/Unspecified) 0 bytes Aug 30 03:20:11 rtr02 Rib: %BGP-5-ADJCHANGE: peer 4.35.X.Y (AS 3356) old state Established event RecvNotify new state Idle And incoming traffic from AS3356 and AS209 both dropped to very low volumes. On Sun, Aug 30, 2020 at 5:58 AM Jason Kuehl <jason.w.kuehl@gmail.com> wrote:
Well, When I tried calling I got a fast busy, so that's nice.
On Sun, Aug 30, 2020 at 8:33 AM David Hubbard < dhubbard@dino.hostasaurus.com> wrote:
Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted.
They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years…
*From: *NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Drew Weaver via NANOG <nanog@nanog.org> *Reply-To: *Drew Weaver <drew.weaver@thenap.com> *Date: *Sunday, August 30, 2020 at 8:23 AM *To: *"nanog@nanog.org" <nanog@nanog.org> *Subject: *Centurylink having a bad morning?
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- Sincerely,
Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com
Now if you call into CL you get a message stating their technicians are working on an ip outage. On Sun, Aug 30, 2020 at 6:56 AM Chase Christian <madsushi@gmail.com> wrote:
Multiple BGP sessions with Level3 (DIA) started flapping at approx 03:00 Pacific:
Aug 30 03:05:13 rtr02 Rib: %BGP-3-NOTIFICATION: sent to neighbor 4.35.X.Y (AS 3356) 4/0 (Hold Timer Expired Error/Unspecified) 0 bytes Aug 30 03:05:13 rtr02 Rib: %BGP-5-ADJCHANGE: peer 4.35.X.Y (AS 3356) old state Established event HoldTime new state Idle Aug 30 03:07:37 rtr02 Rib: %BGP-5-ADJCHANGE: peer 4.35.X.Y (AS 3356) old state OpenConfirm event RecvKeepAlive new state Established Aug 30 03:15:38 rtr02 Rib: %BGP-5-ADJCHANGE: peer 4.35.X.Y (AS 3356) old state Established event HoldTime new state Idle Aug 30 03:17:15 rtr02 Rib: %BGP-5-ADJCHANGE: peer 4.35.X.Y (AS 3356) old state OpenConfirm event RecvKeepAlive new state Established Aug 30 03:19:55 rtr02 Rib: %BGP-3-NOTIFICATION: sent to neighbor 4.35.X.Y+52091 (proto) 6/7 (Cease/connection collision resolution) 0 bytes Aug 30 03:20:11 rtr02 Rib: %BGP-3-NOTIFICATION: received from neighbor 4.35.X.Y (AS 3356) 4/0 (Hold Timer Expired Error/Unspecified) 0 bytes Aug 30 03:20:11 rtr02 Rib: %BGP-5-ADJCHANGE: peer 4.35.X.Y (AS 3356) old state Established event RecvNotify new state Idle
And incoming traffic from AS3356 and AS209 both dropped to very low volumes.
On Sun, Aug 30, 2020 at 5:58 AM Jason Kuehl <jason.w.kuehl@gmail.com> wrote:
Well, When I tried calling I got a fast busy, so that's nice.
On Sun, Aug 30, 2020 at 8:33 AM David Hubbard < dhubbard@dino.hostasaurus.com> wrote:
Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted.
They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years…
*From: *NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Drew Weaver via NANOG <nanog@nanog.org> *Reply-To: *Drew Weaver <drew.weaver@thenap.com> *Date: *Sunday, August 30, 2020 at 8:23 AM *To: *"nanog@nanog.org" <nanog@nanog.org> *Subject: *Centurylink having a bad morning?
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- Sincerely,
Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com
Started about 5:05am central, started clearing up for me about 7:15am. My route from ATT in Chicago is still going through NYC to get back to Chicago but at least packet loss isn't 70-100% anymore. I also tried turning down sessions and still was seeing stale announcements on other LGs. On 08/30/2020 07:27 AM, David Hubbard wrote:
Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted.
They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years…
*From: *NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Drew Weaver via NANOG <nanog@nanog.org> *Reply-To: *Drew Weaver <drew.weaver@thenap.com> *Date: *Sunday, August 30, 2020 at 8:23 AM *To: *"nanog@nanog.org" <nanog@nanog.org> *Subject: *Centurylink having a bad morning?
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/ <https://www.centurylink.com/business/login/>
Just wondering what others are seeing.
I don’t think it is anywhere near back to normal but I can get to 2-3 more sites than I could when I got called in. From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of Andy Brezinsky Sent: Sunday, August 30, 2020 8:48 AM To: nanog@nanog.org Subject: Re: Centurylink having a bad morning? Started about 5:05am central, started clearing up for me about 7:15am. My route from ATT in Chicago is still going through NYC to get back to Chicago but at least packet loss isn't 70-100% anymore. I also tried turning down sessions and still was seeing stale announcements on other LGs. On 08/30/2020 07:27 AM, David Hubbard wrote: Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted. They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years… From: NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org><mailto:nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Drew Weaver via NANOG <nanog@nanog.org><mailto:nanog@nanog.org> Reply-To: Drew Weaver <drew.weaver@thenap.com><mailto:drew.weaver@thenap.com> Date: Sunday, August 30, 2020 at 8:23 AM To: "nanog@nanog.org"<mailto:nanog@nanog.org> <nanog@nanog.org><mailto:nanog@nanog.org> Subject: Centurylink having a bad morning? Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
I'm over in MA in a CL building, it's very much still broken. I shut down the interfaces to CL and now just using Comcast. On Sun, Aug 30, 2020 at 9:20 AM Andy Brezinsky <andy@mbrez.com> wrote:
Started about 5:05am central, started clearing up for me about 7:15am. My route from ATT in Chicago is still going through NYC to get back to Chicago but at least packet loss isn't 70-100% anymore.
I also tried turning down sessions and still was seeing stale announcements on other LGs.
On 08/30/2020 07:27 AM, David Hubbard wrote:
Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted.
They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years…
*From: *NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Drew Weaver via NANOG <nanog@nanog.org> <nanog@nanog.org> *Reply-To: *Drew Weaver <drew.weaver@thenap.com> <drew.weaver@thenap.com> *Date: *Sunday, August 30, 2020 at 8:23 AM *To: *"nanog@nanog.org" <nanog@nanog.org> <nanog@nanog.org> <nanog@nanog.org> *Subject: *Centurylink having a bad morning?
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- Sincerely, Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com
Yep Regards Ray Ludendorff On Aug 30, 2020, at 16:16, Jason Kuehl <jason.w.kuehl@gmail.com> wrote: I'm over in MA in a CL building, it's very much still broken. I shut down the interfaces to CL and now just using Comcast. On Sun, Aug 30, 2020 at 9:20 AM Andy Brezinsky <andy@mbrez.com<mailto:andy@mbrez.com>> wrote: Started about 5:05am central, started clearing up for me about 7:15am. My route from ATT in Chicago is still going through NYC to get back to Chicago but at least packet loss isn't 70-100% anymore. I also tried turning down sessions and still was seeing stale announcements on other LGs. On 08/30/2020 07:27 AM, David Hubbard wrote: Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted. They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years… From: NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org><mailto:nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Drew Weaver via NANOG <nanog@nanog.org><mailto:nanog@nanog.org> Reply-To: Drew Weaver <drew.weaver@thenap.com><mailto:drew.weaver@thenap.com> Date: Sunday, August 30, 2020 at 8:23 AM To: "nanog@nanog.org"<mailto:nanog@nanog.org> <nanog@nanog.org><mailto:nanog@nanog.org> Subject: Centurylink having a bad morning? Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing. -- Sincerely, Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com<mailto:jason.w.kuehl@gmail.com>
They’re not reachable so who knows if they’re even working on it.
Gonna go out on a limb here and assume that a lot of phones were ringing and people are in fact working on whatever it is. :) On Sun, Aug 30, 2020 at 8:34 AM David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted.
They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years…
*From: *NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Drew Weaver via NANOG <nanog@nanog.org> *Reply-To: *Drew Weaver <drew.weaver@thenap.com> *Date: *Sunday, August 30, 2020 at 8:23 AM *To: *"nanog@nanog.org" <nanog@nanog.org> *Subject: *Centurylink having a bad morning?
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
Big leaf SDWan is seeing the outage between gateways. https://status.bigleaf.net/incidents/31r4wts0jlrr Robert DeVita Managing Director Mejeticks c. 469-441-8864 e. radevita@mejeticks.com ________________________________ From: NANOG <nanog-bounces+radevita=mejeticks.com@nanog.org> on behalf of Tom Beecher <beecher@beecher.cc> Sent: Sunday, August 30, 2020 8:14:27 AM To: David Hubbard <dhubbard@dino.hostasaurus.com> Cc: nanog@nanog.org <nanog@nanog.org> Subject: Re: Centurylink having a bad morning? They’re not reachable so who knows if they’re even working on it. Gonna go out on a limb here and assume that a lot of phones were ringing and people are in fact working on whatever it is. :) On Sun, Aug 30, 2020 at 8:34 AM David Hubbard < dhubbard@dino.hostasaurus.com <mailto:dhubbard@dino.hostasaurus.com> > wrote: Same. Also, as reported on outages list, what’s even worse is that they appear to be continuing to propagate advertisements from circuits whose sessions have been turned down. I validated ours still were via a couple looking glass portals. Down Detector shows nearly every major service provider impacted. They’re not reachable so who knows if they’re even working on it. I feel like they’ve been cutting heavily on the network ops side in recent years… From: NANOG <nanog-bounces+dhubbard= dino.hostasaurus.com@nanog.org <mailto:dino.hostasaurus.com@nanog.org> > on behalf of Drew Weaver via NANOG < nanog@nanog.org <mailto:nanog@nanog.org> > Reply-To: Drew Weaver < drew.weaver@thenap.com <mailto:drew.weaver@thenap.com> > Date: Sunday, August 30, 2020 at 8:23 AM To: " nanog@nanog.org <mailto:nanog@nanog.org> " < nanog@nanog.org <mailto:nanog@nanog.org> > Subject: Centurylink having a bad morning? Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ <https://app.bitdam.com/api/v1.0/links/rewrite_click/?rewrite_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJyZXdyaXRlX2lkIjoiNWY0YmIzYzEwNDM1NjI4ZjkyODkyODAxIiwidXJsIjoiIn0.7_1Pand0E0IfNFM4q_f9NZvezYwC4AXy1Xi5SRStaSc&url=https%3A//www.centurylink.com/business/login/> Just wondering what others are seeing.
The CL portal loads for me, and I can log in, but it is slower than usual. Not seeing traffic issues on our CL circuits. -mel via cell On Aug 30, 2020, at 5:23 AM, Drew Weaver via NANOG <nanog@nanog.org> wrote: Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
BGP sessions randomly flapping or having routing issues in different cities since ~5AM EST On Sun, Aug 30, 2020 at 8:42 AM Mel Beckman <mel@beckman.org> wrote:
The CL portal loads for me, and I can log in, but it is slower than usual. Not seeing traffic issues on our CL circuits.
-mel via cell
On Aug 30, 2020, at 5:23 AM, Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
Saw the flapping in Cleveland but not in Cincinnatti or Ashburn… From: Tomas Lynch <tomas.lynch@gmail.com> Sent: Sunday, August 30, 2020 8:45 AM To: Mel Beckman <mel@beckman.org> Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@nanog.org Subject: Re: Centurylink having a bad morning? BGP sessions randomly flapping or having routing issues in different cities since ~5AM EST On Sun, Aug 30, 2020 at 8:42 AM Mel Beckman <mel@beckman.org<mailto:mel@beckman.org>> wrote: The CL portal loads for me, and I can log in, but it is slower than usual. Not seeing traffic issues on our CL circuits. -mel via cell On Aug 30, 2020, at 5:23 AM, Drew Weaver via NANOG <nanog@nanog.org<mailto:nanog@nanog.org>> wrote: Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
Flapping in Miami, Dallas, Atlanta, Los Angeles, Seattle and San Jose. It is also affecting some data centers in Europe too. but haven't seen flaps there, just suboptimal routing. On Sun, Aug 30, 2020 at 8:53 AM Drew Weaver <drew.weaver@thenap.com> wrote:
Saw the flapping in Cleveland but not in Cincinnatti or Ashburn…
*From:* Tomas Lynch <tomas.lynch@gmail.com> *Sent:* Sunday, August 30, 2020 8:45 AM *To:* Mel Beckman <mel@beckman.org> *Cc:* Drew Weaver <drew.weaver@thenap.com>; nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
BGP sessions randomly flapping or having routing issues in different cities since ~5AM EST
On Sun, Aug 30, 2020 at 8:42 AM Mel Beckman <mel@beckman.org> wrote:
The CL portal loads for me, and I can log in, but it is slower than usual. Not seeing traffic issues on our CL circuits.
-mel via cell
On Aug 30, 2020, at 5:23 AM, Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
Reporting from Europe, any IP with them in the path is unreachable from various providers. I guess they wanted to try IPv6-only.. :P IPv6 is fine, working fine, IPv4 not at all.. Antonis
On 30 Aug 2020, at 14:58, Tomas Lynch <tomas.lynch@gmail.com> wrote:
Flapping in Miami, Dallas, Atlanta, Los Angeles, Seattle and San Jose. It is also affecting some data centers in Europe too. but haven't seen flaps there, just suboptimal routing.
On Sun, Aug 30, 2020 at 8:53 AM Drew Weaver <drew.weaver@thenap.com> wrote: Saw the flapping in Cleveland but not in Cincinnatti or Ashburn…
From: Tomas Lynch <tomas.lynch@gmail.com> Sent: Sunday, August 30, 2020 8:45 AM To: Mel Beckman <mel@beckman.org> Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@nanog.org Subject: Re: Centurylink having a bad morning?
BGP sessions randomly flapping or having routing issues in different cities since ~5AM EST
On Sun, Aug 30, 2020 at 8:42 AM Mel Beckman <mel@beckman.org> wrote:
The CL portal loads for me, and I can log in, but it is slower than usual. Not seeing traffic issues on our CL circuits.
-mel via cell
On Aug 30, 2020, at 5:23 AM, Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
Latest updates from my tickets: 08/30/2020 14:28:20 GMT - The IP NOC confirmed a routing issue and commenced with troubleshooting efforts. Routing configuration adjustments have been made and service affecting alarms are beginning to clear. 08/30/2020 11:38:15 GMT - The IP NOC is engaged in cooperative escalated investigations to isolate and troubleshoot the fault at this time. 08/30/2020 11:03:09 GMT - On August 30, 2020 at 10:00 GMT, CenturyLink identified a Market Wide service impact. As this network fault is impacting multiple clients, the event has increased visibility with CenturyLink leadership. As such, client trouble tickets associated to this fault have been automatically escalated to higher priority. The NOC is engaged and investigating in order to isolate the cause. Please be advised that updates for this event will be relayed at a minimum of hourly unless otherwise noted. The information conveyed hereafter is associated to live troubleshooting effort and as the discovery process evolves through to service resolution, ticket closure, or post incident review, details may evolve. On Sun, Aug 30, 2020 at 7:30 AM Antonios Chariton <daknob.mac@gmail.com> wrote:
Reporting from Europe, any IP with them in the path is unreachable from various providers. I guess they wanted to try IPv6-only.. :P IPv6 is fine, working fine, IPv4 not at all..
Antonis
On 30 Aug 2020, at 14:58, Tomas Lynch <tomas.lynch@gmail.com> wrote:
Flapping in Miami, Dallas, Atlanta, Los Angeles, Seattle and San Jose. It is also affecting some data centers in Europe too. but haven't seen flaps there, just suboptimal routing.
On Sun, Aug 30, 2020 at 8:53 AM Drew Weaver <drew.weaver@thenap.com> wrote: Saw the flapping in Cleveland but not in Cincinnatti or Ashburn…
From: Tomas Lynch <tomas.lynch@gmail.com> Sent: Sunday, August 30, 2020 8:45 AM To: Mel Beckman <mel@beckman.org> Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@nanog.org Subject: Re: Centurylink having a bad morning?
BGP sessions randomly flapping or having routing issues in different cities since ~5AM EST
On Sun, Aug 30, 2020 at 8:42 AM Mel Beckman <mel@beckman.org> wrote:
The CL portal loads for me, and I can log in, but it is slower than usual. Not seeing traffic issues on our CL circuits.
-mel via cell
On Aug 30, 2020, at 5:23 AM, Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
How is that acceptable behavior? It's not, the best part. There RCA will be terrible. "Bad Regex" or the best I ever got was "Bad cable" just two words... My contact is ending soon... On Sun, Aug 30, 2020 at 10:29 AM Antonios Chariton <daknob.mac@gmail.com> wrote:
Reporting from Europe, any IP with them in the path is unreachable from various providers. I guess they wanted to try IPv6-only.. :P IPv6 is fine, working fine, IPv4 not at all..
Antonis
On 30 Aug 2020, at 14:58, Tomas Lynch <tomas.lynch@gmail.com> wrote:
Flapping in Miami, Dallas, Atlanta, Los Angeles, Seattle and San Jose. It is also affecting some data centers in Europe too. but haven't seen flaps there, just suboptimal routing.
On Sun, Aug 30, 2020 at 8:53 AM Drew Weaver <drew.weaver@thenap.com> wrote: Saw the flapping in Cleveland but not in Cincinnatti or Ashburn…
From: Tomas Lynch <tomas.lynch@gmail.com> Sent: Sunday, August 30, 2020 8:45 AM To: Mel Beckman <mel@beckman.org> Cc: Drew Weaver <drew.weaver@thenap.com>; nanog@nanog.org Subject: Re: Centurylink having a bad morning?
BGP sessions randomly flapping or having routing issues in different cities since ~5AM EST
On Sun, Aug 30, 2020 at 8:42 AM Mel Beckman <mel@beckman.org> wrote:
The CL portal loads for me, and I can log in, but it is slower than usual. Not seeing traffic issues on our CL circuits.
-mel via cell
On Aug 30, 2020, at 5:23 AM, Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- Sincerely, Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com
I just brought one of my sessions back up to attempt to avoid the blackholing, should be full feed, getting all of 850 v4 routes and 106 v6. From: NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Tomas Lynch <tomas.lynch@gmail.com> Date: Sunday, August 30, 2020 at 9:41 AM To: Drew Weaver <drew.weaver@thenap.com> Cc: "nanog@nanog.org" <nanog@nanog.org> Subject: Re: Centurylink having a bad morning? Flapping in Miami, Dallas, Atlanta, Los Angeles, Seattle and San Jose. It is also affecting some data centers in Europe too. but haven't seen flaps there, just suboptimal routing. On Sun, Aug 30, 2020 at 8:53 AM Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>> wrote: Saw the flapping in Cleveland but not in Cincinnatti or Ashburn… From: Tomas Lynch <tomas.lynch@gmail.com<mailto:tomas.lynch@gmail.com>> Sent: Sunday, August 30, 2020 8:45 AM To: Mel Beckman <mel@beckman.org<mailto:mel@beckman.org>> Cc: Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>>; nanog@nanog.org<mailto:nanog@nanog.org> Subject: Re: Centurylink having a bad morning? BGP sessions randomly flapping or having routing issues in different cities since ~5AM EST On Sun, Aug 30, 2020 at 8:42 AM Mel Beckman <mel@beckman.org<mailto:mel@beckman.org>> wrote: The CL portal loads for me, and I can log in, but it is slower than usual. Not seeing traffic issues on our CL circuits. -mel via cell On Aug 30, 2020, at 5:23 AM, Drew Weaver via NANOG <nanog@nanog.org<mailto:nanog@nanog.org>> wrote: Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
I believe from this moment forward things are converging back to normal. Kind regards, Job
I've been burning before. I'll wait at least an hour before turning my links back on. On Sun, Aug 30, 2020 at 11:31 AM Job Snijders <job@instituut.net> wrote:
I believe from this moment forward things are converging back to normal.
Kind regards,
Job
-- Sincerely, Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com
I am seeing some odd traffic deflections in the EU via 3356, but nothing in the US so far. Some scattershot oddball reports landing in our NOC, but nothing conclusive. On Sun, Aug 30, 2020 at 8:41 AM Mel Beckman <mel@beckman.org> wrote:
The CL portal loads for me, and I can log in, but it is slower than usual. Not seeing traffic issues on our CL circuits.
-mel via cell
On Aug 30, 2020, at 5:23 AM, Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes. On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances. søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
I’m not defending them but I am sure it isn’t intentional. From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of Baldur Norddahl Sent: Sunday, August 30, 2020 9:28 AM To: nanog@nanog.org Subject: Re: Centurylink having a bad morning? How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances. søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com<mailto:joe@breathe-underwater.com>>: Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes. On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org<mailto:nanog@nanog.org>> wrote: Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
Exactly. And asking that they somehow prove this won't happen again is impossible. - Mike Bolitho On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com
:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that. It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them. But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something? Regards, Baldur On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com
:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
If you have to have connectivity to them, you could always just instruct them not to announce your routes beyond their AS; paid peering, and announce through more reliable ASs such as 2914 and 1299. Many people do this. Otherwise, cut ties with them and save yourself the headaches.
On Aug 30, 2020, at 12:09, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote: Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote: I’m not defending them but I am sure it isn’t intentional.
From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of Baldur Norddahl Sent: Sunday, August 30, 2020 9:28 AM To: nanog@nanog.org Subject: Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
This is what happens when the design of 'god power' automation tools doesn't take into account the concept of blast radius. It might be more inconvenient to internally partition automated change management systems, but it can also limit the effect of automation tools gone awry. https://www.ibm.com/garage/method/practices/manage/practice_limited_blast_ra... https://principlesofchaos.org/ On Sun, Aug 30, 2020 at 10:09 AM Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins < joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating. Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are. L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen." On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins < joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
At this point you don't even know whether it's a human error (example: generating a flowspec rule for port TCP/179), a filtering issue (example: accepting a flowspec rule for port TCP/179), or a software issue (example: certain flowspec update crashes the BGP daemon). And in the third scenario I think that at least some portion of the blame shifts from the carrier to its vendors, assuming the thing that crashed was not a home-grown BGP implementation. With the route optimizer incidents - because let's face it, Honest Networker is on the money as usual https://honestnetworker.net/2020/08/06/as10990-routing/ - there is really no excuse for any tier-1 carrier, they should at the very least have strict prefix-list based filtering in place for customer-facing EBGP sessions. In those cases it's much easier to state who's not taking care of their proverbial lawn. Best regards, Martijn On 8/31/20 3:25 PM, Tom Beecher wrote: https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/ I definitely found Mr. Prince's writing about yesterday's events fascinating. Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are. L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen." On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il<mailto:hank@interall.co.il>> wrote: On 30/08/2020 20:08, Baldur Norddahl wrote: https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/ Sounds like Flowspec possibly blocking tcp/179 might be the cause. But that is Cloudflare speculation. Regards, Hank Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer An outage is what it is. I am not worried about outages. We have multiple transits to deal with that. It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them. But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something? Regards, Baldur On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com<mailto:mikebolitho@gmail.com>> wrote: Exactly. And asking that they somehow prove this won't happen again is impossible. - Mike Bolitho On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>> wrote: I’m not defending them but I am sure it isn’t intentional. From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org<mailto:thenap.com@nanog.org>> On Behalf Of Baldur Norddahl Sent: Sunday, August 30, 2020 9:28 AM To: nanog@nanog.org<mailto:nanog@nanog.org> Subject: Re: Centurylink having a bad morning? How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances. søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com<mailto:joe@breathe-underwater.com>>: Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes. On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org<mailto:nanog@nanog.org>> wrote: Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
Maybe we are idealizing these so-called tier-1 carriers and we, tier-ns, should treat them as what they really are: another AS. Accept that they are going to fail and do our best to mitigate the impact on our own networks, i.e. more peering. On Mon, Aug 31, 2020 at 9:54 AM Martijn Schmidt via NANOG <nanog@nanog.org> wrote:
At this point you don't even know whether it's a human error (example: generating a flowspec rule for port TCP/179), a filtering issue (example: accepting a flowspec rule for port TCP/179), or a software issue (example: certain flowspec update crashes the BGP daemon). And in the third scenario I think that at least some portion of the blame shifts from the carrier to its vendors, assuming the thing that crashed was not a home-grown BGP implementation.
With the route optimizer incidents - because let's face it, Honest Networker is on the money as usual https://honestnetworker.net/2020/08/06/as10990-routing/ - there is really no excuse for any tier-1 carrier, they should at the very least have strict prefix-list based filtering in place for customer-facing EBGP sessions. In those cases it's much easier to state who's not taking care of their proverbial lawn.
Best regards, Martijn
On 8/31/20 3:25 PM, Tom Beecher wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating.
Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins < joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
You're preaching to the choir here.. ;) On 8/31/20 4:33 PM, Tomas Lynch wrote: Maybe we are idealizing these so-called tier-1 carriers and we, tier-ns, should treat them as what they really are: another AS. Accept that they are going to fail and do our best to mitigate the impact on our own networks, i.e. more peering. On Mon, Aug 31, 2020 at 9:54 AM Martijn Schmidt via NANOG <nanog@nanog.org<mailto:nanog@nanog.org>> wrote: At this point you don't even know whether it's a human error (example: generating a flowspec rule for port TCP/179), a filtering issue (example: accepting a flowspec rule for port TCP/179), or a software issue (example: certain flowspec update crashes the BGP daemon). And in the third scenario I think that at least some portion of the blame shifts from the carrier to its vendors, assuming the thing that crashed was not a home-grown BGP implementation. With the route optimizer incidents - because let's face it, Honest Networker is on the money as usual https://honestnetworker.net/2020/08/06/as10990-routing/ - there is really no excuse for any tier-1 carrier, they should at the very least have strict prefix-list based filtering in place for customer-facing EBGP sessions. In those cases it's much easier to state who's not taking care of their proverbial lawn. Best regards, Martijn On 8/31/20 3:25 PM, Tom Beecher wrote: https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/ I definitely found Mr. Prince's writing about yesterday's events fascinating. Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are. L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen." On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il<mailto:hank@interall.co.il>> wrote: On 30/08/2020 20:08, Baldur Norddahl wrote: https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/ Sounds like Flowspec possibly blocking tcp/179 might be the cause. But that is Cloudflare speculation. Regards, Hank Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer An outage is what it is. I am not worried about outages. We have multiple transits to deal with that. It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them. But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something? Regards, Baldur On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com<mailto:mikebolitho@gmail.com>> wrote: Exactly. And asking that they somehow prove this won't happen again is impossible. - Mike Bolitho On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>> wrote: I’m not defending them but I am sure it isn’t intentional. From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org<mailto:thenap.com@nanog.org>> On Behalf Of Baldur Norddahl Sent: Sunday, August 30, 2020 9:28 AM To: nanog@nanog.org<mailto:nanog@nanog.org> Subject: Re: Centurylink having a bad morning? How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances. søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com<mailto:joe@breathe-underwater.com>>: Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes. On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org<mailto:nanog@nanog.org>> wrote: Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
That's all we can do. Thankfully I work for an org that understands this and has *at least* two fully redundant circuits. Sometimes a third smaller carrier if we can prove that it is diverse, but that isn't the case very often. - Mike Bolitho On Mon, Aug 31, 2020 at 7:35 AM Tomas Lynch <tomas.lynch@gmail.com> wrote:
Maybe we are idealizing these so-called tier-1 carriers and we, tier-ns, should treat them as what they really are: another AS. Accept that they are going to fail and do our best to mitigate the impact on our own networks, i.e. more peering.
On Mon, Aug 31, 2020 at 9:54 AM Martijn Schmidt via NANOG <nanog@nanog.org> wrote:
At this point you don't even know whether it's a human error (example: generating a flowspec rule for port TCP/179), a filtering issue (example: accepting a flowspec rule for port TCP/179), or a software issue (example: certain flowspec update crashes the BGP daemon). And in the third scenario I think that at least some portion of the blame shifts from the carrier to its vendors, assuming the thing that crashed was not a home-grown BGP implementation.
With the route optimizer incidents - because let's face it, Honest Networker is on the money as usual https://honestnetworker.net/2020/08/06/as10990-routing/ - there is really no excuse for any tier-1 carrier, they should at the very least have strict prefix-list based filtering in place for customer-facing EBGP sessions. In those cases it's much easier to state who's not taking care of their proverbial lawn.
Best regards, Martijn
On 8/31/20 3:25 PM, Tom Beecher wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating.
Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins < joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
At the end of the day, the business needs to besides to take that cost. All you can do is document, and talk about the risks. Save that email for that "I told you so moment" On Mon, Aug 31, 2020 at 10:50 AM Mike Bolitho <mikebolitho@gmail.com> wrote:
That's all we can do. Thankfully I work for an org that understands this and has *at least* two fully redundant circuits. Sometimes a third smaller carrier if we can prove that it is diverse, but that isn't the case very often.
- Mike Bolitho
On Mon, Aug 31, 2020 at 7:35 AM Tomas Lynch <tomas.lynch@gmail.com> wrote:
Maybe we are idealizing these so-called tier-1 carriers and we, tier-ns, should treat them as what they really are: another AS. Accept that they are going to fail and do our best to mitigate the impact on our own networks, i.e. more peering.
On Mon, Aug 31, 2020 at 9:54 AM Martijn Schmidt via NANOG < nanog@nanog.org> wrote:
At this point you don't even know whether it's a human error (example: generating a flowspec rule for port TCP/179), a filtering issue (example: accepting a flowspec rule for port TCP/179), or a software issue (example: certain flowspec update crashes the BGP daemon). And in the third scenario I think that at least some portion of the blame shifts from the carrier to its vendors, assuming the thing that crashed was not a home-grown BGP implementation.
With the route optimizer incidents - because let's face it, Honest Networker is on the money as usual https://honestnetworker.net/2020/08/06/as10990-routing/ - there is really no excuse for any tier-1 carrier, they should at the very least have strict prefix-list based filtering in place for customer-facing EBGP sessions. In those cases it's much easier to state who's not taking care of their proverbial lawn.
Best regards, Martijn
On 8/31/20 3:25 PM, Tom Beecher wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating.
Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins < joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG < nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- Sincerely, Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com
Not everyone will peer with you, notably, AS3356 (unless you're big enough, which few can say.) On 8/31/20 4:33 PM, Tomas Lynch wrote:
Maybe we are idealizing these so-called tier-1 carriers and we, tier-ns, should treat them as what they really are: another AS. Accept that they are going to fail and do our best to mitigate the impact on our own networks, i.e. more peering.
On Mon, Aug 31, 2020 at 9:54 AM Martijn Schmidt via NANOG <nanog@nanog.org <mailto:nanog@nanog.org>> wrote:
At this point you don't even know whether it's a human error (example: generating a flowspec rule for port TCP/179), a filtering issue (example: accepting a flowspec rule for port TCP/179), or a software issue (example: certain flowspec update crashes the BGP daemon). And in the third scenario I think that at least some portion of the blame shifts from the carrier to its vendors, assuming the thing that crashed was not a home-grown BGP implementation.
With the route optimizer incidents - because let's face it, Honest Networker is on the money as usual https://honestnetworker.net/2020/08/06/as10990-routing/ - there is really no excuse for any tier-1 carrier, they should at the very least have strict prefix-list based filtering in place for customer-facing EBGP sessions. In those cases it's much easier to state who's not taking care of their proverbial lawn.
Best regards, Martijn
On 8/31/20 3:25 PM, Tom Beecher wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating.
Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il <mailto:hank@interall.co.il>> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com <mailto:mikebolitho@gmail.com>> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com <mailto:drew.weaver@thenap.com>> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org <mailto:thenap.com@nanog.org>> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org <mailto:nanog@nanog.org> *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com <mailto:joe@breathe-underwater.com>>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org <mailto:nanog@nanog.org>> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
On Thu, Sep 3, 2020 at 2:37 PM Mark Tinka <mark.tinka@seacom.com> wrote:
On 31/Aug/20 17:57, Bryan Holloway wrote:
Not everyone will peer with you, notably, AS3356 (unless you're big enough, which few can say.)
I think Tomas meant more diverse peering, not peering with CL.
Oh, yes! Let's not start another "what's a tier one" war!
Mark.
On 4/Sep/20 23:41, Tomas Lynch wrote:
Oh, yes! Let's not start another "what's a tier one" war!
Oh no, let's :-). We get over here in Africa as well. Local operators either calling themselves Tier 1, or being called a Tier 1. Nonsensical. Years back, our Marketing team asked me to comment on the use of "Tier" for our literature. You can probably imagine what I said :-). For me, it's simple - you are present in X cities or Y cities. Tier is useless because the Internet does not come from a single country or a single operator. And saying a network is "big" or "small" is subjective to everyone's perspective, so that doesn't help either. So you're present here, and present there. That's it. It's 2020 :-). Mark.
I find it most useful as a warning beacon. If anyone is talking about how they are or want "Tier 1", then I need to back away slowly. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Mark Tinka via NANOG" <nanog@nanog.org> To: "Tomas Lynch" <tomas.lynch@gmail.com> Cc: "NANOG" <nanog@nanog.org> Sent: Saturday, September 5, 2020 5:26:21 AM Subject: Re: Centurylink having a bad morning? On 4/Sep/20 23:41, Tomas Lynch wrote: Oh, yes! Let's not start another "what's a tier one" war! Oh no, let's :-). We get over here in Africa as well. Local operators either calling themselves Tier 1, or being called a Tier 1. Nonsensical. Years back, our Marketing team asked me to comment on the use of "Tier" for our literature. You can probably imagine what I said :-). For me, it's simple - you are present in X cities or Y cities. Tier is useless because the Internet does not come from a single country or a single operator. And saying a network is "big" or "small" is subjective to everyone's perspective, so that doesn't help either. So you're present here, and present there. That's it. It's 2020 :-). Mark.
[ off list ]
Oh, yes! Let's not start another "what's a tier one" war!
Oh no, let's :-).
We get over here in Africa as well. Local operators either calling themselves Tier 1, or being called a Tier 1. Nonsensical.
Years back, our Marketing team asked me to comment on the use of "Tier" for our literature. You can probably imagine what I said :-).
For me, it's simple - you are present in X cities or Y cities. Tier is useless because the Internet does not come from a single country or a single operator. And saying a network is "big" or "small" is subjective to everyone's perspective, so that doesn't help either.
So you're present here, and present there. That's it.
spoken like a tier two randy
The more diversified your peering, the better you are. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Mark Tinka" <mark.tinka@seacom.com> To: nanog@nanog.org Sent: Thursday, September 3, 2020 1:35:46 PM Subject: Re: Centurylink having a bad morning? On 31/Aug/20 17:57, Bryan Holloway wrote:
Not everyone will peer with you, notably, AS3356 (unless you're big enough, which few can say.)
I think Tomas meant more diverse peering, not peering with CL. Mark.
Unless a certain Tier 1 is also a CDN. On 9/5/20 5:12 PM, Mike Hammett via NANOG wrote:
The more diversified your peering, the better you are.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
------------------------------------------------------------------------ *From: *"Mark Tinka" <mark.tinka@seacom.com> *To: *nanog@nanog.org *Sent: *Thursday, September 3, 2020 1:35:46 PM *Subject: *Re: Centurylink having a bad morning?
On 31/Aug/20 17:57, Bryan Holloway wrote:
Not everyone will peer with you, notably, AS3356 (unless you're big enough, which few can say.)
I think Tomas meant more diverse peering, not peering with CL.
Mark.
Hey Tomas ! I would like to buy you a very large beber mug! They are just another AS! For example... What gives then the theorical right to not publish informations on PeeringDB like AS-SET, to allow the paid Peering partners of then go filter theyr announced routes? And I'm not talking specific of 209/3346/3549... It applies to all of then! And the same to some IXP route-servers! Some popular IXP does not keep info update to allow who peer with then check if the routes the reannounce are correct. There is any RFC that says: "All other ASN must trust in all the routes that Tier1 and IXP route-servers announces. And are not allowed to check if it is correct." Em seg, 31 de ago de 2020 11:36, Tomas Lynch <tomas.lynch@gmail.com> escreveu:
Maybe we are idealizing these so-called tier-1 carriers and we, tier-ns, should treat them as what they really are: another AS. Accept that they are going to fail and do our best to mitigate the impact on our own networks, i.e. more peering.
On Mon, Aug 31, 2020 at 9:54 AM Martijn Schmidt via NANOG <nanog@nanog.org> wrote:
At this point you don't even know whether it's a human error (example: generating a flowspec rule for port TCP/179), a filtering issue (example: accepting a flowspec rule for port TCP/179), or a software issue (example: certain flowspec update crashes the BGP daemon). And in the third scenario I think that at least some portion of the blame shifts from the carrier to its vendors, assuming the thing that crashed was not a home-grown BGP implementation.
With the route optimizer incidents - because let's face it, Honest Networker is on the money as usual https://honestnetworker.net/2020/08/06/as10990-routing/ - there is really no excuse for any tier-1 carrier, they should at the very least have strict prefix-list based filtering in place for customer-facing EBGP sessions. In those cases it's much easier to state who's not taking care of their proverbial lawn.
Best regards, Martijn
On 8/31/20 3:25 PM, Tom Beecher wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating.
Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins < joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
On 31/Aug/20 16:33, Tomas Lynch wrote:
Maybe we are idealizing these so-called tier-1 carriers and we, tier-ns, should treat them as what they really are: another AS. Accept that they are going to fail and do our best to mitigate the impact on our own networks, i.e. more peering.
Bingo! Mark.
I also found the part where they mention that a lot of hosting companies only have one uplink to be quizzical and also the fact that he goes pretty close to implying that its Centurylink’s customers fault for not having multiple paths to Cloudflare that don’t touch Centurylink a bit puzzling. It could have just been poorly written. From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of Tom Beecher Sent: Monday, August 31, 2020 9:26 AM To: Hank Nussbacher <hank@interall.co.il> Cc: NANOG <nanog@nanog.org> Subject: Re: Centurylink having a bad morning? https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/ I definitely found Mr. Prince's writing about yesterday's events fascinating. Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are. L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen." On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il<mailto:hank@interall.co.il>> wrote: On 30/08/2020 20:08, Baldur Norddahl wrote: https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/ Sounds like Flowspec possibly blocking tcp/179 might be the cause. But that is Cloudflare speculation. Regards, Hank Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer An outage is what it is. I am not worried about outages. We have multiple transits to deal with that. It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them. But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something? Regards, Baldur On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com<mailto:mikebolitho@gmail.com>> wrote: Exactly. And asking that they somehow prove this won't happen again is impossible. - Mike Bolitho On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com<mailto:drew.weaver@thenap.com>> wrote: I’m not defending them but I am sure it isn’t intentional. From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org<mailto:thenap.com@nanog.org>> On Behalf Of Baldur Norddahl Sent: Sunday, August 30, 2020 9:28 AM To: nanog@nanog.org<mailto:nanog@nanog.org> Subject: Re: Centurylink having a bad morning? How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances. søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com<mailto:joe@breathe-underwater.com>>: Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes. On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org<mailto:nanog@nanog.org>> wrote: Hello, Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal. As of right now their support portal won’t load: https://www.centurylink.com/business/login/ Just wondering what others are seeing.
There's a number of enterprise end user type customers of 3356 that have on-premises server rooms/hosting for their stuff. And they spend a lot of money every month for a 'redundant' metro ethernet circuit that takes diverse fiber paths from their business park office building to the local clink/level3 POP. But all that last mile redundancy and fail over ability doesn't do much for them when 3356 breaks its network at the BGP level. On Mon, Aug 31, 2020 at 9:36 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I also found the part where they mention that a lot of hosting companies only have one uplink to be quizzical and also the fact that he goes pretty close to implying that its Centurylink’s customers fault for not having multiple paths to Cloudflare that don’t touch Centurylink a bit puzzling. It could have just been poorly written.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Tom Beecher *Sent:* Monday, August 31, 2020 9:26 AM *To:* Hank Nussbacher <hank@interall.co.il> *Cc:* NANOG <nanog@nanog.org> *Subject:* Re: Centurylink having a bad morning?
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating.
Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank
Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com
:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
On Mon, Aug 31, 2020 at 3:52 PM Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
There's a number of enterprise end user type customers of 3356 that have on-premises server rooms/hosting for their stuff. And they spend a lot of money every month for a 'redundant' metro ethernet circuit that takes diverse fiber paths from their business park office building to the local clink/level3 POP. But all that last mile redundancy and fail over ability doesn't do much for them when 3356 breaks its network at the BGP level.
There is a lot of stuff that fails in an ugly way when a network breaks and doesn't withdraw; in many (most?) ways it acts just like a hijack... W
On Mon, Aug 31, 2020 at 9:36 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I also found the part where they mention that a lot of hosting companies only have one uplink to be quizzical and also the fact that he goes pretty close to implying that its Centurylink’s customers fault for not having multiple paths to Cloudflare that don’t touch Centurylink a bit puzzling. It could have just been poorly written.
From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of Tom Beecher Sent: Monday, August 31, 2020 9:26 AM To: Hank Nussbacher <hank@interall.co.il> Cc: NANOG <nanog@nanog.org> Subject: Re: Centurylink having a bad morning?
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating.
Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank
Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of Baldur Norddahl Sent: Sunday, August 30, 2020 9:28 AM To: nanog@nanog.org Subject: Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- I don't think the execution is relevant when it was obviously a bad idea in the first place. This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants. ---maf
Hopefully those customers learned the difference between redundancy and diversity this weekend. :) On Mon, Aug 31, 2020 at 3:48 PM Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
There's a number of enterprise end user type customers of 3356 that have on-premises server rooms/hosting for their stuff. And they spend a lot of money every month for a 'redundant' metro ethernet circuit that takes diverse fiber paths from their business park office building to the local clink/level3 POP. But all that last mile redundancy and fail over ability doesn't do much for them when 3356 breaks its network at the BGP level.
On Mon, Aug 31, 2020 at 9:36 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I also found the part where they mention that a lot of hosting companies only have one uplink to be quizzical and also the fact that he goes pretty close to implying that its Centurylink’s customers fault for not having multiple paths to Cloudflare that don’t touch Centurylink a bit puzzling. It could have just been poorly written.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Tom Beecher *Sent:* Monday, August 31, 2020 9:26 AM *To:* Hank Nussbacher <hank@interall.co.il> *Cc:* NANOG <nanog@nanog.org> *Subject:* Re: Centurylink having a bad morning?
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating.
Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank
Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com
:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
On Mon, Aug 31, 2020 at 4:36 PM Tom Beecher <beecher@beecher.cc> wrote:
Hopefully those customers learned the difference between redundancy and diversity this weekend. :)
I'm unclear how either solves things for many customers... If they had CenturyLink and AcmeNetworkWidgets, and announce the same network through both -- and their connection to CL went down, *but CL continues to announce / doesn't withdraw* they are still stuck, yes? (Unless they can deaggregate that is...) What am I missing? W
On Mon, Aug 31, 2020 at 3:48 PM Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
There's a number of enterprise end user type customers of 3356 that have on-premises server rooms/hosting for their stuff. And they spend a lot of money every month for a 'redundant' metro ethernet circuit that takes diverse fiber paths from their business park office building to the local clink/level3 POP. But all that last mile redundancy and fail over ability doesn't do much for them when 3356 breaks its network at the BGP level.
On Mon, Aug 31, 2020 at 9:36 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I also found the part where they mention that a lot of hosting companies only have one uplink to be quizzical and also the fact that he goes pretty close to implying that its Centurylink’s customers fault for not having multiple paths to Cloudflare that don’t touch Centurylink a bit puzzling. It could have just been poorly written.
From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of Tom Beecher Sent: Monday, August 31, 2020 9:26 AM To: Hank Nussbacher <hank@interall.co.il> Cc: NANOG <nanog@nanog.org> Subject: Re: Centurylink having a bad morning?
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating.
Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank
Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of Baldur Norddahl Sent: Sunday, August 30, 2020 9:28 AM To: nanog@nanog.org Subject: Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- I don't think the execution is relevant when it was obviously a bad idea in the first place. This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants. ---maf
In this specific event, 3356 not withdrawing routes is certainly a head scratcher, and I'm sure for many the thing we're most looking forward to a definitive answer on. However, if a network only has 3356 as their upstream, they are 100% at the mercy of 3356 at all times. Having a redundant AND diverse connection to a 2nd upstream ASN at least provides you some options. In this case for example, let's say at all times you did a +2 prepend to both 3356 and Acme. 3356 even happens, you shut down your session to them. Some percentage of your traffic that would have been faceplanting in/through 3356 now works via Acme. Then you notice the non-withdrawl issue. You can then remove 1 prepend, or perhaps deagg strategically to try and get more traffic away from the trouble. A redundant path to a different.upstream at least provides you some potential options to work around that with which you otherwise could not. It wouldn't be perfect, but options > no options. On Mon, Aug 31, 2020 at 5:08 PM Warren Kumari <warren@kumari.net> wrote:
On Mon, Aug 31, 2020 at 4:36 PM Tom Beecher <beecher@beecher.cc> wrote:
Hopefully those customers learned the difference between redundancy and
diversity this weekend. :)
I'm unclear how either solves things for many customers...
If they had CenturyLink and AcmeNetworkWidgets, and announce the same network through both -- and their connection to CL went down, *but CL continues to announce / doesn't withdraw* they are still stuck, yes? (Unless they can deaggregate that is...) What am I missing?
W
On Mon, Aug 31, 2020 at 3:48 PM Eric Kuhnke <eric.kuhnke@gmail.com>
There's a number of enterprise end user type customers of 3356 that
have on-premises server rooms/hosting for their stuff. And they spend a lot of money every month for a 'redundant' metro ethernet circuit that takes
On Mon, Aug 31, 2020 at 9:36 AM Drew Weaver <drew.weaver@thenap.com>
wrote:
I also found the part where they mention that a lot of hosting
companies only have one uplink to be quizzical and also the fact that he goes pretty close to implying that its Centurylink’s customers fault for not having multiple paths to Cloudflare that don’t touch Centurylink a bit
From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On
Behalf Of Tom Beecher
Sent: Monday, August 31, 2020 9:26 AM To: Hank Nussbacher <hank@interall.co.il> Cc: NANOG <nanog@nanog.org> Subject: Re: Centurylink having a bad morning?
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events
fascinating.
Verizon makes a mistake with BGP filters that allows a secondary
mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global
connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il>
wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank
Caveat: The views expressed above are solely my own and do not express
An outage is what it is. I am not worried about outages. We have
multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and
customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system
But I disagree in that it would be impossible. They need to make a
good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com>
wrote:
Exactly. And asking that they somehow prove this won't happen again is
impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com>
wrote:
I’m not defending them but I am sure it isn’t intentional.
From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On
Behalf Of Baldur Norddahl
Sent: Sunday, August 30, 2020 9:28 AM To: nanog@nanog.org Subject: Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my
wrote: diverse fiber paths from their business park office building to the local clink/level3 POP. But all that last mile redundancy and fail over ability doesn't do much for them when 3356 breaks its network at the BGP level. puzzling. It could have just been poorly written. the views or opinions of my employer that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them. prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <
joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The
only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org>
wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity
had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load:
https://www.centurylink.com/business/login/
Just wondering what others are seeing.
-- I don't think the execution is relevant when it was obviously a bad idea in the first place. This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants. ---maf
We’re bailing out a customer in exactly this same boat as we speak. There are so many. Ms. Benjamin PD Cannon, ASCE 6x7 Networks & 6x7 Telecom, LLC CEO ben@6by7.net "The only fully end-to-end encrypted global telecommunications company in the world.” FCC License KJ6FJJ
On Aug 31, 2020, at 12:52 PM, Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
There's a number of enterprise end user type customers of 3356 that have on-premises server rooms/hosting for their stuff. And they spend a lot of money every month for a 'redundant' metro ethernet circuit that takes diverse fiber paths from their business park office building to the local clink/level3 POP. But all that last mile redundancy and fail over ability doesn't do much for them when 3356 breaks its network at the BGP level.
On Mon, Aug 31, 2020 at 9:36 AM Drew Weaver <drew.weaver@thenap.com> wrote: I also found the part where they mention that a lot of hosting companies only have one uplink to be quizzical and also the fact that he goes pretty close to implying that its Centurylink’s customers fault for not having multiple paths to Cloudflare that don’t touch Centurylink a bit puzzling. It could have just been poorly written.
From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of Tom Beecher Sent: Monday, August 31, 2020 9:26 AM To: Hank Nussbacher <hank@interall.co.il> Cc: NANOG <nanog@nanog.org> Subject: Re: Centurylink having a bad morning?
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
I definitely found Mr. Prince's writing about yesterday's events fascinating.
Verizon makes a mistake with BGP filters that allows a secondary mistake from leaked "optimizer" routes to propagate, and Mr. Prince takes every opportunity to lob large chunks of granite about how terrible they are.
L3 allows an erroneous flowspec announcement to cause massive global connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank@interall.co.il> wrote:
On 30/08/2020 20:08, Baldur Norddahl wrote:
https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
Sounds like Flowspec possibly blocking tcp/179 might be the cause.
But that is Cloudflare speculation.
Regards, Hank
Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
An outage is what it is. I am not worried about outages. We have multiple transits to deal with that.
It is the keep announcing prefixes after withdrawal from peers and customers that is the huge problem here. That is killing all the effort and money I put into having redundancy. It is sabotage of my network after I cut the ties. I do not want to be a customer at an outlet who has a system that will do that. Luckily we do not currently have a contract and now they will have to convince me it is safe for me to make a contract with them. If that is impossible I guess I won't be getting a contract with them.
But I disagree in that it would be impossible. They need to make a good report telling exactly what went wrong and how they changed the design, so something like this can not happen again. The basic design of BGP is such that this should not happen easily if at all. They did something unwise. Did they make a route reflector based on a database or something?
Regards,
Baldur
On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho@gmail.com> wrote:
Exactly. And asking that they somehow prove this won't happen again is impossible.
- Mike Bolitho
On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
From: NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> On Behalf Of Baldur Norddahl Sent: Sunday, August 30, 2020 9:28 AM To: nanog@nanog.org Subject: Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com>:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
Eric Kuhnke <eric.kuhnke@gmail.com> writes:
There's a number of enterprise end user type customers of 3356 that have on-premises server rooms/hosting for their stuff. And they spend a lot of money every month for a 'redundant' metro ethernet circuit that takes diverse fiber paths from their business park office building to the local clink/level3 POP. But all that last mile redundancy and fail over ability doesn't do much for them when 3356 breaks its network at the BGP level.
Well, many of us are paying for redundant power supplies or redundant REs, even if that doesn't make any difference when the chassis is on fire. I guess most people know that, and still buy those redundant components. It's always about cost versus risk. I certainly hope nobody here believe single homed is completely failsafe. (the lack of withdrawal making multi homed customers fail too is more unexpected, and a new factor to consider in the future) Bjørn
On Mon, Aug 31, 2020 at 11:28 PM Bjørn Mork <bjorn@mork.no> wrote:
Well, many of us are paying for redundant power supplies or redundant REs, even if that doesn't make any difference when the chassis is on fire. I guess most people know that, and still buy those redundant components.
I buy it so I can walk the machine from an old UPS to a new UPS. Those instances occur with much more frequency than chassis fires. ;) -A
As a coincidence... I was *thinking* of moving a 90TB SAN (with mechanical's) to another rack that way... skateboard, long fibers and long power cords =D Beat installing a Cisco 12k solo with 2x4's to align the mounting holes... ----- Alain Hebert ahebert@pubnix.net PubNIX Inc. 50 boul. St-Charles P.O. Box 26770 Beaconsfield, Quebec H9W 6G7 Tel: 514-990-5911 http://www.pubnix.net Fax: 514-990-9443 On 2020-09-01 16:16, Aaron C. de Bruyn via NANOG wrote:
On Mon, Aug 31, 2020 at 11:28 PM Bjørn Mork <bjorn@mork.no <mailto:bjorn@mork.no>> wrote:
Well, many of us are paying for redundant power supplies or redundant REs, even if that doesn't make any difference when the chassis is on fire. I guess most people know that, and still buy those redundant components.
I buy it so I can walk the machine from an old UPS to a new UPS. Those instances occur with much more frequency than chassis fires. ;)
-A
On Tue, Sep 1, 2020 at 11:53 PM Alain Hebert <ahebert@pubnix.net> wrote:
As a coincidence... I was *thinking* of moving a 90TB SAN (with mechanical's) to another rack that way... skateboard, long fibers and long power cords =D
well, what you REALLY need is one of these: https://www.cru-inc.com/products/wiebetech/hotplug_field_kit_product/ and 2-3 UPS... swap to the UPS, then just roll the stack over, plug to utility and done. (minus network transfer)
We once moved a 3u server 30 miles between data centers this way. Plug redundant psu into a ups and 2 people carried it out and put them in a vehicle. Sent from my iPhone
On Sep 1, 2020, at 11:58 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Tue, Sep 1, 2020 at 11:53 PM Alain Hebert <ahebert@pubnix.net> wrote:
As a coincidence... I was *thinking* of moving a 90TB SAN (with mechanical's) to another rack that way... skateboard, long fibers and long power cords =D
well, what you REALLY need is one of these: https://www.cru-inc.com/products/wiebetech/hotplug_field_kit_product/
and 2-3 UPS... swap to the UPS, then just roll the stack over, plug to utility and done. (minus network transfer)
Shawn L via NANOG wrote on 02/09/2020 12:15:
We once moved a 3u server 30 miles between data centers this way. Plug redundant psu into a ups and 2 people carried it out and put them in a vehicle.
hopefully none of these server moves that people have been talking about involved spinning disks. If they did, kit damage is one of the likely outcomes - you seriously do not want to bump active spindles: www.google.com/search?q=disk+platter+damage&tbm=isch SSDs are a different story. In that case it's just a bit odd as to why you wouldn't want to power down a system to physically move it - in the sense that if your service delivery model can't withstand periodic maintenance and loss of availability of individual components, rethinking the model might be productive. Nick
On 9/2/20 1:49 PM, Nick Hilliard wrote:
Shawn L via NANOG wrote on 02/09/2020 12:15:
We once moved a 3u server 30 miles between data centers this way. Plug redundant psu into a ups and 2 people carried it out and put them in a vehicle.
hopefully none of these server moves that people have been talking about involved spinning disks. If they did, kit damage is one of the likely outcomes - you seriously do not want to bump active spindles:
www.google.com/search?q=disk+platter+damage&tbm=isch
SSDs are a different story. In that case it's just a bit odd as to why you wouldn't want to power down a system to physically move it - in the sense that if your service delivery model can't withstand periodic maintenance and loss of availability of individual components, rethinking the model might be productive.
Nick
If it's your server, moving beyond (very) local facilities, and time is not of the essence, then sure: power down. If you're law-enforcement mid-raid, or trying to preserve your Frogger high-score, well, ...
If the client pays me a shit ton of money to make sure the server won't turn off, and they pay for the hardware to make it happen. I;d think about it. It's a like a colo move on hardmode. Its extremely stupid, and I would advise not doing it. Hell even when I migrated e911 server, we had a 20 minutes outage to move the physical server. If that server can't be shut off, something was built wrong. On Wed, Sep 2, 2020 at 9:33 AM Bryan Holloway <bryan@shout.net> wrote:
On 9/2/20 1:49 PM, Nick Hilliard wrote:
Shawn L via NANOG wrote on 02/09/2020 12:15:
We once moved a 3u server 30 miles between data centers this way. Plug redundant psu into a ups and 2 people carried it out and put them in a vehicle.
hopefully none of these server moves that people have been talking about involved spinning disks. If they did, kit damage is one of the likely outcomes - you seriously do not want to bump active spindles:
www.google.com/search?q=disk+platter+damage&tbm=isch
SSDs are a different story. In that case it's just a bit odd as to why you wouldn't want to power down a system to physically move it - in the sense that if your service delivery model can't withstand periodic maintenance and loss of availability of individual components, rethinking the model might be productive.
Nick
If it's your server, moving beyond (very) local facilities, and time is not of the essence, then sure: power down.
If you're law-enforcement mid-raid, or trying to preserve your Frogger high-score, well, ...
-- Sincerely, Jason W Kuehl Cell 920-419-8983 jason.w.kuehl@gmail.com
While conserving connectivity? 😂 ________________________________ De : Shawn L via NANOG <nanog@nanog.org> Envoyé : mercredi 2 septembre 2020 13:15 À : nanog Objet : Re: Centurylink having a bad morning? We once moved a 3u server 30 miles between data centers this way. Plug redundant psu into a ups and 2 people carried it out and put them in a vehicle. Sent from my iPhone
On Sep 1, 2020, at 11:58 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Tue, Sep 1, 2020 at 11:53 PM Alain Hebert <ahebert@pubnix.net> wrote:
As a coincidence... I was *thinking* of moving a 90TB SAN (with mechanical's) to another rack that way... skateboard, long fibers and long power cords =D
well, what you REALLY need is one of these: https://www.cru-inc.com/products/wiebetech/hotplug_field_kit_product/
and 2-3 UPS... swap to the UPS, then just roll the stack over, plug to utility and done. (minus network transfer)
That is what the 5G router is for... ons. 2. sep. 2020 19.47 skrev Michael Hallgren <mh@xalto.net>:
While conserving connectivity? 😂
------------------------------ *De :* Shawn L via NANOG <nanog@nanog.org> *Envoyé :* mercredi 2 septembre 2020 13:15 *À :* nanog *Objet :* Re: Centurylink having a bad morning?
We once moved a 3u server 30 miles between data centers this way. Plug redundant psu into a ups and 2 people carried it out and put them in a vehicle.
Sent from my iPhone
On Sep 1, 2020, at 11:58 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Tue, Sep 1, 2020 at 11:53 PM Alain Hebert <ahebert@pubnix.net> wrote:
As a coincidence... I was *thinking* of moving a 90TB SAN (with
mechanical's) to another rack that way... skateboard, long fibers and long power cords =D
well, what you REALLY need is one of these: https://www.cru-inc.com/products/wiebetech/hotplug_field_kit_product/
and 2-3 UPS... swap to the UPS, then just roll the stack over, plug to utility and done. (minus network transfer)
https://www.youtube.com/watch?v=vQ5MA685ApE On Wed 02 Sep 2020 20:40:35 GMT, Baldur Norddahl wrote:
That is what the 5G router is for...
ons. 2. sep. 2020 19.47 skrev Michael Hallgren <mh@xalto.net>:
While conserving connectivity? 😂
------------------------------ *De :* Shawn L via NANOG <nanog@nanog.org> *Envoyé :* mercredi 2 septembre 2020 13:15 *À :* nanog *Objet :* Re: Centurylink having a bad morning?
We once moved a 3u server 30 miles between data centers this way. Plug redundant psu into a ups and 2 people carried it out and put them in a vehicle.
Sent from my iPhone
On Sep 1, 2020, at 11:58 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Tue, Sep 1, 2020 at 11:53 PM Alain Hebert <ahebert@pubnix.net> wrote:
As a coincidence... I was *thinking* of moving a 90TB SAN (with
mechanical's) to another rack that way... skateboard, long fibers and long power cords =D
well, what you REALLY need is one of these: https://www.cru-inc.com/products/wiebetech/hotplug_field_kit_product/
and 2-3 UPS... swap to the UPS, then just roll the stack over, plug to utility and done. (minus network transfer)
On Wed, Sep 2, 2020 at 12:00 AM Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Tue, Sep 1, 2020 at 11:53 PM Alain Hebert <ahebert@pubnix.net> wrote:
As a coincidence... I was *thinking* of moving a 90TB SAN (with mechanical's) to another rack that way... skateboard, long fibers and long power cords =D
well, what you REALLY need is one of these: https://www.cru-inc.com/products/wiebetech/hotplug_field_kit_product/
Yeah, no... actually, hell no! That setup scares me, and I'm surprised that it can be sold at all, even with many warning labels and disclaimers... After the first time I saw it (I suspect also due to Chris!) I tried doing something similar -- I cut the ends off a power cord, attached alligator clips and moved a lamp from one outlet to another -- this was all on the same circuit (no UPS, no difference in potential, etc) and so it doesn't need anything to switch between supplies. I checked with a multimeter before making the connections (to triple check) that I had live and neutral correct, had an in-circuit GFCI, and was wearing rubber gloves. It *worked*, but having a plug with live, exposed pins is not something I want to repeat.... On a related note - my wife once spent much time trying to explain to one of her clients why they cannot just plug the input of their power strip into the output of the same powerstrip, and get free 'lectricity... "But power comes out ot the socket!!!" , "Well, yes, but it has to get into the powerstrip" , "Yah! That's why I plug the plug into it..."... I think that eventually she just demonstrated (again!) that it doesn't work, and then muttered something about "Magic"... W
and 2-3 UPS... swap to the UPS, then just roll the stack over, plug to utility and done. (minus network transfer)
-- I don't think the execution is relevant when it was obviously a bad idea in the first place. This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants. ---maf
On Wed, Sep 2, 2020 at 11:02, Warren Kumari <warren@kumari.net> wrote:
well, what you REALLY need is one of these: https://www.cru-inc.com/products/wiebetech/hotplug_field_kit_product/
Yeah, no... actually, hell no!
That setup scares me, and I'm surprised that it can be sold at all, even with many warning labels and disclaimers... After the first time I saw it (I suspect also due to Chris!) I tried doing something similar -- I cut the ends off a power cord, attached alligator clips ...
called a suicide cord for a reason. Bad idea that comes about regularly when people try to back feed a generator to their house. Andy
Not being intentional isn't really an excuse....Outages are generally not intentional but we still like to use services that stay up most of the time. On Sun, Aug 30, 2020 at 11:11 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org> *On Behalf Of *Baldur Norddahl *Sent:* Sunday, August 30, 2020 9:28 AM *To:* nanog@nanog.org *Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com
:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load: https://www.centurylink.com/business/login/
Just wondering what others are seeing.
I saw customers behind AS209’s ILEC Residential network flapping all morning. It’s stable now. So LVLT 3356 is definitely feeding parts of 209 now. On Sun, Aug 30, 2020 at 8:57 AM Ross Tajvar <ross@tajvar.io> wrote:
Not being intentional isn't really an excuse....Outages are generally not intentional but we still like to use services that stay up most of the time.
On Sun, Aug 30, 2020 at 11:11 AM Drew Weaver <drew.weaver@thenap.com> wrote:
I’m not defending them but I am sure it isn’t intentional.
*From:* NANOG <nanog-bounces+drew.weaver=thenap.com@nanog.org>
*On Behalf Of *Baldur Norddahl
*Sent:* Sunday, August 30, 2020 9:28 AM
*To:* nanog@nanog.org
*Subject:* Re: Centurylink having a bad morning?
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <joe@breathe-underwater.com
:
Finally got through on their support line and spoke to level1. The only thing the tech could say was it was an issue with BGP route reflectors and it started about 3am(pacific). They were still trying to isolate the issue. I've tried failing
over my circuits and no go, the traffic just dies as L3 won't stop advertising my routes.
On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog@nanog.org> wrote:
Hello,
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
As of right now their support portal won’t load:
https://www.centurylink.com/business/login/
Just wondering what others are seeing.
Once upon a time, Baldur Norddahl <baldur.norddahl@gmail.com> said:
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my prefixes after I pull them. Under any circumstances.
Umm, then I guess you won't sign a contract with anybody? I sure wouldn't agree to that. I don't personally write the routing software, so I can't guarantee there isn't a bug in there (actually, since it is software, I can guarantee there ARE bugs in there). We'll see if/when they issue an RFO, but software has bugs, and configuration errors have entirely unexpected consequences. It's possible some poor design issue was exposed, or it could be some basically unforeseeable incident. -- Chris Adams <cma@cmadams.net>
On Sun, Aug 30, 2020 at 5:21 PM Chris Adams <cma@cmadams.net> wrote:
How is that acceptable behaviour? I shall remember never to make a contract with these guys until they can prove that they won't advertise my
Once upon a time, Baldur Norddahl <baldur.norddahl@gmail.com> said: prefixes
after I pull them. Under any circumstances.
Umm, then I guess you won't sign a contract with anybody? I sure wouldn't agree to that. I don't personally write the routing software, so I can't guarantee there isn't a bug in there (actually, since it is software, I can guarantee there ARE bugs in there).
We'll see if/when they issue an RFO, but software has bugs, and configuration errors have entirely unexpected consequences. It's possible some poor design issue was exposed, or it could be some basically unforeseeable incident.
Not really the point. BGP is designed such that if I take down the link, the prefixes MUST be withdrawn within reasonable time. The self healing aspect of the internet entirely depends on this. Clearly they have some kind of system that does not respect that by design. I am guessing they have something homebrewn going on with their route reflectors? It is like a plane. It is impossible to prove or even design a plane that can never fall out of the sky. But now we had a plane that crashed in a very bad way, so that plane (Centurylink) is grounded until they can prove that something like this can not happen again. Which means they need to redesign whatever the hell they have going on here. Regards, Baldur
On Sun, 30 Aug 2020 at 20:00, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Not really the point. BGP is designed such that if I take down the link, the prefixes MUST be withdrawn within reasonable time. The self healing aspect of the internet entirely depends on this. Clearly they have some kind of system that does not respect that by design. I am guessing they have something homebrewn going on with their route reflectors?
Add scale and BGP implementations can take a lot of time, hours of it. Best thing you can do is add contractual obligations so people at your provider who agree with you have some ammo. Instant is not on the table, I'm sure that is obvious after that it's less than obvious what is good enough.
It is like a plane. It is impossible to prove or even design a plane that can never fall out of the sky. But now we had a plane that crashed in a very bad way, so that plane (Centurylink) is grounded until they can prove that something like this can not happen again. Which means they need to redesign whatever the hell they have going on here.
Nothing ever works like this, it's naive to think any RCA leads to something fixed so that it can never happen again. Only thing that can be affected is the frequency of an event, removing it is not on the cards. And usually affecting frequency is mostly about belief not something provable. In addition to MTBF, questions should be raised about MTTR, provable MTTR efforts are far more likely to exist than provable MTBF efforts, but if we buy-in to the notion that it never will happen again, because we is good, then no MTTR focus is needed, why fix something that will never happen. What if this outage took 5min to solve? -- ++ytti
On 8/30/20 8:14 AM, Drew Weaver via NANOG wrote:
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
Just to confirm we're seeing this on AS3356 and not AS209, correct? We have links to both and shut down AS3356 which seems to have cleared "most" of the problems. -- inoc.net!rblayzor XMPP: rblayzor.AT.inoc.net PGP: https://pgp.inoc.net/rblayzor/
AS3356 is the one I've seen all the chatter about this morning. On Sun, Aug 30, 2020 at 10:03 AM Robert Blayzor <rblayzor.bulk@inoc.net> wrote:
On 8/30/20 8:14 AM, Drew Weaver via NANOG wrote:
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
Just to confirm we're seeing this on AS3356 and not AS209, correct?
We have links to both and shut down AS3356 which seems to have cleared "most" of the problems.
-- inoc.net!rblayzor XMPP: rblayzor.AT.inoc.net PGP: https://pgp.inoc.net/rblayzor/
AS3356 is the Level3 internet. On Sun, Aug 30, 2020 at 8:09 AM Ian Bowers <iggdawg@gmail.com> wrote:
AS3356 is the one I've seen all the chatter about this morning.
On Sun, Aug 30, 2020 at 10:03 AM Robert Blayzor <rblayzor.bulk@inoc.net> wrote:
On 8/30/20 8:14 AM, Drew Weaver via NANOG wrote:
Woke up this morning to a bunch of reports of issues with connectivity had to shut down some Level3/CTL connections to get it to return to normal.
Just to confirm we're seeing this on AS3356 and not AS209, correct?
We have links to both and shut down AS3356 which seems to have cleared "most" of the problems.
-- inoc.net!rblayzor XMPP: rblayzor.AT.inoc.net PGP: https://pgp.inoc.net/rblayzor/
participants (44)
-
Aaron C. de Bruyn
-
Alain Hebert
-
Alarig Le Lay
-
Andrew Koch
-
Andy Brezinsky
-
Antonios Chariton
-
Baldur Norddahl
-
Ben Cannon
-
Bjørn Mork
-
Bryan Holloway
-
Chase Christian
-
Chris Adams
-
Christopher Morrow
-
David Hubbard
-
Douglas Fischer
-
Drew Weaver
-
Eric Kuhnke
-
Hank Nussbacher
-
Ian Bowers
-
Jared Geiger
-
JASON BOTHE
-
Jason Kuehl
-
Job Snijders
-
Joseph Jenkins
-
K. Scott Helms
-
Mark Tinka
-
Martijn Schmidt
-
Matt Hoppes
-
Mel Beckman
-
Michael Hallgren
-
Mike Bolitho
-
Mike Hammett
-
Nick Hilliard
-
Randy Bush
-
Ray Ludendorff
-
Robert Blayzor
-
Robert DeVita
-
Romeo Czumbil
-
Ross Tajvar
-
Saku Ytti
-
Shawn L
-
Tom Beecher
-
Tomas Lynch
-
Warren Kumari