Shaw routing issue 12 Aug 2014 - Test - lists.nanog.org

newer
Re: So Philip Smith / Geoff...

Shaw routing issue 12 Aug 2014

older
ASR9K xml agent vs netconf

Leah Ungstad

12 Aug 2014 12 Aug '14

8:40 p.m.

Hi Nanog, anyone know what's up with a nationwide (Canadian) routing issue on Shaw? http://www.theregister.co.uk/2014/08/12/nationwide_outage_at_canadian_isp_sh... https://community.shaw.ca/docs/DOC-3455 thanks Leah

Reply

Sign in to reply online Use email software

Show replies by date

Pete Lumbis

13 Aug 13 Aug

3:08 p.m.

Maybe related to the 512k route issue? http://www.bgpmon.net/what-caused-todays-internet-hiccup/ I've seen people reboot to recover from TCAM exception without adjusting TCAM size only to run into the issue all over again. It's a fun way to watch the problems roll around the network. On Tue, Aug 12, 2014 at 4:40 PM, Leah Ungstad <leah.ungstad@gmail.com> wrote:

Hi Nanog, anyone know what's up with a nationwide (Canadian) routing issue on Shaw?

http://www.theregister.co.uk/2014/08/12/nationwide_outage_at_canadian_isp_sh... https://community.shaw.ca/docs/DOC-3455

thanks Leah

Reply

Sign in to reply online Use email software

Hugo Slabbert

3:24 p.m.

Outside looking in, but we did get a maintenance notice from Shaw in June for "Core Router reboot to resolve fully utilized IPv4 table"; let's hope for their sake they recarved TCAM while they're at it and that they don't have too many of those hiding around the network. -- Hugo On Wed 2014-Aug-13 11:08:55 -0400, Pete Lumbis <alumbis@gmail.com> wrote:

Maybe related to the 512k route issue? http://www.bgpmon.net/what-caused-todays-internet-hiccup/

I've seen people reboot to recover from TCAM exception without adjusting TCAM size only to run into the issue all over again. It's a fun way to watch the problems roll around the network.

On Tue, Aug 12, 2014 at 4:40 PM, Leah Ungstad <leah.ungstad@gmail.com> wrote:

...
Hi Nanog, anyone know what's up with a nationwide (Canadian) routing issue on Shaw?

http://www.theregister.co.uk/2014/08/12/nationwide_outage_at_canadian_isp_sh... https://community.shaw.ca/docs/DOC-3455

thanks Leah

Reply

Sign in to reply online Use email software

Geoffrey Keating

10:06 p.m.

Pete Lumbis <alumbis@gmail.com> writes:

Maybe related to the 512k route issue? http://www.bgpmon.net/what-caused-todays-internet-hiccup/

I've seen people reboot to recover from TCAM exception without adjusting TCAM size only to run into the issue all over again. It's a fun way to watch the problems roll around the network.

In this case, it would probably have "helped" in the same way as rebooting or waving a rubber chicken or whatever sometimes "helps": the route issue was caused initially by a problem at Verizon that caused them to deaggregate, which they fixed, so by the time someone had identified the problem, paged someone, gotten them to the data center, had a teleconference, rebooted the device, waited for it to come back up... Verizon would have fixed it, so when it came back up it'd be back under 512k again.

Reply

Sign in to reply online Use email software

Pete Lumbis

14 Aug 14 Aug

1:07 a.m.

Yep. Most of the time I've seen this it's two data centers, both go TCAM exception. You reboot DC1, when it comes back up you reboot DC2. This means no iBGP learned routes so DC1 is fine. DC 2 is fine, until the iBGP peer comes back and then start all over again. On Wed, Aug 13, 2014 at 6:06 PM, Geoffrey Keating <geoffk@geoffk.org> wrote:

Pete Lumbis <alumbis@gmail.com> writes:

...
Maybe related to the 512k route issue? http://www.bgpmon.net/what-caused-todays-internet-hiccup/

I've seen people reboot to recover from TCAM exception without adjusting TCAM size only to run into the issue all over again. It's a fun way to watch the problems roll around the network.

In this case, it would probably have "helped" in the same way as rebooting or waving a rubber chicken or whatever sometimes "helps": the route issue was caused initially by a problem at Verizon that caused them to deaggregate, which they fixed, so by the time someone had identified the problem, paged someone, gotten them to the data center, had a teleconference, rebooted the device, waited for it to come back up... Verizon would have fixed it, so when it came back up it'd be back under 512k again.

Reply

Sign in to reply online Use email software

Leah Ungstad

7:46 p.m.

Thanks for the info Pete, Geoffrey & Hugo! LU On Wed, Aug 13, 2014 at 6:07 PM, Pete Lumbis <alumbis@gmail.com> wrote:

Yep. Most of the time I've seen this it's two data centers, both go TCAM exception. You reboot DC1, when it comes back up you reboot DC2. This means no iBGP learned routes so DC1 is fine. DC 2 is fine, until the iBGP peer comes back and then start all over again.

On Wed, Aug 13, 2014 at 6:06 PM, Geoffrey Keating <geoffk@geoffk.org> wrote:

...
Pete Lumbis <alumbis@gmail.com> writes:

...
Maybe related to the 512k route issue? http://www.bgpmon.net/what-caused-todays-internet-hiccup/

I've seen people reboot to recover from TCAM exception without adjusting TCAM size only to run into the issue all over again. It's a fun way to watch the problems roll around the network.

In this case, it would probably have "helped" in the same way as rebooting or waving a rubber chicken or whatever sometimes "helps": the route issue was caused initially by a problem at Verizon that caused them to deaggregate, which they fixed, so by the time someone had identified the problem, paged someone, gotten them to the data center, had a teleconference, rebooted the device, waited for it to come back up... Verizon would have fixed it, so when it came back up it'd be back under 512k again.

Reply

Sign in to reply online Use email software

4246

Age (days ago)

4248

Last active (days ago)

Download

5 comments

4 participants

tags

participants (4)

Geoffrey Keating
Hugo Slabbert
Leah Ungstad
Pete Lumbis