RE: BGP and The zero window edge
Ben's blog details an experiment in which he advertises routes and then withdraws them, but some of them remain stuck for days. I'd like to get to the bottom of this problem. Has anyone else seen this before or can provide data to analyze? On or off list. Regards, Jakob. -----Original Message----- Date: Wed, 21 Apr 2021 07:31:10 -0400 From: "Jean St-Laurent" <jean@ddostest.me> Nice article explaining a specific BGP corner case not removing routes when TCP window reaches 0. https://blog.benjojo.co.uk/post/bgp-stuck-routes-tcp-zero-window The proposed solution is a new RFC for BGP with the suggestion to introduce a new timer. Fascinating! Jean St-Laurent /CISSP ddosTest me security inc site:? https://ddostest.me
Dear Jakob, group, On Wed, Apr 21, 2021 at 08:59:06PM +0000, Jakob Heitz (jheitz) via NANOG wrote:
Ben's blog details an experiment in which he advertises routes and then withdraws them, but some of them remain stuck for days.
I'd like to get to the bottom of this problem.
I think there are *two* problems: 1) some BGP implementations (or multi-node BGP configurations) sometimes end up getting stuck in one way or another. 2) other BGP nodes are not able to disconnect/reconnect to systems suffering from instantiations of problem #1. While on the one hand it is important to follow-up on each and every instantiation of problem #1, I personally think it also is worthwhile exploring whether the BGP FSM itself can be redefined in a way that encourages BGP protocol implementations to be more robust and rely less on the remote peer behaving correctly. Once Problem #2 is addressed, finding and isolating instances of Problem #1 will become much easier.
Has anyone else seen this before or can provide data to analyze? On or off list.
From the BGP Default-Free Zone perspective it is hard to differentiate between an entire (multi-vendor) Autonomous System being stuck, or just one router.
To test individual router implementations this tool is useful https://github.com/benjojo/bgp-zerowindow-test - but please keep in mind that "TCP Recv Wind == 0" trick is just one way to easily get a BGP peer to manifest the problematic behavior.
From a BGP protocol perspective BGP nodes shouldn't inspect the TCP receive window, but rather focus on whether all locally available signals indicate that the remote peer is still progressing data.
Kind regards, Job
I'd like to get some data on what actually happened in the real cases and analyze it. If it's a Cisco router at fault, then we have a bug to fix. Even if it's not a Cisco, there may be ways we can help to avoid the situation. However, before we start on solutions, I'd like to get a good understanding of what actually happened. TCP zero window is possible, but many other things could cause it too. Anyone? Regards, Jakob. -----Original Message----- From: Job Snijders <job@fastly.com> Sent: Wednesday, April 21, 2021 2:11 PM To: Jakob Heitz (jheitz) <jheitz@cisco.com> Cc: nanog@nanog.org Subject: Re: BGP and The zero window edge Dear Jakob, group, On Wed, Apr 21, 2021 at 08:59:06PM +0000, Jakob Heitz (jheitz) via NANOG wrote:
Ben's blog details an experiment in which he advertises routes and then withdraws them, but some of them remain stuck for days.
I'd like to get to the bottom of this problem.
I think there are *two* problems: 1) some BGP implementations (or multi-node BGP configurations) sometimes end up getting stuck in one way or another. 2) other BGP nodes are not able to disconnect/reconnect to systems suffering from instantiations of problem #1. While on the one hand it is important to follow-up on each and every instantiation of problem #1, I personally think it also is worthwhile exploring whether the BGP FSM itself can be redefined in a way that encourages BGP protocol implementations to be more robust and rely less on the remote peer behaving correctly. Once Problem #2 is addressed, finding and isolating instances of Problem #1 will become much easier.
Has anyone else seen this before or can provide data to analyze? On or off list.
From the BGP Default-Free Zone perspective it is hard to differentiate between an entire (multi-vendor) Autonomous System being stuck, or just one router.
To test individual router implementations this tool is useful https://github.com/benjojo/bgp-zerowindow-test - but please keep in mind that "TCP Recv Wind == 0" trick is just one way to easily get a BGP peer to manifest the problematic behavior.
From a BGP protocol perspective BGP nodes shouldn't inspect the TCP receive window, but rather focus on whether all locally available signals indicate that the remote peer is still progressing data.
Kind regards, Job
On Wed, Apr 21, 2021 at 09:22:57PM +0000, Jakob Heitz (jheitz) wrote:
I'd like to get some data on what actually happened in the real cases and analyze it.
[snip]
TCP zero window is possible, but many other things could cause it too.
Indeed. There could be a number of reasons that caused it. Switchings away from TCP win=0 towards "Zombie Routes": *RIGHT NOW* (at the moment of writing), there are a number of zombie route visible in the IPv6 Default-Free Zone: One example is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d15::/48 2a0b:6b86:d15::/48 via: BGP.as_path: 204092 57199 35280 6939 42615 42615 212232 BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232 BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232 (first announced April 15th, last withdrawn April 15th, 2021) Another one is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d24::/48 2a0b:6b86:d24::/48 via: BGP.as_path: 201701 9002 6939 42615 212232 BGP.as_path: 34927 9002 6939 42615 212232 BGP.as_path: 207960 34927 9002 6939 42615 212232 BGP.as_path: 44103 50673 9002 6939 42615 212232 BGP.as_path: 208627 207910 34927 9002 6939 42615 212232 BGP.as_path: 3280 34927 9002 6939 42615 212232 BGP.as_path: 206628 34927 9002 6939 42615 212232 BGP.as_path: 208627 207910 34927 9002 6939 42615 212232 (first announced March 24th, last withdrawn March 24th, 2021) Just now, I literally rebooted the BGP speaker behind lg.ring.nlnog.net to make ensure that those routes are not stuck in the BGP looking glass itself. 2a0b:6b86:d24::/48 was first announced on March 24th, 2021, and withdrawn at the end of March 24th, 2021 by the originator, and now almost a month later, this prefix still is visible in the default-free zone despite WITHDRAW messages having been sent and the AS 212232 operator confirming they are not announcing that IP prefix anywhere. I checked the AS 6939 Looking glass, but the d24::/48 route is not visible in the http://lg.he.net/ web interface. This leads me to believe the the route got stuck somewhere along way in either of 201701, 204092, 206628, 207910, 207960, 208627, 3280, 34927, 35280, 44103, 50673, 57199, and/or 9002. This implies indeed might be multiple reasons a BGP route gets stuck ('stuck' as in - a WITHDRAW was not generated, or ignored). Perhaps on any one of these edges there is a very high Out Queue for one reason or another: 34927 9002 206628 34927 44103 50673 207960 34927 3280 34927 9002 6939 201701 9002 208627 207910 I'm not sure all the these sightings of stuck routes can be pinpointed to one specific BGP vendor (or one bug). Kind regards, Job
I'm not sure if this is helpful to this discussion or not, but I recently became aware of a bug in a virtual router using DPDK+VPP which sounds like it could possibly produce a similar issue to what is being described, without the TCP window being a factor. The system used the same process to read and process the messages coming in to the netlink socket. When a large BGP update was being processed it was possible that the netlink buffer was being filled while previous updates were being processed. This caused some route updates to not be processed, not applied to the VPP FIB, and so they became stuck. The particular vendor I spoke to about this issue resolved this by giving priority to reading and storing the messages for processing, and asynchronously processing those messages in batches. I can share additional details off-list if anyone thinks this could be related to the problem. -----Original Message----- From: NANOG <nanog-bounces+philip.loenneker=tasmanet.com.au@nanog.org> On Behalf Of Job Snijders via NANOG Sent: Thursday, 22 April 2021 9:25 AM To: Jakob Heitz (jheitz) <jheitz@cisco.com> Cc: nanog@nanog.org Subject: Re: BGP and The zero window edge On Wed, Apr 21, 2021 at 09:22:57PM +0000, Jakob Heitz (jheitz) wrote:
I'd like to get some data on what actually happened in the real cases and analyze it.
[snip]
TCP zero window is possible, but many other things could cause it too.
Indeed. There could be a number of reasons that caused it. Switchings away from TCP win=0 towards "Zombie Routes": *RIGHT NOW* (at the moment of writing), there are a number of zombie route visible in the IPv6 Default-Free Zone: One example is https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flg.ring.nln... 2a0b:6b86:d15::/48 via: BGP.as_path: 204092 57199 35280 6939 42615 42615 212232 BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232 BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232 (first announced April 15th, last withdrawn April 15th, 2021) Another one is https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flg.ring.nln... 2a0b:6b86:d24::/48 via: BGP.as_path: 201701 9002 6939 42615 212232 BGP.as_path: 34927 9002 6939 42615 212232 BGP.as_path: 207960 34927 9002 6939 42615 212232 BGP.as_path: 44103 50673 9002 6939 42615 212232 BGP.as_path: 208627 207910 34927 9002 6939 42615 212232 BGP.as_path: 3280 34927 9002 6939 42615 212232 BGP.as_path: 206628 34927 9002 6939 42615 212232 BGP.as_path: 208627 207910 34927 9002 6939 42615 212232 (first announced March 24th, last withdrawn March 24th, 2021) Just now, I literally rebooted the BGP speaker behind lg.ring.nlnog.net to make ensure that those routes are not stuck in the BGP looking glass itself. 2a0b:6b86:d24::/48 was first announced on March 24th, 2021, and withdrawn at the end of March 24th, 2021 by the originator, and now almost a month later, this prefix still is visible in the default-free zone despite WITHDRAW messages having been sent and the AS 212232 operator confirming they are not announcing that IP prefix anywhere. I checked the AS 6939 Looking glass, but the d24::/48 route is not visible in the https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flg.he.net%2... web interface. This leads me to believe the the route got stuck somewhere along way in either of 201701, 204092, 206628, 207910, 207960, 208627, 3280, 34927, 35280, 44103, 50673, 57199, and/or 9002. This implies indeed might be multiple reasons a BGP route gets stuck ('stuck' as in - a WITHDRAW was not generated, or ignored). Perhaps on any one of these edges there is a very high Out Queue for one reason or another: 34927 9002 206628 34927 44103 50673 207960 34927 3280 34927 9002 6939 201701 9002 208627 207910 I'm not sure all the these sightings of stuck routes can be pinpointed to one specific BGP vendor (or one bug). Kind regards, Job
On 22/04/2021 02:24, Job Snijders via NANOG wrote:
On Wed, Apr 21, 2021 at 09:22:57PM +0000, Jakob Heitz (jheitz) wrote:
I'd like to get some data on what actually happened in the real cases and analyze it.
[snip]
TCP zero window is possible, but many other things could cause it too.
Indeed. There could be a number of reasons that caused it.
Switchings away from TCP win=0 towards "Zombie Routes":
*RIGHT NOW* (at the moment of writing), there are a number of zombie route visible in the IPv6 Default-Free Zone:
One example is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d15::/48
2a0b:6b86:d15::/48 via: BGP.as_path: 204092 57199 35280 6939 42615 42615 212232 BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232 BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232 (first announced April 15th, last withdrawn April 15th, 2021)
Another one is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d24::/48
2a0b:6b86:d24::/48 via: BGP.as_path: 201701 9002 6939 42615 212232 BGP.as_path: 34927 9002 6939 42615 212232 BGP.as_path: 207960 34927 9002 6939 42615 212232 BGP.as_path: 44103 50673 9002 6939 42615 212232 BGP.as_path: 208627 207910 34927 9002 6939 42615 212232 BGP.as_path: 3280 34927 9002 6939 42615 212232 BGP.as_path: 206628 34927 9002 6939 42615 212232 BGP.as_path: 208627 207910 34927 9002 6939 42615 212232 (first announced March 24th, last withdrawn March 24th, 2021)
Just now, I literally rebooted the BGP speaker behind lg.ring.nlnog.net to make ensure that those routes are not stuck in the BGP looking glass itself.
2a0b:6b86:d24::/48 was first announced on March 24th, 2021, and withdrawn at the end of March 24th, 2021 by the originator, and now almost a month later, this prefix still is visible in the default-free zone despite WITHDRAW messages having been sent and the AS 212232 operator confirming they are not announcing that IP prefix anywhere.
I checked the AS 6939 Looking glass, but the d24::/48 route is not visible in the http://lg.he.net/ web interface. This leads me to believe the the route got stuck somewhere along way in either of 201701, 204092, 206628, 207910, 207960, 208627, 3280, 34927, 35280, 44103, 50673, 57199, and/or 9002.
This implies indeed might be multiple reasons a BGP route gets stuck ('stuck' as in - a WITHDRAW was not generated, or ignored). Perhaps on any one of these edges there is a very high Out Queue for one reason or another:
34927 9002 206628 34927 44103 50673 207960 34927 3280 34927 9002 6939 201701 9002 208627 207910
I'm not sure all the these sightings of stuck routes can be pinpointed to one specific BGP vendor (or one bug).
I would guess that all the stuck route sightings manifest from one undiscovered TCP library bug that some BGP vendors are all commonly using. -Hank
Kind regards,
Job
On Thu, Apr 22, 2021 at 01:24:54AM +0200, Job Snijders via NANOG wrote: [...]
Another one is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d24::/48
2a0b:6b86:d24::/48 via: BGP.as_path: 201701 9002 6939 42615 212232 BGP.as_path: 34927 9002 6939 42615 212232 BGP.as_path: 207960 34927 9002 6939 42615 212232 BGP.as_path: 44103 50673 9002 6939 42615 212232 BGP.as_path: 208627 207910 34927 9002 6939 42615 212232 BGP.as_path: 3280 34927 9002 6939 42615 212232 BGP.as_path: 206628 34927 9002 6939 42615 212232 BGP.as_path: 208627 207910 34927 9002 6939 42615 212232 (first announced March 24th, last withdrawn March 24th, 2021)
[...]
I checked the AS 6939 Looking glass, but the d24::/48 route is not visible in the http://lg.he.net/ web interface. This leads me to believe the the route got stuck somewhere along way in either of 201701, 204092, 206628, 207910, 207960, 208627, 3280, 34927, 35280, 44103, 50673, 57199, and/or 9002.
9002. Hit by Juniper PR1562090, route stuck in DeletePending.. Workaround applied, sessions with 6939 restarted, route is gone.
On Thu, Apr 22, 2021 at 02:29:31PM +0300, Alexandre Snarskii wrote:
9002. Hit by Juniper PR1562090, route stuck in DeletePending.. Workaround applied, sessions with 6939 restarted, route is gone.
Thank you for the details and clearing the issue. Kind regards, Job
Job Snijders via NANOG writes:
*RIGHT NOW* (at the moment of writing), there are a number of zombie route visible in the IPv6 Default-Free Zone:
[Reversing the order of your two examples]
Another one is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d24::/48
2a0b:6b86:d24::/48 via: BGP.as_path: 201701 9002 6939 42615 212232 BGP.as_path: 34927 9002 6939 42615 212232 BGP.as_path: 207960 34927 9002 6939 42615 212232 BGP.as_path: 44103 50673 9002 6939 42615 212232 BGP.as_path: 208627 207910 34927 9002 6939 42615 212232 BGP.as_path: 3280 34927 9002 6939 42615 212232 BGP.as_path: 206628 34927 9002 6939 42615 212232 BGP.as_path: 208627 207910 34927 9002 6939 42615 212232 (first announced March 24th, last withdrawn March 24th, 2021)
So that one was resolved at AS9002, see Alexandre's followup (thanks!) AS9002 had also been my guess when I read this, because it's the leftmost common AS in the paths observed.
One example is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d15::/48
2a0b:6b86:d15::/48 via: BGP.as_path: 204092 57199 35280 6939 42615 42615 212232 BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232 BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232 (first announced April 15th, last withdrawn April 15th, 2021)
Applying the same logic, I'd suspect that the withdrawal is stuck in AS57199 in this case. I'll try to contact them. Here's a (partial) RIPE RIS BGPlay view of the last lifecycle of the 2a0b:6b86:d15::/48 beacon: https://stat.ripe.net/widget/bgplay#w.resource=2a0b:6b86:d15::/48&w.ignoreReannouncements=true&w.starttime=1618444740&w.endtime=1618542000&w.rrcs=0,1,2,4,10,12,20,21&w.instant=null&w.type=bgp Cheers, -- Simon.
On Thu 22 Apr 2021 01:24:54 GMT, Job Snijders via NANOG wrote:
One example is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d15::/48
2a0b:6b86:d15::/48 via: BGP.as_path: 204092 57199 35280 6939 42615 42615 212232 BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232 BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232 (first announced April 15th, last withdrawn April 15th, 2021)
On the AS204092 side, the route is one week and two days old (so 2021-04-16). So we never received the withdrawn. asbr01#sh bgp ipv6 uni 2a0b:6b86:d15::/48 BGP routing table entry for 2A0B:6B86:D15::/48, version 88407242 BGP Bestpath: deterministic-med: med Paths: (2 available, best #1, table default) Advertised to update-groups: 129 130 145 167 Refresh Epoch 1 57199 35280 6939 42615 42615 212232 2A0B:CBC0:1::BD (FE80::66D1:54FF:FEEF:9893) from 2A0B:CBC0:1::BD (80.67.167.5) Origin IGP, metric 10, localpref 100, valid, external, best Community: 24115:6939 35280:10 35280:1040 35280:2080 35280:3120 35280:20000 35280:21000 35280:21150 57199:35280 57199:65535 64496:100 64496:57199 64999:24115 unknown transitive attribute: flag 0xE0 type 0x20 length 0x30 value 0000 5E33 0000 03E9 0000 0001 0000 5E33 0000 03EA 0000 0002 0000 5E33 0000 03EB 0000 0005 0000 5E33 0000 03EC 0000 1B1B path 7F1E8D0F3B58 RPKI State valid rx pathid: 0, tx pathid: 0x0 Refresh Epoch 1 57199 35280 6939 42615 42615 212232, (received-only) 2A0B:CBC0:1::BD (FE80::66D1:54FF:FEEF:9893) from 2A0B:CBC0:1::BD (80.67.167.5) Origin IGP, metric 4294967295, localpref 100, valid, external Community: 24115:6939 35280:10 35280:1040 35280:2080 35280:3120 35280:20000 35280:21000 35280:21150 57199:35280 57199:65535 64999:24115 unknown transitive attribute: flag 0xE0 type 0x20 length 0x30 value 0000 5E33 0000 03E9 0000 0001 0000 5E33 0000 03EA 0000 0002 0000 5E33 0000 03EB 0000 0005 0000 5E33 0000 03EC 0000 1B1B path 7F1E8D0EF088 RPKI State valid rx pathid: 0, tx pathid: 0 asbr01#sh ipv6 route 2a0b:6b86:d15::/48 Routing entry for 2A0B:6B86:D15::/48 Known via "bgp 204092", distance 20, metric 10, type external Route count is 1/1, share count 0 Routing paths: FE80::66D1:54FF:FEEF:9893, GigabitEthernet0/0/0.24 MPLS label: nolabel Last updated 1w2d ago asbr01# -- Alarig
Dnia Wed, Apr 21, 2021 at 08:59:06PM +0000, Jakob Heitz (jheitz) via NANOG napisaĆ(a):
Has anyone else seen this before or can provide data to analyze? On or off list.
- https://labs.ripe.net/author/romain_fontugne/bgp-zombies/ - https://www.slideshare.net/atendesoftware/bgp-zombie-routes kind regards, -- Pawel Malachowski
participants (8)
-
Alarig Le Lay
-
Alexandre Snarskii
-
Hank Nussbacher
-
Jakob Heitz (jheitz)
-
Job Snijders
-
Pawel Malachowski
-
Philip Loenneker
-
Simon Leinen