<Keepalives are temporarily in throttle due to closed TCP window>
I am having difficulty maintaining my BGP session from my 6509 with Sup-7203bxls to a 7206 VXR NPE-400. The session bounces every 3 minutes. I do have other IBGP sessions that are established with no problems, however, this is the only IBGP peer that is bouncing regularly. cr1.AUSTTXEE#show ip bgp neighbors 67.214.64.100 <SNIP> BGP state = Established, up for 00:02:54 Last read 00:00:53, last write 00:02:54, hold time is 180, keepalive interval is 60 seconds Keepalives are temporarily in throttle due to closed TCP window Neighbor capabilities: Route refresh: advertised and received(new) Address family IPv4 Unicast: advertised and received Message statistics: What does exactly the message mean and how do I stabilize this? Any help will be appreciated. Michael Ruiz Network Engineer Office 210-448-0040 Cell 512-744-3826 mruiz@telwestservices.com <mailto::mruiz@telwestservices.com> "I don't measure a man's success by how high he climbs but how high he bounces when he hits bottom." - General George S. Patton Jr. How am I doing? Please email my Director of Engineering Jared Martin with any feedback at: jmartin@telwestservices.com
On Mon, 14 Sep 2009, Michael Ruiz wrote:
I am having difficulty maintaining my BGP session from my 6509 with Sup-7203bxls to a 7206 VXR NPE-400. The session bounces every 3 minutes. I do have other IBGP sessions that are established with no problems, however, this is the only IBGP peer that is bouncing regularly.
What does exactly the message mean and how do I stabilize this? Any help will be appreciated.
This is most likely an MTU problem. Your SYN/SYN+ACK goes thru, but then the first fullsize MSS packet is sent, and it's not getting to the destination. 3 minutes is the dead timer for keepalives, which are not getting thru either because of the stalled TCP session. -- Mikael Abrahamsson email: swmike@swm.pp.se
* Mikael Abrahamsson:
What does exactly the message mean and how do I stabilize this? Any help will be appreciated.
This is most likely an MTU problem.
Does IOS enable PMTUD for BGP sessions by default these days? The 476 (or something like that) MTU is unlikely an issue. There could be a forwarding bug which causes drops dependent on packet size, though.
* Mikael Abrahamsson:
What does exactly the message mean and how do I stabilize this? Any help will be appreciated.
This is most likely an MTU problem.
Does IOS enable PMTUD for BGP sessions by default these days? The 476 (or something like that) MTU is unlikely an issue. There could be a forwarding bug which causes drops dependent on packet size, though.
I am not sure. I think it is, but I went ahead and put in the command manually. Here is more of the configuration to do with TCP information. ip tcp selective-ack ip tcp window-size 65535 ip tcp synwait-time 10 ip tcp path-mtu-discovery -----Original Message----- From: Florian Weimer [mailto:fw@deneb.enyo.de] Sent: Tuesday, September 15, 2009 12:14 PM To: Mikael Abrahamsson Cc: Michael Ruiz; nanog@nanog.org Subject: Re: <Keepalives are temporarily in throttle due to closed TCP window> * Mikael Abrahamsson:
What does exactly the message mean and how do I stabilize this? Any help will be appreciated.
This is most likely an MTU problem.
Does IOS enable PMTUD for BGP sessions by default these days? The 476 (or something like that) MTU is unlikely an issue. There could be a forwarding bug which causes drops dependent on packet size, though.
On Tue, Sep 15, 2009 at 12:28:02PM -0500, Michael Ruiz wrote:
Here is more of the configuration to do with TCP information.
ip tcp selective-ack ip tcp window-size 65535 ip tcp synwait-time 10 ip tcp path-mtu-discovery
Every time I turn those on (plus timestamping), it breaks something. The last time I tried it broke ftp based transfers of new IOS, had to disable or use tftp to get a non-corrupted image (SRA). The time before that, it occasionally caused bgp keepalives to be missed and thus dropped the session (SXF). It may work now, or there may be more subtle Cisco bugs lurking, who knows. :) You can confirm what MSS is actually being used in show ip bgp neighbor, under the "max data segment" line. I believe in modern code there is a way to turn on pmtud for all bgp neighbors (or individual ones) which may or may not depend on the global ip tcp path-mtu-discovery setting. I don't recall off the top of my head, but you should be able to confirm what size messages you're actually trying to send. FWIW I've run extensive tests on BGP with > 9000 byte MSS (though numbers that large are completely irrelevent, since bgp's maximum message size is 4096 bytes) and never hit a problem. I once saw a bug where Cisco miscalculated the MSS when doing tcp md5 (off by the number of bytes that the tcp option would take, I forget which direction), but I'm sure that's fixed now too. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Every time I turn those on (plus timestamping), it breaks something. The last time I tried it broke ftp based transfers of new IOS, had to disable or use tftp to get a non-corrupted image (SRA). The time before that, it occasionally caused bgp keepalives to be missed and thus dropped the session (SXF). It may work now, or there may be more subtle
Cisco bugs lurking, who knows. :)
I tried that, no dice. I thought it would actually work.
You can confirm what MSS is actually being used in show ip bgp neighbor, under the "max data segment" line. I believe in modern code there is a way to turn on pmtud for all bgp neighbors (or individual ones) which may or may not depend on the global ip tcp path-mtu-discovery setting. I don't recall off the top of my head, but you should be able to confirm what size messages you're actually trying to send. FWIW I've run extensive tests on BGP with > 9000 byte MSS (though numbers that large are completely irrelevent, since bgp's maximum message size is 4096 bytes) and never hit a problem. I once saw a bug where Cisco miscalculated the MSS when doing tcp md5 (off by the number of bytes that the tcp option would take, I forget which direction), but I'm sure that's fixed now too. :)
Below is snap shot of the neighbor in question. Datagrams (max data segment is 4410 bytes): Rcvd: 6 (out of order: 0), with data: 4, total data bytes: 278 Sent: 6 (retransmit: 5), with data: 2, total data bytes: 4474 Could there be a problem with the total data bytes size exceeds the size of the max data segment? Below is the router (7206 NPE-400) I am trying to establish a session with BGP neighbor. <snip> Description: cr1.AUSTTXEE Member of peer-group TelWest-iBGP for session parameters BGP version 4, remote router ID 67.214.64.97 BGP state = Established, up for 00:00:02 Last read 00:00:02, hold time is 180, keepalive interval is 60 seconds Neighbor capabilities: Route refresh: advertised and received(old & new) Address family IPv4 Unicast: advertised and received Message statistics: <snip> Datagrams (max data segment is 4410 bytes): Rcvd: 4 (out of order: 0), with data: 1, total data bytes: 64 Sent: 5 (retransmit: 0, fastretransmit: 0), with data: 3, total data bytes: 259 cr2.CRCHTXCB# -----Original Message----- From: Richard A Steenbergen [mailto:ras@e-gerbil.net] Sent: Tuesday, September 15, 2009 2:54 PM To: Michael Ruiz Cc: nanog@nanog.org Subject: Re: <Keepalives are temporarily in throttle due to closed TCP window> On Tue, Sep 15, 2009 at 12:28:02PM -0500, Michael Ruiz wrote:
Here is more of the configuration to do with TCP information.
ip tcp selective-ack ip tcp window-size 65535 ip tcp synwait-time 10 ip tcp path-mtu-discovery
Every time I turn those on (plus timestamping), it breaks something. The last time I tried it broke ftp based transfers of new IOS, had to disable or use tftp to get a non-corrupted image (SRA). The time before that, it occasionally caused bgp keepalives to be missed and thus dropped the session (SXF). It may work now, or there may be more subtle Cisco bugs lurking, who knows. :) You can confirm what MSS is actually being used in show ip bgp neighbor, under the "max data segment" line. I believe in modern code there is a way to turn on pmtud for all bgp neighbors (or individual ones) which may or may not depend on the global ip tcp path-mtu-discovery setting. I don't recall off the top of my head, but you should be able to confirm what size messages you're actually trying to send. FWIW I've run extensive tests on BGP with > 9000 byte MSS (though numbers that large are completely irrelevent, since bgp's maximum message size is 4096 bytes) and never hit a problem. I once saw a bug where Cisco miscalculated the MSS when doing tcp md5 (off by the number of bytes that the tcp option would take, I forget which direction), but I'm sure that's fixed now too. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
On Tue, Sep 15, 2009 at 03:10:52PM -0500, Michael Ruiz wrote:
Below is snap shot of the neighbor in question.
Datagrams (max data segment is 4410 bytes): Rcvd: 6 (out of order: 0), with data: 4, total data bytes: 278 Sent: 6 (retransmit: 5), with data: 2, total data bytes: 4474
Could there be a problem with the total data bytes size exceeds the size of the max data segment?
The maximum BGP message size is 4096 and there is no padding, so you would need a heck of a lot of overhead to get another 300+ bytes on there. I'd say the answer is no, unless you're running this over a MPLS over GRE over MPLS over IPSec over MPLS over... well... you get the picture. :) It's possible that your link isn't actually capable of passing 4096-ish byte packets for whatever resaon. A quick way to validate or eliminate that theory is to do some pings from the router with different size payloads, sourced from your side of the /30 and pinging the far side, and using the df-bit to prevent fragmentation. Failing that, make sure you aren't doing anything stupid with your control plane policiers, maybe try turning those off to see if there is an improvement. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
And more specifically, possibly an interface MTU (or ip mtu, I forget which). If there is a mismatch between ends of a link, in one direction, MTU-sized packets get sent, and the other end sees those as "giants". I've seen situations where the MTU is calculated incorrectly, when using some technology that adds a few bytes (e.g. VLAN tags, MPLS tags, etc.). On Cisco boxes, when talking to other Cisco boxes, even. Take a look at the interfaces over which the peering session runs, at both ends. I.e., is this the only BGP session *over that interface*, for the local box? (It might not be the end you think it's at, BTW.) Oh, and if you find something, please, let us know. War stories make for great bar BOFs at NANOG meetings. :-) Brian -----Original Message----- From: Mikael Abrahamsson [mailto:swmike@swm.pp.se] Sent: September-14-09 2:39 PM To: Michael Ruiz Cc: nanog@nanog.org Subject: Re: <Keepalives are temporarily in throttle due to closed TCP window> On Mon, 14 Sep 2009, Michael Ruiz wrote:
I am having difficulty maintaining my BGP session from my 6509 with Sup-7203bxls to a 7206 VXR NPE-400. The session bounces every 3 minutes. I do have other IBGP sessions that are established with no problems, however, this is the only IBGP peer that is bouncing regularly.
What does exactly the message mean and how do I stabilize this? Any help will be appreciated.
This is most likely an MTU problem. Your SYN/SYN+ACK goes thru, but then the first fullsize MSS packet is sent, and it's not getting to the destination. 3 minutes is the dead timer for keepalives, which are not getting thru either because of the stalled TCP session. -- Mikael Abrahamsson email: swmike@swm.pp.se
Take a look at the interfaces over which the peering session runs, at both ends. I.e., is this the only BGP session *over that interface*, for the local box?
You are going to find this even more strange. I have two routers that are communicating over the same transport medium and are actually in the same rack. One router is a Cisco 7606 which has an IBGP session established with my Cisco 6509. Both equipment have Sup-7203bxls 1 Gig of memory. Ironically from the 6509's perspective, I cannot seem to maintain a session with my 7206VXR which has two directly connected DS-3s. In order for my 6509 to establish an IBGP session with my 7606, it has to go through the 7206 VXR. Crazy right? Yeah I can already this is going to be a *War Story* as you said it. :) -----Original Message----- From: Brian Dickson [mailto:Brian.Dickson@concertia.com] Sent: Tuesday, September 15, 2009 3:40 PM To: Mikael Abrahamsson; Michael Ruiz Cc: nanog@nanog.org Subject: RE: <Keepalives are temporarily in throttle due to closed TCP window> And more specifically, possibly an interface MTU (or ip mtu, I forget which). If there is a mismatch between ends of a link, in one direction, MTU-sized packets get sent, and the other end sees those as "giants". I've seen situations where the MTU is calculated incorrectly, when using some technology that adds a few bytes (e.g. VLAN tags, MPLS tags, etc.). On Cisco boxes, when talking to other Cisco boxes, even. Take a look at the interfaces over which the peering session runs, at both ends. I.e., is this the only BGP session *over that interface*, for the local box? (It might not be the end you think it's at, BTW.) Oh, and if you find something, please, let us know. War stories make for great bar BOFs at NANOG meetings. :-) Brian -----Original Message----- From: Mikael Abrahamsson [mailto:swmike@swm.pp.se] Sent: September-14-09 2:39 PM To: Michael Ruiz Cc: nanog@nanog.org Subject: Re: <Keepalives are temporarily in throttle due to closed TCP window> On Mon, 14 Sep 2009, Michael Ruiz wrote:
I am having difficulty maintaining my BGP session from my 6509 with Sup-7203bxls to a 7206 VXR NPE-400. The session bounces every 3 minutes. I do have other IBGP sessions that are established with no problems, however, this is the only IBGP peer that is bouncing regularly.
What does exactly the message mean and how do I stabilize this? Any help will be appreciated.
This is most likely an MTU problem. Your SYN/SYN+ACK goes thru, but then the first fullsize MSS packet is sent, and it's not getting to the destination. 3 minutes is the dead timer for keepalives, which are not getting thru either because of the stalled TCP session. -- Mikael Abrahamsson email: swmike@swm.pp.se
On Tue, Sep 15, 2009 at 05:39:33PM -0300, Brian Dickson wrote:
And more specifically, possibly an interface MTU (or ip mtu, I forget which).
If there is a mismatch between ends of a link, in one direction, MTU-sized packets get sent, and the other end sees those as "giants".
Well if the interface or ip mtu was smaller on one end, this would result in a lower mss negotiation and you would just have smaller but working packets. The bad situation is when there is a layer 2 device in the middle which eats the big packets and doesn't generate an ICMP needfrag. For example, if there was a 1500-byte only ethernet switch in the middle of this link, it would drop anything > 1500 bytes and prevent path mtu discovery from working, resulting in silent blackholing. I was assuming that wasn't the case here based on the 4474 mtu (was assuming sonet links or something), but looking at the original message he doesn't say what media or what might be in the middle, so its possible 4474 is a manually configured mtu causing blackholing.
I've seen situations where the MTU is calculated incorrectly, when using some technology that adds a few bytes (e.g. VLAN tags, MPLS tags, etc.).
Even when things are working as intended, different vendors mean different things when they talk about MTU. For example, Juniper and Cisco disagree as to whether the mtu should include layer 2 or .1q tag overhead, resuling in inconsistent MTU numbers which are not only different between the vendors, but which can change depending on what type of trunk you're running between the devices. Enabling > 1500 byte MTUs is a dangerous game if you don't know what you're doing, or if you're connected to other people who are sloppy and don't fully verify their MTU settings on every link.
War stories make for great bar BOFs at NANOG meetings. :-)
Never ending supply of those things. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
I was assuming that wasn't the case here based on the 4474 mtu (was assuming sonet links or something), but looking at the original message he doesn't say what media or what might be in the middle, so its possible 4474 is a manually configured mtu causing blackholing.
Here is the network architecture from the Cisco 6509 to the 7206 VXR. The 6509 has a successful BGP session established with another router, Cisco 7606 w/ Sup720-3bxls. The 7606 and 7206 VXR are connected together by a Cisco 3550 switch. In order for the 6509 to establish the IBGP session to the 7606, it has to pass through two DS-3s, go through the 7206 VXR, out the Fast E, through the Cisco 3550, and then to the 7606. I checked the MTUs on the 3550s and I am seeing the Fast E interfaces are still showing 1500 bytes. Would increasing the MTU size on the switches cause any harm? -----Original Message----- From: Richard A Steenbergen [mailto:ras@e-gerbil.net] Sent: Tuesday, September 15, 2009 3:53 PM To: Brian Dickson Cc: Mikael Abrahamsson; Michael Ruiz; nanog@nanog.org Subject: Re: <Keepalives are temporarily in throttle due to closed TCP window> On Tue, Sep 15, 2009 at 05:39:33PM -0300, Brian Dickson wrote:
And more specifically, possibly an interface MTU (or ip mtu, I forget which).
If there is a mismatch between ends of a link, in one direction, MTU-sized packets get sent, and the other end sees those as "giants".
Well if the interface or ip mtu was smaller on one end, this would result in a lower mss negotiation and you would just have smaller but working packets. The bad situation is when there is a layer 2 device in the middle which eats the big packets and doesn't generate an ICMP needfrag. For example, if there was a 1500-byte only ethernet switch in the middle of this link, it would drop anything > 1500 bytes and prevent path mtu discovery from working, resulting in silent blackholing. I was assuming that wasn't the case here based on the 4474 mtu (was assuming sonet links or something), but looking at the original message he doesn't say what media or what might be in the middle, so its possible 4474 is a manually configured mtu causing blackholing.
I've seen situations where the MTU is calculated incorrectly, when using some technology that adds a few bytes (e.g. VLAN tags, MPLS tags, etc.).
Even when things are working as intended, different vendors mean different things when they talk about MTU. For example, Juniper and Cisco disagree as to whether the mtu should include layer 2 or .1q tag overhead, resuling in inconsistent MTU numbers which are not only different between the vendors, but which can change depending on what type of trunk you're running between the devices. Enabling > 1500 byte MTUs is a dangerous game if you don't know what you're doing, or if you're connected to other people who are sloppy and don't fully verify their MTU settings on every link.
War stories make for great bar BOFs at NANOG meetings. :-)
Never ending supply of those things. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
I checked the MTUs on the 3550s and I am seeing the Fast E interfaces are still showing 1500 bytes. Would increasing the MTU size on the switches cause any harm?
The 3550s are very limited with respect to MTU - the standard model can only do up to 1546 byte, while I believe the -12G model can to 2000 byte. In any case - you won't get a 4470 byte packet through a 3550. Also, changing the MTU on the 3550 requires a reboot. Steinar Haug, Nethelp consulting, sthaug@nethelp.no
On Wed, Sep 16, 2009 at 01:18:20PM -0500, Michael Ruiz wrote:
Here is the network architecture from the Cisco 6509 to the 7206 VXR. The 6509 has a successful BGP session established with another router, Cisco 7606 w/ Sup720-3bxls. The 7606 and 7206 VXR are connected together by a Cisco 3550 switch. In order for the 6509 to establish the IBGP session to the 7606, it has to pass through two DS-3s, go through the 7206 VXR, out the Fast E, through the Cisco 3550, and then to the 7606. I checked the MTUs on the 3550s and I am seeing the Fast E interfaces are still showing 1500 bytes. Would increasing the MTU size on the switches cause any harm?
As other people have said, this definitely sounds like an MTU problem. Basically you're trying to pass 4470 byte BGP packets over a link that drops anything bigger than 1500. The session will establish because all the setup packets are small, but the tcp session will stall as soon as you try to send routes across it. What should be happening here is the 6509 will generate a 4470 byte packet because it sees the directly connected interface as a DS3 and doesn't know the path is incapable of supporting > 1500 bytes end to end. The layer 3 device on the mtu choke point, in this case the faste interface on the 7206vxr, should be configured to a 1500 byte mtu. This will cause the 7206vxr to generate an ICMP neegfrag when the 4470 byte packet comes along, and cause path mtu discovery to lower the MSS on the IBGP session. Either a) you have the mtu misconfigured on that 7206vxr port, b) your router is misconfigured not to generate the icmp, c) something in the middle is misconfigured to filter this necessary icmp packet, or d) some other screwup probably related to one of the above. Generally speaking increasing the MTU size on a switch can never hurt anything, but having an insufficiently large MTU on the switch is what will break you the most (as is happening here). The problem occurs when you increase the MTU on the layer 3 routers to something beyond what the layer 2 link in the middle is capable of supporting. Layer 3 devices will either fragment (deprecated) or generate ICMP NeedFrags which will cause path MTU discovery to shrink the MSS. Layer 2 devices are incapable of doing this, so you MUST NOT set the layer 3 MTU above what the layer 2 link is capable of handling. Now that said, increasing the mtu on the 3550 won't work here because 3550 MTU support is terrible. The only option you have is to configure the MTU of all interfaces to 1546 with the "system mtu 1546" command, followed by a reload. This is not big enough to pass your 4470 byte packets, and will also break any MTU dependent configuration you might be running. For example, after you do this, any OSPF speakers on your 3550 will have to have their MTUs adjusted as well, or OSPF will not come back up due to the interface mismatch. For more details see: http://www.cisco.com/en/US/products/hw/switches/ps700/products_configuration... Your best bet (in order of most preferable to least) is to a) fix whatever is breaking path mtu discovery on the 7206vxr in the first place, b) force the mss of the ibgp session to something under 1460, or c) lower the mtu on the ds3 interface to 1500. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
RAS wrote: [ lots of good stuff elided for brevity ]
c) lower the mtu on the ds3 interface to 1500.
This will have another benefit, if it is done to all such interfaces on the two devices. (Where by "all such interfaces", I mean "everything with set-able MTU > 1500".) Configuring one common MTU size on the interfaces, means the buffer pool on the box will switch from several pools of varying sizes, to one pool. The number of buffers per pool get pro-rated by speed * buffer size, so high-speed, high-MTU interfaces get a gigantic chunk of buffer space. Once you reduce things to one pool, you significantly reduce the likelihood of buffer starvation. Note that the discussion on benefits to buffers is old info, and may even be moot these days, but buffers are fundamental enough that I doubt it. However, the original problem, iBGP not working, will definitely be resolved by this. Note also, changing this often won't take effect until a reboot, and/or may result in buffer re-carving with an attendant "hit" of up to 30 seconds of no forwarding packets (!!). You've been warned... In other words, plan this carefully, and make sure you have remote hands available or are on site. This qualifies as "deep voodoo". ;-) Brian
Either a) you have the mtu misconfigured on that 7206vxr
Here is the network architecture from the Cisco 6509 to the 7206 VXR. The 6509 has a successful BGP session established with another router, Cisco 7606 w/ Sup720-3bxls. The 7606 and 7206 VXR are connected together by a Cisco 3550 switch. In order for the 6509 to establish
That part is where I am at a loss. How is it the 6509 can establish a IBGP session with a 7606 when it has to go through the 7206 VXR? The DS-3s are connected to the 7206 VXR. To add more depth to the story. I have 8 IBGP sessions that are connected to the 7206 VXR that have been up and running for over a year. Some of the sessions traverse the DS-3s and or a GigE long haul connections. There are a total 10 Core routers that are mixture of Cisco 7606, 6509s, 7206 VXR w/ NPE400s or G1s. Only this one IBGP session out of 9 routers is not being established. Since I have a switch between the 7606 and 7206, I plan to put a packet capture server and see what I can see. -----Original Message----- From: Richard A Steenbergen [mailto:ras@e-gerbil.net] Sent: Wednesday, September 16, 2009 2:07 PM To: Michael Ruiz Cc: Brian Dickson; nanog@nanog.org Subject: Re: <Keepalives are temporarily in throttle due to closed TCP window> On Wed, Sep 16, 2009 at 01:18:20PM -0500, Michael Ruiz wrote: the
IBGP session to the 7606, it has to pass through two DS-3s, go through the 7206 VXR, out the Fast E, through the Cisco 3550, and then to the 7606. I checked the MTUs on the 3550s and I am seeing the Fast E interfaces are still showing 1500 bytes. Would increasing the MTU size on the switches cause any harm?
As other people have said, this definitely sounds like an MTU problem. Basically you're trying to pass 4470 byte BGP packets over a link that drops anything bigger than 1500. The session will establish because all the setup packets are small, but the tcp session will stall as soon as you try to send routes across it. What should be happening here is the 6509 will generate a 4470 byte packet because it sees the directly connected interface as a DS3 and doesn't know the path is incapable of supporting > 1500 bytes end to end. The layer 3 device on the mtu choke point, in this case the faste interface on the 7206vxr, should be configured to a 1500 byte mtu. This will cause the 7206vxr to generate an ICMP neegfrag when the 4470 byte packet comes along, and cause path mtu discovery to lower the MSS on the IBGP session. Either a) you have the mtu misconfigured on that 7206vxr port, b) your router is misconfigured not to generate the icmp, c) something in the middle is misconfigured to filter this necessary icmp packet, or d) some other screwup probably related to one of the above. Generally speaking increasing the MTU size on a switch can never hurt anything, but having an insufficiently large MTU on the switch is what will break you the most (as is happening here). The problem occurs when you increase the MTU on the layer 3 routers to something beyond what the layer 2 link in the middle is capable of supporting. Layer 3 devices will either fragment (deprecated) or generate ICMP NeedFrags which will cause path MTU discovery to shrink the MSS. Layer 2 devices are incapable of doing this, so you MUST NOT set the layer 3 MTU above what the layer 2 link is capable of handling. Now that said, increasing the mtu on the 3550 won't work here because 3550 MTU support is terrible. The only option you have is to configure the MTU of all interfaces to 1546 with the "system mtu 1546" command, followed by a reload. This is not big enough to pass your 4470 byte packets, and will also break any MTU dependent configuration you might be running. For example, after you do this, any OSPF speakers on your 3550 will have to have their MTUs adjusted as well, or OSPF will not come back up due to the interface mismatch. For more details see: http://www.cisco.com/en/US/products/hw/switches/ps700/products_configura tion_example09186a008010edab.shtml#c4 Your best bet (in order of most preferable to least) is to a) fix whatever is breaking path mtu discovery on the 7206vxr in the first place, b) force the mss of the ibgp session to something under 1460, or c) lower the mtu on the ds3 interface to 1500. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
On Wed, Sep 16, 2009 at 06:47:10PM -0500, Michael Ruiz wrote:
Either a) you have the mtu misconfigured on that 7206vxr
That part is where I am at a loss. How is it the 6509 can establish a IBGP session with a 7606 when it has to go through the 7206 VXR? The DS-3s are connected to the 7206 VXR. To add more depth to the story. I have 8 IBGP sessions that are connected to the 7206 VXR that have been up and running for over a year. Some of the sessions traverse the DS-3s and or a GigE long haul connections. There are a total 10 Core routers that are mixture of Cisco 7606, 6509s, 7206 VXR w/ NPE400s or G1s. Only this one IBGP session out of 9 routers is not being established. Since I have a switch between the 7606 and 7206, I plan to put a packet capture server and see what I can see.
And is that the one that traverses the 3550 with the 1500 byte MTU? Re-read what we said. You should be able to test the MTU theory by disabling path-mtu-discovery, which will cause MSS to fail back to the minimum 576. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
participants (6)
-
Brian Dickson
-
Florian Weimer
-
Michael Ruiz
-
Mikael Abrahamsson
-
Richard A Steenbergen
-
sthaug@nethelp.no