We were using PMTUD. However: 1) The link was iBGP and was done via crossever with both having default MTU 2) I tried disabling PMTUD with no difference 3) Cisco admitted it was a known bug, and downreving it to 12.4(15)T resolved the issue. ---- Matthew Huff | One Manhattanville Rd OTA Management LLC | Purchase, NY 10577 http://www.ox.com | Phone: 914-460-4039 aim: matthewbhuff | Fax: 914-460-4139
-----Original Message----- From: Paul Cosgrove [mailto:paul.cosgrove@heanet.ie] Sent: Tuesday, February 24, 2009 12:26 PM To: Mills, Charles Cc: Renaud RAKOTOMALALA; Matthew Huff; nanog@nanog.org Subject: Re: Illegal header length in BGP error
Are you using PMTUD?
We saw this on a couple of our route reflectors and on one occasion picked it up in a capture. So I can say that the issue is due to bad packets being sent, rather than an inaccurate error. It can be reported differently according to where the corruption occurs (e.g. unsupported message type, update malformed etc.).
Two production BGP sessions were affected at different times, and one showed errors every few days, the other weeks apart. Both sessions were from route reflectors to other routers receiving full tables, and both traversed multiple hops. All other sessions of these routers were fine. Whilst investigating we identified that different MTUs were being used on the device interfaces at each end of the sessions. The session on which we saw most errors also had lower MTUs on intervening links, so PMTUD was suspected to be a factor.
I replaced one of the paths with a direct link, using identical MTUs, and that stopped the errors on that session (since PMTUD had nothing to do anymore). Just to be sure we recreated a multiple hop topology from our production route reflectors to isolated lab routers, with low intervening link MTUs and ACLs to keep out other unwanted traffic - which also produced the same error on those sessions (but only once each over three months).
After correcting all the MTUs in the production network the errors ceased completely. Our test routers shared these links, but also used an additional link with a low mtu which we deliberately did not fix; as it turned out we not see it again there either so the trigger was not entirely clear.
One other thing to note is that, at the time, we were seeing some other problems with these production routers, whichcisco believed may have been due to SNMP polling of BGP stats. If you have been changing that recently I would also consider it a possibility.
Paul.
I ran into exactly the same thing during a code upgrade a few weeks ago.
I wrote it off as a bug in BGP and backed off the code until a new release was out. I was also running 12.4(22)T On an NPE-G2.
Chuck
-----Original Message----- From: Renaud RAKOTOMALALA [mailto:renaud@rakotomalala.com] Sent: Tuesday, February 24, 2009 10:49 AM To: Matthew Huff; 'nanog@nanog.org' Subject: Re: Illegal header length in BGP error
Hello Matthew,
We changed the motherboard from cisco one of our from 7206VXR (NPE- G1) to 7206VXR (NPE-G2).
Due to incompability with the IOS 12.3(4r)T3 we upgraded this IOS to 12.4(12.2r)T. At the end we've got the same problem as you between one of our 7200 in 12.3 and the new one in 12.4 ....
We solved the problem by upgrading the cisco withe the IOS from 12.4(12.2r) to 12.4(4)XD10 and the BGP session came back alive ....
So now everything work fine between our 7200 (IOS 12.3) and the other 7200 in IOS 12.4(4)XD10
I hope it could help you ...
Cheers, Renaud
Matthew Huff a écrit :
One of our upstream providers flapped this morning, and since then
sending corrupted BPG data. I'm running 12.4(22)T on cisco 7200s. I'm getting no BGP errors from that providers and the number of routes and basic sanity check looks okay. However, when it tries to redistribute the bgp routes via iBGP to our other board routers, we get:
003372: Feb 24 09:17:13.963 EST: %BGP-5-ADJCHANGE: neighbor x.x.x.x Down BGP Notification sent 003373: Feb 24 09:17:13.963 EST: %BGP-3-NOTIFICATION: sent to neighbor x.x.x.x 1/2 (illegal header length) 2 bytes
All routes have identical hardware and IOS versions. My google and cisco search fu leads me to the AS path length bug, but the interesting
that since we have "bgp maxas-limit 75" configured and a recent IOS, we haven't had the problem before when other people were reporting issues. I've also looked at the path mtu issue, and although we haven't had a
before I disabled bgp mtu path discovery, but have the same issues.
Anyone seeing something like this today, and or does anyone have a suggestion on finding out more specific info (which as path for example so I can filter it)?
This e-mail message and any files transmitted with it contain confidential information intended only for the person(s) to whom this email message is addressed. If you have received this e-mail message in error, please notify the sender immediately by telephone or e-mail and destroy the original message without making a copy. Thank you. Neither this information block, the typed name of the sender, nor anything else in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in
Mills, Charles wrote: they are thing is problem this message.