Persistent BGP peer flapping - do you care?
NANOG -- We are finalizing a revision of the BGP specification. It is in last call for your new BGP specification. This BGP revision is to match the bgp in deployment. One part of the specification remains, a fix for a problem called "persistent bgp flapping". We urgently need input from nanog folks on what is deployed. Here's the description "presistent bgp flapping" from the BGP specification:
If a BGP speaker detects an error, it shuts down the connection and changes its state to Idle. Getting out of the Idle state requires generation of the Start event. If such an event is generated automatically, then persistent BGP errors may result in persistent flapping of the speaker.
1) Do any of the ISPs see the persistent bgp peer flapping now? Does anyone from the ISP community ever experience anything like this BGP persistent peer flapping? The solution is to have an exponential backoff in the rate of sending the Opens. If you have seen this, how many routers did this persistent bgp peer flapping impact? (Can you give a % of your routers or a total number)? How often does this impact your routers? 2) Is this feature on in your machine by default? If not, do you configure the exponential rates? 3)Do you track if your routers are in this state? How do you track if your routers are in this state? Sue Hares
This has been bandied about before, but one should note that the "drop the peer if an error is received" is only really effective if the session that initiated the error does not propogate it. Most Cisco routers running common IOS images not only do not drop the session, but pass along the bad prefix, which leads to the occasional bad route dropping peering sessions on most of the Enterasys(*) routers on the planet. I guess the main question is what is considered an "error" - if the peer starts obviously misbehaving, then yet, drop the peer. But don't drop the peer due to an invalid prefix that most likely did not originate on that router - it would be much better for the 'net as a whole to just drop the bad prefix and carry on. Maybe a algorithm could be built in where the peer could be dropped if the number of bad prefixes exceeds a set threshold... In short, the "drop the session when you get a bad prefix" only works its intended purpose when every router that speaks BGP does this. If that can't be had, we should really revisit the spec in that regard. -Chris (*) among other vendors; it was a customer's Enterasys router that got my attention the last time it happened... -Chris On Thu, Jan 17, 2002 at 02:27:25PM -0500, Susan Hares wrote:
NANOG --
We are finalizing a revision of the BGP specification. It is in last call for your new BGP specification.
This BGP revision is to match the bgp in deployment. One part of the specification remains, a fix for a problem called "persistent bgp flapping". We urgently need input from nanog folks on what is deployed.
Here's the description "presistent bgp flapping" from the BGP specification:
If a BGP speaker detects an error, it shuts down the connection and changes its state to Idle. Getting out of the Idle state requires generation of the Start event. If such an event is generated automatically, then persistent BGP errors may result in persistent flapping of the speaker.
1) Do any of the ISPs see the persistent bgp peer flapping now?
Does anyone from the ISP community ever experience anything like this BGP persistent peer flapping? The solution is to have an exponential backoff in the rate of sending the Opens.
If you have seen this, how many routers did this persistent bgp peer flapping impact? (Can you give a % of your routers or a total number)? How often does this impact your routers?
2) Is this feature on in your machine by default? If not, do you configure the exponential rates?
3)Do you track if your routers are in this state? How do you track if your routers are in this state?
Sue Hares
-- --------------------------- Christopher A. Woodfield rekoil@semihuman.com PGP Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xB887618B
On Thu, Jan 17, 2002 at 03:10:00PM -0500, Christopher A. Woodfield wrote:
This has been bandied about before, but one should note that the "drop the peer if an error is received" is only really effective if the session that initiated the error does not propogate it. Most Cisco routers running common IOS images not only do not drop the session, but pass along the bad prefix, which leads to the occasional bad route dropping peering sessions on most of the Enterasys(*) routers on the planet.
Actually, my understanding (I haven't had this happen to me directly, so I can't say) is that Ciscos propagate the bad route and -then- drop the session. But regardless...
I guess the main question is what is considered an "error" - if the peer starts obviously misbehaving, then yet, drop the peer. But don't drop the peer due to an invalid prefix that most likely did not originate on that router - it would be much better for the 'net as a whole to just drop the bad prefix and carry on. Maybe a algorithm could be built in where the peer could be dropped if the number of bad prefixes exceeds a set threshold...
In short, the "drop the session when you get a bad prefix" only works its intended purpose when every router that speaks BGP does this. If that can't be had, we should really revisit the spec in that regard.
RFC1771: 6. BGP Error Handling. This section describes actions to be taken when errors are detected while processing BGP messages. When any of the conditions described here are detected, a NOTIFICATION message with the indicated Error Code, Error Subcode, and Data fields is sent, and the BGP connection is closed. If no Error Subcode is specified, then a zero must be used. If the RFC states "drop the session when you get a bad prefix" then I would like to think "every router that speaks BGP does this" could be a safe expectation. I know, I know, there's a difference between "what the spec says" and "what the product does", but isn't this the point of RFC standards? On the other hand, I suppose the argument could be made that the RFC doesn't actually say "the BGP session is closed without the invalid update being propagated to other peers". However, if your BGP engine can detect an invalid update (which it can, if it is closing the session), isn't it a given that it should know not to propagate said update? -c
Chris: Thanks for the input. This is the revisit the specification time. Just to confirm your answer, I'll paraphrase it and let you know what happened. the persistent bgp peer flapping happens when you (one of the paths) 1) Error causes stop (bad prefix --> drop connection) 2) BGP peer goes to IDLE state 3) Automatic restart happens (cisco doesn't utilize the backoff) 4) Open sent 5) active 6) error due to bad prefix still being sent 7) Idle Hold time (time delay here) --> go back to #1 Specification says to slow down the cycle of the establishing by increase the time delay in step #7. I think we are describing the same problem. Could you please confirm? At 03:10 PM 1/17/2002 -0500, Christopher A. Woodfield wrote:
This has been bandied about before, but one should note that the "drop the peer if an error is received" is only really effective if the session that initiated the error does not propogate it. Most Cisco routers running common IOS images not only do not drop the session, but pass along the bad prefix, which leads to the occasional bad route dropping peering sessions on most of the Enterasys(*) routers on the planet.
Do the peering sessions drop once or repeatedly until the bad prefix gets cleared out?
I guess the main question is what is considered an "error" - if the peer starts obviously misbehaving, then yet, drop the peer. But don't drop the peer due to an invalid prefix that most likely did not ori0ginate on that router - it would be much better for the 'net as a whole to just drop the bad prefix and carry on. Maybe a algorithm could be built in where the peer could be dropped if the number of bad prefixes exceeds a set threshold...
The algorithms for what constitutes a "drop" can be an implementation detail or be specified as an optional portion of the next version of the BGP specification.
In short, the "drop the session when you get a bad prefix" only works its intended purpose when every router that speaks BGP does this. If that can't be had, we should really revisit the spec in that regard.
The specification says "recommended" (should) now and as we noted with cisco, not all vendors implement it. We are documenting existing practice so recommended/should will remain. If you think it is a very serious operational issue, you can always input to the idr mailing list that the "should" needs to be "must" due to an operational issues. Thanks again for answering the cry for help! Sue
See the "BGP Noise Tonight?" thread from the NANOG archives, October 2001. A bogus prefix (a leaked comfederation string) originated from $NETWORK, and that networks' peers/upstreams, whether or not they dropped the peer, propogated the prefix, in violation of RFC behavior. Said prefix propagated to most of the 'net, and every RFC-compliant router that got it dropped its peering session. The customer I was working with, for example, lost complete connectivity because all three of his upstream providers sent him the bad route. If everyone had followed the RFC, the prefix would have never made it past the first peering sessions, and the damage would have been contained. But because most Cisco routers don't follow the RFC, it became a much more widespread operational issue as the bad prefix hit many, many RFC-compliant routers and caused many, many peers to drop unnecessarily. We witnessed exactly the kind of looping you're talking about; session gets bad prefix, drop, reopens, gets same bad prefix again, drops, and on and on; this could definitely benefit from the holddown timer you're talking about. The problem here, ironically, is that it's not the Cisco networks that fail in this case, but the hardware that actually tries to do the right thing, which is fine on paper, but in this case, turned out to be exactly the Wrong Thing. My opinion would be to modify the spec as follows: 1. Upon receipt of an invalid prefix advertisement, notify the sender of the error and flush the advertisement. Do not drop the peering session unless a large number of bad prefixes are received (possibly as a percentage of total route updates received over a given amount of time). 2. Upon receipt of invalid BGP control/negotiation data (i.e. data that's not part of a prefix advertisement, such as keepalives, etc), notify sender of the error and drop the peering session. I agree with your holddown timer proposal in cases of the peer being dropped due to errors, as the resultant loops can result in extreme prefix dampening. But my assertation is that BGP peering sessions should be a bit more robust and not drop everything at the first sign of trouble. -Chris On Thu, Jan 17, 2002 at 04:14:10PM -0500, Susan Hares wrote:
Chris:
Thanks for the input. This is the revisit the specification time. Just to confirm your answer, I'll paraphrase it and let you know what happened.
the persistent bgp peer flapping happens when you (one of the paths)
1) Error causes stop (bad prefix --> drop connection)
2) BGP peer goes to IDLE state 3) Automatic restart happens (cisco doesn't utilize the backoff) 4) Open sent 5) active 6) error due to bad prefix still being sent 7) Idle Hold time (time delay here)
--> go back to #1
Specification says to slow down the cycle of the establishing by increase the time delay in step #7.
I think we are describing the same problem. Could you please confirm?
At 03:10 PM 1/17/2002 -0500, Christopher A. Woodfield wrote:
This has been bandied about before, but one should note that the "drop the peer if an error is received" is only really effective if the session that initiated the error does not propogate it. Most Cisco routers running common IOS images not only do not drop the session, but pass along the bad prefix, which leads to the occasional bad route dropping peering sessions on most of the Enterasys(*) routers on the planet.
Do the peering sessions drop once or repeatedly until the bad prefix gets cleared out?
I guess the main question is what is considered an "error" - if the peer starts obviously misbehaving, then yet, drop the peer. But don't drop the peer due to an invalid prefix that most likely did not ori0ginate on that router - it would be much better for the 'net as a whole to just drop the bad prefix and carry on. Maybe a algorithm could be built in where the peer could be dropped if the number of bad prefixes exceeds a set threshold...
The algorithms for what constitutes a "drop" can be an implementation detail or be specified as an optional portion of the next version of the BGP specification.
In short, the "drop the session when you get a bad prefix" only works its intended purpose when every router that speaks BGP does this. If that can't be had, we should really revisit the spec in that regard.
The specification says "recommended" (should) now and as we noted with cisco, not all vendors implement it. We are documenting existing practice so recommended/should will remain.
If you think it is a very serious operational issue, you can always input to the idr mailing list that the "should" needs to be "must" due to an operational issues.
Thanks again for answering the cry for help!
Sue
-- --------------------------- Christopher A. Woodfield rekoil@semihuman.com PGP Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xB887618B
### On Thu, 17 Jan 2002 17:00:06 -0500, "Christopher A. Woodfield" ### <rekoil@semihuman.com> casually decided to expound upon Susan Hares ### <skh@nexthop.com> the following thoughts about "Re: Persistent BGP peer ### flapping - do you care?": CAW> I agree with your holddown timer proposal in cases of the peer being dropped due to CAW> errors, as the resultant loops can result in extreme prefix dampening. But my CAW> assertation is that BGP peering sessions should be a bit more robust and not drop CAW> everything at the first sign of trouble. Well, as I recall, the original intent to drop the entire session and thereby flush that peer from the table is because an invalid advertisement may be symptomatic of a larger scale table corruption on the part of the peer thus all advertisements should be invalidated. Dropping the peer and thereby initiating a coldstart/reset was the conservative solution. I think some form of peer damping with an exponential decay timer much like route flap damping would be a good thing. Simply reject the OPEN until the decay timer has expired. As for propogation of the bad prefix... well that soapbox has worn paint on top. If people aren't going to bother following specs in the first place I'm not sure a new spec will solve anything. -- /*===================[ Jake Khuon <khuon@NEEBU.Net> ]======================+ | Packet Plumber, Network Engineers /| / [~ [~ |) | | --------------- | | for Effective Bandwidth Utilisation / |/ [_ [_ |) |_| N E T W O R K S | +=========================================================================*/
solution for customer data and trouble ticketing system. I have already used Request Tracker. thanks, arman
We've been using bugzilla for some time without many issues... Regards, Christopher J. Wolff, VP CIO Broadband Laboratories http://www.bblabs.com -----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Arman Sent: Friday, January 18, 2002 3:37 PM Cc: nanog@merit.edu Subject: recommendation for open source based solution for customer data and trouble ticketing system. I have already used Request Tracker. thanks, arman
1) Do any of the ISPs see the persistent bgp peer flapping now?
Almost every day on different EBGP sessions (very few IBGP) Reasons are most of the time circuit problems or max prefixes. Most of the time, problem is not in BGP protocol itself.
The solution is to have an exponential backoff in the rate of sending the Opens.
According to my records (i log BGP flaps for 6 months), Exponential backoff on all boxes would reduce flaps by 80%.
If you have seen this, how many routers did this persistent bgp peer flapping impact? (Can you give a % of your routers or a total number)? How often does this impact your routers?
Almost 50% routers experience BGP flap in 6 months, but 80% flaps are on 5% of all the boxes. I can give you more precise data privatly is necessary :-)
2) Is this feature on in your machine by default? If not, do you configure the exponential rates?
Most boxes i operate do not provide this feature. Other have a constant timer before going from Idle to Active.
3)Do you track if your routers are in this state? How do you track if your routers are in this state?
Noway to track for the moment. Vincent.
participants (7)
-
Arman
-
Christopher A. Woodfield
-
Christopher J. Wolff
-
Clayton Fiske
-
Jake Khuon
-
Susan Hares
-
Vincent Gillet