On Wed, 12 Mar 2003, Jack Bates wrote:
traffic going to them. My router shows the last BGP peer reset about that time, so this could be me sending the global table. His bandwidth then drops to 0 for almost exactly 30 minutes (MRTG isn't an exactly graph). My guess (authoratative answer) was the customer flapped their routes once too many times and was suppressed by both of my providers, as I seem to recall the penalty heal rate is in 30 minute increments.
Were there more flaps than just that last one before everything became very quiet? A flap (up->down transition) has a penalty of 1000. By default (if dampening is enabled), the dampen threshold is 2000. You need at least three flaps to trigger dampening.
First issue is, am I right? If I am, then I need to develop ways to limit the damage done to my customer.
Yell at your upstreams.
Is there a way to setup route supression just under what most people use so that I can have client fix the problem and then clear the suppress on my network to allow them to come back up immediately just under the suppress threshold?
Dampening doesn't work on direct eBGP sessions: when the session is lost the dampening info is removed from memory. So dampening your own customers doesn't really do anything. For this reason, it seems curious to me that both your upstreams use rather aggressive dampening. (See RIPE-229 for some considerations on good dampening practices.)
Opinions? Suggestions? Options?
If this happens again you can simply reset your sessions to your upstreams (one at a time of course) to get rid of the dampening IN THE NEXT HOP AS. However, if the trouble is further upstream this only makes matters worse.