Route Supression Problem

12 Mar 2003

      Unless useful to others, feel free to just reply off-list.

Background:

Tuesday (yesterday) morning around 1am, I got a phone call from one of my
transit customers(which seems more like a dream). I, sadly, didn't have the
router they are on logging to a server, so it's impossible for me to see
exactly what happened. Here's what I have. They received a minor spike in
traffic going to them. My router shows the last BGP peer reset about that
time, so this could be me sending the global table. His bandwidth then drops
to 0 for almost exactly 30 minutes (MRTG isn't an exactly graph). My guess
(authoratative answer) was the customer flapped their routes once too many
times and was suppressed by both of my providers, as I seem to recall the
penalty heal rate is in 30 minute increments.

First issue is, am I right? If I am, then I need to develop ways to limit
the damage done to my customer. Is there a way to setup route supression
just under what most people use so that I can have client fix the problem
and then clear the suppress on my network to allow them to come back up
immediately just under the suppress threshold? Another possibility, although
I've not seen reference to it, since the customer only transits through my
network and depends on my redundancy, is it possible to hold his routes in
the tables and keep advertising them out unless they are down for a set time
period (ie, ignore flaps, but drop them if he's down 15-30 minutes)?

I've never seen this issue. I was aware supression was possible when I first
started learning BGP, and so I have never risked bouncing my peers more than
three times in a day, and at that point usually quit playing until the next
week. When my peers flap due to DDOS attacks, BGP never stabalizes fully or
my providers have protected my networks (though I haven't seen how 69.8/18
will react in this scenario which doesn't have a shorter prefix at the
peer).

My customer is thinking of multi-homing again after this. Of course, it
wouldn't have saved the customer. The reason they left multi-homing is that
their network is in the same building and they only have one BGP router. I
don't think multiple paths would have saved them.

Opinions? Suggestions? Options?

-Jack

~We now return you to the 69/8 threads

Jack Bates

Iljitsch van Beijnum

Randy Bush

Iljitsch van Beijnum

Peter E. Fry

Iljitsch van Beijnum

Vadim Antonov

John Kristoff

Randy Bush

tags

participants (6)