On 2/21/2011 13:10, Chris Wallace wrote:
I am looking for some help with an issue we recently had with one of our BGP peers recently. I currently have two DIA providers each terminated into their own edge router and I am doing iBGP to exchange routes between the two edge routers. Last week Provider A made a policy change "somewhere" in their network in the middle of the day causing traffic to stop routing. Of course this connection happens to be the preferred route for the majority of our inbound and outbound traffic. I never saw our physical link go down and never saw our peer drop therefore BGP did not stop advertising routes, this caused most of our customers traffic to go nowhere. In order to fix the issue I had to manually shutdown the peer till Provider A confirmed the change they made had been reverted. This isn't the first time we have seen this issue with our various providers, how can I prevent issues like this from happening in the future?
I had a provider like that a long time ago; it was an ATG T1 (which was fine) but when they were bought by Eschelon the exact problem you're describing would happen every other month like clockwork. The first time was forgivable. The second time I was annoyed. After the third I was angry, unplugged it, and told them to stuff it because apparently they didn't know how to deal with BGP. You can't prevent it from happening. You can only come up with band-aids to notify you. Save yourself the headache and find a new provider that knows how to handle BGP. What happens if the other circuit is not available (outage, planned maintenance, etc.) at the same time the problem one decides to black hole you? If you're facing the same repeating problem they are obviously not the best fit for you. ~Seth