Re: problem at mae-west tonight?
On a regular basis, MAE West has failures that cause only some of the other routers to become unreachable. I believe all of the instances that I have heard of are related to problems with the Netedges which show up mostly under high load. I've heard various people call it the "Sleeping Interface" problem. It usually goes away when you reset the netedges, but under heavy load it can come back quickly. We have seen it extensively, when our 10mb connection became heavily loaded. Just yesterday, we upgraded to a DS3, so were hoping not to see it for a while. I know AGIS has also had this problem and I believe Best has seen it as well. Maybe you haven't seen it because you have a colocated router at MAE-WEST. Rob
We experienced the same thing with Netcom. Currently we are peered with over 40 netwroks through the RS, but I have only had this problem with Netcom.
Is it really a next-hop problem or a Netcom internal problem? Last time this happened, about 2 weeks ago, they cleared their RA session and did some other things and everything came up fine. I did not get details from the routing folks over there.
I don't quite see how and where the layer 2 topology comes into play here. Netcom should simply be seeing routes (through the RS) that state your MW IP address and the routes advertised from it. Is there some reason that your MW IP would be unreachable by Netcom? I am confused as to why this would ever happen in the MW scenario. Now the PB-NAP is a different story with the non-fully meshed scenario.
Please explain what you mean Matt.
Rob Exodus Communications Inc.
The problem I have with the route server this evening is that I announce my routes to the route server, and my policy configuration in the route server reflects that I peer with Netcom, and so the route server tells Netcom how to reach me. Unfortunately, packets leaving Netcom headed to me at layer 2 are going into a black hole. To fix this, I've had to dump my peering with the route server entirely, so that Netcom is only seeing my routes from AGIS (our transit provider) and not from the route server. Ugh. My fears about the route server not knowing the status of the layer 2 topology have come true, and there's no way to fix this that doesn't involve manual intervention.
-matthew kaufman matthew@scruz.net
Well, I run gated on a BSDI box for the Hooked MAE West router. I'm thinking about implementing a "pingnouse INTERVAL" option on the peer/group commands in gated, so it will periodically ping next hops received from the route servers and set the nouse bit if the nexthop is unreachable. Any better ideas?
It would be nice to come up with a good mechanism for doing 3rd party keepalives that cisco and other router vendors would be willing to implement.
Rob
participants (1)
-
Rob Liebschutz