RE: IS-IS protocol implementation problem
No, I'm a single-AS hosting provider, no confederation. The more I think about it, the more I'm convinced that CEF simply stopped working; all my interfaces were active, and there were no apparent problems with my IGP, which is OSPF. I think that major BGP wigginess caused the CEF problem; thanks very much for you insight, I definitely need to think about it some more. -----Original Message----- From: smd@clock.org [mailto:smd@clock.org] Sent: Monday, October 30, 2000 7:28 AM To: nanog@merit.edu; rdobbins@netmore.net Cc: neil@colt.net; sean@donelan.com Subject: RE: IS-IS protocol implementation problem | I had a bizarre event occur on Thursday night/Friday morning, and this is | likely the culprit. Some of your symptoms are consistent with a badly-broken sloshing IGP, notably the drop in traffic load and large numbers of dying TCPs passing through the afflicted network. This is two sides of the same coin: a destination in your network, learned through (e)BGP is mapped to a next-hop address (typically the interface across which you are talking (e)BGP)and propagated through their network via iBGP. The IGP is used so that each iBGP-talking router knows how to get to each next-hop address. A sloshing IGP will break connectivity between a given router and all the addresses associated with a broken next-hop. A hypothesis: for each afflicted router, the failure of one next-hop-address to be reachable will cause your ENTIRE network to be unreachable by sources relying upon traffic passing through that router. This may mean a sizeable proportion of their customer base simply could not reach you reliably enough to maintain a TCP connection in equilibrium, or at all. Frequent transition to slow-start due to loss/out-of-order-packets *and* a reduction in the overall number of TCP "mice", would severly reduce traffic. An interesting question, however, is why would their iBGP TCP connections appear to remain functional (you aren't losing eBGP routes) in this sort of mess? Did loopback addresses not come and go, but interface addresses did? (That would be interesting to consider in the face of possible aggregation of interface addresses into the IGP). Is there significant partitioning because of, for example, AS confederating, mitigatiing the problem by removing iBGP's need to know about distant loopback addresses, but not distant next-hop-addresses? We are lucky to have what could be a very interesting case study in routing scalability trade-offs. What a pity nothing like outage@sprint.net exists any more, where we might find useful information from the victim provider. :-( Sean.
participants (1)
-
rdobbins@netmore.net