Routing problem with 206.117.0.0/16
There is currently a routing problem with parts of 206.117.0.0/16 apparently as a result of a bad external announcement of some type. It is *not* apparently as a result of 206.117.0.0 or its direct upstreams. The source network of the bad announcement has been identified, and is attempting to fix it. This address space encompasses part of Los Nettos, (USC, ISI, Caltech, JPL, TRW, CenterGate Research etc.) and includes GeekTools superWhois, and SamSpade.org, which explains why many of you are getting failures in whois queries. -- Rodney Joffe CenterGate Research Group, LLC. http://www.centergate.com "Technology so advanced, even we don't understand it!"(SM)
A heavily flapping AS struck my curiosity: AS 10916. Somehow, AS 6138 constantly appears and disappears out of the path leading through AS 1239 - the other path to them via AS 701 is barely flapping at all. # sh ip bgp regexp 10916 [...] Network Next Hop Metric LocPrf Weight Path *d 63.87.96.0/24 censored 95 110 0 13789 1239 10916 i *> censored 98 100 0 701 10916 i [...] And every few minutes: *d 63.87.96.0/24 censored 95 110 0 13789 1239 6138 10916 i *> censored 98 100 0 701 10916 i [...] What could be the cause for an AS appearing/disappearing in a path every few minutes? Is it really AS 6138 that is flapping for 10916? For some reason they prefer the indirect route through 6138 to 1239 (SprintLink), instead of their direct connection to 1239. These are the times when such a peer should be shut down for the sanity of the rest of the network. bye,Kai
A heavily flapping AS struck my curiosity: AS 10916.
Somehow, AS 6138 constantly appears and disappears out of the path leading through AS 1239 - the other path to them via AS 701 is barely flapping at all.
# sh ip bgp regexp 10916 [...] Network Next Hop Metric LocPrf Weight Path *d 63.87.96.0/24 censored 95 110 0 13789 1239 10916 i *> censored 98 100 0 701 10916 i [...]
And every few minutes:
*d 63.87.96.0/24 censored 95 110 0 13789 1239 6138 10916 i *> censored 98 100 0 701 10916 i [...]
What could be the cause for an AS appearing/disappearing in a path every few minutes? Is it really AS 6138 that is flapping for 10916? For some reason they prefer the indirect route through 6138 to 1239 (SprintLink), instead of their direct connection to 1239. These are the times when such a peer should be shut down for the sanity of the rest of the network.
bye,Kai
Easily identified (but certainly not complete catalog of) reasons for such a flap that come to mind knowing nothing else about them other than what you describe above: (a) Router with insufficient memory for full BGP table from that view perspective (it fills up to memory capacity, collapses, BGP is reset, routes flap, wash, rinse, repeat) (b) Link that both BGP and traffic pass through is insufficient for continued keepalives once traffic moves in that direction (line becomes preferred by a large amount of traffic, traffic floods line, BGP keepalives fail, BGP session fails, traffic moves away, wash, rinse, repeat - see RED discussion archives some months ago for more detailed discussions on traffic flow dampening with similar patterns.) Quite a few people have problem type (a) happen more often that you might think - I've run across it several times in the dim past, either as a memory problem or with BGP implementations that choked on certain corrupted/unusual advertisements halfway through the table transfer. If it's a memory problem, it's often an ACL issue that is related to someone removing the "sanity" ACL that otherwise would protect a smaller router from the falures that would occur with a full table update. ("Gee, this ACL seems to be preventing a full table from being sent to the customer. I'm sure the customer really wanted a full table - I'll remove the list.") This all being said, I'm willing to bet that neither (a) nor (b) is at the root of the problem here, but they're both possible. Your question centered more on the path than on the cause, so I'll take a swing at it. Since you're looking at the insertion of a route into a path, a possible situation might be that AS10916 peers with AS1239, and also peers with AS6138 who is also a transit user for AS1239. Sprint (AS1239) would prefer and re-advertise the route from their most direct customer when they could hear it from their direct customer (AS10916), and that would override the announcement coming from AS6138, which would be less-preferred. When (link 1) goes away, then Sprint would prefer and re-announce the route being heard from 10916 via (link 2). 701 \ \____________ | | link 2 1239---10916--------6138 / \________________/ / link 1 13789 JT
At Friday 04:59 PM 8/11/00, John Todd wrote:
701 \ \____________ | | link 2 1239---10916--------6138 / \________________/ / link 1 13789
JT
Thank you, Todd, this was the scenario I had in mind. Nice to see someone still excells in the field of ASCII art, too :) The POC of AS10916 has emailed me back since, and is looking into this issue. I was also intrigued by:
(b) Link that both BGP and traffic pass through is insufficient for continued keepalives once traffic moves in that direction (line becomes preferred by a large amount of traffic, traffic floods line, BGP keepalives fail, BGP session fails, traffic moves away, wash, rinse, repeat - see RED discussion archives some months ago for more detailed discussions on traffic flow dampening with similar patterns.)
What can be done to prevent flapping in this situation, other than putting QoS mechanisms into place to prefer the BGP traffic over everything else? Is there a good and automated (Cisco-leaning, sorry) way to keep BGP sessions down if they have flapped too often?
Hi, See the following comments. ----- Original Message ----- From: "Kai Schlichting" <kai@pac-rim.net> To: "John Todd" <jtodd@loligo.com> Cc: <nanog@merit.edu> Sent: Saturday, August 12, 2000 5:22 AM Subject: Re: flap flap: AS 10916
At Friday 04:59 PM 8/11/00, John Todd wrote:
701 \ \____________ | | link 2 1239---10916--------6138 / \________________/ / link 1 13789
JT
Thank you, Todd, this was the scenario I had in mind. Nice to see someone still excells in the field of ASCII art, too :) The POC of AS10916 has emailed me back since, and is looking into this issue.
I was also intrigued by:
(b) Link that both BGP and traffic pass through is insufficient for continued keepalives once traffic moves in that direction (line becomes preferred by a large amount of traffic, traffic floods line, BGP keepalives fail, BGP session fails, traffic moves away, wash, rinse, repeat - see RED discussion archives some months ago for more detailed discussions on traffic flow dampening with similar patterns.)
What can be done to prevent flapping in this situation, other than putting QoS mechanisms into place to prefer the BGP traffic over everything else? Is there a good and automated (Cisco-leaning, sorry) way to keep BGP sessions down if they have flapped too often?
In response to your prefered solution, it's natural to turn on bgp dampen. See the following excerpt: " BGP Dampening Route flap dampening (introduced in Cisco Internetwork Operating System [Cisco IOS] Release 11.0) is a mechanism for minimising the instability caused by route flapping. Route flapping is the BGP network prefixes being frequently added and removed. Whenever a network goes down, the rest of the Internet would like to know about it. Hence, BGP propagates that state change throughout the Internet. Yet, if this state change is happening from a faulty circuits (frequently going up and down) or from mis-configured routing (redistributed the IGP into the EGP), the Internet would experience several hundred BGP state changes a second. For every state change, BGP must allocate time to work to process the work and pass on the changed to all other BGP neighbours. This places a tremendous strain on the backbone routers. Hence, the tool to control and minimise the effect of route flaps - BGP Dampening. The following are the commands used to control route dampening: bgp dampening [[route-map map-name] [half-life-time reuse-value suppress-value maximum-suppress- time]] " But I wonder why not turn to your "other than" solution? In a Cisco box, youcan turn on the SPD feature. See the following excerpt: " When a link goes to a saturated state, the router will drop packets. The problem is that the router will drop any type of packets - including routing protocol packets. Selective Packet Discard (SPD) will attempt to toss non-routing packets instead of routing packets when the link is overloaded. In releases 11.1CA and 11.1CC, the configuration command: ip spd enable will switch on SPD. Selective Packet Discard is enabled by default on 11.2(5)P and more recent releases " regards, Yu Ning ------------------------------------------------------ (Mr.) Yu(2) Ning(2) ChinaNET(AS4134) Backbone Operation Center Networking Dep.,Datacom Bureau, China Telecom. Beijing,P.R.C +86-10-66418105/66418121/66418123(fax) ------------------------------------------------------
participants (4)
-
John Todd
-
Kai Schlichting
-
Rodney Joffe
-
Yu Ning