We see a number of session towards downstreams flaps obviously caused by prefix 120.29.240.0/21, originated by AS45158, transited by AS4739 (see below). Best regards, Fredy Kuenzler Init7 / AS13030 #sh ip bgp 120.29.240.0 Number of BGP Routes matching display condition : 4 Status codes: s suppressed, d damped, h history, * valid, > best, i internal Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop MED LocPrf Weight Path *>i 120.29.240.0/21 206.223.143.99 21 150 0 4739 45158 {64512 64514 64516 64519 64521 64522 64525 64526 64528 64529 64530 64535 64537 64538 64541 64542 64543 64544 64545 64546 64547 64548 64549 64552 64553 64556 64557 64560 64561 64562 64564 64565 64566 64568 64569 64570 64574 64575 64576 64577 64578 64580 64582 64583 64584 64588 64593 64598 64599 64601 64602 64605 64610 64611 64620 64621 65397 65398 65470 65471 65472 65473 65474 65479 65480 65484 65485 65490 65502 65505 65511 65514 65523 65524 65528 65534 65609} ? *i 120.29.240.0/21 206.223.143.99 21 150 0 4739 45158 {64512 64514 64516 64519 64521 64522 64525 64526 64528 64529 64530 64535 64537 64538 64541 64542 64543 64544 64545 64546 64547 64548 64549 64552 64553 64556 64557 64560 64561 64562 64564 64565 64566 64568 64569 64570 64574 64575 64576 64577 64578 64580 64582 64583 64584 64588 64593 64598 64599 64601 64602 64605 64610 64611 64620 64621 65397 65398 65470 65471 65472 65473 65474 65479 65480 65484 65485 65490 65502 65505 65511 65514 65523 65524 65528 65534 65609} ? *i 120.29.240.0/21 206.223.143.99 21 150 0 4739 45158 {64512 64514 64516 64519 64521 64522 64525 64526 64528 64529 64530 64535 64537 64538 64541 64542 64543 64544 64545 64546 64547 64548 64549 64552 64553 64556 64557 64560 64561 64562 64564 64565 64566 64568 64569 64570 64574 64575 64576 64577 64578 64580 64582 64583 64584 64588 64593 64598 64599 64601 64602 64605 64610 64611 64620 64621 65397 65398 65470 65471 65472 65473 65474 65479 65480 65484 65485 65490 65502 65505 65511 65514 65523 65524 65528 65534 65609} ? * 120.29.240.0/21 77.67.76.237 1546 150 0 3257 7473 7474 9300 45158 i Last update to IP routing table: 0h25m47s, 1 path(s) installed: Route is advertised to 1 peers: 213.144.128.180(13030)
On Wed, Nov 17, 2010 at 10:19:55AM +0100, Fredy Kuenzler wrote:
We see a number of session towards downstreams flaps obviously caused by prefix 120.29.240.0/21, originated by AS45158, transited by AS4739 (see below).
The same here, but in my case we're downstream itself :) JunOS logs: Nov 17 01:28:54.955 2010 <gw> rpd[1391]: bgp_read_v4_update:8283: NOTIFICATION sent to <peer> (External AS 22822): code 3 (Update Message Error) subcode 1 (invalid attribute list) Nov 17 01:28:54.982 2010 <gw> rpd[1391]: Received BAD update from <peer> (External AS 22822), aspath_attr():3055 PA4_TYPE_AS4PATH(17) => 50 times FLAPPED family inet-unicast(1), prefix 120.29.240.0/21 Nov 17 01:28:55.009 2010 <gw> rpd[1391]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer <peer> (External AS 22822) changed state from Established to Idle (event RecvUpdate)
Best regards,
Fredy Kuenzler Init7 / AS13030
#sh ip bgp 120.29.240.0 Number of BGP Routes matching display condition : 4 Status codes: s suppressed, d damped, h history, * valid, > best, i internal Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop MED LocPrf Weight Path *>i 120.29.240.0/21 206.223.143.99 21 150 0 4739 45158 {64512 64514 64516 64519 64521 64522 64525 64526 64528 64529 64530 64535 64537 64538 64541 64542 64543 64544 64545 64546 64547 64548 64549 64552 64553 64556 64557 64560 64561 64562 64564 64565 64566 64568 64569 64570 64574 64575 64576 64577 64578 64580 64582 64583 64584 64588 64593 64598 64599 64601 64602 64605 64610 64611 64620 64621 65397 65398 65470 65471 65472 65473 65474 65479 65480 65484 65485 65490 65502 65505 65511 65514 65523 65524 65528 65534 65609} ? *i 120.29.240.0/21 206.223.143.99 21 150 0 4739 45158 {64512 64514 64516 64519 64521 64522 64525 64526 64528 64529 64530 64535 64537 64538 64541 64542 64543 64544 64545 64546 64547 64548 64549 64552 64553 64556 64557 64560 64561 64562 64564 64565 64566 64568 64569 64570 64574 64575 64576 64577 64578 64580 64582 64583 64584 64588 64593 64598 64599 64601 64602 64605 64610 64611 64620 64621 65397 65398 65470 65471 65472 65473 65474 65479 65480 65484 65485 65490 65502 65505 65511 65514 65523 65524 65528 65534 65609} ? *i 120.29.240.0/21 206.223.143.99 21 150 0 4739 45158 {64512 64514 64516 64519 64521 64522 64525 64526 64528 64529 64530 64535 64537 64538 64541 64542 64543 64544 64545 64546 64547 64548 64549 64552 64553 64556 64557 64560 64561 64562 64564 64565 64566 64568 64569 64570 64574 64575 64576 64577 64578 64580 64582 64583 64584 64588 64593 64598 64599 64601 64602 64605 64610 64611 64620 64621 65397 65398 65470 65471 65472 65473 65474 65479 65480 65484 65485 65490 65502 65505 65511 65514 65523 65524 65528 65534 65609} ? * 120.29.240.0/21 77.67.76.237 1546 150 0 3257 7473 7474 9300 45158 i Last update to IP routing table: 0h25m47s, 1 path(s) installed: Route is advertised to 1 peers: 213.144.128.180(13030)
-- In theory, there is no difference between theory and practice. But, in practice, there is.
Am 17.11.2010 10:19, schrieb Fredy Kuenzler:
We see a number of session towards downstreams flaps obviously caused by prefix 120.29.240.0/21, originated by AS45158, transited by AS4739 (see below).
#sh ip bgp 120.29.240.0 Number of BGP Routes matching display condition : 4 Status codes: s suppressed, d damped, h history, * valid, > best, i internal Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop MED LocPrf Weight Path *>i 120.29.240.0/21 206.223.143.99 21 150 0 4739 45158 {64512 64514 64516 64519 64521 64522 64525 64526 64528 64529 64530 64535 64537 64538 64541 64542 64543 64544 64545 64546 64547 64548 64549 64552 64553 64556 64557 64560 64561 64562 64564 64565 64566 64568 64569 64570 64574 64575 64576 64577 64578 64580 64582 64583 64584 64588 64593 64598 64599 64601 64602 64605 64610 64611 64620 64621 65397 65398 65470 65471 65472 65473 65474 65479 65480 65484 65485 65490 65502 65505 65511 65514 65523 65524 65528 65534 65609} ?
After some investigation I can post a summary of the incident. At appx. 9:33 CET we saw the first flaps, affecting most of our downstreams, with Cisco and Juniper routers. Our backbone, based on Brocade XMR, was not affected, apart from the number of BGP updates which caused some CPU load. Ironically the incident prefix got picked up by a Cisco edge router, and the session to the peer (AS4739) where the prefix got injected didn't crash either. We filtered the evil prefix, and then the systems became stable again. Meanwhile AS4739 shut down the BGP session with the originator AS45158 (thanks MMC). The propagation itself of the originator is rather uncommon, I'd say, as we can see, it's a BGP confederation of not less than 77 private AS numbers. Don't know for what it should be useful... We asked some customers what gear they are running, and here is a short compilation - all these systems were affected by the BGP flaps: - Cisco 2821 - c2800nm-advipservicesk9-mz.124-20.T4 - Cisco 2821 - c2800nm-advipservicesk9-mz.124-24.T1.bin - Cisco ASR1002F - asr1000rp1-adventerprisek9.03.01.01.S.150-1.S1.bin - Juniper MX480 - junos 10.0R3.10 We couldn't observe flaps of Quagga. Also not one single iBGP session was affected within our Brocade / Cisco network. Best regards, Fredy Kuenzler Init7 / AS13030
At 07:40 AM 11/17/2010, Fredy Kuenzler wrote:
Am 17.11.2010 10:19, schrieb Fredy Kuenzler:
We see a number of session towards downstreams flaps obviously caused by prefix 120.29.240.0/21, originated by AS45158, transited by AS4739 (see below).
#sh ip bgp 120.29.240.0 Number of BGP Routes matching display condition : 4 Status codes: s suppressed, d damped, h history, * valid, > best, i internal Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop MED LocPrf Weight Path *>i 120.29.240.0/21 206.223.143.99 21 150 0 4739 45158 {64512 64514 64516 64519 64521 64522 64525 64526 64528 64529 64530 64535 64537 64538 64541 64542 64543 64544 64545 64546 64547 64548 64549 64552 64553 64556 64557 64560 64561 64562 64564 64565 64566 64568 64569 64570 64574 64575 64576 64577 64578 64580 64582 64583 64584 64588 64593 64598 64599 64601 64602 64605 64610 64611 64620 64621 65397 65398 65470 65471 65472 65473 65474 65479 65480 65484 65485 65490 65502 65505 65511 65514 65523 65524 65528 65534 65609} ?
After some investigation I can post a summary of the incident.
At appx. 9:33 CET we saw the first flaps, affecting most of our downstreams, with Cisco and Juniper routers. Our backbone, based on Brocade XMR, was not affected, apart from the number of BGP updates which caused some CPU load.
Ironically the incident prefix got picked up by a Cisco edge router, and the session to the peer (AS4739) where the prefix got injected didn't crash either.
We filtered the evil prefix, and then the systems became stable again.
Meanwhile AS4739 shut down the BGP session with the originator AS45158 (thanks MMC).
The propagation itself of the originator is rather uncommon, I'd say, as we can see, it's a BGP confederation of not less than 77 private AS numbers. Don't know for what it should be useful...
It's not a confederation, but an AS_SET. Confederations paths are represented like this: (65251 65200 65202) whereas a AS_SET is an unordered list, used for loop detection when aggregating routes. Hmm.. It appears as though remove-private-as does NOT strip out private AS numbers from a AS_SET. Anyone know why you wouldn't want to do that? The route would be filtered by anyone using one of the above ASes, which is exactly why it should be stripped. Anyone have a customer router or running a confederation using one of those ASes which can verify?
We asked some customers what gear they are running, and here is a short compilation - all these systems were affected by the BGP flaps:
- Cisco 2821 - c2800nm-advipservicesk9-mz.124-20.T4 - Cisco 2821 - c2800nm-advipservicesk9-mz.124-24.T1.bin - Cisco ASR1002F - asr1000rp1-adventerprisek9.03.01.01.S.150-1.S1.bin - Juniper MX480 - junos 10.0R3.10
Bringing up a earlier thread, we rejected the route because we've set a maxas-limit of 50 and log it so I woke up to find these messages in my email this morning rather than a phone call in the middle of the night: Nov 17 02:32:46.133 CST: %BGP-6-ASPATH: Long AS path 4323 4739 45158 {64512,64514,64516,64519,64521,64522,64525,64526,64528,64529,64530,64535,64537,64538,64541,64542,64543,64544,64545,64546,64547,64548,64549,64552,64553,64556,64557,64560,64561,64562,64564,64565,64566,64568,64569,64570 received from 207.250.148.153: Prefixes: 120.29.240.0/21
We couldn't observe flaps of Quagga. Also not one single iBGP session was affected within our Brocade / Cisco network.
Best regards,
Fredy Kuenzler Init7 / AS13030
-James
On (2010-11-17 14:40 +0100), Fredy Kuenzler wrote:
We asked some customers what gear they are running, and here is a short compilation - all these systems were affected by the BGP flaps:
- Cisco 2821 - c2800nm-advipservicesk9-mz.124-20.T4 - Cisco 2821 - c2800nm-advipservicesk9-mz.124-24.T1.bin - Cisco ASR1002F - asr1000rp1-adventerprisek9.03.01.01.S.150-1.S1.bin - Juniper MX480 - junos 10.0R3.10
I think we really need community tool to test BGP implementations against known/past bugs and unknown (fuzzied) bugs. Simple script which has two eBGP sessions to network being tested, one injecting routes and another receiving them should be enough. There are quite few BGP APIs out there with pretty ready infra, some even have test infra ready: https://github.com/jesnault/bgp4r I'm happy to contribute test cases if such project surfaces, but at the moment I'm not ready to start the project. Maybe next time my network blows up due to BGP bug, I'll find time. -- ++ytti
* Saku Ytti:
I think we really need community tool to test BGP implementations against known/past bugs and unknown (fuzzied) bugs.
Testing is the easy part. Meeting all the requirements for getting the fix rolled out on the (relevant parts of the) Internet is impossible because many ISPs have little experience upgrading their routers.
Hi, (forgot list) On Wed, 17 Nov 2010 14:40:14 +0100, Fredy Kuenzler <kuenzler@init7.net> wrote:
Am 17.11.2010 10:19, schrieb Fredy Kuenzler:
4739 45158 {64512 64514 64516 64519 64521 64522 64525 64526 64528 64529 64530 64535 64537 64538 64541 64542 64543 64544 64545 64546 64547 64548 64549 64552 64553 64556 64557 64560 64561 64562 64564 64565 64566 64568 64569 64570 64574 64575 64576 64577 64578 64580 64582 64583 64584 64588 64593 64598 64599 64601 64602 64605 64610 64611 64620 64621 65397 65398 65470 65471 65472 65473 65474 65479 65480 65484 65485 65490 65502 65505 65511 65514 65523 65524 65528 65534 65609} ?
The propagation itself of the originator is rather uncommon, I'd say, as we can see, it's a BGP confederation of not less than 77 private AS numbers. Don't know for what it should be useful...
one minor correction here: 65609 is no private ASN, its a reserved one in ASN32 Space (65609 > 65535, which is 2^16-1). looking at my junipers sh rou ... detail, it showed me the AS_SET with AS_TRANS in ASN16_PATH, and AS65609 in ASN32_PATH and ASN-MERGED_PATH. What surprised me a bit was that AS_TRANS was at the beginning of the AS_SET, while 65609 was listed at the end of the AS_SET; which may or may be an issue of presentation only, or may or maybe a problem. In the end it wouldnt surprise me if one or another implementation would screw up exactly because of ASN32 here. my 2c, -mc
hey, Looks like this broken update was around from 08:32:15 UTC until 09:47:44 UTC (this matches what we saw): http://www.ris.ripe.net/dashboard/120.29.240.0/21 Other providers, like Easynet, have also hit news with unexpected trouble this morning. Go do RIS search for 87.80.0.0/13 - lots of prefix unstability with exactly these start and end times. I'm doing some more digging but if any folks with already existing tools (hinthint Renesys :) want to help.. Seems it was bit more widespread that first thought. -- tarko
participants (7)
-
Alexandre Snarskii
-
Florian Weimer
-
Fredy Kuenzler
-
James Stahr
-
mc
-
Saku Ytti
-
Tarko Tikan