On Mon, 31 Jan 2011, Per Carlson wrote:
Really? I've tried to duplicate the results in our lab, but I can't provoke any problems at those numbers. Is it the "other" multicast traffic that's interfering with ND?
It's a hold-queue problem. Normally IPv6 input is around 0.5% CPU on the RP, but due to IPv6 not being supported for SPD and this also seems to cause problems with IPv4 BGP traffic as well, the hold-queue (we raised it a lot) gets full and packets are tail-dropped from the hold-queue, and keepalives being lost. This has been through a full analysis by TAC and their suggestion was to filter non-needed IPv6 multicast, and it completely removed the symptom. We haven't had any major BGP session flaps since.
When pounding the CPU with ~30 times more (5000pps) Neighbour solicitations and flapping 1000 BGP IPv4 prefixes (out of 51000) every 5 seconds, I get the following load (worst case):
We're getting many tens of thousands of prefixes from AMSIX and this peering router is in our BGP full mesh, so when peers go down, it's a lot of paths to recalculate (most of our IPv4 IBGP full mesh peers are in unique update groups for some reason on this router, that's also being analysed). Even though IOS12000 seems fairly complete as an IPv6 core router, we've been running into more and more problems like this. Cisco has implemented a lot of features but not all and for IOS, they probably never will if I understand correctly. Guess XR is the way to go if one wants to keep it for a few more years... -- Mikael Abrahamsson email: swmike@swm.pp.se