On Sun, Jan 30, 2011 at 6:24 PM, Fernando Gont <fernando@gont.com.ar> wrote:
Hi, Matthew,
On 30/01/2011 08:17 p.m., Matthew Petach wrote:
The problem I see is the opening of a new, simple, DoS/DDoS scenario. By repetitively sweeping a targets /64 you can cause EVERYTHING in that /64 to stop working by overflowing the ND/ND cache, depending on the specific ND cache implementation and how big it is/etc.
That depends on the ND implementation being broken enough by not limiting the number of neighbor cache entries that are in the INCOMPLETE state. (I'm not saying those broken implementations don't exist, though).
Even without completely overflowing the ND cache, informal lab testing shows that a single laptop on a well-connected network link can send sufficient packets at a very-large-scale backbone router's connected /64 subnet to keep the router CPU at 90%, sustained, for as long as you'd like. So, while it's not a direct denial of service (the network keeps functioning, albeit under considerable pain), it's enough to impact the ability of the network to react to other dynamic loads. :/
This is very interesting data. Are you talking about Ciscos? Any specific model?
Uh, I've gotten into some trouble in the past for mentioning router vendors by name before in public forums, so I'm going to avoid public mention of names; but it seems that others in this thread are able to speak up with specific details, if that helps answer your question in a slightly more roundabout way. ^_^;
I guess that a possible mitigation technique (implementation-based) would be to limit the number of ongoing addresses in address resolution. (i.e., once you have X ongoing ND resolutions, the router should not be engaged in ND for other addresses) -- note that addresses that the router had already resolved in the past would not suffer from this penalty, as their corresponding entries would be in states other than INCOMPLETE.
Thoughts?
Thanks,
That's been one of the areas that's ripe for development, yes; have the control plane take some preferential actions to avoid harming established connectivity under stressful circumstances like that; potentially taking steps to avoid aging out older, potentially still valid entries if there may not be sufficient resources to safely re-learn them, for example. Matt