From: <cowie@renesys.com> <snip>
On the other hand, we also know (from private communications and from other mailing lists.. ahem) that high rate and high src/dst diversity of scans causes some network devices to fail (devices that cache flows, or devices that suffer from cpu overload under such conditions).
Some BGP-speaking routers (not all, by any means, but some subpopulation) found themselves pegged at 100% CPU on Saturday. Just one example:
Was it not known that under certain conditions the router would flatline? What percautionary measures were put into place in such an event to limit the damage?
Whether you believe "anthropogenic" explanations for the instability depends on how fast you believe NEs can look, think, and type, compared to the speed with which the BGP announcement and withdrawal rates are observed to take off. For my part, I'd bet that the long slow exponential decay (with superimposed spiky noise) is people at work. But the initial blast is not.
When the crisis is on you, it's too late. You are either prepared and know exactly what to do at that critical moment or you don't. You either had a <5 minute response time to the crisis or you didn't. We also know (from private communications and from other mailing lists.. yes, I'm a thief :) that many NEs were caught with their pants down, a mistake they aren't apt to do again. It comes down to one's outlook. Do you just configure and maintain or do you strive to push it to the envelope? Do you truly know your network? Remember, it's a living, breathing thing. The complexity of variables makes complete predictability impossible, and so we must learn to understand it and how it reacts. Then again, perhaps I'm a lunatic. :) Jack Bates BrightNet Oklahoma