On Wed, 15 Aug 2007, Stephen Wilcox wrote:
(Check slide 4) - the simple fact was that with something like 7 of 9 cables down the redundancy is useless .. even if operators maintained N+1 redundancy which is unlikely for many operators that would imply 50% of capacity was actually used with 50% spare.. however we see around 78% of capacity is lost. There was simply to much traffic and not enough capacity.. IP backbones fail pretty badly when faced with extreme congestion.
Remember the end-to-end principle. IP backbones don't fail with extreme congestion, IP applications fail with extreme congestion. Should IP applications respond to extreme congestion conditions better? Or should IP backbones have methods to predictably control which IP applications receive the remaining IP bandwidth? Similar to the telephone network special information tone -- All Circuits are Busy. Maybe we've found a new use for ICMP Source Quench. Even if the IP protocols recover "as designed," does human impatience mean there is a maximum recovery timeout period before humans start making the problem worse?