In article <99327.04417.24304@avi.netaxs.com> you wrote: : Anyone who was in the know care to comment on what caused Above.net to : seemingly go off the air for over an hour on 2/25/1999? : Having an entire major AS just vanish seems to be an interesting : operational item from the pov of people buying transit or having equipment : colo'ed with them. : /vijay All of the people buying transit or having collocated equipment have received 3 updates so far about what happened. Below is a summary. A bunch of bad routes got injected into OSPF and the Ciscos apparently lack some of the damping of SPF calculations that was promised some time ago. CPU went to 100%, all for OSPF process, on all routers. We were on top of it within minutes. It took rebooting some routers and disabling OSPF on others to get the core back to normal. It was the only time AboveNet has ever had a network-wide outage in AboveNet's 3 year history. We are doing some re-engineering to ensure that nothing of that scale can happen again, and have put some fixes into place to prevent any kind of OSPF confusion from lasting for more than a few minutes. I hate link state protocols. At least it didn't take us 6 hours to clear routes like the BGP deaggregation incident of a few years ago, or 18 hours as has happened to other providers who got OSPF cranked in the last few years. Avi Freedman VP, Enginering AboveNet Communications
participants (1)
-
Avi Freedman