Re: Above.net off the air

27 Apr 1999

      In article <99327.04417.24304@avi.netaxs.com> you wrote:

: Anyone who was in the know care to comment on what caused Above.net to
: seemingly go off the air for over an hour on 2/25/1999?

: Having an entire major AS just vanish seems to be an interesting
: operational item from the pov of people buying transit or having equipment
: colo'ed with them. 

: /vijay

All of the people buying transit or having collocated equipment have
received 3 updates so far about what happened.  Below is a summary.

A bunch of bad routes got injected into OSPF and the Ciscos apparently
lack some of the damping of SPF calculations that was promised some 
time ago.  CPU went to 100%, all for OSPF process, on all routers.
We were on top of it within minutes.

It took rebooting some routers and disabling OSPF on others to get the 
core back to normal.  It was the only time AboveNet has ever had a 
network-wide outage in AboveNet's 3 year history.

We are doing some re-engineering to ensure that nothing of that scale
can happen again, and have put some fixes into place to prevent any 
kind of OSPF confusion from lasting for more than a few minutes.  I hate 
link state protocols.

At least it didn't take us 6 hours to clear routes like the BGP
deaggregation incident of a few years ago, or 18 hours as has happened
to other providers who got OSPF cranked in the last few years.

Avi Freedman
VP, Enginering
AboveNet Communications

Avi Freedman

tags

participants (1)