Re: Worldcom and Qwest switch places

7 Feb 2000

      On Sat, 05 February 2000, Sean Donelan wrote:
...
Since Lucent equipment was also involved in the 10 days of Worldcom problems,
is there a common root cause between the Worldcom's problems and Qwest's
problems?  Is there some lesson other providers should be learning from
these events?  Or is each service provider expected to learn and re-learn
these lessons individually?  Is there some network design decision engineers
are getting wrong?
Lucent people told me that the Worldcom problem resulted from a software upgrade to Worldcom's Lucent switches that was done without having a good fallback plan. Lucent engineers had recommended a different strategy to Worldcom but Worldcom went ahead and did it their way. Then the software upgrade triggered some kind of cascading problem that either affected the old code or travelled through the network or both.

In other words, they created a problem as a side effect of the upgrade but didn't have agood strategy to contain or kill the problem that propogated like some kind of living organism. Seems to me that we *HAVE* seen this type of problem before in the Internet with things like AS7007 routes which seemed to hang around parts of the net for days.

How do you plan to rollback to a known state when you can't simply backtrack or reverse your actions?

---
Michael Dillon   Phone: +44 (20) 7769 8489   
                 Mobile: +44 (79) 7099 2658
Director of Product Engineering, GTS IP Services
151 Shaftesbury Ave.
London WC2H 8AL
UK

Re: Worldcom and Qwest switch places

michael.dillon＠gtsip.net