RE: Router crash unplugs 1m Swedish Internet users
One router and it takes there entire network off-line... Maybe someone needs a Intro to Networks 101 class. -jim -----Original Message----- From: Sean Donelan [mailto:sean@donelan.com] Sent: Monday, June 23, 2003 4:24 PM To: nanog@merit.edu Subject: Router crash unplugs 1m Swedish Internet users Has anyone heard what the cause of the outage was? Router crash unplugs 1m Swedish Internet users Saturday, 21 June 2003 The breakdown of one of Sweden's main Internet routers in Stockholmon today unplugged more than 1 million of its Internet subscribers. Reports says in total over 340,000 broadband and 700,000 dial-up customers across the country were affected by the incident. The router failure might also have caused disruptions to other Internet subscribers, who use the services of providers operating on the Telia network. http://www.abc.net.au/science/news/scitech/SciTechRepublish_885166.htm
On Mon, 23 Jun 2003, Jim Deleskie wrote:
One router and it takes there entire network off-line... Maybe someone needs a Intro to Networks 101 class.
Well, if the memory errors corrupts the forwarding table placed on the line cards or something similar, and still keeps its adjacancies up, then you can get these problems. I've seen it happen on route-cache boxes where certain entries in the ip-forwarding table was corrupted and thus incorrectly routed. It could be that they ran out of memory on linecards as well, perhaps injected too many routes etc, and lost dCEF (dunno if the problems was on gsr or juniper), been there, done that. -- Mikael Abrahamsson email: swmike@swm.pp.se
I've seen a case where a single error in the configuration file of a $VENDOR_1 router was accepted (due to an 'undocumented feature'), and this caused the wholesale importation of BGP routes into the IGP, which caused most of their $VENDOR_2 hardware to spaz out. Locating the single error was a matter of hours, not minutes, so effectively a typo took out that ISP - and it's considered by most to be a relatively well-designed network. -David Barak --- Jim Deleskie <jdeleski@rci.rogers.com> wrote:
One router and it takes there entire network off-line... Maybe someone needs a Intro to Networks 101 class.
-jim
===== David Barak -fully RFC 1925 compliant- __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
I've seen a case where a single error in the configuration file of a $VENDOR_1 router was accepted (due to an 'undocumented feature'), and this caused the wholesale importation of BGP routes into the IGP, which caused most of their $VENDOR_2 hardware to spaz out. Locating the single error was a matter of hours, not minutes, so effectively a typo took out that ISP - and it's considered by most to be a relatively well-designed network.
I have also seen a variation of this where the boxes which got flooded by large IGP tables run out of memory and not recovering (because there was no memory left) after the broken router was withdrawn. Eventually the network got fixed by restarting every box in succession. Not sure if there are safeguards for this now or if everybody buys all their IGP routers with 512M or more. Pete
On Mon, 23 Jun 2003, Jim Deleskie wrote:
One router and it takes there entire network off-line... Maybe someone needs a Intro to Networks 101 class.
No matter what kind of technology or design you have there are always kinds of faults which may bring the entire system down. The problem is generally in recognizing when a fault has occured, so the the operation may be switched over to a backup. Particularly, the present Internet routing architecture is (mis)designed in such a way that it is incredibly easy for a local fault or human error to bring a significant portion of the network down. Even single-box _hardware_ faults may lead to global crashes. Long long time ago I had to track down a problem which made US and EU pretty much disconnected for several hours. This turned out to be a hardware problem in 7000's SSE card, which happily worked with packets originating and terminating in the router itself, but silently dropped all transit packets. Voila! Neighbour boxes were convinced that this one's working - because all routing protocols were happy, and were trying to send lots of traffic through it, which was simply going to a blackhole to the mighty annoyance of everyone. I've got a speeding ticket showing over 100mph on Dulles hwy at 3am, too, as a memento of rushing to DC with a spare card... So, in the absense of details, I would reserve judgement on soundness of design practices. --vadim
participants (5)
-
David Barak
-
Jim Deleskie
-
Mikael Abrahamsson
-
Petri Helenius
-
Vadim Antonov