Re: Netcom Outage (Was: My InfoWorld Column About NANOG)
Stephen Balbach <stephen@clark.net> wrote:
Having a fully meshed/redundant network should be the goal of any serious ISP. The only one that claims it with any substance IMO is UUNET.
The "full mesh" in this case is a figment of imagination. Level-2 mapping of cirtuits over the same (non-meshed) physical wires does not make network a tiny weeny bit more reliable, and the added complexity brought by such mapping actually makes system less robust. Other ISPs who happen to be in position to control physical routing of circuits use IP-level rerouting to attack the problem. The IGP rerouting with modern link-state IGPs is sub-second so the redundancy of interior paths is easy to achieve (perticularly if you use tricks like BGP confederations which eliminate need to recompute iBGP routing in case of IGP changes). The hard part is exterior routing where topology changes require massive crunching of BGP tables. Multiplying paths actually makes the problem worse.
We are trying to build one and its not easy. Haveing redundant links in place does not guarantee instant fall over of traffic. Static routes, IGRP, iBGP, bridgeing, rip1 vs rip2, etc. are some of the issues we are running into.
The Golden Rule of engineering (often forgotten in US) -- the simplier the system is the better it works. The root of many Internet woes is in overly complicated router software, that complexity appears to be running out of control. An ISP engineer's nightmare is a Bysantine-mode failure -- when redundancy does not help because a problem in one place triggers failures in a lot of other places. Any RISKs reader knows that software problems in distributed systems often have that nature. Internet configurations are particularly prone to that, especially considering that they're in state of constant flux. That's why draconian revision controls and highly skilled backbone engineering staff (which is actually able to understand global consequences of all their actions) are vital to operations of any serious ISP. Netcom seems to be in stage of learning that hard way.
As well as when an interface is down, but actually looks up to the router, etc..it can be done, but there are so many possible points of failure and unforseen scenarios, it is very difficult to construct and certainly takes time to develop.
That example is a perfect illustration on why many backbone engineers are sceptical about "advanced" level-2 technologies. Even good ol' Ethernet may have numerous quite interesting ways to fail, if not built properly (remember ol' MAE-E? :) --vadim
Other ISPs who happen to be in position to control physical routing of circuits use IP-level rerouting to attack the problem.
The IGP rerouting with modern link-state IGPs is sub-second so the redundancy of interior paths is easy to achieve (perticularly if you use tricks like BGP confederations which eliminate need to recompute iBGP routing in case of IGP changes).
Vadim, Can you be more specific about this? How do you avoid the recompute and also avoid persistent routing loops after a topology change?
The hard part is exterior routing where topology changes require massive crunching of BGP tables. Multiplying paths actually makes the problem worse.
You bet! Erik
participants (2)
-
avg@postman.ncube.com
-
Erik Sherk