Re: dual router vs. single "reliable" router

10 Apr 2003

      On Thu, Apr 10, 2003 at 11:59:25AM -0500, Stephen Sprunk wrote:
...
Nearly all the Cisco device failures I've seen were either software or human
problems; actual hardware failure is _way_ down the list.  Also, I've
observed significantly worse reliability among devices specifically designed
to be highly reliable compared to devices simply designed to work.
There are several networks out there using Cisco devices to achieve over six
9's availability, and the way they do that is by extensive procedure review
and rigorous software testing.  Writing more reliable software is certainly
doable, but more-reliable humans aren't likely and more-reliable hardware is
unnecessary.  IMHO.
This is also my experience.  The chance of a forklift or ceiling tile taking
out your infrastructure is not even close to the amount of times you have to
tell the junior guys "nononononono, `debug all' is _bad_ idea".  Until the
software reduces/eliminates pilot error to a severe degree, and is proven to
prevent forwarding issues (read: fib bugs) -- there is just no big motivation
to run single box.  Single box has its application in IP networks, but moreso
in the access layer (customer edge), or Internet edge (peering).

I'm just flinching thinking about using single box in the core (per POP), when
a single command of any type can just take out the whole box (`no ip routing'
immediately comes to mind).  There's just more software on IP boxes compared
to telco technology.

The only work I've seen in the IETF on topic is this draft:
http://www.ietf.org/internet-drafts/draft-kilsdonk-router-upgrade-01.txt
But it has left a lot to be desired, IMO.

dre