On Thu, Apr 10, 2003 at 11:59:25AM -0500, Stephen Sprunk wrote:
Nearly all the Cisco device failures I've seen were either software or human problems; actual hardware failure is _way_ down the list. Also, I've observed significantly worse reliability among devices specifically designed to be highly reliable compared to devices simply designed to work.
There are several networks out there using Cisco devices to achieve over six 9's availability, and the way they do that is by extensive procedure review and rigorous software testing. Writing more reliable software is certainly doable, but more-reliable humans aren't likely and more-reliable hardware is unnecessary. IMHO.
This is also my experience. The chance of a forklift or ceiling tile taking out your infrastructure is not even close to the amount of times you have to tell the junior guys "nononononono, `debug all' is _bad_ idea". Until the software reduces/eliminates pilot error to a severe degree, and is proven to prevent forwarding issues (read: fib bugs) -- there is just no big motivation to run single box. Single box has its application in IP networks, but moreso in the access layer (customer edge), or Internet edge (peering). I'm just flinching thinking about using single box in the core (per POP), when a single command of any type can just take out the whole box (`no ip routing' immediately comes to mind). There's just more software on IP boxes compared to telco technology. The only work I've seen in the IETF on topic is this draft: http://www.ietf.org/internet-drafts/draft-kilsdonk-router-upgrade-01.txt But it has left a lot to be desired, IMO. dre