T3 backbone stability

13 Feb 1992

      Dave,
	I have summarized and enclosed below our trouble tickets written over
the last 5 days for the T3 network.  The only problem that affected more than
one peer site was a crash of CNSS40 at Cleveland yesterday afternoon due to a
hardware problem.  While a CNSS crash of this type would not normally cause
users loss of connectivity due to backbone redundancy, this crash did result
in connectivity loss since the Ann Arbor interconnect E131 which is homed int
o
the Cleveland POP became reachable via the safety net T1 links only.  The
interconnect gateway was switched over to Houston during this time.  This
resulted in a 25 minute outage of the interconnect.

	The last T3 router crash we had was several weeks ago and the last
hardware induced crash was several months ago.  While this is an undesirable
event, I suspect that SURAnet's use of the T3 backbone may actually reduce
your dependency on the interconnect gateways.

	We have installed new software (build 64 and new rcp_routed) across
the T3 system during the last two weeks with additional performance and
reliability enhancements including improved aggregation of interior routing
updates, faster convergence time, and reduced CPU utilization.

	Operationally we have begun the on-call schedule for the new NNAF
engineering group which has resulted in some more detailed NSR reports on
these scheduled and unscheduled events.  This event coupled with the numerous
scheduled software installations may have given you the false impression that
there was an increase in problems.

	My general conclusion is that while we have still have some problems
with the current T3 adapter technology, this has been manageable and the T3
backbone is still very reliable.  We will cautiously monitor the network
reliability for changes as we add additional traffic to the T3 system.  The T
1
network is not nearly as reliable and we are busy working on those problems.

	Mark 

Peer network router problems:
  18410 - SURAnet sura7.sura.net
  18429 - BARRnet equipment move
  18441 - Pittsburgh power failure

Backbone router hardware problems:
  18423 - cisco serial interface, Xlink (Germany)
  18426 - cnss40 spare t3 card removed from backplane to reduce
          frequency of black links. We have been getting about
          3 black links per month on one interface on this cnss.
  18432, 18491, 18509 - enss129 fddi card hang, manual reset
  18465 - enss129 fddi card replacement scheduled maintainence
  18504 - cnss40 crash resulting in interconnect switchover

Routing configuration problems:
  18503 - Missing ibgp line in cnss48 and 49 for new enss164 at IBM Watson

Scheduled Backbone router software upgrades:
  18414 - new rcp_routed on enss131 for better route aggregation
  18416 - new build 64 on enss163 for performance improvements
  18424 - new rcp_routed on cnss83 
  18425 - new rcp_routed on enss135, 137, 139
  18430 - new rcp_routed on enss129, 132
  18431 - new build 67 on enss135, 129, 132 (to fix fddi bug in build 64)

Site maintenance tickets, no downtime:
  18451, 18452, 18453, 18454, 18455, 18456, 18457 - Perform spare parts
     inventory at Cleveland and Hartford POPs, and at ENSS sites
     128, 135, 129, 132, 133, 134.

=======================================================================
Date:    Wed, 12 Feb 92 18:18:11 EST
To:      nwg@merit.edu

From:    oleary@sura.net
Subject: T3 backbone stability 

We are cutting over to use the T3 backbone now, and I am concerned 
by the several recent messages about ENSS and CNSS problems.

Could someone at Merit summarize some of the recent outages if
there have been trends, or could some of the other midlevels
provide us with some insight as to how (in)stability has 
affected your connections?

We are going with the emerging standard of send T3 stuff to the T3
and T1 stuff to the T1, explicitly importing T3 routes and defaulting
everything else to the T1.

thanks,

					dave o'leary
					SURAnet

Mark Knopper

tags

participants (1)