Dave, I have summarized and enclosed below our trouble tickets written over the last 5 days for the T3 network. The only problem that affected more than one peer site was a crash of CNSS40 at Cleveland yesterday afternoon due to a hardware problem. While a CNSS crash of this type would not normally cause users loss of connectivity due to backbone redundancy, this crash did result in connectivity loss since the Ann Arbor interconnect E131 which is homed int o the Cleveland POP became reachable via the safety net T1 links only. The interconnect gateway was switched over to Houston during this time. This resulted in a 25 minute outage of the interconnect. The last T3 router crash we had was several weeks ago and the last hardware induced crash was several months ago. While this is an undesirable event, I suspect that SURAnet's use of the T3 backbone may actually reduce your dependency on the interconnect gateways. We have installed new software (build 64 and new rcp_routed) across the T3 system during the last two weeks with additional performance and reliability enhancements including improved aggregation of interior routing updates, faster convergence time, and reduced CPU utilization. Operationally we have begun the on-call schedule for the new NNAF engineering group which has resulted in some more detailed NSR reports on these scheduled and unscheduled events. This event coupled with the numerous scheduled software installations may have given you the false impression that there was an increase in problems. My general conclusion is that while we have still have some problems with the current T3 adapter technology, this has been manageable and the T3 backbone is still very reliable. We will cautiously monitor the network reliability for changes as we add additional traffic to the T3 system. The T 1 network is not nearly as reliable and we are busy working on those problems. Mark Peer network router problems: 18410 - SURAnet sura7.sura.net 18429 - BARRnet equipment move 18441 - Pittsburgh power failure Backbone router hardware problems: 18423 - cisco serial interface, Xlink (Germany) 18426 - cnss40 spare t3 card removed from backplane to reduce frequency of black links. We have been getting about 3 black links per month on one interface on this cnss. 18432, 18491, 18509 - enss129 fddi card hang, manual reset 18465 - enss129 fddi card replacement scheduled maintainence 18504 - cnss40 crash resulting in interconnect switchover Routing configuration problems: 18503 - Missing ibgp line in cnss48 and 49 for new enss164 at IBM Watson Scheduled Backbone router software upgrades: 18414 - new rcp_routed on enss131 for better route aggregation 18416 - new build 64 on enss163 for performance improvements 18424 - new rcp_routed on cnss83 18425 - new rcp_routed on enss135, 137, 139 18430 - new rcp_routed on enss129, 132 18431 - new build 67 on enss135, 129, 132 (to fix fddi bug in build 64) Site maintenance tickets, no downtime: 18451, 18452, 18453, 18454, 18455, 18456, 18457 - Perform spare parts inventory at Cleveland and Hartford POPs, and at ENSS sites 128, 135, 129, 132, 133, 134. ======================================================================= Date: Wed, 12 Feb 92 18:18:11 EST To: nwg@merit.edu From: oleary@sura.net Subject: T3 backbone stability We are cutting over to use the T3 backbone now, and I am concerned by the several recent messages about ENSS and CNSS problems. Could someone at Merit summarize some of the recent outages if there have been trends, or could some of the other midlevels provide us with some insight as to how (in)stability has affected your connections? We are going with the emerging standard of send T3 stuff to the T3 and T1 stuff to the T1, explicitly importing T3 routes and defaulting everything else to the T1. thanks, dave o'leary SURAnet
participants (1)
-
Mark Knopper