June Backbone Engineering Report
Hi. This appeared in the Internet Monthly Report just sent around, but since some have indicated that they would also like to see a separate posting to the regional-techs list, here it is.... Mark ANSNET/NSFNET Backbone Engineering Report June 1992 Jordan Becker, ANS Mark Knopper, Merit becker@ans.net mak@merit.edu T3 Backbone Status ================== The T3 Backbone continued to run very reliably during June. With the completion of the RS/960 DS3 interface upgrade in May, the cutover of additional traffic from the T1 to the T3 network resumed in June and is proceeding as quickly as possible. The number of networks configured and announced to the T3 network continues to increase. Midlevel traffic cut over from the T1 to the T3 backbone included NorthWestNet, Sprint/International Connections Manager, and Alternet. The T3 backbone is now carrying nearly double the packet load of that of the T1 backbone. With the upgrade complete and the T3 network stable, several performance and functional enhancements have been administered during June. Improvements to the routing daemon and SNMP daemon were made. A remaining problem on the T3 network is the FDDI adapter performance and stability. Due to the complexity of the T3 adapter upgrade, we chose to defer the FDDI upgrade until August to ensure operational stability. Statistics on network traffic and configured networks ===================================================== The total inbound packet count for the T3 network was 10,736,059,912,up 29% from April. 220,593,003 of these packets entered from the T1 network. The total inbound packet count for the T1 network was 5,761,976,518, down 16.7% from May. 536,009,585 of these packets entered from the T3 network. The combined total inbound packet count for the T1 and T3 networks (less cross network traffic) was 15,741,433,842, up 0.9% from April. Currently there are 5801 IP networks configured in the policy routing database for the T1 network, and 3966 for the T3 network. Actual announced networks to the backbone varies and is currently 2750 for T3 and 4425 for T1. NOC Problem Reports =================== The number of problem reports that result in NOC trouble tickets (total all priority classes) for the T3 network remains constant at 10-20 per week, and for the T1 network it remains at the 15-20 rate per week. T1 Backbone Status ================== The T1 backbone's reliability is not as good as T3, due largely to increased route processing on the RCP nodes. The full load of routes is still being carried by these machines, and they are experiencing congestion and performance problems to some degree. Improvements have been made to the routing software to accomodate protocol upgrades (ie. BGP2). T3 Routing Daemon Software Status ================================= Activities related to the rcp_routed software in June emphasized correcting software problems involving routing instability, and monitoring & correcting routing table integrity problems. There were many bug fixes applied to the routing daemon over the last three months. Monitoring of routing integrity consists of data collection of the full netstat table to find route flapping problems within the backbone and within peer networks, BGP disconnect problems, and external network metric problems. Additional work is underway to collect full routing tables from backbone nodes to be processed using a relational database system. This system generates reports on the statistical use of primary routes, reliability of network announcements to the backbone, and long term statistics on inter-domain routing announcements and growth. A number of improvements and bug fixes have been made to the T3 routing software over the last two months. Highlights included: fix to allow an ENSS that is isolated from the backbone to stop announcing default to peers, better handling of router adapter failures, preventing overruns of external BGP messages sent to external peer routers, gracefully dropping bogus external routes to backbone ENSS nodes, correct response to external metric selection problem for nets announced at same metric from multiple peers, problem with interaction between BGP and EGP for peers in the same Autonomous System, hashing route table efficiency improvements, two routes with same AS path are now both installed to allow backup, BGP-2 PDU size increased from 1024 to 4096 bytes, route from BGP and EGP with same metric now prefers BGP route, better handling of next hop behind peer router and shared network, BGP update packet format fix, fix to BGP 1-2 version negotiation, eliminated chance of BGP disconnects during IGP transitions, eliminating BGP disconnects if peer router is too busy, better response to route instabilities upon failure of T1 interconnect or ENSS, and autorestart of the routing daemon in the event of a crash. As a result of the monitoring and analysis effort along with the actual software changes, reliability and route integrity has improved dramatically on the T3 network over the last month. RS/960 DS3 On-Card Memory Problem ================================= A batch of bad memory chips have been found to result in memory parity errors on a few interfaces. Five of these cards have been replaced as the problems have been identified. Diagnostic microcode has been developed to detect the problems in advance, and nodes are being scheduled for diagnostics to be run over the next few weeks during routing configuration update scheduled windows. DSU Synchronization and CRC/alignment Problem ============================================= A problem that causes logical link failures has been traced to a clock synchronization problem on the T3 Technologies DSU's during clock master/slave transitions. This problem occurs very infrequently and has been reproduced using a newly installed circuit on the T3 research network. Enhanced instrumentation has been added to detect this problem, and work is in progress to correct it. End-To-End Packet Loss Analysis =============================== Researchers at University of Maryland recently conducted some experiments and noticed periodic and random packet loss and packet duplicates when using the T3 network. There were two problems traced to a bridge device and an ethernet problem on the SURAnet ethernet. Peer router problems causing some packet loss during routing updates at NEARnet were identified and are being corrected. Also some packet loss on the T3 ENSS FDDI interface at Stanford was identified. This is due to an FDDI card output buffering problem and might be addressed prior to the FDDI upgrade in August. FDDI Adapter Upgrade ==================== Although the T3 adapters have been upgraded from older technology to the new RS960 adapter technology, the FDDI adapters in the ENSS nodes have not yet been upgraded. The older FDDI adapters continue to suffer from performance on reliability problems. The new RS960 FDDI adapter is scheduled to be installed as part of a field trial on July 20th. Following this field trial, we expect to upgrade the older FDDI interfaces with the new RS960 interface adapters in early August. There are currently five T3 ENSS sites that are using FDDI interfaces in production. SNMP Daemon Changes =================== A new version of the SNMP daemon for the T3 network was installed on June 26. This version supports MIB-II variables for the T/960 ethernet cards (ifInUcastPkts, ifOutUcastPkts, and ifInErrors), and also includes enhanced configuration support for monitoring T3 DSUs. A new SNMP client for the NOC to control the T1 Cylink ACSUs which are part of the T3 backbone has been implemented. This avoids use of a separate dial-in connection to these CSUs. New SNMP variables have been added to furthermonitor the DSU synchronization problem mentioned above.
SNMP Daemon Changes ===================
A new version of the SNMP daemon for the T3 network was installed on June 26. This version supports MIB-II variables for the T/960 ethernet cards (ifInUcastPkts, ifOutUcastPkts, and ifInErrors), and also includes enhanced configuration support for monitoring T3 DSUs. A new SNMP client for the NOC to control the T1 Cylink ACSUs which are part of the T3 backbone has been implemented. This avoids use of a separate dial-in connection to these CSUs.
New SNMP variables have been added to furthermonitor the DSU synchronization problem mentioned above.
The new SNMP daemon still does not report the MIB-II variables correctly. Here is the illustration: SNMP response from enss139.t3.nsf.net ... Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInOctets.1 Counter: 928252188 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInOctets.2 Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInOctets.3 <= Ethernet interface Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInOctets.4 <= T3 interface Counter: 1532701256 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInUcastPkts.1 Counter: 10148844 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInUcastPkts.2 Counter: 54451782 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInUcastPkts.3 <= No octets but pkts Counter: 816175323 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInUcastPkts.4 <= No pkts but octets Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInNUcastPkts.1 Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInNUcastPkts.2 Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInNUcastPkts.3 Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifInNUcastPkts.4 Counter: 1033010978 ... Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutOctets.1 Counter: 928257896 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutOctets.2 Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutOctets.3 <= No octets Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutOctets.4 <= has octets but pkts Counter: 2614141685 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutUcastPkts.1 Counter: 10148940 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutUcastPkts.2 Counter: 37243692 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutUcastPkts.3 <= No octets but pkts Counter: 964648059 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutUcastPkts.4 <= No pkts but octets Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutNUcastPkts.1 Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutNUcastPkts.2 Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutNUcastPkts.3 Counter: 0 Name: mgmt.mib.interfaces.ifTable.ifEntry.ifOutNUcastPkts.4 Counter: 897328785 ... -- +---------------------------------------------------------+ | Jian Li Internet: jian@is.rice.edu |\ | ONCS, Rice University Bitnet: jian@ricevm1.bitnet ||+ | P.O. Box 1892 Phone: (713)285-5328 ||| | Houston, Texas 77251 FAX: (713)527-6099 ||| +---------------------------------------------------------+|| \---------------------------------------------------------\| +---------------------------------------------------------+
participants (2)
-
jian@is.rice.edu
-
mak