ANSNET/NSFNET Backbone Engineering Report August 1992 Jordan Becker, ANS Mark Knopper, Merit becker@ans.net mak@merit.edu T3 Backbone Status ================== The system software and routing software for the T3 routers has stabilized. The new RS/960 FDDI card has completed testing and deployment schedules are in progress. A new system software build with support for 10,000 routes maintained locally on the smart-card interfaces is being tested on the T3 Research Network. Planning is now underway for dismantling of the T1 backbone which is targeted for November. Several steps to be completed prior to dismantling the T1 backbone include support for OSI CLNP transport over the T3 backbone, and the deployment of the redundant backup circuit plan for the T3 ENSS gateways at each regional network. Further activities in support of the Phase IV upgrade to the T3 backbone are in progress. Backbone Traffic and Routing Statistics ======================================= The total inbound packet count for the T1 network during August was 3,903,906,145, down 17.9% from July. 298,961,253 of these packets entered from the T3 network. The total inbound packet count for the T3 network was 13,051,979,670, up 1.3% from July. 129,835,094 of these packets entered from the T1 network. The combined total inbound packet count for the T1 and T3 networks (less cross network traffic) was 16,527,089,468 down 3.1% from July. Reports on T3 backbone byte counts for June, July and August were incorrect due to SNMP reporting problems. These will be corrected soon in reports available on the nis.nsf.net machine. The totals for June, July, and August are 2.279, 2.546, and 2.548 trillion, respectively. As of August 31, the number of networks configured in the NSFNET Policy Routing Database was 6360 for the T1 backbone, and 5594 for the T3 backbone. Of these, 1233 networks were never announced to the T1 backbone and 1102 were never announced to the T3 backbone. For the T1, the maximum number of networks announced to the backbone during the month (from samples collected every 15 minutes) was 4866; on the T3 the maximum number of announced networks was 4206. Average announced networks on 8/31 were 4817 to T1, and 4161 to T3. New FDDI Interface Adapter for ENSS Nodes ========================================= We have a new RS960 FDDI adapter for the RS/6000 router that provides much improved reliability and performance. It was our hope that the new RS960 FDDI interface adapter targeted to upgrade the older 'Hawthorne' technology FDDI adapters in the the T3 ENSS routers would be ready for deployment in early August. However several serious bugs were encountered during testing in late July, and the upgrade has been delayed by more than a month. Fortunately we have corrected or worked around all of these known remaining bugs. We are re-running our full suite of regression tests, and a full set of stress tests on the T3 test network during the labor day weekend. Pending the successful completion of our tests, we expect that the first set of FDDI adapter upgrades on the production T3 ENSS nodes could begining during the week of 9/7. We would like to begin planning for the installation of these new interface adapters at ENSS128 (Palo Alto), ENSS135 (San Diego), ENSS129 (Champaign), and ENSS132 (Pittsburgh). We will develop plans for any further FDDI deployments after these first 4 installations have been successfully completed. Dismantling the T1 Backbone =========================== The current target for dismantling the T1 backbone is November '92. This can be accomplished once the remaining networks using the T1 backbone have been cut over to the T3 backbone (these are: ESnet, EASInet, Mexican Networks at Boulder, and CA*net); an OSI CLNP transport capability over the T3 backbone is in place; the T3 ENSS nodes are backed up by additional T1 circuits terminating at alternate backbone POPs; and the network-to-network source/destination pair statistics matrix is available on the T3 backbone. These activities are described below. Since the RCP nodes on the T1 backbone are experiencing further congestion and performance problems due to the growth in networks, we are planning to reduce the number of networks announced to the T1 nodes by the T3 interconnect gateways. This will eliminate the use of the T3 to back up the T1 for those networks yet to cut over, in the event of a failure in the T1 network. Remaining Network Cutovers -------------------------- The ESnet cutover is waiting for a new version of software to be configured for the ESnet router peers at FIX-West and FIX-East. The Mexican autonomous system will be cut over soon, pending communication with the folks in Mexico. We are developing a plan that will allow EASInet to peer directly with the T3 network. The plan for CA*net is to remove the RT from the token ring on the NSS nodes at Seattle, Princeton and Ithaca, configure them to run the CA*net kernel and gated, and peer directly across the ethernet to the T3 ENSS at these sites. OSI Support Plan ---------------- In order to dismantle the T1 backbone, we need to support the transport of OSI (CLNP) packets across the T3 network. Because we would like to target dismantling of the T1 backbone for sometime in late 1992 and the T3 backbone software for support of OSI is still in test, we would like to proceed with a phased (multi-step) migration for support of OSI switching over the T3 network in order to ensure network stability as we introduce OSI software support. The migration plan involves several steps: 1. Convert RT/PC EPSP routers that reside on the shared ENSS LAN into OSI packet encapsulators. This would be done at the 8 or so sites where there are regionals that currently support OSI switching services. OSI traffic is encapsulated in an IP packet on the RT router and forwarded as an IP packet across the T3 network to a destination RT de-encapsulator. This software already exists and can support the migration of OSI traffic off of the T1 backbone, with no software changes required to the T3 backbone. This software is entering test now and could be running in production by early October. 2. Introduce new RS/6000 OSI encapsulator systems that are the running AIX 3.2 operating system with native CLNP support. These machines will replace the RT OSI encapsulators on the shared ENSS LAN. As the CLNP software gets more stable, the RS/6000 system can begin to support non-encapsulated dynamic OSI routing. There are still no changes required to the production T3 network software in this step. This step could occur sometime in the mid- fall. 3. Deploy the AIX 3.2 operating system and native CLNP switching software on the T3 routers across the backbone. The experience gained in step#2 above will facilitate this migration. This step is expected sometime in January 1993. T1 ENSS Backup Circuits ----------------------- The T1 backbone is currently providing backup connectivity in the event of a problem with the T3 backbone. Since the T3 ENSS nodes are currently singly-connected to a CNSS at an MCI POP, the single T3 circuit and CNSS node represent a single point of failure. As a backup plan, each T3 ENSS will be connected to a new T1 circuit which terminates at a different backbone POP CNSS. This will allow bypass recovery in the event of circuit or CNSS failure. We are executing a test plan on the test network to measure internal routing convergence times and end-user observations during a backup transition. These circuits are being ordered now and are expected to be in place by late October. Network Source/Destination Statistics ------------------------------------- During the migration to the smart card forwarding technology (RS960/T960) we temporarily lost the ability to collect network source/destination pair traffic statistics. This is because packets were no longer passing through the RS/6000 system processor where the statistics collection application software ran. We are now testing new software for near-term deployment that will allow us to continue to collect statistics for each network source/destination pair. These statistics include packets_in, packets_out, bytes_in, and bytes_out. The statistics will be cached on the RS960 and T960 interfaces and uploaded to the RS/6000 system for processing and transmission to a central collection machine. Increase Routing Table Sizes on T3 Network ========================================== We continue to experience an increase in ANSNET/NSFNET advertised networks, (see Backbone Traffic and Routing Statistics, above) The current on-card routing table size on the T3 router RS960 card (T3/FDDI) and T960 card (T1/ethernet) supports 6,000 destination networks with up to 4 alternate routes per destination. The current on-card routing tables are managing on the order of 12K routes (including alternate routes to the same destination). We are now testing new software for the RS960 and T960 interfaces that will be deployed shortly that supports up to 10,000 destination networks with up to 4 alternate routes per destination. This software will be deployed on the T3 network in the near future. We also continue to work on support for on-card route caching which will significantly increase the upper limit on the number of routes to unique destination networks that may be supported. This software will be available with the AIX 3.2 operating system release of the router software in early 1Q93. Phase-IV T3 Network Upgrade Status ================================== The scheduled upgrades to the T3 backbone discussed in the July report are continuing on schedule and will allow the dismantling of the T1 backbone. The major features of this plan include: 1) T3 ENSS FDDI interface upgrades to new RS/960 card. This is currently being scheduled at 4 regional sites. 2) T3 ENSS backup connections are being installed. A T1 circuit will be installed at each T3 ENSS to allow a backup connection to a different CNSS. This will provide some redundancy in the case of T3 circuit or primary CNSS failure. These circuits are scheduled for cutin in October. 3) T3 DSU PROM upgrades. A problem was uncovered in testing the new DSU firmware. The new firmware supports additional SNMP function and fixes a few non-critical bugs. Since this problem was uncovered, a fix has been provided. However the testnet has been occupied with FDDI and other system testing since then. Therefore the upgrades to the DSUs that were scheduled to begin on 9/14 will be postponed until early October. 4) The existing set of CNSS routers in the Washington D.C. area will be moved to an MCI POP in downtown Washington D.C. on 9/12 for closer proximity to several ENSS locations. The tail circuits of the existing network attachments to this POP will be reduced to local access circuits only. 5) The installation of a new CNSS in Atlanta is scheduled for 9/26 to reduce the GA Tech T3 tail to local access only, and provide expansion capability in the southeast. T3 Network Performance Enhancements =================================== The general approach to engineering the T3 network has been to prioritze enhancements that improve stability rather than performance. Since the T3 network RS960 upgrade in May '92, the stability of the network has become very good, and we have been able to spend more resources focusing on the performance of the network, which has also improved significantly. With the upcoming deployment of the new RS960 FDDI adapter, we expect to observe higher peak bandwidth utilization across the T3 network, and higher aggregate packet traffic rates. In anticipation of this, we have conducted some baseline performance measurements on the T3 network that serve as a basis for continued tuning and improvement over time. T3 Network Delay ---------------- In order to analyze the delay across the T3 ANSNET, we start by measuring the delay incurred by each T3 router hop, and then measure the circuit propagation delay across all backbone circuits. We have MCI T3 circuit route mileage figures which can be calibrated with PING measurements to determine how much each hop through a T3 router adds to the round trip time. A set of round trip delay measurements was made using a special version of PING that records timestamps using the AIX system clock with microsecond precision. The technical details of the measurements may be described in a future report on the subject. The end result is that the round trip transit delay across a T3 router was measured to be about 0.33 ms (0.165ms one way delay), with a maximum variance between all samples on the same router of 0.03 ms. The T3 routers currently experience very little variance in delay at the current load on the T3 network. The T3 router transit hop delay is therefore negligible compared to the T3 circuit mileage propagation delay. It turns out that the round trip delay between the Washington POP and the San Franciso POP can be 77ms for packets traversing the southern route (Washington->Greensboro->Houston->Los Angeles->San Francisco) or 67ms for packets traversing the northern route (Washington- >New York->Cleveland->Chicago->San Francisco). During the timeframe of "Hawthorne" technology routers, it was appropriate to choose internal routing metrics that balanced load across redundant T3 paths, and minimized transit traffic on the routers. However now with RS960 technology, the requirement for load-balancing, minimizing transit traffic and hop count, and maintaining equal cost paths is no longer justified. With the introduction of the new Atlanta CNSS, we will explore adjustment of the internal T3 link metrics to minmize round-trip latency ENSS<->ENSS. This will improve overall network performance as perceived by end users. The summary on T3 network latency is: (1) Delays due to multiple hops in the T3 network are measurable, but not large enough to matter a whole lot. The observed T3 ANSNET one way delay associated with a single T3 router hop is 0.165mS per router (1.35mS cross country one way delay due to 8 router hops). This is neglible compared with the cross-country propogation delays (e.g. 35ms one way). It would require the addition of 30 T3 routers to a path to add 10 ms to the unloaded round trip time, given constant circuit mileage. Delays introduced by extra router hops are negligible compared to circuit mileage delays. (2) For small packets, like the default for ping and traceroute, the round trip delay is mostly dependent on circuit mileage, and is relatively independent of bandwidth (for T1 and beyond, at least). (3) All T3 links within the network are maintained at equal cost link metrics regardless of physical mileage. This was designed during the timeframe when RS/6000 routers were switching packets through the system processor, and hop count, and transit traffic through the router were important quantities to minimize. With the introduction of pure adapter level switching (e.g. no RS/6000 system processor involved in switching user datagrams), minimizing hop count and router transit traffic become less important. Minimizing overall ENSS<->ENSS delay becomes more important. (4) The T3 ANSNET maintains two different physical circuit routes between Washington D.C. and Palo Alto. Each of these routes represent equal cost paths, and therefore will split the traffic load between them. However one of these physical routes is about 600 miles longer than the other. This can introduce problems involving asymmetric routes internal to the T3 network, and sub-optimal latency. The T3 ANSNET circuits are physically diverse to avoid large scale network failures in the event of a fiber cut. Compromising physical route diversity is not planned. However some reduction of real T3 circuit mileage (and therefore about 5mS of delay) might be possible on the ANSNET with the installation of the Atlanta POP CNSS in September. ANS is conducting a review with MCI to determine whether the Washington->Greensboro->Houston->Los Angeles->Hayward physical route can be reduced in total circuit miles without compromising route network diversity. This might be possible as part of the plan to co-locate equipment within Atlanta. T3 Network Throughput --------------------- The RS960 adapter technology will support up to five T3 interfaces per router, with an individual T3 interface operating at switching rates in excess of 10K packets per second in each direction. The unit and system tests performed prior to the April '92 network upgrade required the CNSS routers to operate at 50KPPS+ aggregate switching rates, and 22Mbps+ in each direction with an average packet size of 270 bytes on a particular RS960 interface. The router has also been configured and tested in the lab to saturate a full 45Mbps T3 link. The performance that is currently observed by individual end users on the T3 network is largely determined by their access to the network. Access may be via an ethernet or an FDDI local area network. Many users have reported peak throughput observations up to 10Mbps across the T3 network using ethernet access. Several of the T3 network attachments support an FDDI local area network interface which, unfortunately, does not result in more than 14Mbps peak throughput across the T3 backbone right now. With the new RS960 FDDI adapter to be introduced in September, end-to-end network throughput may exceed 22Mbps in each direction (limited by the T3 adapter). The initial RS960 FDDI card software will support a 4000 byte MTU that will be increased later on with subsequent performance tuning. Further performance enhancements will be administered to the T3 backbone in the fall and winter to further approach peak 45Mbps switching rates for end-user applications.