RS/960 Upgrade for T3 Backbone - 5th Week of 5
Phase-III T3 Network Deployment - Step 4 Status Report ====================================================== Jordan Becker, ANS Mark Knopper, Merit Step 4 of the phase-III network was successfully completed last Saturday 5/16. All T3 nodes were back on-line within the scheduled maintainance window with the exception of the Houston CNSS67 T1 concentrator due to software configuration problem, and ENSS129 (Champaign) due to a T3 circuit problem. The upgrade of all other nodes were completed by 10:00 EST on 5/16. The following T3 backbone nodes are currently running with new T3 RS960 hardware and software in a stable configuration: Seattle POP: CNSS88, CNSS89, CNSS91 Denver POP: CNSS96, CNSS97, CNSS99 San Fran. POP: CNSS8, CNSS9, CNSS11 L.A. POP: CNSS16, CNSS17, CNSS19 Chicago POP: CNSS24, CNSS25, CNSS27 Cleveland POP: CNSS40, CNSS41, CNSS43 New York City POP: CNSS32, CNSS33, CNSS35 Hartford POP: CNSS48, CNSS49, CNSS51 St. Louis POP: CNSS80, CNSS81, CNSS83 Houston POP: CNSS64, CNSS65, CNSS67 Regionals: ENSS141 (Boulder), ENSS142 (Salt Lake), ENSS143 (U. Washington) ENSS128 (Palo Alto), ENSS144 (FIX-W), ENSS135 (San Diego) ENSS130 (Argonne), ENSS131 (Ann Arbor), ENSS132 (Pittsburgh), ENSS133 (Ithaca), ENSS134 (Boston) ENSS137 (Princeton), ENSS129 (Champaign), ENSS140 (Lincoln) ENSS139 (Rice) The CNSS32, CNSS48, CNSS64 nodes are now running with mixed technology (e.g. 3xRS960 T3 interfaces, 1xHawthorne T3 interface). Step 4 Deployment Difficulties ============================== The RS960 week 4 deployment was successful with two exceptions. The first problem involved the link between ENSS129 (Champaign) and CNSS81 (St. Louis POP). The second problem was the T1 concentrator in Houston (C67). This affected ENSS174 (IBM Austin) and ENSS173 (ITESM). On ENSS129, a jumper was initially found to be missing from the HSSI board and the jumper was added. ENSS129 was then successfully brought up online at 6:20 EST. Later in the morning, the ENSS129<->CNSS81 link began to experience packet loss. The DSUs were reporting coding violations and bipolar violations on the E129<->C81 link. The problem turned out to be a bad DS3 radio circuit between ENSS129->CNSS81 and MCI swapped the radio channel to clear the problem. At the Houston POP, CNSS64 and CNSS65 were upgraded and came up without problems. ENSS139 (Rice U.) also came up without any problems. However CNSS67 (the T1 concentrator) had complex problems. CNSS67 was taken down around 00:00 and the mechanical modifications were completed and bootup began at 2:15 EST. The machine rebooted and we started troubleshooting. All RS960 cards and the planar board were replaced in various combinations. After considerable analysis of the ODM configuration database, we suspected some ODM problems and we chose to re-install the system software from a tape built from another machine. The machine came right up on the first try at 14:00 EST with the same original hardware installed (including the original new RS960 card). This was clearly a system software corruption problem and there are no suspected hardware failures resulting from this. At the St. Louis POP, CNSS80, CNSS81, and CNSS83 were all upgraded and came up without any problems, 3 hours ahead of schedule at 4:45 EST. At the Denver POP, a loose serial port connector delayed the upgrade start time by a few minutes. We could not access the DSU through the out-of-band modem. An ODM adapter configuration problem was also fixed where cards were coming up up in the wrong slots. However these problems were solved and the Denver maintainence was completed well within the scheduled window. Routing was enabled and traffic flow started through the southern route (through Houston POP) by 9:35 EST even though the Houston T1 concentrator was still down so that the RS960 hybrid link upgrade at CNSS24 in the Chicago POP and the scheduled CNSS25 RS960 card replacement could begin on schedule. When CNSS24 would not come up smoothly with the new RS960 card, it was replaced without attempting to troubleshoot the system or the card. The replaced RS960 card will be returned to IBM for failure analysis. The RS960 card in CNSS25 was replaced as scheduled due to DMA under-runs which we observed last week. This installation went as planned and the machine came up 10:50 EST. New T3 internal link metrics have been installed to support load balancing of traffic across the 3 different hybrid technology links that now exist (e.g. CNSS64<->CNSS72, CNSS48<->CNSS72, CNSS32<->CNSS56). Step 5 Deployment Scheduled for 5/22 ==================================== Based upon the successful completion of step 4 of the deployment, step 5 is currently scheduled to commence at 23:00 local time on 5/22. Step 5 is the final phase-III upgrade step and will complete the deployment. This will involve the following nodes/locations: Greensboro POP: CNSS72, CNSS73, CNSS75 Washington D.C. POP: CNSS56, CNSS57, CNSS58, CNSS59 2nd Site Visit: CNSS32 (New York POP), CNSS64 (Houston POP) CNSS48 (Hartford POP) Regionals: ENSS138 (Georgia Tech.), ENSS136 (College Park) ENSS145 (FIX-E) Other Nodes Affected: ENSS150 (Concert), ENSS151, ENSS153, ENSS166 Following the step 5 deployment, all T3 internal link metrics will be re-adjusted to their normal metrics since no hybrid technology links will exist.
participants (1)
-
mak