RS/960 upgrade status report for Week 2
Phase-III RS960 Deployment Status Report - Step 2 ================================================= Jordan Becker, ANS Mark Knopper, Merit Step 2 of the phase-III network upgrade was successfully completed last Saturday 5/2. The following T3 backbone nodes are currently running with new T3 hardware and software in a stable configuration: Seattle POP: CNSS88, CNSS89, CNSS91 Denver POP: CNSS96, CNSS97, CNSS99 San Fran. POP: CNSS8, CNSS9, CNSS11 L.A. POP: CNSS16, CNSS17, CNSS19 Regionals: ENSS141 (Boulder), ENSS142 (Salt Lake), ENSS143 (U. Washington) ENSS128 (Palo Alto), ENSS144 (FIX-W), ENSS135 (San Diego) CNSS8, CNSS16, CNSS96 are now running with mixed technology (e.g. 3xRS960 T3 interfaces, 1xHawthorne T3 interface). Production traffic on the affected Bay Area ENSS nodes was cutover to the T1 backbone at 2:00 AM EST on 5/2. Production traffic on ENSS135 was cutover two hours earlier. The San Francisco and Los Angeles POP nodes were returned to full service by 10:50 AM EST on 5/2, well within the planned maintainence window. The maintainence in the Los Angeles POP was complicated by the curfew existing at that time. Normally a specially trained 2-3 person deployment team is scheduled to perform these upgrades at each POP location. Because of the circumstances in Los Angeles, a special IBM engineer (Carl Kraft) from Gaithersberg, Maryland was deployed to the Los Angeles POP to perform the upgrade by himself. Carl was able to upgrade the node single-handedly on schedule. Several new procedures were developed following the first deployment step in Seattle and Denver on 4/25. These procedures helped to reduce the installation window and number of installation problems experienced on 4/25. The only problem experienced was the supected failure of a single RS960 adapter during the installation at ENSS128 (Palo Alto). This problem was isolated to the adapter within several minutes and the adapter was swapped resulting in successful operation of the node. A subsequent failure analysis of the RS960 adapter has not resulted in any reproducible problems, and has been attributed to an improper seating of the adapter in ENSS128 during the initial installation. Next Steps ========== Based upon the successful completion of step 2 of the deployment, step3 is currently scheduled to commence at 23:00 local time on 5/8. Step 3 will involve the following nodes/locations: Chicago POP: CNSS24, CNSS25, CNSS27 Cleveland POP: CNSS40, CNSS41, CNSS43 New York City POP: CNSS32, CNSS33, CNSS35 Hartford POP: CNSS48, CNSS49, CNSS51 San Fran. POP: CNSS8 (Second visit to CNSS8->CNSS24 Interface) Regionals: ENSS130 (Argonne), ENSS131 (Ann Arbor), ENSS132 (Pittsb urgh) ENSS133 (Ithaca), ENSS134 (Boston), ENSS137 (Princeton) Other ENSS's Affected: E152, E162, E154, E158, E167, E168, E171, E172, E163, E155, E160, E161, E164, E169 The system software (build 2.78.22) required to support RS960 installations has been fully deployed to all T3 network nodes as of early this week. New rcp_routed software has also been installed on all T3 nodes, although this is not a pre-requisite for any phase-III deployment activities. The new rcp_routed software has enhancements including support for externally administered inter-AS metrics, an auto-restart capability, and a fix for the invalid acceptance by the ENSS of a route to itself from a peer. Following the step 3 deployment, selected T3 internal link metrics will be adjusted to support load balancing of traffic across the 5 different hybrid technology links that will exist. The selection of these link metrics has been chosen through a calculation of traffic distributions on each link based upon an AS<->AS traffic matrix. This step of the deployment involves 4 POPs, and will complete the coast to coast RS/960 path for a large proportion of the backbone traffic. During the step 3 deployment, the Ann Arbor ENSS will be isolated from the T3 backbone. Since the Merit/ANS NOC is located in Ann Arbor, and Merit's backup connectivity to the backbone will be through the T1 network, we are implementing a backup network management machine. The "rover" monitoring tool is running on an unused RS/6000 CNSS at the Denver POP, and its data collection capability will be used if there is any problem with Merit's connection to the T3 backbone. Also during this deployment the Princeton ENSS will be isolated from the backbone. This means that both the Ann Arbor T1/T3 interconnect gateway and its backup at Princeton will not be operational. Therefore on Friday night we will run the Houston interconnect as primary, and temporarily configure the San Diego interconnect gateway as secondary with load sharing being handled as with the Ann Arbor/Houston configuration.
participants (1)
-
mak