AS690 Gated/BGP4 Deployment Status

AS690 Gated Installation Summary - Sunday Feb. 13th 08:10EST ================================ The AS690 gated deployment this past evening was very successful. We are now running gated in production on ENSS205, ENSS194, ENSS160, ENSS131, ENSS139, CNSS120, and ENSS158. These gated nodes are now supporting interoperability with the rcp_routed IGP & IBGP2, external BGP2 and external BGP4 (3 peers at Rice U. thanks to Bill Manning), and several EGP peers. We also are monitoring the appropriate gated MIBs. We found some minor problems along the way, some of which will have to be fixed before the next scheduled deployment, but we did not see anything that would result in operational problems, or require us to back out to rcp_routed. We would like to schedule the next deployment for Tuesday morning Feb 15th (05:00-08:00EST). We would like to deploy gated on ENSS136, ENSS145, ENSS144 during this window. Once we successfully complete installation on these nodes, we would like to deploy across the rest of the AS690 system. This will require a bit of work on the Policy Routing Database to ensure that don't have to do any manual corrections to the gated configuration files as we go along. The summary of problems that were observed during the Sunday morning gated deployment include: 1. Gated "passive" connection option not working. Rcp_routed does not actively try to establish sessions with external routers, and instead waits for them to establish the connection. Gated tries to actively connect to all configured external peers unless the "passive" configuration option is used. The passive configuration option in gated does not seem to be working, and we observed a few startup connection wars occur between ENSS131 and its BGP peers, and likewise with ENSS139. The problem occurs when gated and its external peers try to establish connections at the same time. This settles down after a couple of minutes and the connections stay up once established. This will be fixed in gated. 2. We observed on ENSS194, and ENSS139 that upon some occaisional initializations of gated, we sometimes do not get all of the LSP packets, and the link state database is incomplete. In the case of ENSS194, it did not get all of its adjacencies when gated was first started. This worked itself out on its own after a few minutes. On ENSS139 we saw the same thing, only it did not work itself out until after we re-started gated. This most likely has something to do with way rcp_routed establishes adjacencies. We did not see this on the testnet, and we have no way easy way to debug this, but once stable, the gated systems seem to stay that way. For now, we will address this problem by restarting gated if it gets in this state following gated initialization, and migrate away from rcp_routed as soon as possible. 3. We found a bug in the gated dynamic reconfiguration where gated will crash if we try to reconfigure on the fly with gated systems that support multiple EGP peers. Rcp_routed does not support the same level of dynamic reconfiguration as gated. We were able to reproduce this problem consistently on ENSS139 (Houston). This is a bug that we would like to fix before the next wave of gated deployment.
participants (1)
Jordan Becker