Announcing BGP troubleshooting work for ISPs
Dear all, We (jointly with AT&T Research Lab) have recently developed a real-time troubleshooting tool that identifies significant and actionable BGP routing events (on the order of a few dozen) from millions of BGP updates from border routers of a given ISP network. The goal is for an ISP's network operators to identify locally observed BGP events that are important (e.g., affect a large number of prefixes, shift a lot of traffic, etc.) Our paper is published at NSDI: http://www.eecs.umich.edu/~zmao/Papers/nsdi05-jian.pdf The talk slides are at: http://www.eecs.umich.edu/~zmao/talks/nsdi2005.ppt http://www.eecs.umich.edu/~zmao/talks/nsdi2005.pdf We have several interesting findings when we applied our tool for the AT&T data: -We found more than 15% of the updates are due to persistently flapping prefixes even when flap damping was enabled. The reason is that flap damping is session-based. When a session is reset, the damping history is not retained. Moreover, damping is not implemented for iBGP sessions. There are three main causes for persistent flapping: (1) Conservative damping parameters (2) Protocol oscillations due to MED (3) Unstable interfaces or BGP sessions -We found eBGP session resets and hot potato changes contribute to many routing disturbances, and most of the routing events that have major impact on traffic shift are also due to session resets and hot potato events. Please let us know if you have any comments/feedback. Unfortunately, the tool is not yet available, but the detailed information of how the tool works is available in the paper. Thanks! -Z. Morley Mao, Jian Wu, Jennifer Rexford, Jia Wang
participants (1)
-
Z. Morley Mao