Since when is BGP a bug-free protocol? Let's not forget the BGP best path selection algorithm itself is broken (there are circumstances under which it will NEVER converge on a best path see ietf draft on IDR route oscillation). Not to mention the various malformed AS-Path bugs which have shown up over the years. I took a vendor class once where they made us do a lab where we had to run BGP w/o an IGP, in a later revision of the class they removed that lab because they decided it was too much of a nightmare even for a lab environment. -----Original Message----- From: Iljitsch van Beijnum [mailto:iljitsch@muada.com] Sent: Tuesday, September 03, 2002 10:39 AM To: alex@yuriev.com Cc: nanog@merit.edu Subject: Re: AT&T NYC On Tue, 3 Sep 2002 alex@yuriev.com wrote:
That is why their route is *nailed* via BGP to the router that *always* provide connectivity to them. If they have to move, BGP injectors are your friends. Takes seconds.
Talking about things that take seconds: would you mind sharing your BGP hold time values with us? Iljitsch van Beijnum
Since when is BGP a bug-free protocol? Let's not forget the BGP best path selection algorithm itself is broken (there are circumstances under which it will NEVER converge on a best path see ietf draft on IDR route oscillation). Not to mention the various malformed AS-Path bugs which have shown up over the years. I took a vendor class once where they made us do a lab where we had to run BGP w/o an IGP, in a later revision of the class they removed that lab because they decided it was too much of a nightmare even for a lab environment.
BGP is not a bug-free protocol. BGP is the easiest protocol to *debug* when the problem shows up. BGP does not help to accidently affect *unaffected* paths when a problem shows up. It looks like everyone forgot the reason for this discussion to begin with. It is the outage caused by a mistake on a single router that affected parts of the network that were *NOT* affected by the original mess. Please not that this discussion tends to get restarted whenever we have a real OSPF (or ISIS) caused mess. Alex
You keep referring to the problem of OSPF causing the outage for AT&T and unaffected customers. The AT&T released RFO simply states that OSPF network statements were removed. That can happen just as easy with static routes and BGP network/neighbor statements. OSPF did what it was instructed to do, just as BGP would have done if it were told to drop neighbors, or networks. -jf On Tue, 3 Sep 2002 alex@yuriev.com wrote:
Since when is BGP a bug-free protocol? Let's not forget the BGP best path selection algorithm itself is broken (there are circumstances under which it will NEVER converge on a best path see ietf draft on IDR route oscillation). Not to mention the various malformed AS-Path bugs which have shown up over the years. I took a vendor class once where they made us do a lab where we had to run BGP w/o an IGP, in a later revision of the class they removed that lab because they decided it was too much of a nightmare even for a lab environment.
BGP is not a bug-free protocol.
BGP is the easiest protocol to *debug* when the problem shows up.
BGP does not help to accidently affect *unaffected* paths when a problem shows up.
It looks like everyone forgot the reason for this discussion to begin with. It is the outage caused by a mistake on a single router that affected parts of the network that were *NOT* affected by the original mess.
Please not that this discussion tends to get restarted whenever we have a real OSPF (or ISIS) caused mess.
Alex
You keep referring to the problem of OSPF causing the outage for AT&T and unaffected customers. The AT&T released RFO simply states that OSPF network statements were removed. That can happen just as easy with static routes and BGP network/neighbor statements.
OSPF did what it was instructed to do, just as BGP would have done if it were told to drop neighbors, or networks.
OSPF network statements were removed, according to RFO, which I have received, on one router. Can you please explain to me why customers in other *cities* which clearly were terminated into different routers were affected? Since we know based on our emprirical observation that it did happen, it can be concluded that AT&T has bad network design. It does not matter *why* customers who were not terminated into the affected routers could not use AT&T network. What matters is that they *could* b not use AT&T's network because AT&T's engineering made a choice of using a broken design. This broken design is going to cost AT&T a couple of million. Hopefully, at some point a VP of Engineering for AT&T is going to realize that his job is going to be on the line if stuff like this keeps happening, at which point certain engineers within AT&T are going to get their heads handed back to them on a platter. Again, hopefully at that point, those who remain at AT&T will realize that their existing design is broken and another outage is going to cost them their jobs and redo it. At the end we are going to have a lot more stability on the internet. As far as BGP would have done the same thing: would you mind desciring a configuration of BGP where deletion of a network statement in one router would cause unreachability across paths that do not *realy* on that network statement? Alex
As far as BGP would have done the same thing: would you mind desciring a configuration of BGP where deletion of a network statement in one router would cause unreachability across paths that do not *realy* on that network statement?
Since you have replaced ospf/isis/rip or any other dynamic igp with static and connected, apples to apples comparison would be: "Can you describe a configuration where removing 1 static route on 1 router would cause unreachability for other paths?"
Alex
As far as BGP would have done the same thing: would you mind desciring a configuration of BGP where deletion of a network statement in one router would cause unreachability across paths that do not *realy* on that network statement?
Since you have replaced ospf/isis/rip or any other dynamic igp with static and connected, apples to apples comparison would be: "Can you describe a configuration where removing 1 static route on 1 router would cause unreachability for other paths?"
Again, this is fully dynamic routing. Alex
On Tue, 3 Sep 2002, Frank Scalzo wrote:
Since when is BGP a bug-free protocol? Let's not forget the BGP best path selection algorithm itself is broken
Actually, the RFC says the route selection algorithm is a local matter, so if it's broken on your network, then strictly speaking, it's your own fault.
participants (5)
-
alex@yuriev.com
-
bdragon@gweep.net
-
Feger, James
-
Frank Scalzo
-
Iljitsch van Beijnum