On Tue, 26 June 2001, Clayton Fiske wrote:
Plus, a CRC error can occur between two valid, compliant, bug-free implementations. A bad route, by definition, can't. We're not talking about external faults here, but broken implementations. When one side of a protocol session simply breaks the rules, I don't think it's reasonable to say that the other side needs to be "fixed" to accept that breakage. Fix the broken side.
Uhm, lets see what you think of this press announcement. "Pardon us, we must shutdown the Internet will we decide whose software needs to be fixed. Don't worry, as soon as it is fixed, the Internet will be rebooted." Yep, somebody's implementation was broken. One part of the response is to fix their implementation. While we waiting to get the fix, the rest of the Internet should not have been flapping.
[Note: I am not trying to bash on Cisco for this. Everyone has their bugs, this one is just good for illustrating the point.] On Tue, Jun 26, 2001 at 07:29:24PM -0700, Sean Donelan wrote:
On Tue, 26 June 2001, Clayton Fiske wrote:
Plus, a CRC error can occur between two valid, compliant, bug-free implementations. A bad route, by definition, can't. We're not talking about external faults here, but broken implementations. When one side of a protocol session simply breaks the rules, I don't think it's reasonable to say that the other side needs to be "fixed" to accept that breakage. Fix the broken side.
Uhm, lets see what you think of this press announcement.
"Pardon us, we must shutdown the Internet will we decide whose software needs to be fixed. Don't worry, as soon as it is fixed, the Internet will be rebooted."
That's funny, I thought it was pretty clear who was broken here. How about one more accurately reflecting the situation? "A major vendor has discovered a bug in their BGP implementation which can cause unnecessary instability in the event of a malformed announcement. While this has been a relatively rare occurrence thus far, they have already issued a patch to correct this behavior. Network providers are encouraged to apply this patch as soon as possible."
Yep, somebody's implementation was broken. One part of the response is to fix their implementation. While we waiting to get the fix, the rest of the Internet should not have been flapping.
I happen to have some Vendor X routers on my network, and I sure didn't notice The Internet flapping. I guess I'm in that small section that's not a downstream or direct peer of the offending network. I don't object to the discussion of changing the RFC (whether I agree or not), and I accept that Vendor [everyone else except Cisco] having a knob for this would have prevented some routing disruptions for some networks. But then again, static routes would have prevented that too. It doesn't mean they're a good idea. What I object to is that people are using this particular case as justification for said discussion. Suppose the bug in question had manifested itself differently. Suppose it thought the announcement was malformed, when in fact it was correct. Suppose it behaved the same way, by passing on the announcement and then dropping the originating session. All of the same BGP sessions on offending provider's [presumably] homogenous network would still have dropped. Sure, the border router's session with Vendor X would have stayed up, but the border router probably wouldn't have any routes left to feed to Vendor X, because its iBGP mesh was bouncing all over the place. Now is it the fault of Vendor X for not having a knob? The damage was essentially the same, major routing disruptions for traffic transiting that network. My point is that the nature of this bug was particularly nasty, and it bit several people when it was triggered. However, it was only because of someone else's protocol violation that it was triggered. If we are going to decide that the RFC needs to change to allow for what -might- happen if one implementation's bug triggers another implementation's bug to wreak havoc, I think we're going to be here for a long time. Had the first router(s) to receive the malformed route behaved as the RFC dictates and dropped the offending session, damage would have been limited to the offending router and its downstreams only. I don't agree that this behavior makes The Internet more brittle. This is my final post on the subject. I will be happy to continue the discussion privately if anyone wishes. -c
participants (2)
-
Clayton Fiske
-
Sean Donelan