E.B. Dreger wrote:
Date: Wed, 27 Jun 2001 13:21:20 -0400 From: Matt Levine <matt@deliver3.com>
Agreed, so throw the bad route to the bit bucket and leave the bgp session open, or at the very least (as others have suggested) give me an OPTION to do that. Bad enough we were only operating at 33% capacity, however, if we only had transit from the 4 that were giving us the bad route, we would have lost connectivity totally. While it
<imesho>
On the surface, this appears to be correct.
But let's ask ourselves _why_ those upstreams had bad routes. It's because _they_ did not filter at the edge. If bad routes leak, but are filtered before reaching the core, then they never make it to you.
IOW, your concern is a non-issue if the large providers apply similar filtering at the edge. You wouldn't be cutting yourself off because the provider in question would have filtered it long ago.
Correct. However, this means I have to place my complete trust in them to Do Things Right (well, them, and more importantly in this case, their vendors). As Saturday has demonstrated, this is not a safe assumption, in that there appears to be some significant number of boxes in the core which will propagate bad routing data, even if they are also resetting the sessions which it came from (note: I'm not saying it's Cisco. It might be; historically, Ciscos have done this before. But I have no direct evidence that they did, or didn't; only the inferrence that it had to be *something* used on a very widespread basis, given the number of peers that had the problem simulataneously. Oh, and I *do* know, from direct observation, that the Ciscos facing us were either causing this bug themselves (possible, but it doesn't seem terribly likely given the spread of them), or transiting the route to us when they should have been ditching it, along with the session).
Do it at the edge, and the Internet does not become any more brittle.
The same with source-filtering IPs. Do it at the edge, and the problem goes away. Now, *how* long has it taken to implement this? Someone said, a few messages ago, that the purpose of a routing protocol is to avoid loops. I disagree. The purpose of a routing protocol is to propagate good, viable routing information. Thus, it MUST have a way to deal with bad routing information, but it SHOULD (IMO) have a way to deal with said information that is not necessarily fatal. We have quite clearly demonstrated that it is a non-trivial possibility that A) bad routes will manage to become widespread, through various bugs, and B) it is possible to have one or two bad routes in an otherwise useful table of 100,000 routes. When reality says the basis of your design theory is inaccurate, well, it's time to look at revamping the design to accomodate for it, if that can be done without trashing the whole thing (sometimes even if it takes that, but I see no call for it in this case, as it's not that severe, and it is entirely fixable without tossing out everything that has worked so far).
As for making money... if the general agreement is that "BGP death penalty" is correct, let the violators and bad BGP speakers face the consequences of spewing garbage.
When the violators are "Almost ever major transit provider", this means you'll be off in a corner playing Internet by yourself. This isn't very attractive to most potential customers, no matter how RFC compliant you are. Again, Saturday showed that this is, in fact, the case. I would love to see the core problem fixed, and never *need* to invoke anything that ditches single bad routes because the only breakages occur when a peer goes completely nuts and spews garbage at me. Unfortunately, this hasn't been the case for a long time now, and doesn't appear terribly likely to be fixed tomorrow, given what the press releases have said about various vendors... -- *************************************************************************** Joel Baker System Administrator - lightbearer.com lucifer@lightbearer.com http://www.lightbearer.com/~lucifer