RE: Global BGP - 2001-06-23 - Vendor X's statement...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jim, Agreed, so throw the bad route to the bit bucket and leave the bgp session open, or at the very least (as others have suggested) give me an OPTION to do that. Bad enough we were only operating at 33% capacity, however, if we only had transit from the 4 that were giving us the bad route, we would have lost connectivity totally. While it would've been really cool to post an outage notification bragging about our RFC compliance, and how it's everybody elses fault, I (personally) would have preferred to stay connected to the internet and not be losing revenue. Perhaps I just have my priorities wrong. Matt - -- Matt Levine @Home: matt@deliver3.com @Work: matt@eldosales.com ICQ : 17080004 PGP : http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x6C0D04CF - -----Original Message----- From: Jim Segrave [mailto:jes@nl.demon.net] Sent: Wednesday, June 27, 2001 5:36 AM To: Matt Levine Cc: 'Chance Whaley'; nanog@merit.edu Subject: Re: Global BGP - 2001-06-23 - Vendor X's statement... On Tue 26 Jun 2001 (15:09 -0400), Matt Levine wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
What I would like is for my routers to not drop 4 of our 6 transit providers, RFC, standard, not standard, whatever. We've suggested to our vendor that there atleast be some option to control this, we are not at the core, we are an end user. When following the RFC dictates that our routing equipment loses connectivity to the internet, then I say that there is a problem. It's really nice that they can say "it's not a bug, it's a feature", but this is a feature I'd at the very least have the ability to turn off.
Matt
So you'd prefer to propogate the error to all of your peers, who, in the interests of interoperability, are standards compliant? A bad announcement should be stopped as soon as it's discovered, not propagated because it's inconvenient to drop the session on someone's network. - -- Jim Segrave jes@nl.demon.net -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com> iQA/AwUBOzoVx8p0j1NsDQTPEQKGLQCfd9wIWwhyDYgD/5ObtpOMl4IZWZAAn3R2 HsLf2EeNXCn0R6ZChnKdBPpk =9Sc3 -----END PGP SIGNATURE-----
Date: Wed, 27 Jun 2001 13:21:20 -0400 From: Matt Levine <matt@deliver3.com>
Agreed, so throw the bad route to the bit bucket and leave the bgp session open, or at the very least (as others have suggested) give me an OPTION to do that. Bad enough we were only operating at 33% capacity, however, if we only had transit from the 4 that were giving us the bad route, we would have lost connectivity totally. While it
<imesho> On the surface, this appears to be correct. But let's ask ourselves _why_ those upstreams had bad routes. It's because _they_ did not filter at the edge. If bad routes leak, but are filtered before reaching the core, then they never make it to you. IOW, your concern is a non-issue if the large providers apply similar filtering at the edge. You wouldn't be cutting yourself off because the provider in question would have filtered it long ago. Do it at the edge, and the Internet does not become any more brittle. As for making money... if the general agreement is that "BGP death penalty" is correct, let the violators and bad BGP speakers face the consequences of spewing garbage. </imesho> Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
E.B. Dreger wrote:
Date: Wed, 27 Jun 2001 13:21:20 -0400 From: Matt Levine <matt@deliver3.com>
Agreed, so throw the bad route to the bit bucket and leave the bgp session open, or at the very least (as others have suggested) give me an OPTION to do that. Bad enough we were only operating at 33% capacity, however, if we only had transit from the 4 that were giving us the bad route, we would have lost connectivity totally. While it
<imesho>
On the surface, this appears to be correct.
But let's ask ourselves _why_ those upstreams had bad routes. It's because _they_ did not filter at the edge. If bad routes leak, but are filtered before reaching the core, then they never make it to you.
IOW, your concern is a non-issue if the large providers apply similar filtering at the edge. You wouldn't be cutting yourself off because the provider in question would have filtered it long ago.
Correct. However, this means I have to place my complete trust in them to Do Things Right (well, them, and more importantly in this case, their vendors). As Saturday has demonstrated, this is not a safe assumption, in that there appears to be some significant number of boxes in the core which will propagate bad routing data, even if they are also resetting the sessions which it came from (note: I'm not saying it's Cisco. It might be; historically, Ciscos have done this before. But I have no direct evidence that they did, or didn't; only the inferrence that it had to be *something* used on a very widespread basis, given the number of peers that had the problem simulataneously. Oh, and I *do* know, from direct observation, that the Ciscos facing us were either causing this bug themselves (possible, but it doesn't seem terribly likely given the spread of them), or transiting the route to us when they should have been ditching it, along with the session).
Do it at the edge, and the Internet does not become any more brittle.
The same with source-filtering IPs. Do it at the edge, and the problem goes away. Now, *how* long has it taken to implement this? Someone said, a few messages ago, that the purpose of a routing protocol is to avoid loops. I disagree. The purpose of a routing protocol is to propagate good, viable routing information. Thus, it MUST have a way to deal with bad routing information, but it SHOULD (IMO) have a way to deal with said information that is not necessarily fatal. We have quite clearly demonstrated that it is a non-trivial possibility that A) bad routes will manage to become widespread, through various bugs, and B) it is possible to have one or two bad routes in an otherwise useful table of 100,000 routes. When reality says the basis of your design theory is inaccurate, well, it's time to look at revamping the design to accomodate for it, if that can be done without trashing the whole thing (sometimes even if it takes that, but I see no call for it in this case, as it's not that severe, and it is entirely fixable without tossing out everything that has worked so far).
As for making money... if the general agreement is that "BGP death penalty" is correct, let the violators and bad BGP speakers face the consequences of spewing garbage.
When the violators are "Almost ever major transit provider", this means you'll be off in a corner playing Internet by yourself. This isn't very attractive to most potential customers, no matter how RFC compliant you are. Again, Saturday showed that this is, in fact, the case. I would love to see the core problem fixed, and never *need* to invoke anything that ditches single bad routes because the only breakages occur when a peer goes completely nuts and spews garbage at me. Unfortunately, this hasn't been the case for a long time now, and doesn't appear terribly likely to be fixed tomorrow, given what the press releases have said about various vendors... -- *************************************************************************** Joel Baker System Administrator - lightbearer.com lucifer@lightbearer.com http://www.lightbearer.com/~lucifer
It seems that the right way to handle a malformed route or two depends on who's speaking and who's listening. If I'm a backbone provider and I hear a bad route from a customer, I'm going to drop that connection. I have no incentive to take any risks. This is just as the RFC currently reads. If I'm a customer, I really don't want to shut off the service that I'm paying for. If I'm not going to propagate the routes beyond my borders, why should I drop the whole session? The risk is entirely mine, and a partially corrupted table is better than no connectivity at all.
From this point of view, it seems that the RFC should be loosened to allow configuration of a BGP peer to continue the session and ignore the route.
Perhaps there should be wording to the effect that it is not acceptable practice to propagate routes from the offending router beyond your borders. Maybe there is even a way to phrase it that means "it's not OK to propagate routes from a suspect router back into the core of the Internet." In practice these words have meaning because "upstream" and "downstream" are defined by the flow of money, and economics suppresses loops. Steve Schaefer Dashbit - The Leader In Internet Topology www.dashbit.com www.traceloop.com
On Wed, Jun 27, 2001 at 08:15:35PM +0000, E.B. Dreger wrote:
On the surface, this appears to be correct.
Indeed. But, why stop with this very superficial analysis? Why can't we dig deeper into such details as: - who started announcing cruft, and to who? - which vendor's hardware/software passed it along, and which dropped their BGP sessions, as they're currently required to? - which providers were impacted, and to what extent? and so on. I'm sure most of us know the answers to these questions by now, and those who don't, should. Shame we're all forbidden from discussing things further in a truly open manner due to NDA. This was not the case in the not-so-distant past; hopefully the climate will change in time for future multi-provider incidents of operational concern. -adam
participants (5)
-
Adam Rothschild
-
E.B. Dreger
-
lucifer@lightbearer.com
-
Matt Levine
-
Steve Schaefer