RE: Global BGP - 2001-06-23 - Vendor X's statement...
On Tue, 26 June 2001, "Richard A. Steenbergen" wrote:
Killing 100,000 routes because you don't like one seems a bit excessive.
There are only two situations where you will have a route that "isn't liked", 1) The sender does something wrong, 2) The receiver incorrectly handles something that is actually right. Either way, there is a fundamental flaw in someone's BGP implementation, and that needs to get resolved immediately. Flapping between the offender and its peer is probably an acceptable way to go about finding these, flapping over the entire internet because vendor C chooses to ignore the protocol spec and vendor F does not is probably not...
Flapping 100,000 routes because one route has some bits set in a way one implementation (not vendor) thinks is Ok, and another implementation (not vendor) things is wrong, isn't the best solution. It should flap that one route, not the other 100,000 routes. Even if that one route is never propagate beyond that point, the flapping of the 100,000 routes will be. The RFC 1771 solution requires the flapping of *ALL* routes, not just the bad ones. There will always be cases where Vender A thinks they are correct and Vendor B thinks they are correct, and they differ. And you are correct, either the sender has done something wrong or the receiver has done something wrong, hence the Internet motto. The sender should conservatively send only "good" data. The receiver should liberally accept what it can, and only reject "bad" data. I don't think the receiver should be changed to understand the bad data, just not to reject "good" data. Under RFC 1771, the receiver is rejecting both "good" and "bad" data. It should be revised so when there are both "bad" routes and "good" routes, the receiver should accept the "good" routes and only reject the "bad" routes. If a TELNET implementation doesn't understand an escape code, it shouldn't terminate the entire TELNET session. There is a flaw in both the sender's implementation and RFC 1771's method of handling errors.
Sean, It can be dangerous to look at a problem from the extreme case. To illustrate that, look at it from the other extreme for a second. Imagine a router which gets 99,999 bad routes from a peer and 1 good one. Should it try to send a notify with appropriate opcode for each of those 99,999 and keep the one good route? How likely is it that it can do that and forward the packets it's getting on its other interfaces? Now, let's got back to your assumption that you get one bad route; sure, you can send the notify on that, maintain state that this part of the announcement should be disregarded, and keep going. Somewhere between one bad route out of 100k and one good route out of 100k there is a threshold that says the sensible thing to do is to shut down the session, having told the bgp peer why. What's the threshold on Cisco GSR? How about on a Cisco 7200 series? A Juniper M5? Wait, that threshold should probably vary not just based on my hardware but on the peer's hardware (a 7200 is probably not going to be able to process all those notifies...). Hmm, now I need to signal both my capacity and current load on an ongoing basis so my peers know how many notifications I can handle before I fall over. Suddenly, I don't want to play any more. One of the othe posters noted that seeing bad data was could be a bellwether of a router gone nuts. I'm not sure how often that is the case, but I agree that setting the threshold low makes sense. I might even agree that the simplest thing to do is set it to 1. regards, Ted Sean writes much, which I have snipped to:
The receiver should liberally accept what it can, and only reject "bad" data.
I don't think the receiver should be changed to understand the bad data, just not to reject "good" data.
Under RFC 1771, the receiver is rejecting both "good" and "bad" data. It should be revised so when there are both "bad" routes and "good" routes, the receiver should accept the "good" routes and only reject the "bad" routes.
If a TELNET implementation doesn't understand an escape code, it shouldn't terminate the entire TELNET session.
There is a flaw in both the sender's implementation and RFC 1771's method of handling errors.
On 26 Jun 2001, Sean Donelan wrote:
There will always be cases where Vender A thinks they are correct and Vendor B thinks they are correct, and they differ. And you are correct, either the sender has done something wrong or the receiver has done something wrong, hence the Internet motto.
But there there should be no room for debate, one side is right and the other side is wrong. If there is really a grey area, the solution is to fix the wording of the standards document, not to try and overlook the problem. I agree that in this case it is possible to have ignored the bad AS PATH and drop the route without disturbing the session originating the bad information. This is one specific example could probably have been handled better with a non-fatal notification (with big red lights and buzzers). However, it was unacceptable for that router to propagate the bad information to others. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
participants (3)
-
hardie@equinix.com
-
Richard A. Steenbergen
-
Sean Donelan