RE: Global BGP - 2001-06-23 - Vendor X's statement...
On Tue, 26 June 2001, "Richard A. Steenbergen" wrote:
On 26 Jun 2001, Sean Donelan wrote:
There will always be cases where Vender A thinks they are correct and Vendor B thinks they are correct, and they differ. And you are correct, either the sender has done something wrong or the receiver has done something wrong, hence the Internet motto.
But there there should be no room for debate, one side is right and the other side is wrong. If there is really a grey area, the solution is to fix the wording of the standards document, not to try and overlook the problem.
I'm not proposing we overlook the problem. However, software is very bad at deciding who is right and who is wrong. Other than malware, most vendor software does not deliberately send bad data. The software, or rather the programmer who wrote the software, thought the program was sending correct data. Later when humans looked at the data, humans decided the data was wrong and fixed the software. What do we do between the time the software makes an error, and time the humans can interven? Have the software, with no human oversight, nuke everything? The Blue Screen Of Death may be a very "safe" for software to do when it encounters an error. However, it is not a very good thing for system availability. I agree error handling is "hard." Aborting the entire BGP session makes the Internet more brittle than necessary. In the hours/days between the software sending the data, and the humans fixing it, the network was hurting a lot more than you would expect from a single bad route. The constant cycle of abort, reset, route flap was an amazing multiplier effect of one bad route.
I agree that in this case it is possible to have ignored the bad AS PATH and drop the route without disturbing the session originating the bad information. This is one specific example could probably have been handled better with a non-fatal notification (with big red lights and buzzers). However, it was unacceptable for that router to propagate the bad information to others.
I agree, you must have both sides (conservative send, and liberal receive). Sending bad data is not acceptable. Cisco should not send bad data. Crashing/aborting when you receive bad data isn't acceptable either. Bad data happens, Vendor X should not abort if it had other options. Sometimes there is no alternative besides aborting. However, the RFC makes aborting a requirement. There are errors BGP implementations could recover (with blinking red lights and loud buzzers). The RFC should give the option of continuing to implementations. I was following the standard isn't a good reason to crash. If following the standard causes the Internet to flap like a hummingbird for a day, we need to get the standard changed (as well as fix the existing implementations). These are not mutually exclusive goals. 1) Modify the standard so an error does not have as much impact worldwide 2) Fix the current implementations Yes, a pedestrian may have the right of way in the crosswalk. But proving your point by having the semi-truck flatten you isn't very smart.
Date: 26 Jun 2001 19:23:42 -0700 From: Sean Donelan <sean@donelan.com>
[ heavy snipping throughout ]
I agree, you must have both sides (conservative send, and liberal receive).
Sending bad data is not acceptable. Cisco should not send bad data.
I think that everyone agrees here... the question is, what penalty to apply and with what scope when some router spews bad data?
Crashing/aborting when you receive bad data isn't acceptable either. Bad data happens, Vendor X should not abort if it had other options.
1. Flapping. If the route is bad, put the route in "time out corner". 2. AS-PATH filtering. If the as-path looks funny, kill the route. 3. Bogon/spoofing filtering. If the source IP is funny, block traffic from that IP. Solutions to routing problems follow a "punishment fitting the crime" system. In this sense, I agree with your logic about penalizing a single route being of appropriate scope for bad BGP. Heavy flapping is bad because of a two-word phrase: state maintenance. Any proposed solution should avoid intensive state maintenance, else it will be as much of a pain as flapping. My gut feel is that I'd rather nuke the connection with a bad router, deducing "we don't trust this one". Looking at the above 1-3, however, this sort of behavior does not make sense: 1. If a route flaps, do we damp[en] all routes from it, because { one is | some are } "bad"? No. 2. When some idiot redistributes their upstreams' routes, do we kill their BGP session? I wish, but the answer is no. 3. When funky packets land, do we blackhole anything from the sending router? Nope; this would be increasingly dangerous as one got farther into the core. The above are examples of layer-eight mistakes. If we consider bad data to be the result of a loose nut between the keyboard and the chair, then we should probably penalize on a per-route basis. Up to this point, I agree with you, Sean [Donelan]. But the $100k question (100kroute question?) is: "Does bad data fit in this category, or does it mean that the router on the other end is so fscked that we kill the connection?" It would seem that the RFCs imply the latter. If we suggest otherwise, I should think that we should argue on these grounds... this is where it is handy to have data that will either prove or disprove the claim that "bad data = bad router". My $0.01 (only $0.01 because I'm at the edge), Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
On Wed, Jun 27, 2001 at 04:24:45PM +0000, E.B. Dreger wrote:
Date: 26 Jun 2001 19:23:42 -0700 From: Sean Donelan <sean@donelan.com>
[ heavy snipping throughout ]
I agree, you must have both sides (conservative send, and liberal receive).
Sending bad data is not acceptable. Cisco should not send bad data.
I think that everyone agrees here... the question is, what penalty to apply and with what scope when some router spews bad data?
How about if there was a tool you could run against a BGP speaker which sent a series of deliberately pathological and bogus updates, and logged the behaviour of the box under test? I haven't heard anybody say that vendor X, Y or Z are refusing to fix bugs when they are pointed out to them (quite the contrary). The trick would seem to be to report the bugs before they are found in the wild. What BGP acceptance tests do people currently run against prospective vendors' hardware? Joe
participants (3)
-
E.B. Dreger
-
Joe Abley
-
Sean Donelan