On Tue, 26 June 2001, "Richard A. Steenbergen" wrote:
On 26 Jun 2001, Sean Donelan wrote:
There will always be cases where Vender A thinks they are correct and Vendor B thinks they are correct, and they differ. And you are correct, either the sender has done something wrong or the receiver has done something wrong, hence the Internet motto.
But there there should be no room for debate, one side is right and the other side is wrong. If there is really a grey area, the solution is to fix the wording of the standards document, not to try and overlook the problem.
I'm not proposing we overlook the problem. However, software is very bad at deciding who is right and who is wrong. Other than malware, most vendor software does not deliberately send bad data. The software, or rather the programmer who wrote the software, thought the program was sending correct data. Later when humans looked at the data, humans decided the data was wrong and fixed the software. What do we do between the time the software makes an error, and time the humans can interven? Have the software, with no human oversight, nuke everything? The Blue Screen Of Death may be a very "safe" for software to do when it encounters an error. However, it is not a very good thing for system availability. I agree error handling is "hard." Aborting the entire BGP session makes the Internet more brittle than necessary. In the hours/days between the software sending the data, and the humans fixing it, the network was hurting a lot more than you would expect from a single bad route. The constant cycle of abort, reset, route flap was an amazing multiplier effect of one bad route.
I agree that in this case it is possible to have ignored the bad AS PATH and drop the route without disturbing the session originating the bad information. This is one specific example could probably have been handled better with a non-fatal notification (with big red lights and buzzers). However, it was unacceptable for that router to propagate the bad information to others.
I agree, you must have both sides (conservative send, and liberal receive). Sending bad data is not acceptable. Cisco should not send bad data. Crashing/aborting when you receive bad data isn't acceptable either. Bad data happens, Vendor X should not abort if it had other options. Sometimes there is no alternative besides aborting. However, the RFC makes aborting a requirement. There are errors BGP implementations could recover (with blinking red lights and loud buzzers). The RFC should give the option of continuing to implementations. I was following the standard isn't a good reason to crash. If following the standard causes the Internet to flap like a hummingbird for a day, we need to get the standard changed (as well as fix the existing implementations). These are not mutually exclusive goals. 1) Modify the standard so an error does not have as much impact worldwide 2) Fix the current implementations Yes, a pedestrian may have the right of way in the crosswalk. But proving your point by having the semi-truck flatten you isn't very smart.