RE: Global BGP - 2001-06-23 - Vendor X's statement...

27 Jun 2001

      On Tue, 26 June 2001, "Richard A. Steenbergen" wrote:
...
On 26 Jun 2001, Sean Donelan wrote:
...
There will always be cases where Vender A thinks they are correct and
Vendor B thinks they are correct, and they differ.  And you are
correct, either the sender has done something wrong or the receiver
has done something wrong, hence the Internet motto.
But there there should be no room for debate, one side is right and the
other side is wrong. If there is really a grey area, the solution is to
fix the wording of the standards document, not to try and overlook the
problem.
I'm not proposing we overlook the problem.  However, software is very
bad at deciding who is right and who is wrong.  Other than malware, most
vendor software does not deliberately send bad data.  The software, or
rather the programmer who wrote the software, thought the program was
sending correct data.  Later when humans looked at the data, humans
decided the data was wrong and fixed the software.

What do we do between the time the software makes an error, and time the
humans can interven?

Have the software, with no human oversight, nuke everything?  The Blue
Screen Of Death may be a very "safe" for software to do when it encounters
an error.  However, it is not a very good thing for system availability.

I agree error handling is "hard."

Aborting the entire BGP session makes the Internet more brittle than
necessary. In the hours/days between the software sending the data, and
the humans fixing it, the network was hurting a lot more than you would
expect from a single bad route.  The constant cycle of abort, reset, route
flap was an amazing multiplier effect of one bad route.
...
I agree that in this case it is possible to have ignored the bad AS PATH
and drop the route without disturbing the session originating the bad
information. This is one specific example could probably have been handled
better with a non-fatal notification (with big red lights and buzzers).
However, it was unacceptable for that router to propagate the bad
information to others.
I agree, you must have both sides (conservative send, and liberal receive).

Sending bad data is not acceptable.  Cisco should not send bad data.

Crashing/aborting when you receive bad data isn't acceptable either.  Bad
data happens, Vendor X should not abort if it had other options.

Sometimes there is no alternative besides aborting.  However, the RFC makes
aborting a requirement.  There are errors BGP implementations could recover
(with blinking red lights and loud buzzers).  The RFC should give the
option of continuing to implementations.

I was following the standard isn't a good reason to crash.  If following
the standard causes the Internet to flap like a hummingbird for a day, 
we need to get the standard changed (as well as fix the existing
implementations).

These are not mutually exclusive goals.

   1) Modify the standard so an error does not have as much impact worldwide
   2) Fix the current implementations

Yes, a pedestrian may have the right of way in the crosswalk.  But proving
your point by having the semi-truck flatten you isn't very smart.