Re: That pesky AS path corruption bug...
At Tuesday 01:26 PM 5/23/00 , Vijay Gill wrote:
This is a hack. We do not need more cruft added on, rather, what we need is correct behavior. The correct behavior being - if you see a corrupt/ a malformed update from a peer, send a notify and drop the session. Seems fairly simple to me.
The above suggestion of your fails in case of route servers.
Insist on correct behavior, not on cruftery.
...reading the host requirements RFC and its definition of the robustness principle: Why was the behavior above chosen over the more conceivable and robust "ignore (log) corrupted message, continue with regular operation" ? Given route flap dampening, dropping the BGP session is hardly the desirable outcome here. On that note: under what circumstances should or shouldn't the BGP session come back up without mnual intervention? bye,Kai
Kai Schlichting wrote:
At Tuesday 01:26 PM 5/23/00 , Vijay Gill wrote:
This is a hack. We do not need more cruft added on, rather, what we need is correct behavior. The correct behavior being - if you see a corrupt/ a malformed update from a peer, send a notify and drop the session. Seems fairly simple to me.
The above suggestion of your fails in case of route servers.
Insist on correct behavior, not on cruftery.
...reading the host requirements RFC and its definition of the robustness principle: Why was the behavior above chosen over the more conceivable and robust "ignore (log) corrupted message, continue with regular operation" ? Given route flap dampening, dropping the BGP session is hardly the desirable outcome here. On that note: under what circumstances should or shouldn't the BGP session come back up without mnual intervention?
Well, let's see... the corrupted message was delivered over a TCP session. That means the data sent is what the router at the other end sent. Little likelihood of in-transit damage. So, we've got a router at the remote end which is generating mangled messages. Now, do you trust that the mangled message was the result of a single-event failure that won't recur, or did that remote router suffer some sort of serious brain cramp (software or hardware failure) which will result in additional bad messages? At some point it makes sense to cut ones losses, declare the remote device braindead, and route around it. If a session goes down because of a BGP session problme (bad message), it is worthwhile to either not bring the circuit back automatically, or if automatic, implement a backoff mechanism as a form of local route flap damping. Indeed, based on Pete's posting, this is exactly what is supposed to happen. If a session goes down because someone unplugged a cable and reconnected it, there should not be a need for manual intervention. Similarly, if you have a T1 get hit by lightning and the surge suppressors work right, you may well see the line go down, then come back up. -- ----------------------------------------------------------------- Daniel Senie dts@senie.com Amaranth Networks Inc. http://www.amaranth.com
On Tue, 23 May 2000, Kai Schlichting wrote:
Insist on correct behavior, not on cruftery.
more conceivable and robust "ignore (log) corrupted message, continue with regular operation" ? Given route flap dampening, dropping the BGP
Byzantine failures perhaps? If I am getting some corrupt updates, how confident am I that the rest of the updates are valid? In any case, it would appear that the process on the other side has some issues since theoretically, a malformed update should never (for most values of never) arrive.
session is hardly the desirable outcome here. On that note: under what circumstances should or shouldn't the BGP session come back up without mnual intervention?
A combination of bgp dampening and exponential backoff timing for retry on sessions that were dropped because of an error should take care of the rest. /vijay
participants (3)
-
Daniel Senie
-
Kai Schlichting
-
Vijay Gill