Re: Time to revise RFC 1771
On 6/26/2001 at 13:47:37 -0700, Clayton Fiske said:
On Tue, Jun 26, 2001 at 04:27:49PM -0400, Dave Israel wrote:
This ignores three basic facts:
1) Networks tend to be homogenous in platform. 2) Platforms tend to accept their own implementation quirks 3) Networks peer at borders
Therefore, under the "drop the session rule," my bad announcement gets to all my borders fine, and all my external peers who are not running forgiving/compatable implementations drop their connections to me and all my traffic to/from them hits the floor.
In this case, vendor C's implementation was neither forgiving nor compatible. It still dropped the peer(s) in question. It just had the much more harmful quirk that it forwarded the bad route on to its peers before doing so. In this case, a homogenous network would not only lose its border sessions, it would lose all internal ones through which the route was advertised.
I'm certainly not defending (or attacking) either vendor's implementation; in the current environment, I believe following the RFC is the correct course. I was more concerned with future implementations of BGP, and how (I feel) they should handle problems like this, since, as we add more and more features to BGP, how we handle what appears to be a bad route (or a bad NLRI) is going to become more important.
One CRC error does not make PPP drop. Why make one route cause a catastrophic loss of connectivity? Report the bad route, drop it, and move on; let layer 8 resolve it.
Because, arguably, we don't know that it's just one route. We just know that one route set off the alarm. Do you feel safe assuming that whatever bug caused one corrupted route left all the other routes alone?
No, but I feel secure that, if it corrupted a large enough number of routes, the effect will not be worse than dropping the session. Somebody mentioned what happens if there are 100,000 bad routes and 1 good one. You keep the good one and drop the 100,000 bad ones. Dropping routes is even easier than using them. Besides, which tends to be harder on a router: dropping bad routes, or tearing down and restarting a TCP session?
Plus, a CRC error can occur between two valid, compliant, bug-free implementations. A bad route, by definition, can't. We're not talking about external faults here, but broken implementations. When one side of a protocol session simply breaks the rules, I don't think it's reasonable to say that the other side needs to be "fixed" to accept that breakage. Fix the broken side.
A "bad route" can happen whenever one implementation differs from another. Both can be valid according to some definition of the standard. Determining who is wrong, and fixing it, takes time. If you're dropping a few of my routes during that time, that's unavoidable. If every customer of mine cannot reach every customer of yours while we fight over whose implementation is wrong and who needs to change what, then who wins? And how is this fight more legitimate than the one you have with your telco provider over how they built your circuit and where your errors are coming from?
The reason this has got everyone's attention is because of the unique way in which the breakage occurred. If all implementations were changed to drop the single bad route and keep the sessions intact, the damage would not have been what it was. If all implementations followed the current specs and dropped the session with the router which first originated the bad route, the damage would not have been what it was. To say that one way causes massive damage and the other doesn't is inaccurate. The damage was caused by the implementation in question doing something resembling one but with harmful behavior thrown in.
I think the issue has gone beyond what happened, and into what will happen. It's a simple design philosophy question: Do you build protocols that are robust and resilient under stress, or do you build protocols that refuse to interoperate until everything completely agrees? Ideally, I can see the beauty of the second, but realistically, I think you need to be permissive. -- Dave Israel Senior Manager, IP Backbone Intermedia Business Internet
participants (1)
-
Dave Israel