On Fri, Jan 12, 2001 at 03:23:51PM -0500, Deepak Jain wrote:
I think the argument is one of stability. BGP is supposed to be stable for days/weeks on end normally. Making your internal network too sensitive to external changes destabilizes your network and those who connect to you.
If a BGP session with one peer resets once every three days, and you peer with them at a few places, at most you are talking about a service degradation for about 5-10 minutes as say 1/3 of your packets are resent or dropped (assuming you peer in three places, etc). 180 seconds is nothing for a router with many peering sessions and a reasonable traffic load.
With regard to your earlier comments about busy routers "pausing" BGP, perhaps this is something that can be investigated at a vendor software level. I would think keepalives (of any variety) should rank fairly high on the food chain in terms of CPU precedence. If this isn't the case already, why not? I don't know how true it is anymore, but I recall a few years back having to deal with some routers which got bogged down with OSPF updates to the point that they kept resetting perfectly stable links (or the other end did) due to keepalives not being processed in a timely manner. In the interest of stability, I would certainly want keepalives to be processed ahead of routing updates. After all, it's not as though they even represent a significant percentage of the total workload on the CPU, even when you reach a reasonably high number of links. And if your links keep resetting due to route churn, you've got a self-perpetuating problem.
The bigger concern is IF a peer is dropping a session that often, *what* is wrong with their router? I am very afraid of routers that *randomly* timeout and re-peer with no good reason.
In this case, I would expect a NOC with proper monitoring of peering sessions to take notice and initiate an investigation into the problem. -c