RE: BGP keepalive/holdtime at GigE exchange

12 Jan 2001

      I think the argument is one of stability. BGP is supposed to be stable for
days/weeks on end normally. Making your internal network too sensitive to
external changes destabilizes your network and those who connect to you.

If a BGP session with one peer resets once every three days, and you peer
with them at a few places, at most you are talking about a service
degradation for about 5-10 minutes as say 1/3 of your packets are resent
or dropped (assuming you peer in three places, etc). 180 seconds is
nothing for a router with many peering sessions and a reasonable traffic
load. 

Its not exciting, but the other peer's customers are just as screwed. If
the whole fabric went goes down, a good dampening policy at your
internal-> BR routers will keep the instability from influencing your
core. 

The bigger concern is IF a peer is dropping a session that often, *what*
is wrong with their router? I am very afraid of routers that *randomly*
timeout and re-peer with no good reason.

Most networks insert new routes at internal/CR/other routers that are
automatically distributed to their borders, this way internal route
changes do not require resetting of external peers to take effect. 

So, maybe I am misunderstanding your concern, why micromanage BGP timers
on your routers when a reasonably sized network may have more than 1000
external peering sessions; and each router on both sides has different
loading characteristics that are not stable? 

inbound prefix limits are my personal interest in a lot of these per
neighbor configs and even, then a big customer signed on or leaving a peer
causes the prefix limits to get hit or be meaningless; I only recommend
them for use with peers that have fat finger engineers working at 4am. :)

Deepak Jain
AiNET

On Fri, 12 Jan 2001, Lane Patterson wrote:
...
Hmm, I know there are a lot of overburdened BR's out there, but
since this is set on a per-neighbor basis, there should at least
be room for some selective optimization.  It seems a bit crazy
to think that each time there's a BR maintenance/reboot at an IXP,
peers will continue to send to the bit bucket in the sky for 180+
seconds.
...
-----Original Message-----
From: Deepak Jain [mailto:deepak@ai.net]
Sent: Friday, January 12, 2001 11:48 AM
To: Lane Patterson
Cc: 'nanog@merit.edu'
Subject: RE: BGP keepalive/holdtime at GigE exchange
The problem I have seen with setting BGP timeouts that low is 
when peering
with overloaded or slow/old routers. Often they will "pause" their BGP
activity while they are actively peering or repeering across their
internal or external network. The low times will then cause 
more timeouts
before the fabric has stablized.
Deepak Jain
AiNET
On Fri, 12 Jan 2001, Lane Patterson wrote:
...
Hmm, many folks didn't seem to understand the context here.
fast-external-fallover doesn't apply if a peer BR across a GigE
exchange dies...you've still got link on your Gig port, so there
is no link level indication of failure.
tweaking tcp timers is not the right approach...BGP explicitly
has a keepalive for this exact purpose, when peering dies but
your interface stays up.
the best non-radical suggestion so far is to simply tweak your
keepalive to 10 and holdtime to 30 seconds, to bring this in line
with the granularity of direct-connected peer interface or
IGP metrics.
...
Do people do this?  Do people have problems doing this?
Do any folks do less than this on their eBGP peers, and at
what tradeoff expense.
This is the old issue of finding the right operationally sane
timeouts, not too high, not too low.  The defaults clearly
seem too high, yet I haven't seen many cases where folks set 
these down :-)
Cheers,
-Lane
...
-----Original Message-----
From: Lane Patterson [mailto:lpatterson@equinix.com]
Sent: Thursday, January 11, 2001 10:08 PM
To: 'nanog@merit.edu'
Subject: FW: BGP keepalive/holdtime at GigE exchange
I am looking for operational BCP feedback on common practice 
for tweaking
down BGP holdtime/keepalive across GigE exchange points,
...
...
could go down on the other side of the GigE switch without a 
corresponding adjacency change seen on your BR.  The thought is
to make down peers known as fast thru a GigE exchange as
since a peer
they would
...
...
be over a POS private peer interface.
The current defaults are pretty gross, and much worse than the
ISIS hello and interface keepalive defaults of 10 seconds.
IOS12.x: neighbor [ip-address | peer-group-name] timers 
keepalive holdtime
  holdtime: default 180 seconds	
  keepalive: default 60 seconds
http://cco.cisco.com/univercd/cc/td/doc/product/software/ios12
1/121cgcr/ip_r
/iprprt2/1rdbgp.htm#xtocid8553
JunOS 4.2: 
  holdtime: default 90 seconds
  keepalive: default one third of holdtime
https://www.juniper.net/techpubs/software/junos42/swconfig-rou
ting42/html/bg
p-summary13.html#1015669
Cheers,
-Lane
Lane Patterson <lane@equinix.com>
Equinix, Inc.