On 23 mei 2009, at 0:58, Zaid Ali wrote:
From experience I found that you need to keep all the timers in sync with all your peers. Something like this for every peer in your bgp config.
neighbor xxx.xx.xx.x timers 30 60
30 60 isn't a good choice because that means that after 30.1 seconds a keepalive comes in and then after 60.0 seconds the session will expire while the second one would be there in 60.1 seconds. The other side will typically use hold timer / 3 for their keepalive interval. If you set it to something not divisible by 3 then you get all 3 of those within the hold timer. I often recommended 5 16 in the past but that's a bit on the short side, some less robust BGP implementations work single threaded and may not be able to send keepalives every 15 seconds when they're very busy. The minimum possible hold time is 3. If you only change the setting at your end you can change it to something higher when bad stuff happens, if the other end also sets it then you'll have to change it at both ends as the hold time is negotiated and the lowest is used. If you really want fast failover terminate the fiber in the BGP router and make sure fast-external-failover is on (I think it's the default). For manual failover, simply shut down the BGP sessions on the router that you don't want to handle traffic at that time. If you have peergroups you can do "neighbor peergroup shutdown" for the fastest results. Shutting down interfaces is not such a good idea, then the routing protocols have to time out.
Make sure that this is communicated to your peer as well so that their timer setting are reflected the same.
Zaid ----- Original Message ----- From: "Steve Bertrand" <steve@ibctech.ca> To: "nanog list" <nanog@nanog.org> Sent: Friday, May 22, 2009 3:45:20 PM GMT -08:00 US/Canada Pacific Subject: Multi-homed clients and BGP timers
Hi all,
I've got numerous single-site 100Mb fibre clients who have backup SDSL links to my PoP. The two services terminate on separate distribution/access routers.
The CPE that peers to my fibre router sets a community, and my end sets the pref to 150 based on it. The CPE also sets a higher pref for prefixes from the fibre router. The SDSL router to CPE leaves the default preference in place. Both of my PE gear sends default- originate to the CPE. There is (generally) no traffic that should ever be on the SDSL link while the fibre is up.
Both of the PE routers then advertise the learnt client route up into the core:
*>i208.70.107.128/28 172.16.104.22 0 150 0 64762 i * i 172.16.104.23 0 100 0 64762 i
My problem is the noticeable delay for switchover when the fibre happens to go down (God forbid).
I would like to know if BGP timer adjustment is the way to adjust this, or if there is a better/different way. It's fair to say that the fibre doesn't 'flap'. Based on operational experience, if there is a problem with the fibre network, it's down for the count.
While I'm at it, I've got another couple of questions:
- whatever technique you might recommend to reduce the convergence throughout the network, can the same principles be applied to iBGP as well?
- if I need to down core2, what is the quickest and easiest way to ensure that all gear connected to the cores will *quickly* switch to preferring core1?
Steve