Multi-homed clients and BGP timers
Hi all, I've got numerous single-site 100Mb fibre clients who have backup SDSL links to my PoP. The two services terminate on separate distribution/access routers. The CPE that peers to my fibre router sets a community, and my end sets the pref to 150 based on it. The CPE also sets a higher pref for prefixes from the fibre router. The SDSL router to CPE leaves the default preference in place. Both of my PE gear sends default-originate to the CPE. There is (generally) no traffic that should ever be on the SDSL link while the fibre is up. Both of the PE routers then advertise the learnt client route up into the core: *>i208.70.107.128/28 172.16.104.22 0 150 0 64762 i * i 172.16.104.23 0 100 0 64762 i My problem is the noticeable delay for switchover when the fibre happens to go down (God forbid). I would like to know if BGP timer adjustment is the way to adjust this, or if there is a better/different way. It's fair to say that the fibre doesn't 'flap'. Based on operational experience, if there is a problem with the fibre network, it's down for the count. While I'm at it, I've got another couple of questions: - whatever technique you might recommend to reduce the convergence throughout the network, can the same principles be applied to iBGP as well? - if I need to down core2, what is the quickest and easiest way to ensure that all gear connected to the cores will *quickly* switch to preferring core1? Steve
From experience I found that you need to keep all the timers in sync with all your peers. Something like this for every peer in your bgp config.
neighbor xxx.xx.xx.x timers 30 60 Make sure that this is communicated to your peer as well so that their timer setting are reflected the same. Zaid ----- Original Message ----- From: "Steve Bertrand" <steve@ibctech.ca> To: "nanog list" <nanog@nanog.org> Sent: Friday, May 22, 2009 3:45:20 PM GMT -08:00 US/Canada Pacific Subject: Multi-homed clients and BGP timers Hi all, I've got numerous single-site 100Mb fibre clients who have backup SDSL links to my PoP. The two services terminate on separate distribution/access routers. The CPE that peers to my fibre router sets a community, and my end sets the pref to 150 based on it. The CPE also sets a higher pref for prefixes from the fibre router. The SDSL router to CPE leaves the default preference in place. Both of my PE gear sends default-originate to the CPE. There is (generally) no traffic that should ever be on the SDSL link while the fibre is up. Both of the PE routers then advertise the learnt client route up into the core: *>i208.70.107.128/28 172.16.104.22 0 150 0 64762 i * i 172.16.104.23 0 100 0 64762 i My problem is the noticeable delay for switchover when the fibre happens to go down (God forbid). I would like to know if BGP timer adjustment is the way to adjust this, or if there is a better/different way. It's fair to say that the fibre doesn't 'flap'. Based on operational experience, if there is a problem with the fibre network, it's down for the count. While I'm at it, I've got another couple of questions: - whatever technique you might recommend to reduce the convergence throughout the network, can the same principles be applied to iBGP as well? - if I need to down core2, what is the quickest and easiest way to ensure that all gear connected to the cores will *quickly* switch to preferring core1? Steve
Zaid Ali wrote:
From experience I found that you need to keep all the timers in sync with all your peers. Something like this for every peer in your bgp config.
neighbor xxx.xx.xx.x timers 30 60
Make sure that this is communicated to your peer as well so that their timer setting are reflected the same.
Thankfully at this point, we manage all CPE of any clients who peer with us, and so far, the clients advertise our own space back to us. I'll go back to looking at adequate timer settings for my environment. All it takes is a quick phone call to the client IT people to inform them that a change will be made, and when they prefer I do it (in the event something goes south). Also thankfully, I'm within a quick walk/drive to these sites, which I've found to be a comfort during the last year while I've walked the BGP learning curve (one of my clients in particular leaves me with quite a few resources (fibre connections, hardware) for me to *test* with between site-and-PoP ;) Cheers, and thanks! Steve
On May 22, 2009, at 5:15 PM, Steve Bertrand wrote:
neighbor xxx.xx.xx.x timers 30 60
Make sure that this is communicated to your peer as well so that their timer setting are reflected the same.
Thankfully at this point, we manage all CPE of any clients who peer with us, and so far, the clients advertise our own space back to us. I'll go back to looking at adequate timer settings for my environment.
All it takes is a quick phone call to the client IT people to inform them that a change will be made, and when they prefer I do it (in the event something goes south). Also thankfully, I'm within a quick walk/drive to these sites, which I've found to be a comfort during the last year while I've walked the BGP learning curve (one of my clients in particular leaves me with quite a few resources (fibre connections, hardware) for me to *test* with between site-and-PoP ;)
Of course, given that the lowest BGP holdtime is selected when the session is being established, you don't really need to change the CPE side, all you need to do is make the change on the network side and reset the session. And it's typically a good idea to set the keepalive interval to a higher frequency when employing lower holdtimes such that transient keepalive loss (or updates, which act as implicit keepalives) don't cause any unnecessary instability. Also, there are usually global values you can set for all BGP neighbors in most implementations, as well as the per-peer configuration illustrated above. The former requires less configuration bits if you're comfortable with setting the values globally. If you want to converge a little fast than BGP holdtimes here and the fiber link is directly between the routers, you might look at something akin to Cisco's "bgp fast-external-fallover", which immediately resets the session if the link layer is reset or lost.
While I'm at it, I've got another couple of questions:
- whatever technique you might recommend to reduce the convergence throughout the network, can the same principles be applied to iBGP as well?
Depending on your definition of convergence, yes. If you're referring to update advertisements as opposed to session or router failures, though, MRAI tweaks and/or less iBGP hierarchy might be the way to go. Then again, there are lots of side effects with these as well..
- if I need to down core2, what is the quickest and easiest way to ensure that all gear connected to the cores will *quickly* switch to preferring core1?
Use your IGP mechanisms akin to IS-IS overload bit or OSPF stub router (max metric) advertisement. -danny
If you want to converge a little fast than BGP holdtimes here and the fiber link is directly between the routers, you might look at something akin to Cisco's "bgp fast-external-fallover", which immediately resets the session if the link layer is reset or lost.
Also things to consider: BFD for BGP and UDLD will help identify link failures faster. (If all of your equipment supports it, YMMV, etc). Deepak
Danny McPherson wrote:
On May 22, 2009, at 5:15 PM, Steve Bertrand wrote:
neighbor xxx.xx.xx.x timers 30 60
Make sure that this is communicated to your peer as well so that their timer setting are reflected the same.
Thankfully at this point, we manage all CPE of any clients who peer with us, and so far, the clients advertise our own space back to us. I'll go back to looking at adequate timer settings for my environment.
Of course, given that the lowest BGP holdtime is selected when the session is being established, you don't really need to change the CPE side, all you need to do is make the change on the network side and reset the session. And it's typically a good idea to set the keepalive interval to a higher frequency when employing lower holdtimes such that transient keepalive loss (or updates, which act as implicit keepalives) don't cause any unnecessary instability.
Also, there are usually global values you can set for all BGP neighbors in most implementations, as well as the per-peer configuration illustrated above. The former requires less configuration bits if you're comfortable with setting the values globally.
I remember reading that the lowest value is implemented, but thanks for the reminder. In this case, since I *can* change it at the CPE, I may as well. That way, in the event that I move on (or get hit by a bus) and the next person moves the connection to a new router, the CPE will win. Also... the global setting is a great idea. Unfortunately, connected to this router that handles these fibre connections are a couple of local peers that I don't want to change the 'defaults' for. I can't remember if timers can be set at a peer-group level, so I'll look that up and go from there. That will be my best option given what is connected to this router.
If you want to converge a little fast than BGP holdtimes here and the fiber link is directly between the routers, you might look at something akin to Cisco's "bgp fast-external-fallover", which immediately resets the session if the link layer is reset or lost.
Well, unfortunately, the local PUC owns the fibre, and they have a switch aggregating all of their fibre in a star pattern. They then trunk the VLANs to me across two redundant pair. I'm in the process of persuading them to allow me to put my own gear in their location so I can manage it myself (no risk of port-monitor, no risk of their ops fscking up my clients etc). This way, they connect from their client-facing converter into whatever port in my switch I tell them. With that said, and as I said before, L3 and below rarely fails. I'll look into fast-external-fallover. It may be worth it here.
While I'm at it, I've got another couple of questions:
- whatever technique you might recommend to reduce the convergence throughout the network, can the same principles be applied to iBGP as well?
Depending on your definition of convergence, yes. If you're referring to update advertisements as opposed to session or router failures, though, MRAI tweaks and/or less iBGP hierarchy might be the way to go. Then again, there are lots of side effects with these as well..
I suppose I might not completely understand what I am asking. - pe1 has iBGP peering with p1 and p2, and pe1 has p2 as it's next hop in FIB for prefix X (both cores have prefix X in routing table through a different edge device) - p2 suddenly falls off the network Perhaps it's late enough on Friday night after a long day for me to not be thinking correctly, but I can't figure out exactly what the delay time would be for a client connected to pe1 to re-reach prefix X if p2 goes down hard.
- if I need to down core2, what is the quickest and easiest way to ensure that all gear connected to the cores will *quickly* switch to preferring core1?
Use your IGP mechanisms akin to IS-IS overload bit or OSPF stub router (max metric) advertisement.
I will certainly look into your suggestions. I have only a backbone area in OSPF carrying loopbacks and infrastructure, but don't quite understand the entire OSPF protocol yet. Thanks Danny, Steve
Steve Bertrand wrote:
Well, unfortunately, the local PUC owns the fibre, and they have a switch aggregating all of their fibre in a star pattern. They then trunk the VLANs to me across two redundant pair. I'm in the process of persuading them to allow me to put my own gear in their location so I can manage it myself (no risk of port-monitor, no risk of their ops fscking up my clients etc). This way, they connect from their client-facing converter into whatever port in my switch I tell them.
Correct me if I'm wrong, but wasn't this exactly the type of situation that BFD was designed to detect and help with? Jack
Jack Bates wrote:
Steve Bertrand wrote:
Well, unfortunately, the local PUC owns the fibre, and they have a switch aggregating all of their fibre in a star pattern. They then trunk the VLANs to me across two redundant pair. I'm in the process of persuading them to allow me to put my own gear in their location so I can manage it myself (no risk of port-monitor, no risk of their ops fscking up my clients etc). This way, they connect from their client-facing converter into whatever port in my switch I tell them.
Correct me if I'm wrong, but wasn't this exactly the type of situation that BFD was designed to detect and help with?
I don't know, but I'm printing it[1] anyway to take home and read. It's been mentioned a few times, and clearly worth learning about. Thanks, Steve [1] http://bgp.potaroo.net/ietf/all-ids/draft-ietf-bfd-v4v6-1hop-09.txt
For BFD to work, you need: * ISR + 12.4(15)T (or later) * 7200 with 12.4T or 12.2SRx * 7600/6500/GSR + 12.2SRB (or later) * ASR A complete list is at the bottom of this document: http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fs_bfd.html You'll find some more BFD details and usage guidelines here: http://www.nil.com/ipcorner/bfd/ Best regards Ivan http://www.ioshints.info/about http://blog.ioshints.info/
Correct me if I'm wrong, but wasn't this exactly the type of situation that BFD was designed to detect and help with?
I don't know, but I'm printing it[1] anyway to take home and read. It's been mentioned a few times, and clearly worth learning about.
Thanks,
Steve
[1] http://bgp.potaroo.net/ietf/all-ids/draft-ietf-bfd-v4v6-1hop-09.txt
I would agree, BFD is the ideal way to go. I've wanted our upstream provider to use BFD on our OSPF and iBGP links, but they said they're still testing it internally. They're quite gun-shy on implementing it because the existing configuration is stable -- they don't want a new protocol creating unnecessary failovers. I'm just looking to cut failovers from the existing 12 to 45 seconds (depending on the direction) to a second or two. Frank -----Original Message----- From: Ivan Pepelnjak [mailto:ip@ioshints.info] Sent: Saturday, May 23, 2009 7:40 AM To: 'Steve Bertrand'; 'Jack Bates' Cc: 'NANOG list' Subject: RE: Multi-homed clients and BGP timers For BFD to work, you need: * ISR + 12.4(15)T (or later) * 7200 with 12.4T or 12.2SRx * 7600/6500/GSR + 12.2SRB (or later) * ASR A complete list is at the bottom of this document: http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fs_bfd.html You'll find some more BFD details and usage guidelines here: http://www.nil.com/ipcorner/bfd/ Best regards Ivan http://www.ioshints.info/about http://blog.ioshints.info/
Correct me if I'm wrong, but wasn't this exactly the type of situation that BFD was designed to detect and help with?
I don't know, but I'm printing it[1] anyway to take home and read. It's been mentioned a few times, and clearly worth learning about.
Thanks,
Steve
[1] http://bgp.potaroo.net/ietf/all-ids/draft-ietf-bfd-v4v6-1hop-09.txt
If you want to converge a little fast than BGP holdtimes here and the fiber link is directly between the routers, you might look at something akin to Cisco's "bgp fast-external-fallover", which immediately resets the session if the link layer is reset or lost.
For fast external fallover, your physical interface has to go down. Inside your network you could use BGP fast fallover (which drops BGP session after the IGP route to the neighbor is lost), details are here: http://www.nil.com/ipcorner/DesigningBGPNetworks/ Fast fallover with EBGP multihop is described here: http://wiki.nil.com/EBGP_load_balancing_with_EBGP_session_between_loopback_i nterfaces Ivan http://www.ioshints.info/about http://blog.ioshints.info/
On 23 mei 2009, at 0:58, Zaid Ali wrote:
From experience I found that you need to keep all the timers in sync with all your peers. Something like this for every peer in your bgp config.
neighbor xxx.xx.xx.x timers 30 60
30 60 isn't a good choice because that means that after 30.1 seconds a keepalive comes in and then after 60.0 seconds the session will expire while the second one would be there in 60.1 seconds. The other side will typically use hold timer / 3 for their keepalive interval. If you set it to something not divisible by 3 then you get all 3 of those within the hold timer. I often recommended 5 16 in the past but that's a bit on the short side, some less robust BGP implementations work single threaded and may not be able to send keepalives every 15 seconds when they're very busy. The minimum possible hold time is 3. If you only change the setting at your end you can change it to something higher when bad stuff happens, if the other end also sets it then you'll have to change it at both ends as the hold time is negotiated and the lowest is used. If you really want fast failover terminate the fiber in the BGP router and make sure fast-external-failover is on (I think it's the default). For manual failover, simply shut down the BGP sessions on the router that you don't want to handle traffic at that time. If you have peergroups you can do "neighbor peergroup shutdown" for the fastest results. Shutting down interfaces is not such a good idea, then the routing protocols have to time out.
Make sure that this is communicated to your peer as well so that their timer setting are reflected the same.
Zaid ----- Original Message ----- From: "Steve Bertrand" <steve@ibctech.ca> To: "nanog list" <nanog@nanog.org> Sent: Friday, May 22, 2009 3:45:20 PM GMT -08:00 US/Canada Pacific Subject: Multi-homed clients and BGP timers
Hi all,
I've got numerous single-site 100Mb fibre clients who have backup SDSL links to my PoP. The two services terminate on separate distribution/access routers.
The CPE that peers to my fibre router sets a community, and my end sets the pref to 150 based on it. The CPE also sets a higher pref for prefixes from the fibre router. The SDSL router to CPE leaves the default preference in place. Both of my PE gear sends default- originate to the CPE. There is (generally) no traffic that should ever be on the SDSL link while the fibre is up.
Both of the PE routers then advertise the learnt client route up into the core:
*>i208.70.107.128/28 172.16.104.22 0 150 0 64762 i * i 172.16.104.23 0 100 0 64762 i
My problem is the noticeable delay for switchover when the fibre happens to go down (God forbid).
I would like to know if BGP timer adjustment is the way to adjust this, or if there is a better/different way. It's fair to say that the fibre doesn't 'flap'. Based on operational experience, if there is a problem with the fibre network, it's down for the count.
While I'm at it, I've got another couple of questions:
- whatever technique you might recommend to reduce the convergence throughout the network, can the same principles be applied to iBGP as well?
- if I need to down core2, what is the quickest and easiest way to ensure that all gear connected to the cores will *quickly* switch to preferring core1?
Steve
* Iljitsch van Beijnum:
30 60 isn't a good choice because that means that after 30.1 seconds a keepalive comes in and then after 60.0 seconds the session will expire while the second one would be there in 60.1 seconds.
Wouldn't the underlying TCP retry sooner than that?
On May 25, 2009, at 11:33 AM, Florian Weimer wrote:
* Iljitsch van Beijnum:
30 60 isn't a good choice because that means that after 30.1 seconds a keepalive comes in and then after 60.0 seconds the session will expire while the second one would be there in 60.1 seconds.
Wouldn't the underlying TCP retry sooner than that?
I suspect that given update messages serve as implicit keepalives, it's _extremely rare that an actual keepalive message is needed in global routing environments. -danny
* Danny McPherson:
On May 25, 2009, at 11:33 AM, Florian Weimer wrote:
* Iljitsch van Beijnum:
30 60 isn't a good choice because that means that after 30.1 seconds a keepalive comes in and then after 60.0 seconds the session will expire while the second one would be there in 60.1 seconds.
Wouldn't the underlying TCP retry sooner than that?
I suspect that given update messages serve as implicit keepalives, it's _extremely rare that an actual keepalive message is needed in global routing environments.
See the subject of this thread. 8-) I don't think we're talking about full tables here, so you actually have to rely on keepalives (plus TCP retransmits).
We have customers in the same way you do. We only use Cisco (both pop routers and managed cpe) and use neighbor xxx.xxx.xxx.xxx timers 5 15 on the pop routers with great success. We haven't found any drawback so far. // OK On Sat, May 23, 2009 at 12:45 AM, Steve Bertrand <steve@ibctech.ca> wrote:
Hi all,
I've got numerous single-site 100Mb fibre clients who have backup SDSL links to my PoP. The two services terminate on separate distribution/access routers.
The CPE that peers to my fibre router sets a community, and my end sets the pref to 150 based on it. The CPE also sets a higher pref for prefixes from the fibre router. The SDSL router to CPE leaves the default preference in place. Both of my PE gear sends default-originate to the CPE. There is (generally) no traffic that should ever be on the SDSL link while the fibre is up.
Both of the PE routers then advertise the learnt client route up into the core:
*>i208.70.107.128/28 172.16.104.22 0 150 0 64762 i * i 172.16.104.23 0 100 0 64762 i
My problem is the noticeable delay for switchover when the fibre happens to go down (God forbid).
I would like to know if BGP timer adjustment is the way to adjust this, or if there is a better/different way. It's fair to say that the fibre doesn't 'flap'. Based on operational experience, if there is a problem with the fibre network, it's down for the count.
While I'm at it, I've got another couple of questions:
- whatever technique you might recommend to reduce the convergence throughout the network, can the same principles be applied to iBGP as well?
- if I need to down core2, what is the quickest and easiest way to ensure that all gear connected to the cores will *quickly* switch to preferring core1?
Steve
What's the BCP for BGP timers at exchange points? I imagine if everyone did something low like 5-15 rather than the default 60-180, CPU usage increase could be significant given a high number peers. Keeping in mind that "bgp fast-external-failover" is of no use at an exchange since the fabric is likely to stay up when a peer has gone down, and BFD would need to be negotiated peer-by-peer, is there a recommendation other than the default 60-180? Would going below 60-180 without first discussing it with your peers, tend to piss them off? Chris
Hi Chris, .-- My secret spy satellite informs me that at Mon, 25 May 2009, Chris Caputo wrote:
Would going below 60-180 without first discussing it with your peers, tend to piss them off?
60-180 is fairly conservative. 60-180 is the Cisco default I believe, however Junipers defaults are 30-90. I never pissed anyone off with that ;) Cheers, Andree
For those in multivendor environments, it's worth also being aware that since 7.6R1 JunOS sets the minimum BGP hold timer to 20 seconds. If I were creating a standard timer config to deploy consistently on customer peers (and needed something on the fast side in timer terms) I would need to take that into account. (And yes, there is of course a way to override the 20s hold timer, but it's not a supported config last time I checked) j. ________________________________________ From: Andree Toonk [andree+nanog@toonk.nl] Sent: Monday, May 25, 2009 2:33 PM To: Chris Caputo Cc: nanog@nanog.org Subject: Re: IXP BGP timers (was: Multi-homed clients and BGP timers) Hi Chris, .-- My secret spy satellite informs me that at Mon, 25 May 2009, Chris Caputo wrote:
Would going below 60-180 without first discussing it with your peers, tend to piss them off?
60-180 is fairly conservative. 60-180 is the Cisco default I believe, however Junipers defaults are 30-90. I never pissed anyone off with that ;) Cheers, Andree
Steve Bertrand wrote:
My problem is the noticeable delay for switchover when the fibre happens to go down (God forbid).
I would like to know if BGP timer adjustment is the way to adjust this, or if there is a better/different way. It's fair to say that the fibre doesn't 'flap'. Based on operational experience, if there is a problem with the fibre network, it's down for the count.
Thanks to all for the great feedback. In summary, I've learnt: - Even though BFD would be a fantastic solution and would require only minimal changes (to my strict uRPF setup), it's a non-starter, as I don't fit all of the requirements that Ivan pointed out - fast-external-fallover is already enabled by default, but in order for this to be effective, the interface has to physically go into down state. In my case, although not impossible, it is extremely unlikely - adjusting BGP timers is the best option given it's really the only one left. Although I generally try to keep consistency among all equipment (if I set the timers at one end, I would set them the same at the other). Iljitsch recommended to leave the CPE end alone, so if something bad happens, access to the CPE would not be necessary to revert the change - I'm going to set the timers to 5/16. I like the idea of the extra second on top of being divisible by three. That will ensure that at least three keepalives have a chance to make it before the session hold timer is reached Cheers! Steve
participants (14)
-
Andree Toonk
-
Chris Caputo
-
Danny McPherson
-
Deepak Jain
-
Florian Weimer
-
Frank Bulk
-
Iljitsch van Beijnum
-
Ivan Pepelnjak
-
Ivan Pepelnjak
-
Jack Bates
-
John.Herbert@ins.com
-
Olof Kasselstrand
-
Steve Bertrand
-
Zaid Ali