bfd-like mechanism for LANPHY connections between providers

Tassos Chatzithomaoglou

16 Mar 2011 16 Mar '11

4:56 p.m.

Are there any transit providers out there that accept using the BFD (or any other similar) mechanism for eBGP peerings? If no, how do you solve the issue with the physical interface state when LANPHY connections are used? Anyone messing with the BGP timers? If yes, what about multiple LAN connections with a single BGP peering? -- Tassos

Show replies by date

Richard A Steenbergen

16 Mar 16 Mar

5:03 p.m.

On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote:

...

Are there any transit providers out there that accept using the BFD (or any other similar) mechanism for eBGP peerings? If no, how do you solve the issue with the physical interface state when LANPHY connections are used? Anyone messing with the BGP timers? If yes, what about multiple LAN connections with a single BGP peering?

Well first off LAN PHY has a perfectly useful link state. That's pretty much the ONLY thing it has in the way of native OAM, but it does have that, and that's normally good enough to bring down your EBGP session quickly. Personally I find the risk of false positives when speaking to other people's random bad BGP implementations to be too great if you go much below 30 sec hold timers (and sadly, even 30 secs is too low for some people). We (nLayer) are still waiting for our first customer to request BFD, we'd be happy to offer it (with reasonable timer values of course). :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Tassos Chatzithomaoglou

6:25 p.m.

Richard A Steenbergen wrote on 16/03/2011 19:03:

...

On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote:

...
Are there any transit providers out there that accept using the BFD (or any other similar) mechanism for eBGP peerings? If no, how do you solve the issue with the physical interface state when LANPHY connections are used? Anyone messing with the BGP timers? If yes, what about multiple LAN connections with a single BGP peering?

Well first off LAN PHY has a perfectly useful link state. That's pretty much the ONLY thing it has in the way of native OAM, but it does have that, and that's normally good enough to bring down your EBGP session quickly. Personally I find the risk of false positives when speaking to other people's random bad BGP implementations to be too great if you go much below 30 sec hold timers (and sadly, even 30 secs is too low for some people). We (nLayer) are still waiting for our first customer to request BFD, we'd be happy to offer it (with reasonable timer values of course). :)

Link state is good for the local connection. If there are multiple intermediate optical points (not managed by either party), or a lan switch (IX environment), you won't get any link notification for everything not connected locally to your interface, unless there is a mechanism to signal that to you. -- Tassos

Jensen Tyler

6:33 p.m.

We are going to turn up BFD with Level3 this Saturday. They require that you run a Juniper(per SE). Its sounds like it is fairly new as there was no paperwork to request the service, had to put it in the notes. We have many switches between us and Level3 so we don't get a "interface down" to drop the session in the event of a failure. -----Original Message----- From: Tassos Chatzithomaoglou [mailto:achatz@forthnet.gr] Sent: Wednesday, March 16, 2011 1:26 PM To: nanog@nanog.org Subject: Re: bfd-like mechanism for LANPHY connections between providers Richard A Steenbergen wrote on 16/03/2011 19:03:

...

On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote:

...
Are there any transit providers out there that accept using the BFD (or any other similar) mechanism for eBGP peerings? If no, how do you solve the issue with the physical interface state when LANPHY connections are used? Anyone messing with the BGP timers? If yes, what about multiple LAN connections with a single BGP peering?

Well first off LAN PHY has a perfectly useful link state. That's pretty much the ONLY thing it has in the way of native OAM, but it does have that, and that's normally good enough to bring down your EBGP session quickly. Personally I find the risk of false positives when speaking to other people's random bad BGP implementations to be too great if you go much below 30 sec hold timers (and sadly, even 30 secs is too low for some people). We (nLayer) are still waiting for our first customer to request BFD, we'd be happy to offer it (with reasonable timer values of course). :)

Jeff Wheeler

6:55 p.m.

On Wed, Mar 16, 2011 at 2:33 PM, Jensen Tyler <JTyler@fiberutilities.com> wrote:

...

We have many switches between us and Level3 so we don't get a "interface down" to drop the session in the event of a failure.

This is often my topology as well. I am satisfied with BGP's mechanism and default timers, and have been for many years. The reason for this is quite simple: failures are relatively rare, my convergence time to a good state is largely bounded by CPU, and I do not consider a slightly improved convergence time to be worth an a-typical configuration. Case in point, Richard says that none of his customers have requested such configuration to date; and you indicate that Level3 will provision BFD only if you use a certain vendor and this is handled outside of their normal provisioning process. For an IXP LAN interface and associated BGP neighbors, I see much more advantage. I imagine this will become common practice for IXP peering sessions long before it is typical to use BFD on customer/transit-provider BGP sessions. -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts

Richard A Steenbergen

7:28 p.m.

On Wed, Mar 16, 2011 at 02:55:14PM -0400, Jeff Wheeler wrote:

...

This is often my topology as well. I am satisfied with BGP's mechanism and default timers, and have been for many years. The reason for this is quite simple: failures are relatively rare, my convergence time to a good state is largely bounded by CPU, and I do not consider a slightly improved convergence time to be worth an a-typical configuration. Case in point, Richard says that none of his customers have requested such configuration to date; and you indicate that Level3 will provision BFD only if you use a certain vendor and this is handled outside of their normal provisioning process.

There are still a LOT of platforms where BFD doesn't work reliably (without false positives), doesn't work as advertised, doesn't work under every configuration (e.g. on SVIs), or doesn't scale very well (i.e. it would fall over if you had more than a few neighbors configured). The list of caveats is huge, the list of vendors which support it well is small, and there should be giant YMMV stickers everywhere. But Juniper (M/T/MX series at any rate) is definitely one of the better options (though not without its flaws, inability to configure on the group level and selectively disable per-peer, and lack of support on the group level where any IPv6 neighbor is configured, come to mind). Running BFD with a transit provider is USUALLY the least interesting use case, since you're typically connected either directly, or via a metro transport service which is capable of passing link state. One possible exception to this is when you need to bundle multiple links together, but link-agg isn't a good solution, and you need to limit the number of EBGP paths to reduce load on the routers. The typical solution for this is loopback peering, but this kills your link state detection mechanism for killing BGP during a failure, which is where BFD starts to make sense. For IX's, where you have an active L2 switch in the middle and no link state, BFD makes the most sense. Unfortunately it's the area where we've seen the least traction among peers, with "zomg why are you sending me these udp packets" complaints outnumbering people interesting in configuring BFD 10:1. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Jensen Tyler

8:42 p.m.

Correct me if I am wrong but to detect a failure by default BGP would wait the "hold-timer" then declare a peer dead and converge. So you would be looking at 90 seconds(juniper default?) + CPU bound convergence time to recover? Am I thinking about this right? -----Original Message----- From: Jeff Wheeler [mailto:jsw@inconcepts.biz] Sent: Wednesday, March 16, 2011 1:55 PM To: nanog@nanog.org Subject: Re: bfd-like mechanism for LANPHY connections between providers On Wed, Mar 16, 2011 at 2:33 PM, Jensen Tyler <JTyler@fiberutilities.com> wrote:

...

We have many switches between us and Level3 so we don't get a "interface down" to drop the session in the event of a failure.

Jeff Wheeler

10:59 p.m.

On Wed, Mar 16, 2011 at 4:42 PM, Jensen Tyler <JTyler@fiberutilities.com> wrote:

...

Correct me if I am wrong but to detect a failure by default BGP would wait the "hold-timer" then declare a peer dead and converge.

So you would be looking at 90 seconds(juniper default?) + CPU bound convergence time to recover? Am I thinking about this right?

This is correct. Note that 90 seconds isn't just a "Juniper default." This suggested value appeared in RFC 1267 §5.4 (BGP-3) all the way back in 1991. In my view, configuring BFD for eBGP sessions is risking increased MTBF for rare reductions in MTTR. This is a risk / reward decision that IMO is still leaning towards "lots of risk" for "little reward." I'll change my mind about this when BFD works on most boxes and is part of the standard provisioning procedure for more networks. It has already been pointed out that this is not true today. If your eBGP sessions are failing so frequently that you are very concerned about this 90 seconds, I suggest you won't reduce your operational headaches or customer grief by configuring BFD. This is probably an indication that you need to: 1) straighten out the problems with your switching network or transport vendor 2) get better transit 3) depeer some peers who can't maintain a stable connection to you; or 4) sacrifice something to the backhoe deity Again, in the case of an IXP interface, I believe BFD has much more potential benefit. -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts

Loopback

17 Mar 17 Mar

1:45 p.m.

New subject: Simple Low Cost WAN Link Simulator Recommendations

Need the ability to test Network Management and Provisioning applications over a variety of WAN link speeds from T1 equivalent up to 1GB speeds. Seems to be quite a few offerings but I am looking for recommendations from actual users. Thanks in advance.

Sergey Voropaev

1:54 p.m.

New subject: Simple Low Cost WAN Link Simulator Recommendations

I've used WANem (http://wanem.sourceforge.net/) for a last 2 years. Simple WEB-interface, wide range of setting - it is enough fro network engineers. On 17 March 2011 16:45, Loopback <loopback@digi-muse.com> wrote:

...

Need the ability to test Network Management and Provisioning applications over a variety of WAN link speeds from T1 equivalent up to 1GB speeds. Seems to be quite a few offerings but I am looking for recommendations from actual users. Thanks in advance.

Loopback

1:57 p.m.

New subject: Simple Low Cost WAN Link Simulator Recommendations

Mike Callagy

18 Mar 18 Mar

4:34 p.m.

New subject: Simple Low Cost WAN Link Simulator Recommendations

Network Nightmare http://gigenn.net/ I used this device in the past to test an HP RGS deployment. You can simulate different connection rates and induce latency. Documentation is weak but it does the job. On Thu, Mar 17, 2011 at 6:57 AM, Loopback <loopback@digi-muse.com> wrote:

...

Need the ability to test Network Management and Provisioning applications over a variety of WAN link speeds from T1 equivalent up to 1GB speeds. Seems to be quite a few offerings but I am looking for recommendations from actual users. Thanks in advance.

Matthew Petach

20 Mar 20 Mar

10:20 p.m.

New subject: Simple Low Cost WAN Link Simulator Recommendations

On Thu, Mar 17, 2011 at 6:57 AM, Loopback <loopback@digi-muse.com> wrote:

...

Need the ability to test Network Management and Provisioning applications over a variety of WAN link speeds from T1 equivalent up to 1GB speeds. Seems to be quite a few offerings but I am looking for recommendations from actual users. Thanks in advance.

We've used FreeBSD + dummynet on a multi-NIC box in bridging mode to do 'bump on the wire' WAN simulations involving packet loss, latency, and unidirectional packet flow variances. Works wonderfully, and the price is right. Matt

Tim Durack

10:30 p.m.

New subject: Simple Low Cost WAN Link Simulator Recommendations

On Sun, Mar 20, 2011 at 6:20 PM, Matthew Petach <mpetach@netflight.com>wrote:

...

On Thu, Mar 17, 2011 at 6:57 AM, Loopback <loopback@digi-muse.com> wrote:

...
Need the ability to test Network Management and Provisioning applications over a variety of WAN link speeds from T1 equivalent up to 1GB speeds. Seems to be quite a few offerings but I am looking for recommendations from actual users. Thanks in advance.

Linux tc netem: http://www.linuxfoundation.org/collaborate/workgroups/networking/netem Has worked well for us. -- Tim:>

Jim Logajan

18 Mar 18 Mar

5:27 p.m.

New subject: Simple Low Cost WAN Link Simulator Recommendations

Loopback <loopback@digi-muse.com> wrote:

...

Need the ability to test Network Management and Provisioning applications over a variety of WAN link speeds from T1 equivalent up to 1GB speeds. Seems to be quite a few offerings but I am looking for recommendations from actual users. Thanks in advance.

I've used both Mini Maxwell and Maxwell Pro from InterWorking Labs with great success: http://minimaxwell.iwl.com/ http://maxwell.iwl.com/ The Maxwell Pro has a bunch of interesting capabilities - it can be used to actually modify packets as they pass through (like changing IP addresses and port numbers, or messing with other parts of selected packets.) Obviously that goes beyond simple WAN emulation. Jim

Wilkinson, Alex

20 Mar 20 Mar

2:24 p.m.

New subject: Simple Low Cost WAN Link Simulator Recommendations [SEC=UNCLASSIFIED]

0n Fri, Mar 18, 2011 at 10:27:18AM -0700, Jim Logajan wrote: >Loopback <loopback@digi-muse.com> wrote: >> Need the ability to test Network Management and Provisioning >> applications over a variety of WAN link speeds from T1 equivalent up to >> 1GB speeds. Seems to be quite a few offerings but I am looking for >> recommendations from actual users. Thanks in advance. FreeBSD + DummyNet [http://www.freebsd.org/cgi/man.cgi?query=dummynet&sektion=4] -Alex IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.

Sudeep Khuraijam

17 Mar 17 Mar

midnight

Correct me if I am wrong but to detect a failure by default BGP would wait the "hold-timer" then declare a peer dead and converge. Hence the case for BFD. There a difference of several orders of magnitude between BFD keepalive intervals (in ms) and BGP (in seconds) with generally configurable multipliers vs. hold timer. With Real time media and ever faster last miles, BGP hold timer may find itself inadequate, if not in appropriate in some cases. For a provider to require a vendor instead of RFC compliance is sinful. Sudeep On Mar 16, 2011, at 1:42 PM, Jensen Tyler wrote: Correct me if I am wrong but to detect a failure by default BGP would wait the "hold-timer" then declare a peer dead and converge. So you would be looking at 90 seconds(juniper default?) + CPU bound convergence time to recover? Am I thinking about this right? -----Original Message----- From: Jeff Wheeler [mailto:jsw@inconcepts.biz] Sent: Wednesday, March 16, 2011 1:55 PM To: nanog@nanog.org<mailto:nanog@nanog.org> Subject: Re: bfd-like mechanism for LANPHY connections between providers On Wed, Mar 16, 2011 at 2:33 PM, Jensen Tyler <JTyler@fiberutilities.com<mailto:JTyler@fiberutilities.com>> wrote: We have many switches between us and Level3 so we don't get a "interface down" to drop the session in the event of a failure. This is often my topology as well. I am satisfied with BGP's mechanism and default timers, and have been for many years. The reason for this is quite simple: failures are relatively rare, my convergence time to a good state is largely bounded by CPU, and I do not consider a slightly improved convergence time to be worth an a-typical configuration. Case in point, Richard says that none of his customers have requested such configuration to date; and you indicate that Level3 will provision BFD only if you use a certain vendor and this is handled outside of their normal provisioning process. For an IXP LAN interface and associated BGP neighbors, I see much more advantage. I imagine this will become common practice for IXP peering sessions long before it is typical to use BFD on customer/transit-provider BGP sessions. -- Jeff S Wheeler <jsw@inconcepts.biz<mailto:jsw@inconcepts.biz>> Sr Network Operator / Innovative Network Concepts ____________________________________________ Sudeep Khuraijam | I speak for no one but I

Jeff Wheeler

1:05 a.m.

On Wed, Mar 16, 2011 at 8:00 PM, Sudeep Khuraijam <skhuraijam@liveops.com> wrote:

...

There a difference of several orders of magnitude between BFD keepalive intervals (in ms) and BGP (in seconds) with generally configurable multipliers vs. hold timer. With Real time media and ever faster last miles, BGP hold timer may find itself inadequate, if not in appropriate in some cases.

For eBGP peerings, your router must re-converge to a good state in < 9 seconds to see an order of magnitude improvement in time-to-repair. This is typically not the case for transit/customer sessions. To make a risk/reward choice that is actually based in reality, you need to understand your total time to re-converge to a good state, and how much of that is BGP hold-time. You should then consider whether changing BGP timers (with its own set of disadvantages) is more or less practical than using BFD. Let's put it another way: if CPU/FIB convergence time were not a significant issue, do you think vendors would be working to optimize this process, that we would have concepts like MPLS FRR and PIC, and that each new router product line upgrade comes with a yet-faster CPU? Of course not. Vendors would just have said, "hey, let's get together on a lower hold time for BGP." As I stated, I'll change my opinion of BFD when implementations improve. I understand the risk/reward situation. You don't seem to get this, and as a result, your overly-simplistic view is that "BGP takes seconds" and "BFD takes milliseconds."

...

For a provider to require a vendor instead of RFC compliance is sinful.

Many sins are more practical than the alternatives. -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts

5228

Age (days ago)

5232

Last active (days ago)

List overview

Download

17 comments

12 participants

participants (12)

Jeff Wheeler
Jensen Tyler
Jim Logajan
Loopback
Matthew Petach
Mike Callagy
Richard A Steenbergen
Sergey Voropaev
Sudeep Khuraijam
Tassos Chatzithomaoglou
Tim Durack
Wilkinson, Alex