Classification:Public Guys, I'm looking for recommendation regarding BFD timers that we can use for long haul circuit. RTT is roughly around 110 ms. In fact this is a l2vpn ckt provided by a telco. Can you please advise the factors we can consider when setting the BFD timers (or any recommended values)? I have set 10 ms dead time but this is causing BFD to go down occasionally. Thanks & Regards This email is classified as Public by Harivishnu Abhilash Disclaimer: This electronic message and all contents contain information from Mannai Corporation which may be privileged, confidential or otherwise protected from discloser. The information is intended to be for the addressee only. If you are not addressee, any disclosure, copy, distribution or use of the contents of this message is prohibited. If you have received this electronic message in error please notify the sender immediately and destroy the original and all copies.
Hey, On Thu, 16 Jul 2020 at 17:07, Harivishnu Abhilash <Harivishnu.Abhilash@mannai.com.qa> wrote: Classification: Top secret
Guys, I’m looking for recommendation regarding BFD timers that we can use for long haul circuit. RTT is roughly around 110 ms. In fact this is a l2vpn ckt provided by a telco.
Can you please advise the factors we can consider when setting the BFD timers (or any recommended values)? I have set 10 ms dead time but this is causing BFD to go down occasionally.
RTT is immaterial. What you need to ask is what is your vendor SLA regarding packet loss as well as what is their convergence budget. You do not want to detect BFD down for operator re-routable problems. Personally I would encourage to ignore BFD and ask for link-down propagation. In Junos should propagation would be 'set interfaces X gigether-options asynchronous-notification'. -- ++ytti
On 16/Jul/20 05:51, Harivishnu Abhilash wrote:
*Classification:**Public*
Guys, I’m looking for recommendation regarding BFD timers that we can use for long haul circuit. RTT is roughly around 110 ms. In fact this is a l2vpn ckt provided by a telco.
Can you please advise the factors we can consider when setting the BFD timers (or any recommended values)? I have set 10 ms dead time but this is causing BFD to go down occasionally.
We run different intervals and multipliers depending on whether the connection is LAN or WAN. For LAN (so within the same data centre), intervals are set to 150ms and multipliers are set to 3. For WAN (any backbone regardless of latency), intervals are set to 250ms and multipliers are set to 5. Since our network spans multiple countries and continents, we wanted a uniform value for the WAN side of things, so we don't have too many customized configurations. We found these settings to work well in mixed environments where implementations vary between CPU and line card processing, and also to strike a balance between accuracy and false positives. We've been running this on IOS XE, IOS XR and Junos platforms since 2014. The only issues we found were: * BFD on LAG's on IOS XR platforms in a LAN environment don't work. A point-to-point mechanism is required, so we disabled it there. Junos and IOS XE have no problems running BFD on LAG's in LAN's, so we have it on there. This is for within the data centre. * BFDv6 on the MX does not run in hardware. Since IS-IS (for us) ties in BFD for link state event detection, a transient lack of CPU resources to service BFDv6 traffic will result in not only BFDv6 going down, but also the entire IS-IS protocol flapping on the assumption that a link event has occurred. So if you run BFDv6 alongside BFDv4, recommend that you disable BFDv6 until Juniper introduce hardware support for it on the MX (and I'm guessing all other Junos platforms). We have an ER out for this since 2019, and we are told it should be appearing sometime between Q4'20 - 1H'21. * Syntax for BFD in Junos has changed to incorporate address families. So while the old syntax will commit, it will leave an annotation in the configuration about not being supported anymore. Recommend you convert your Junos BFD configurations to IPv4 and IPv6 specificity, if you haven't already done so. I can't remember when this came into effect, but it likely was Junos 16. We are on Junos 17 now. Our longest circuit point-to-point is 140ms (Cape Town - London). These settings have been running fine on there since Day 1 (IOS XR-to-IOS XR), and overall detection and re-convergence of IS-IS + LFA leaves us happy and sleeping well at night. Mark.
Classification:Internal Hi Mark, Thanks for the update. You have any backhauls, that is running over an L2 xconnect ? I’m facing issue only on the backhaul link over a l2vpn ckt. Ta, From: NANOG <nanog-bounces+harivishnu.abhilash=mannai.com.qa@nanog.org> On Behalf Of Mark Tinka Sent: Thursday, July 16, 2020 8:35 PM To: nanog@nanog.org Subject: Re: BFD for long haul circuit EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe. On 16/Jul/20 05:51, Harivishnu Abhilash wrote: Classification:Public Guys, I’m looking for recommendation regarding BFD timers that we can use for long haul circuit. RTT is roughly around 110 ms. In fact this is a l2vpn ckt provided by a telco. Can you please advise the factors we can consider when setting the BFD timers (or any recommended values)? I have set 10 ms dead time but this is causing BFD to go down occasionally. We run different intervals and multipliers depending on whether the connection is LAN or WAN. For LAN (so within the same data centre), intervals are set to 150ms and multipliers are set to 3. For WAN (any backbone regardless of latency), intervals are set to 250ms and multipliers are set to 5. Since our network spans multiple countries and continents, we wanted a uniform value for the WAN side of things, so we don't have too many customized configurations. We found these settings to work well in mixed environments where implementations vary between CPU and line card processing, and also to strike a balance between accuracy and false positives. We've been running this on IOS XE, IOS XR and Junos platforms since 2014. The only issues we found were: * BFD on LAG's on IOS XR platforms in a LAN environment don't work. A point-to-point mechanism is required, so we disabled it there. Junos and IOS XE have no problems running BFD on LAG's in LAN's, so we have it on there. This is for within the data centre. * BFDv6 on the MX does not run in hardware. Since IS-IS (for us) ties in BFD for link state event detection, a transient lack of CPU resources to service BFDv6 traffic will result in not only BFDv6 going down, but also the entire IS-IS protocol flapping on the assumption that a link event has occurred. So if you run BFDv6 alongside BFDv4, recommend that you disable BFDv6 until Juniper introduce hardware support for it on the MX (and I'm guessing all other Junos platforms). We have an ER out for this since 2019, and we are told it should be appearing sometime between Q4'20 - 1H'21. * Syntax for BFD in Junos has changed to incorporate address families. So while the old syntax will commit, it will leave an annotation in the configuration about not being supported anymore. Recommend you convert your Junos BFD configurations to IPv4 and IPv6 specificity, if you haven't already done so. I can't remember when this came into effect, but it likely was Junos 16. We are on Junos 17 now. Our longest circuit point-to-point is 140ms (Cape Town - London). These settings have been running fine on there since Day 1 (IOS XR-to-IOS XR), and overall detection and re-convergence of IS-IS + LFA leaves us happy and sleeping well at night. Mark. This email is classified as Internal by Harivishnu Abhilash Disclaimer: This electronic message and all contents contain information from Mannai Corporation which may be privileged, confidential or otherwise protected from discloser. The information is intended to be for the addressee only. If you are not addressee, any disclosure, copy, distribution or use of the contents of this message is prohibited. If you have received this electronic message in error please notify the sender immediately and destroy the original and all copies.
On 17/Jul/20 02:37, Harivishnu Abhilash wrote:
Thanks for the update. You have any backhauls, that is running over an L2 xconnect ? I’m facing issue only on the backhaul link over a l2vpn ckt.
Unfortunately not. All our backbones are either over dark fibre or EoDWDM. Mark.
Unfortunately not.
Fortunately .... very fortunately Mark. L2VPNs running on someone's IP backbone sold by many as "circuits" has many issues ... stability, MTU blackhols, random drops - and that is pretty much the same all over the world :( Very unfortunate technology just to mux more users and get more $$$ from single investment. Cheers, R. On Fri, Jul 17, 2020 at 8:43 AM Mark Tinka <mark.tinka@seacom.com> wrote:
On 17/Jul/20 02:37, Harivishnu Abhilash wrote:
Thanks for the update. You have any backhauls, that is running over an L2 xconnect ? I’m facing issue only on the backhaul link over a l2vpn ckt.
Unfortunately not. All our backbones are either over dark fibre or EoDWDM.
Mark.
On 17/Jul/20 11:50, Robert Raszuk wrote:
Fortunately .... very fortunately Mark.
Hehe, I meant in the context of not having a similar condition as the OP.
L2VPNs running on someone's IP backbone sold by many as "circuits" has many issues ... stability, MTU blackhols, random drops - and that is pretty much the same all over the world :(
Very unfortunate technology just to mux more users and get more $$$ from single investment.
Can't argue with you. I suppose a lot of customers go for it because they need an Ethernet service slower than 1Gbps, and 1Gbps via a DWDM service is pricier. Where I've seen it be popular is in intercontinental circuits that customers want in order to test a market with as little exposure as possible. Mark.
On 17/07/2020 10:57, Mark Tinka wrote:
I suppose a lot of customers go for it because they need an Ethernet service slower than 1Gbps, and 1Gbps via a DWDM service is pricier.
Where I've seen it be popular is in intercontinental circuits that customers want in order to test a market with as little exposure as possible.
The differentiation is: consumer vs. service provider. If you're a service provider, don't buy a consumer product and hope to sell it on at a similar (or higher) SLA rate to other consumers; that way lies ruin. -- Tom
Tom Hill wrote on 17/07/2020 16:06:
If you're a service provider, don't buy a consumer product and hope to sell it on at a similar (or higher) SLA rate to other consumers; that way lies ruin.
I was going to suggest that there wasn't much in the way of consumer grade international circuits, so why would you even bring this up? But then I lol'd. Nick
On 17/Jul/20 17:12, Nick Hilliard wrote:
I was going to suggest that there wasn't much in the way of consumer grade international circuits, so why would you even bring this up? But then I lol'd.
Now you have me wondering whether Tom was serious or not :-). It's time for my Friday wine, hehe. Mark.
On 17/Jul/20 17:06, Tom Hill wrote:
The differentiation is: consumer vs. service provider.
If you're a service provider, don't buy a consumer product and hope to sell it on at a similar (or higher) SLA rate to other consumers; that way lies ruin.
I don't know of "Consumers" that buy l2vpn's. Most consumers usually go for ADSL, FTTH or 4G... all carrying IP :-). We have several customers that buy EoMPLS circuits from us both within and outside of countries, and between continents. The reasons vary, but safe to say they've been happy. Of course, should the requirements get to 10Gbps or more, moving them over to DWDM makes plenty of commercial sense. In my experience, trying to provide EoMPLS transport to customers in the 6Gbps region and above, when your backbone consists mostly of N x 10Gbps links, is just asking for it. I'd recommend considering doing that only if one had N x 100Gbps everywhere, including router-switch 802.1Q trunks. Mark.
On 17/07/2020 16:40, Mark Tinka wrote:
I don't know of "Consumers" that buy l2vpn's. Most consumers usually go for ADSL, FTTH or 4G... all carrying IP :-).
We have several customers that buy EoMPLS circuits from us both within and outside of countries, and between continents. The reasons vary, but safe to say they've been happy.
Yes, I rather think that you've drawn comparison to "consumer" as being in a home somewhere. Someone that consumes a circuit, and someone that provides the service (or resells one). A business customer is a consumer in that case - I won't discriminate against what use someone has for wanting to consume bandwidth between countries, but I do think the specificity here is in whether you intend to just use it, or resell it, and that's where the difference comes in relation to Robert's point. -- Tom
On 17/Jul/20 18:42, Tom Hill wrote:
Yes, I rather think that you've drawn comparison to "consumer" as being in a home somewhere.
Someone that consumes a circuit, and someone that provides the service (or resells one). A business customer is a consumer in that case - I won't discriminate against what use someone has for wanting to consume bandwidth between countries, but I do think the specificity here is in whether you intend to just use it, or resell it, and that's where the difference comes in relation to Robert's point.
We see both use-cases, where businesses (enterprise) consume, and operators resell. Ultimately, it's about not boxing everything into a definition, especially if it meets your needs. Just like how our idea of a core or peering router vs. a vendor's idea of a core or peering router might differ :-). Mark.
Well luckily we have MEF to set expectations about ones EPL/EVPL/EPLAN/EVPLAN performance. (and formal SLA contracts describing every single aspect of the service and its performance). Anyways, when I was designing these the back in the days when it was cool and demand was high, customers (other carriers) were getting MTU9100 (to fit customers MTU9000), the whole CFM & LFM shebang (to the point made earlier in the thread that the link should go down on both ends -like it’s the case with a wave) and sub 50ms convergence in case something when wrong inside our backbone. We as a provider got more $$$ from single investment to our wave/fiber, but our customers could enjoy p2p links on par with wave for less $. adam From: NANOG <nanog-bounces+adamv0025=netconsultings.com@nanog.org> On Behalf Of Robert Raszuk Sent: Friday, July 17, 2020 10:50 AM To: Mark Tinka <mark.tinka@seacom.com> Cc: nanog@nanog.org Subject: Re: BFD for long haul circuit
Unfortunately not.
Fortunately .... very fortunately Mark. L2VPNs running on someone's IP backbone sold by many as "circuits" has many issues ... stability, MTU blackhols, random drops - and that is pretty much the same all over the world :( Very unfortunate technology just to mux more users and get more $$$ from single investment. Cheers, R. On Fri, Jul 17, 2020 at 8:43 AM Mark Tinka <mark.tinka@seacom.com <mailto:mark.tinka@seacom.com> > wrote: On 17/Jul/20 02:37, Harivishnu Abhilash wrote: Thanks for the update. You have any backhauls, that is running over an L2 xconnect ? I’m facing issue only on the backhaul link over a l2vpn ckt. Unfortunately not. All our backbones are either over dark fibre or EoDWDM. Mark.
On 18/Jul/20 15:31, adamv0025@netconsultings.com wrote:
Well luckily we have MEF to set expectations about ones EPL/EVPL/EPLAN/EVPLAN performance. (and formal SLA contracts describing every single aspect of the service and its performance).
Anyways, when I was designing these the back in the days when it was cool and demand was high, customers (other carriers) were getting MTU9100 (to fit customers MTU9000), the whole CFM & LFM shebang (to the point made earlier in the thread that the link should go down on both ends -like it’s the case with a wave) and sub 50ms convergence in case something when wrong inside our backbone.
We as a provider got more $$$ from single investment to our wave/fiber, but our customers could enjoy p2p links on par with wave for less $.
It's probably worth noting that easily 90% of all remote peering circuits are running over an EoMPLS service, FWIW. Mark.
On 16/Jul/20 19:34, Mark Tinka wrote:
BFD on LAG's on IOS XR platforms in a LAN environment don't work. A point-to-point mechanism is required, so we disabled it there. Junos and IOS XE have no problems running BFD on LAG's in LAN's, so we have it on there. This is for within the data centre.
So for the archives, I thought I'd update this comment in case anyone comes across this issue in the future. If you want to run BFD on a LAG in IOS XR, you need to tell it to do so with the below command: conf bfd mutlipath include location <location-id> This ensures that at least one physical line card (where a member link may live) hosts the BFD sessions. Without this, the router tries to host the session on the RP, which doesn't work. This also applies to fixed form factor IOS XR-based routers. In previous versions of IOS XR, enabling BFD without this feature simply kept the BFD sessions down. At some point (not sure when, but we saw this in 6.7.1), IS-IS will remain down if BFD remains down, i.e., having this command is now mandatory. The error log to signal you to this requirement is not particularly user-friendly: %L2-BFD-6-SESSION_NO_RESOURCES We won't be messing around with BFD on LAG's in IOS XR still, as a matter of course. Too many moving parts compared to how it's done in Junos. But for anyone who may need this, here you go. Mark.
participants (7)
-
adamv0025@netconsultings.com
-
Harivishnu Abhilash
-
Mark Tinka
-
Nick Hilliard
-
Robert Raszuk
-
Saku Ytti
-
Tom Hill