BFD for routes learned trough Route-servers in IXPs
Time-to-time, in some IXP in the world some issue on the forwarding plane occurs. When it occurs, this topic comes back. The failures are not big enough to drop the BGP sessions between IXP participants and route-servers. But are enough to prejudice traffic between participants. And then the problem comes: "How can I check if my communication against the NextHop of the routes that I learn from the route-servers are OK? If it is not OK, how can I remove it from my FIB?" Some other possible causes of this feeling are: - ARP Resolution issues (CPU protection and lunatic Mikrotiks with 30 seconds ARP timeout is a bombastic recipe) - MAC-Address Learning limitations on the transport link of the participants can be a pain in the a..rm. So, I was searching on how to solve that and I found a draft (8th release) with the intention to solve that... https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08 If understood correctly, the effective implementation of it will depend on new code on any BGP engine that will want to do that check. It is kind of frustrating... At least 10 years after the release of RFC until the refresh os every router involved in IXPs in the world. Some questions come: A) There is anything that we can do to rush this? B) There is any other alternative to that? P.S.1: I gave up of inventing crazy BGP filter polices to test reachability of NextHop. The effectiveness of it can't even be compared to BFD, and almost kill de processing capacity of my router. P.S.2: IMHO, the biggest downside of those problems is the evasion of route-servers from some participants when issues described above occurs.
So, I was searching on how to solve that and I found a draft (8th release) with the intention to solve that... https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
If understood correctly, the effective implementation of it will depend on new code on any BGP engine that will want to do that check. It is kind of frustrating... At least 10 years after the release of RFC until the refresh os every router involved in IXPs in the world.
you have a better (== easier to implement and deploy) signaling path? the draft passed wglc in 1948. it is awaiting two implementations, as is the wont of the idr wg. randy
On Tue, Sep 15, 2020 at 9:40 PM Randy Bush <randy@psg.com> wrote:
So, I was searching on how to solve that and I found a draft (8th release) with the intention to solve that... https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
If understood correctly, the effective implementation of it will depend on new code on any BGP engine that will want to do that check. It is kind of frustrating... At least 10 years after the release of RFC until the refresh os every router involved in IXPs in the world.
you have a better (== easier to implement and deploy) signaling path?
the draft passed wglc in 1948. it is awaiting two implementations, as is the wont of the idr wg.
I think you also mean to say: "this is actually still a DRAFT and not an RFC, so really no BGP implementor is beholden to this document, unless they have coin bearing customers who wish to see this feature implemented"
So, I was searching on how to solve that and I found a draft (8th release) with the intention to solve that... https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
If understood correctly, the effective implementation of it will depend on new code on any BGP engine that will want to do that check. It is kind of frustrating... At least 10 years after the release of RFC until the refresh os every router involved in IXPs in the world.
you have a better (== easier to implement and deploy) signaling path?
the draft passed wglc in 1948. it is awaiting two implementations, as is the wont of the idr wg.
I think you also mean to say: "this is actually still a DRAFT and not an RFC, so really no BGP implementor is beholden to this document, unless they have coin bearing customers who wish to see this feature implemented"
if i had meant to say that, i probably would have. no one on this thread has called it anything other than a draft, so i am quite unsure what your point is; and i will not put words in your mouth. sadly, these years, vendors do not seem to care a lot about drafts, rfcs, ... anything which sells. randy
On Wed, Sep 16, 2020 at 4:55 PM Randy Bush <randy@psg.com> wrote:
So, I was searching on how to solve that and I found a draft (8th release) with the intention to solve that... https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
If understood correctly, the effective implementation of it will depend on new code on any BGP engine that will want to do that check. It is kind of frustrating... At least 10 years after the release of RFC until the refresh os every router involved in IXPs in the world.
you have a better (== easier to implement and deploy) signaling path?
the draft passed wglc in 1948. it is awaiting two implementations, as is the wont of the idr wg.
I think you also mean to say: "this is actually still a DRAFT and not an RFC, so really no BGP implementor is beholden to this document, unless they have coin bearing customers who wish to see this feature implemented"
if i had meant to say that, i probably would have. no one on this thread has called it anything other than a draft, so i am quite unsure what your point is; and i will not put words in your mouth.
I think the OP said: " At least 10 years after the release of RFC
until the refresh os every router involved in IXPs in the world."
it's not an rfc yet.
sadly, these years, vendors do not seem to care a lot about drafts, rfcs, ... anything which sells.
sure :(
Well... My idea with the initial mail was: a) Check if there is anything hindering the evolution of this draft to an RFC. b) Bet in try to make possible a thing that nowadays could be considered impossible, like: "How to enable the BFD capability on a route-server with 2000 BGP Sessions without crashing the box?" And maybe: c) How about suggesting a standard best practice dor ARP-Timeout for IXPs. And creating tools to measure the ARP-Timeout configurations of each participant, and make this info available trough standard protocols. Em qua., 16 de set. de 2020 às 18:14, Christopher Morrow < morrowc.lists@gmail.com> escreveu:
On Wed, Sep 16, 2020 at 4:55 PM Randy Bush <randy@psg.com> wrote:
So, I was searching on how to solve that and I found a draft (8th
release)
with the intention to solve that... https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
If understood correctly, the effective implementation of it will depend on new code on any BGP engine that will want to do that check. It is kind of frustrating... At least 10 years after the release of RFC until the refresh os every router involved in IXPs in the world.
you have a better (== easier to implement and deploy) signaling path?
the draft passed wglc in 1948. it is awaiting two implementations, as is the wont of the idr wg.
I think you also mean to say: "this is actually still a DRAFT and not an RFC, so really no BGP implementor is beholden to this document, unless they have coin bearing customers who wish to see this feature implemented"
if i had meant to say that, i probably would have. no one on this thread has called it anything other than a draft, so i am quite unsure what your point is; and i will not put words in your mouth.
I think the OP said: " At least 10 years after the release of RFC
until the refresh os every router involved in IXPs in the world."
it's not an rfc yet.
sadly, these years, vendors do not seem to care a lot about drafts, rfcs, ... anything which sells.
sure :(
-- Douglas Fernando Fischer Engº de Controle e Automação
"How can I check if my communication against the NextHop of the routes that I learn from the route-servers are OK? If it is not OK, how can I remove it from my FIB?"
Install a route optimizer that constantly pings next hops, when the drop threshold is met, remove the routes. No one is going to open BFD to whole subnets, especially those they don't have peering agreements with, making this pointless.
- ARP Resolution issues (CPU protection and lunatic Mikrotiks with 30 seconds ARP timeout is a bombastic recipe) CoPP is always important, and it's not just Mikrotik's with default low ARP timeouts. Linux - 1 minute Brocade - 10 minutes Cumulus - 18 minutes BSD distros - 20 minutes Extreme - 20 minutes HP - 25 minutes
- MAC-Address Learning limitations on the transport link of the participants can be a pain in the a..rm. As you said, this issue doesn't seem important enough to warrant significant action. For transport, colo a switch that can handles BGP announcements, routes, and ARPs, then transport that across with only 2 MACs and internal point-to-point IP assignments. Ryan On Sep 15 2020, at 5:55 pm, Douglas Fischer <fischerdouglas@gmail.com> wrote: Time-to-time, in some IXP in the world some issue on the forwarding plane occurs. When it occurs, this topic comes back.
The failures are not big enough to drop the BGP sessions between IXP participants and route-servers.
But are enough to prejudice traffic between participants.
And then the problem comes: "How can I check if my communication against the NextHop of the routes that I learn from the route-servers are OK? If it is not OK, how can I remove it from my FIB?"
Some other possible causes of this feeling are: - ARP Resolution issues (CPU protection and lunatic Mikrotiks with 30 seconds ARP timeout is a bombastic recipe) - MAC-Address Learning limitations on the transport link of the participants can be a pain in the a..rm.
So, I was searching on how to solve that and I found a draft (8th release) with the intention to solve that... https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
If understood correctly, the effective implementation of it will depend on new code on any BGP engine that will want to do that check. It is kind of frustrating... At least 10 years after the release of RFC until the refresh os every router involved in IXPs in the world.
Some questions come: A) There is anything that we can do to rush this? B) There is any other alternative to that?
P.S.1: I gave up of inventing crazy BGP filter polices to test reachability of NextHop. The effectiveness of it can't even be compared to BFD, and almost kill de processing capacity of my router.
P.S.2: IMHO, the biggest downside of those problems is the evasion of route-servers from some participants when issues described above occurs.
Ryan Hamel wrote on 16/09/2020 03:01:
Install a route optimizer that constantly pings next hops
or if you want a more reliable IXP experience, don't install a route optimiser and if you do, don't make it ping next-hops. - you're not guaranteed that the icmp reply back to the route optimiser will follow the forward path. - you are guaranteed that icmp is heavily deprioritised on ixp routers - the busier the IXP, the busier the control planes of all the IXP routers you're going to ping, and the more likely they are to drop your ping packets. This will lead to greater route churn. If this approach is widely deployed it will lead to wider-scale routing oscillations due to control plane mismanagement. - route optimisers are associated with serious bgp leakage issues. if you're doing this at an IXP, the danger is significantly magnified because bi-lat peering sessions rarely, if ever, implement prefix filtering. It is true that IXPs occasionally see forwarding plane failures. These tend to be pretty unusual these days. Be careful about optimising edge cases like this. You'll often end up introducing new failure modes which may be more serious and which may occur more regularly. Nick
On 16/09/2020 04:01, Ryan Hamel wrote:
CoPP is always important, and it's not just Mikrotik's with default low ARP timeouts.
Linux - 1 minute Brocade - 10 minutes Cumulus - 18 minutes BSD distros - 20 minutes Extreme - 20 minutes
Juniper - 20 minutes
HP - 25 minutes
-- Chriztoffer
On Wed, 16 Sep 2020 at 23:15, Chriztoffer Hansen <chriztoffer.hansen@de-cix.net> wrote:
On 16/09/2020 04:01, Ryan Hamel wrote:
CoPP is always important, and it's not just Mikrotik's with default low ARP timeouts.
Linux - 1 minute Brocade - 10 minutes Cumulus - 18 minutes BSD distros - 20 minutes Extreme - 20 minutes Juniper - 20 minutes HP - 25 minutes IOS - 4 hours
Why are these considered (by Ryan) low values? Does low have a negative connotation here? ARP timeout should be lower than MAC timeout, and MAC timeout usually is 300 seconds. Anything above 300seconds is probably poor BCP for default value, as defaults should interoperate in a somewhat sane manner. Of course operators are free to configure very high ARP timeout, as long as they also remember to equally configure higher MAC timeout. -- ++ytti
About this comparison between CAM-Table Timeout, and ARP-Table Timeout. I tend to partially agree with you... Ethernet is a so widely used protocol to sever scenarios. We need to consider the different needs of the type of communications. For example: I'm not a big fan of Mikrotik/RouterOS. But I know they are there, and liking or not, I need to accept that I will need to deal with then(as a peer or even as an operator). One of most common uses of Mikrotik is for HotSpot/Captive Portal. And for that, an ARP Timeout of 30 seconds is very OK! Is a good way to check if the EndUser is still reachable on the network, and based on that do the billing. But 30 Seconds for an IXP? It does not make any sense! Those packets are stealing CPU cycles of the Control Plane of any router in the LAN. Another example: You suggested equalizing ARP-Timeout and MAC-Timeout For a campus LAN? With frequent topology changes, add/removes of hosts every time... That is perfect! But talking about an IXP LAN: In an ideal scenario, how often should happen topology changes on an IXP? How often new hosts get ins/outs of hosts in the and IXP LAN? Why should we spend CPU Cycles with 576K ARP Requests a day(2K participants, 5 min ARP-Timeout). Instead of 1.2K ARP Requests a day(2K participants, 4 hours ARP-Timeout)? I would prefer to use those CPU cycles to process other things like BGP messages, BFD, etc... Em qui., 17 de set. de 2020 às 02:54, Saku Ytti <saku@ytti.fi> escreveu:
On Wed, 16 Sep 2020 at 23:15, Chriztoffer Hansen <chriztoffer.hansen@de-cix.net> wrote:
On 16/09/2020 04:01, Ryan Hamel wrote:
CoPP is always important, and it's not just Mikrotik's with default low ARP timeouts.
Linux - 1 minute Brocade - 10 minutes Cumulus - 18 minutes BSD distros - 20 minutes Extreme - 20 minutes Juniper - 20 minutes HP - 25 minutes IOS - 4 hours
Why are these considered (by Ryan) low values? Does low have a negative connotation here?
ARP timeout should be lower than MAC timeout, and MAC timeout usually is 300 seconds. Anything above 300seconds is probably poor BCP for default value, as defaults should interoperate in a somewhat sane manner. Of course operators are free to configure very high ARP timeout, as long as they also remember to equally configure higher MAC timeout.
-- ++ytti
-- Douglas Fernando Fischer Engº de Controle e Automação
On Thu, 17 Sep 2020 at 20:51, Douglas Fischer <fischerdouglas@gmail.com> wrote:
Why should we spend CPU Cycles with 576K ARP Requests a day(2K participants, 5 min ARP-Timeout). Instead of 1.2K ARP Requests a day(2K participants, 4 hours ARP-Timeout)? I would prefer to use those CPU cycles to process other things like BGP messages, BFD, etc...
I think this communication may not be very communicative. How many more BGP messages per day can we process if we do 1.2k ARP requests a day instead of 576k? How many more days of DFZ BGP UPDATE growth is that? -- ++ytti
If you look just to the normal situations... 1.2K vs 576K may not represent very much. But if you look tho ARP Requests Graphs on a significative topology changing on a big IXP, and also look to CPU-per-process graphs, maybe what I'm suggesting could be more explicit. I'm talking of good boxes freezing because of that. Of course CoPP exists to avoid that. But the vanilla configurations of CoPP combined with lunatic ARP-Timeout causes many day-by-day problems... So, in this case, the solution would but a BCP with some "MUST"s defining acceptable rates. And with that, every that doesn't like to be waked up at dawn will become happy(at least by this reason). Em qui., 17 de set. de 2020 às 15:07, Saku Ytti <saku@ytti.fi> escreveu:
On Thu, 17 Sep 2020 at 20:51, Douglas Fischer <fischerdouglas@gmail.com> wrote:
Why should we spend CPU Cycles with 576K ARP Requests a day(2K participants, 5 min ARP-Timeout). Instead of 1.2K ARP Requests a day(2K participants, 4 hours ARP-Timeout)? I would prefer to use those CPU cycles to process other things like BGP messages, BFD, etc...
I think this communication may not be very communicative.
How many more BGP messages per day can we process if we do 1.2k ARP requests a day instead of 576k? How many more days of DFZ BGP UPDATE growth is that?
-- ++ytti
-- Douglas Fernando Fischer Engº de Controle e Automação
On 9/17/20 1:51 PM, Douglas Fischer wrote:
But 30 Seconds for an IXP? It does not make any sense! Those packets are stealing CPU cycles of the Control Plane of any router in the LAN.
Especially given how some exchanges lock the mac address of participants. You could probably get away with ARP timeouts of a day or even just permanent with manual clearing when you see a peer go down. -Paul
Hello ARP timeout should be lower than MAC timeout, but usually the default is the other way around. Which is extremely stupid. To those who do not know why, let me give a simple example: Router R1 is connected to switch SW1 with a connection to server SRV: R1 <-> SW1 <-> SRV Router R2 is connected to switch SW2 with a connection to server SRV: R2 <-> SW2 <-> SRV The server is using R1 as default gateway. Traffic is arriving from the internet through R2 towards the server. The server will however send replies back through the default gateway at R1. This is a usual case with redundant routers - only one will be used as a default gateway but traffic may come from both. Initially all will be good. But SW2 is only seeing unidirectional traffic from R2. No traffic goes from SRV to R2 and thus, after some time, SW2 will expire the MAC learning for SRV. This has the unfortunate result that SW2 will start flooding traffic to SRV out through all ports. Then after more time has passed, R2 will renew the ARP binding by sending out an ARP query to SRV. The server will send back an ARP reply to R2. This packet from SRV to R2 will pass SW2 and thus have the effect of renewing the MAC binding at SW2 too. The flooding stops and all is well again. Until the MAC binding expires and the story repeats. If the MAC timeout is 5 minutes and the ARP timeout is 20 minutes, which is very usual, you will have flooding for 15 minutes out of every 20 minutes interval! Stupid! Why have vendors not fixed their defaults for this case? Regards, Baldur On Thu, Sep 17, 2020 at 7:51 AM Saku Ytti <saku@ytti.fi> wrote:
On Wed, 16 Sep 2020 at 23:15, Chriztoffer Hansen <chriztoffer.hansen@de-cix.net> wrote:
On 16/09/2020 04:01, Ryan Hamel wrote:
CoPP is always important, and it's not just Mikrotik's with default low ARP timeouts.
Linux - 1 minute Brocade - 10 minutes Cumulus - 18 minutes BSD distros - 20 minutes Extreme - 20 minutes Juniper - 20 minutes HP - 25 minutes IOS - 4 hours
Why are these considered (by Ryan) low values? Does low have a negative connotation here?
ARP timeout should be lower than MAC timeout, and MAC timeout usually is 300 seconds. Anything above 300seconds is probably poor BCP for default value, as defaults should interoperate in a somewhat sane manner. Of course operators are free to configure very high ARP timeout, as long as they also remember to equally configure higher MAC timeout.
-- ++ytti
Hi, In some IXPs, getting a BFD protected BGP sessions with their route-servers is possible. However, it is usualy optional, so there is no way how to discover know who of your MLPA peering partners has their sessions protected the same way and who don't. You can also ask peers you have a session with to enable BFD there. If they run carrier-grade border routes connected to IXP switches just with fibers, it works pretty well. So just try to talk with your peers about BFD. -- S pozdravem/Best Regards, Zbyněk Pospíchal Dne 16.09.20 v 2:55 Douglas Fischer napsal(a):
Time-to-time, in some IXP in the world some issue on the forwarding plane occurs. When it occurs, this topic comes back.
The failures are not big enough to drop the BGP sessions between IXP participants and route-servers.
But are enough to prejudice traffic between participants.
And then the problem comes: "How can I check if my communication against the NextHop of the routes that I learn from the route-servers are OK? If it is not OK, how can I remove it from my FIB?"
Some other possible causes of this feeling are: - ARP Resolution issues (CPU protection and lunatic Mikrotiks with 30 seconds ARP timeout is a bombastic recipe) - MAC-Address Learning limitations on the transport link of the participants can be a pain in the a..rm.
So, I was searching on how to solve that and I found a draft (8th release) with the intention to solve that... https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
If understood correctly, the effective implementation of it will depend on new code on any BGP engine that will want to do that check. It is kind of frustrating... At least 10 years after the release of RFC until the refresh os every router involved in IXPs in the world.
Some questions come: A) There is anything that we can do to rush this? B) There is any other alternative to that?
P.S.1: I gave up of inventing crazy BGP filter polices to test reachability of NextHop. The effectiveness of it can't even be compared to BFD, and almost kill de processing capacity of my router.
P.S.2: IMHO, the biggest downside of those problems is the evasion of route-servers from some participants when issues described above occurs.
Am Mi., 16. Sept. 2020 um 02:57 Uhr schrieb Douglas Fischer <fischerdouglas@gmail.com>:
Time-to-time, in some IXP in the world some issue on the forwarding plane occurs. When it occurs, this topic comes back.
The failures are not big enough to drop the BGP sessions between IXP participants and route-servers.
But are enough to prejudice traffic between participants.
And then the problem comes: "How can I check if my communication against the NextHop of the routes that I learn from the route-servers are OK? If it is not OK, how can I remove it from my FIB?"
If the traffic is that important then the public internet is the wrong way to transport it. The internet has convergence times up to multiple minutes. Failures can occur everywhere. Reacting to these changes comes at a global cost.
Some other possible causes of this feeling are: - ARP Resolution issues (CPU protection and lunatic Mikrotiks with 30 seconds ARP timeout is a bombastic recipe) - MAC-Address Learning limitations on the transport link of the participants can be a pain in the a..rm.
IXP can/do limit the participant port allowed MAC IXP usually provide a sane config which includes ARP timeouts (which can be checked and an ARP sponge helps as well) The same goes for all the other multicast/broadcast protocols.
So, I was searching on how to solve that and I found a draft (8th release) with the intention to solve that... https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
If understood correctly, the effective implementation of it will depend on new code on any BGP engine that will want to do that check. It is kind of frustrating... At least 10 years after the release of RFC until the refresh os every router involved in IXPs in the world.
Some questions come: A) There is anything that we can do to rush this? B) There is any other alternative to that?
IXP are not simple L2 switches anymore, forwarding is done with LACP/MPLS/VXLAN/... over multiple paths. When A and B can reach a route-server it does not guarantee that A can reach B. Using BFD between members might help or might not as you can not check the complete topology below. The IXP should use BFD and maybe even compare interface counters on both sides of a link in their infrastructure. @past dayjob: We monitored IXP health by pinging our peers/next-hops every X minutes and alerted NOC when there would be bigger changes. Like 10% of peers/next-hops that responded before stopped responding to ICMP.
P.S.1: I gave up of inventing crazy BGP filter polices to test reachability of NextHop. The effectiveness of it can't even be compared to BFD, and almost kill de processing capacity of my router.
P.S.2: IMHO, the biggest downside of those problems is the evasion of route-servers from some participants when issues described above occurs.
route-servers caused some issues in the past like not propagating the revocation/timeout of prefixes some peers like a more direct relationship
If the traffic is that important then the public internet is the wrong way to transport it.
Nonsense. It is usually something said by those who do not know how to use Internet as a transport in a reliable way between two endpoints. In your books what is Internet good for ? Torrent and porn ?
The internet has convergence times up to multiple minutes.
It does not matter how long does it take to "converge" any single path. Hint: Consider using multiple disjoined paths and you see that for vast majority of "Internet failures" the connectivity restoration time would be very close to your RTT time between your endpoints. Rgs, R.
On Sep 22, 2020, at 4:46 AM, Andy Davidson <andy@nosignal.org> wrote:
Hi,
Douglas Fisher wrote:
B) There is any other alternative to that?
Don't connect to IXPs with very very large and complicated topologies. Connect to local IXPs where the design makes a forwarding plane failure that causes the problem you describe less likely.
Or don’t use a route server except to bootstrap. I regularly see issues related to them. I get it’s not easy to peer at an IXP, but IXP peering isn’t for everyone as some people might make it sound. This is why back in the day there was a push to require 24x7 staffing of the remote side to ensure it was being monitored/supported. That may no longer apply to many people, but without active monitoring, you won’t know what the state is of the remote side. - Jared
participants (14)
-
Andy Davidson
-
Baldur Norddahl
-
Christopher Morrow
-
Chriztoffer Hansen
-
Douglas Fischer
-
Jared Mauch
-
Karsten Elfenbein
-
Nick Hilliard
-
Paul Timmins
-
Randy Bush
-
Robert Raszuk
-
Ryan Hamel
-
Saku Ytti
-
Zbyněk Pospíchal