Am Mi., 16. Sept. 2020 um 02:57 Uhr schrieb Douglas Fischer <fischerdouglas@gmail.com>:
Time-to-time, in some IXP in the world some issue on the forwarding plane occurs. When it occurs, this topic comes back.
The failures are not big enough to drop the BGP sessions between IXP participants and route-servers.
But are enough to prejudice traffic between participants.
And then the problem comes: "How can I check if my communication against the NextHop of the routes that I learn from the route-servers are OK? If it is not OK, how can I remove it from my FIB?"
If the traffic is that important then the public internet is the wrong way to transport it. The internet has convergence times up to multiple minutes. Failures can occur everywhere. Reacting to these changes comes at a global cost.
Some other possible causes of this feeling are: - ARP Resolution issues (CPU protection and lunatic Mikrotiks with 30 seconds ARP timeout is a bombastic recipe) - MAC-Address Learning limitations on the transport link of the participants can be a pain in the a..rm.
IXP can/do limit the participant port allowed MAC IXP usually provide a sane config which includes ARP timeouts (which can be checked and an ARP sponge helps as well) The same goes for all the other multicast/broadcast protocols.
So, I was searching on how to solve that and I found a draft (8th release) with the intention to solve that... https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
If understood correctly, the effective implementation of it will depend on new code on any BGP engine that will want to do that check. It is kind of frustrating... At least 10 years after the release of RFC until the refresh os every router involved in IXPs in the world.
Some questions come: A) There is anything that we can do to rush this? B) There is any other alternative to that?
IXP are not simple L2 switches anymore, forwarding is done with LACP/MPLS/VXLAN/... over multiple paths. When A and B can reach a route-server it does not guarantee that A can reach B. Using BFD between members might help or might not as you can not check the complete topology below. The IXP should use BFD and maybe even compare interface counters on both sides of a link in their infrastructure. @past dayjob: We monitored IXP health by pinging our peers/next-hops every X minutes and alerted NOC when there would be bigger changes. Like 10% of peers/next-hops that responded before stopped responding to ICMP.
P.S.1: I gave up of inventing crazy BGP filter polices to test reachability of NextHop. The effectiveness of it can't even be compared to BFD, and almost kill de processing capacity of my router.
P.S.2: IMHO, the biggest downside of those problems is the evasion of route-servers from some participants when issues described above occurs.
route-servers caused some issues in the past like not propagating the revocation/timeout of prefixes some peers like a more direct relationship