
IMHO, the key info here is that a known set of subnets was affected. This rules out some stuff: - LACP manages link bundling, as in “can this interface be added to the bundle?”. The effect of bundling should be to have multiple links to choose from when egressing a packet. RFC7130 is a nice addition to bundles as it uses BFD to manage each link - meaning a bad member is removed quickly (LACP timers are not that fast and LACP itself is not designed to react fast). - Hashing (which is used for load balancing traffic in hardware switches) is not managed by LACP - it’s always local to each device and as said before, each side usually has a different view of the ideal hashing. A classical example is when there are many IPs behind one firewall doing NAT - you can’t rely on diversity of IPs and MACs to select an egress link, so you usually change the hashing to be per port. - Link errors would affect random traffic to any destination / from any source So, none of the above technologies would affect traffic connectivity _selectively_. Perhaps a malformed bundle could blackhole traffic, but that wouldn’t be specific to certain subnets unless someone is *extremely* unlucky and _only_ his subnets hashed to the “bad bundle member” :) It simply looks like a routing issue through this path. Perhaps the flapping of the BGP session re-advertised this path to some place that previously wasn’t using it, and apparently can't? Pedro Martins Prado pedro.prado@gmail.com / +353 83 036 1875
On 28 Sep 2025, at 06:24, William Herrin via NANOG <nanog@lists.nanog.org> wrote:
On Sat, Sep 27, 2025 at 7:31 PM Bruce Wainer via NANOG <nanog@lists.nanog.org> wrote:
Excuse my ignorance about this IXP and your equipment, but is Micro-BFD (RFC 7130) supported? And if so, is it enabled or can you enable it? While configuration wise it will use the single IP addresses of the aggregate, separate BFD instances are set up for each underlying link and will confirm whether Layer 3 is working on that point-to-point connection.
Hi Bruce,
I'm also not familiar with this particular IXP but generally with IXPs we're not talking about point to point connections. The multiple participants' routers are part of a shared layer-2 fabric (a switch or switches) over which they trade layer-3 packets directly with each other. The route advertisements may transit the route servers but the routed packets do not.
You can get into some really finicky errors where both participants successfully talk to the route server and thereby exchange routes, but for one reason or another can't get packets back and forth to each other. Bonded circuits (LAGs) add complexity which makes troubleshooting that much harder.
If it were me, I would have considered building this connection differently. For speed, I'd have chosen a 100G link instead of two 10G links. Had my objective been reliability, I'd have built that at layer 3 instead of layer 2 -- two routers each with its own 10G link, and then done some balancing of the advertised routes. But in all fairness to Andy, I don't have anywhere near complete information here and the details matter a lot.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/ _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RXKZDTZB...