Sites unreachable while traversing Dallas IXP

older
Weekly Global IPv4 Routing Table...

Andy Cole

26 Sep 2025 26 Sep '25

3:21 a.m.

Group, I've been Peering with both Route Servers in the Dallas IX for over a month using a single 10G link with no issues. Due to capacity concerns I had to augment to a 20G LAG. In order to do this, I shut the existing link down (which dropped both eBGP sessions), used the existing IP space to create the LAG, and then added the 2nd 10G link. The eBGP sessions reestablished over the LAG and traffic started flowing error free. No configuration changes to routing policy at all. After a few days we started to get customer complaints for certain sites/domains being unreachable. I worked around the issue by not announcing the customer blocks to the route servers and changed the return path to traverse transit. This solved the issue, but I'm perplexed as to what could've caused the issue, and where to look to resolve it. If you guys could provide feedback and point me in the right direction I'd appreciate it. TIA. ~Andy

Show replies by date

Mel Beckman

26 Sep 26 Sep

5:33 a.m.

I’m assuming you’ve tried the obvious “it’s the cable stupidity” rule outs such as replacing the involved physical components like cables or SFPs. After that, the problem likely is LACP configuration. As you may know, LACP doesn't use a single "LACP algorithm" for distributing packets across links. Instead you configure one of the available hash-based distribution functions the two endpoints have in common. The hash uses packet header information to distribute outgoing traffic across the LAG. Common hash algorithms include options to balance traffic based on combinations of Layer 2, 3 and 4 addresses, such as source and destination MAC addresses, source and destination IP addresses, or source and destination TCP/UDP ports. The best choice depends on the specific network traffic and desired distribution. I’ve found that sometimes with LAGs between different equipment vendors one or more of these algorithms aren’t compatible, resulting in packets out of order or even dropped. For example, Cisco and Juniper have different implementations of LACP hashing with similar names. But under the covers, Juniper allows finer-grained control over the specific Layer 2, Layer 3, and Layer 4 fields used for hashing through the forwarding-options hash-key configuration, while Cisco offers just a few fixed hash modes like Layer 2, Layer 3, and Layer 4, with the specific details of the hashing algorithm being proprietary. In my own experience, packet loss on Cisco-Juniper LACP links has arisen from inconsistent or incompatible configurations. You can troubleshoot by checking LACP status and interface counters on both sides, ensuring compatible settings like LACP rate. I’ve even seen duplex flapping! Be sure to look at logs on both ends for hardware errors or weird messages. If the issue persists, try adjusting LACP parameters, and testing using single active member links. Have you tried switching to a different algorithm? -mel beckman On Sep 25, 2025, at 8:22 PM, Andy Cole via NANOG <nanog@lists.nanog.org> wrote: Group, I've been Peering with both Route Servers in the Dallas IX for over a month using a single 10G link with no issues. Due to capacity concerns I had to augment to a 20G LAG. In order to do this, I shut the existing link down (which dropped both eBGP sessions), used the existing IP space to create the LAG, and then added the 2nd 10G link. The eBGP sessions reestablished over the LAG and traffic started flowing error free. No configuration changes to routing policy at all. After a few days we started to get customer complaints for certain sites/domains being unreachable. I worked around the issue by not announcing the customer blocks to the route servers and changed the return path to traverse transit. This solved the issue, but I'm perplexed as to what could've caused the issue, and where to look to resolve it. If you guys could provide feedback and point me in the right direction I'd appreciate it. TIA. ~Andy _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/VQJ37BWP...

Nick Hilliard

2:10 p.m.

...

Instead you configure one of the available hash-based distribution functions the two endpoints have in common. [...] In my own experience, packet loss on Cisco-Juniper LACP links has arisen from inconsistent or incompatible configurations.

Mel Beckman via NANOG wrote on 26/09/2025 06:33: this analysis is straight down wrong. The hashing algos on each side of a LAG bundle are entirely independent of each other and there is no problem whatever with using different hashing algos on each side. Nick

Nick Hilliard

2:20 p.m.

Andy Cole via NANOG wrote on 26/09/2025 04:21:

...

No configuration changes to routing policy at all. After a few days we started to get customer complaints for certain sites/domains being unreachable. I worked around the issue by not announcing the customer blocks to the route servers and changed the return path to traverse transit. This solved the issue, but I'm perplexed as to what could've caused the issue, and where to look to resolve it. If you guys could provide feedback and point me in the right direction I'd appreciate it. TIA.

If this was confirmed working before upgrading to 2x10, then that's useful data. The starting point here would be to check both 10G bearer circuits for errors and discards. Dallas-IX is using IXP Manager so you should be able to log in and check for discards and errors on both ports at the remote side in addition to checking the same on your local router (or switch). If it's not traffic being dropped on the link, then it could be an issue relating to the hashing algo on one side of the LAG or the other. Try to get a repeat case with specific traffic, and then bring this up with the Dallas IX people. Is traffic using both links? Are either of them filling up? Does the problem go away if you disable one link, or the other? Make sure to rule out MTU problems in each bearer link too. Also, be sure to rule out ipv6 routing. Sometimes web pages don't load up properly because some of the assets are delivered over ipv6. Because ipv6 isn't as well monitored as ipv4 in general (cue outrage) and because everyone starts out diagnostics with tools which default to ipv4, this can sometimes slip under the radar. Nick

Mel Beckman

2:32 p.m.

Nick, From https://www.exam-labs.com/blog/configuring-lacp-between-cisco-ios-and-junipe... Understanding LACP Failures and Common Pitfalls Link aggregation, although a robust feature, is susceptible to various issues. To resolve these problems effectively, network administrators must first understand the potential causes of LACP failures. Here are some of the most common causes for LACP issues: 1. Mismatched Configurations: Often, the primary reason for LACP failure is a misalignment in configurations between the two devices. This might include differences in LACP modes (active vs. passive), inconsistent port-channel settings, or mismatched VLANs. Proper alignment of settings across both devices is crucial to the successful negotiation of LACP. -mel On Sep 26, 2025, at 7:10 AM, Nick Hilliard <nick@foobar.org> wrote: Mel Beckman via NANOG wrote on 26/09/2025 06:33: Instead you configure one of the available hash-based distribution functions the two endpoints have in common. [...] In my own experience, packet loss on Cisco-Juniper LACP links has arisen from inconsistent or incompatible configurations. this analysis is straight down wrong. The hashing algos on each side of a LAG bundle are entirely independent of each other and there is no problem whatever with using different hashing algos on each side. Nick

Nick Hilliard

2:57 p.m.

Mel Beckman wrote on 26/09/2025 15:32:

...

From https://www.exam-labs.com/blog/configuring-lacp-between-cisco-ios-and-junipe...

*Understanding LACP Failures and Common Pitfalls*

Link aggregation, although a robust feature, is susceptible to various issues. To resolve these problems effectively, network administrators must first understand the potential causes of LACP failures. Here are some of the most common causes for LACP issues:

1. *Mismatched Configurations:** *Often, the primary reason for LACP failure is a misalignment in configurations between the two devices. This might include differences in LACP modes (active vs. passive), inconsistent port-channel settings, or mismatched VLANs. Proper alignment of settings across both devices is crucial to the successful negotiation of LACP.

Not sure what this has to do with your previous email, where you were specifically referring to hashing algos needing to be the same or "compatible" on each side of the link:

...

Instead you configure one of the available hash-based distribution functions the two endpoints have in common.

...

I’ve found that sometimes with LAGs between different equipment vendors one or more of these algorithms aren’t compatible, resulting in packets out of order or even dropped.

There's no such thing as the packet hashing algorithms needing to be "compatible" on each side of a LAG bundle. You can use whatever hashing algo you want on either side and it doesn't make the slightest difference at the other end. There's no concept of hashing algo compatibility, and hashing doesn't come into play with LACP or any other sort of negotiation. With the link above, you're confusing LACP negotiation issues with packet loss. If you have an LACP negotiation failure, typically the LAG interface won't come up to start with on a juniper / cisco device. Sometimes - rarely - you can have failure modes where an individual bearer circuit won't be used correctly, which would cause 1/N*100% packet loss, i.e. 50% if N=2. Also, as this is an IXP link, it probably means that it's a single untagged VLAN, i.e. no mismatched vlans. Although if there is a mismatched vlan, then that vlan will experience 100% packet loss. Nick

Brian Turnbow

3:07 p.m.

Hi Andy, Andy Cole via NANOG wrote on 26/09/2025 04:21:

...

No configuration changes to routing policy at all. After a few days we started to get customer complaints for certain sites/domains being unreachable. I worked around the issue by not announcing the customer blocks to the route servers and changed the return path to traverse transit. This solved the issue, but I'm perplexed as to what could've caused the issue, and where to look to resolve it. If you guys could provide feedback and point me in the right direction I'd appreciate it. TIA.

Besides what others have mentioned, Another thing that changed moving from the physical interface to lacp is going to be the mac used on the ixp lan. I would check connectivity to all IPs on the peering lan as they may receive your routes from the route server but not be able to contact you directly. You may have already done this but as you mentioned removing the announcements to the route server, to solve the issue it rang the route server blackhole bell for me. It wouldn't be the first time I've seen this happen. Brian Brian

Tom Beecher

3:11 p.m.

I think folks are mixing up concepts a bit in this thread. The *hashing algorithm* is not the same thing as the *load balancing* algorithm. I have a LACP bundle with 4 member links. The *load balancing* algorithm determines if traffic is balanced per packet or per flow across the member links. The *hashing* algorithm is what is used to decide which link to use for each traffic element, with the goal being even distribution across all possible paths. LACP establishment doesn't depend on these two things at all. There is also no requirement that both sides use the same LB or hashing algorithms. ( There are cases where you absolutely DON'T want that to happen anyways.) What can occur is that one side uses a given LB/hashing combo such that traffic hotspots to one of the member links, running it over, and some stuff gets dropped. What can also occur is that if one side does per-packet balancing, which can create all kinds of out of order packet problems. On Fri, Sep 26, 2025 at 10:33 AM Mel Beckman via NANOG < nanog@lists.nanog.org> wrote:

...

Nick,

From

https://www.exam-labs.com/blog/configuring-lacp-between-cisco-ios-and-junipe...

Understanding LACP Failures and Common Pitfalls

Link aggregation, although a robust feature, is susceptible to various issues. To resolve these problems effectively, network administrators must first understand the potential causes of LACP failures. Here are some of the most common causes for LACP issues:

1. Mismatched Configurations: Often, the primary reason for LACP failure is a misalignment in configurations between the two devices. This might include differences in LACP modes (active vs. passive), inconsistent port-channel settings, or mismatched VLANs. Proper alignment of settings across both devices is crucial to the successful negotiation of LACP.

-mel

On Sep 26, 2025, at 7:10 AM, Nick Hilliard <nick@foobar.org> wrote:

Mel Beckman via NANOG wrote on 26/09/2025 06:33: Instead you configure one of the available hash-based distribution functions the two endpoints have in common. [...] In my own experience, packet loss on Cisco-Juniper LACP links has arisen from inconsistent or incompatible configurations. this analysis is straight down wrong. The hashing algos on each side of a LAG bundle are entirely independent of each other and there is no problem whatever with using different hashing algos on each side.

Nick _______________________________________________ NANOG mailing list

https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/EJYFZTRG...

Pierre LANCASTRE

3:16 p.m.

Hi Maybe you are now connected to 2 different remote devices in MLAG/VPC or ESI, which could have MTU / routing table inconsistencies. So, as suggested in some previous mails, did you try to deactivate the new link and keep the bgp setup as it was originally? Cordialement / Best regards Pierre Le ven. 26 sept. 2025, 11 h 06 a.m., Nick Hilliard via NANOG < nanog@lists.nanog.org> a écrit :

...

Mel Beckman wrote on 26/09/2025 15:32:

...
From

https://www.exam-labs.com/blog/configuring-lacp-between-cisco-ios-and-junipe...

...
*Understanding LACP Failures and Common Pitfalls*

Link aggregation, although a robust feature, is susceptible to various issues. To resolve these problems effectively, network administrators must first understand the potential causes of LACP failures. Here are some of the most common causes for LACP issues:

1. *Mismatched Configurations:** *Often, the primary reason for LACP failure is a misalignment in configurations between the two devices. This might include differences in LACP modes (active vs. passive), inconsistent port-channel settings, or mismatched VLANs. Proper alignment of settings across both devices is crucial to the successful negotiation of LACP.

Not sure what this has to do with your previous email, where you were specifically referring to hashing algos needing to be the same or "compatible" on each side of the link:

...
Instead you configure one of the available hash-based distribution functions the two endpoints have in common.

...
I’ve found that sometimes with LAGs between different equipment vendors one or more of these algorithms aren’t compatible, resulting in packets out of order or even dropped.

There's no such thing as the packet hashing algorithms needing to be "compatible" on each side of a LAG bundle. You can use whatever hashing algo you want on either side and it doesn't make the slightest difference at the other end. There's no concept of hashing algo compatibility, and hashing doesn't come into play with LACP or any other sort of negotiation.

With the link above, you're confusing LACP negotiation issues with packet loss. If you have an LACP negotiation failure, typically the LAG interface won't come up to start with on a juniper / cisco device. Sometimes - rarely - you can have failure modes where an individual bearer circuit won't be used correctly, which would cause 1/N*100% packet loss, i.e. 50% if N=2.

Also, as this is an IXP link, it probably means that it's a single untagged VLAN, i.e. no mismatched vlans. Although if there is a mismatched vlan, then that vlan will experience 100% packet loss.

Nick _______________________________________________ NANOG mailing list

https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/TTWMOLCG...

Tom Beecher

3:23 p.m.

...

Besides what others have mentioned, Another thing that changed moving from the physical interface to lacp is going to be the mac used on the ixp lan. I would check connectivity to all IPs on the peering lan as they may receive your routes from the route server but not be able to contact you directly. You may have already done this but as you mentioned removing the announcements to the route server, to solve the issue it rang the route server blackhole bell for me. It wouldn't be the first time I've seen this happen.

This is an excellent reminder. Many IXPs put a MAC filter on each port that you have to have them change if you change your end. I'm guessing this isn't the case here, since the OP stated his BGP sessions came up and traffic was flowing, but it's possible and a good callout. On Fri, Sep 26, 2025 at 11:14 AM Brian Turnbow via NANOG < nanog@lists.nanog.org> wrote:

...

Hi Andy,

Andy Cole via NANOG wrote on 26/09/2025 04:21:

...
No configuration changes to routing policy at all. After a few days we started to get customer complaints for certain sites/domains being unreachable. I worked around the issue by not announcing the customer blocks to the route servers and changed the return path to traverse transit. This solved the issue, but I'm perplexed as to what could've caused the issue, and where to look to resolve it. If you guys could provide feedback and point me in the right direction I'd appreciate it. TIA.

Besides what others have mentioned, Another thing that changed moving from the physical interface to lacp is going to be the mac used on the ixp lan. I would check connectivity to all IPs on the peering lan as they may receive your routes from the route server but not be able to contact you directly. You may have already done this but as you mentioned removing the announcements to the route server, to solve the issue it rang the route server blackhole bell for me. It wouldn't be the first time I've seen this happen.

Brian

Brian _______________________________________________ NANOG mailing list

https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/6CBYNULY...

Brian Turnbow

4:01 p.m.

HI, On Fri, 26 Sept 2025 at 17:23, Tom Beecher <beecher@beecher.cc> wrote:

...

...
Besides what others have mentioned, Another thing that changed moving from the physical interface to lacp is going to be the mac used on the ixp lan. I would check connectivity to all IPs on the peering lan as they may receive your routes from the route server but not be able to contact you directly. You may have already done this but as you mentioned removing the announcements to the route server, to solve the issue it rang the route server blackhole bell for me. It wouldn't be the first time I've seen this happen.

This is an excellent reminder. Many IXPs put a MAC filter on each port that you have to have them change if you change your end.

I'm guessing this isn't the case here, since the OP stated his BGP sessions came up and traffic was flowing, but it's possible and a good callout.

Yes it is not an acl on his port, as no sessions would come up, but maybe a fabric forwarding issue. I have no idea what they use if flat L2, evpn etc on the peering lan but it would be something to check. It could even be some crazy member that fixed static arp tables or L2 acls for "security purposes" ;-) Brian

nanog＠immibis.com

27 Sep 27 Sep

7:44 a.m.

On 26 September 2025 18:01:49 CEST, Brian Turnbow via NANOG <nanog@lists.nanog.org> wrote:

...

HI,

On Fri, 26 Sept 2025 at 17:23, Tom Beecher <beecher@beecher.cc> wrote:

...
I'm guessing this isn't the case here, since the OP stated his BGP sessions came up and traffic was flowing, but it's possible and a good callout.

Yes it is not an acl on his port, as no sessions would come up, but maybe a fabric forwarding issue.

What if one port was dropping all traffic, but BGP kept retrying until it got hashed onto the working port, or it was only L3 hashing and happened to put BGP sessions always on the working port? IMO, BGP being up isn't evidence of all links working and no traffic drops.

Bruce Wainer

28 Sep 28 Sep

2:31 a.m.

Excuse my ignorance about this IXP and your equipment, but is Micro-BFD (RFC 7130) supported? And if so, is it enabled or can you enable it? While configuration wise it will use the single IP addresses of the aggregate, separate BFD instances are set up for each underlying link and will confirm whether Layer 3 is working on that point-to-point connection. Bruce Wainer On Thu, Sep 25, 2025 at 11:22 PM Andy Cole via NANOG <nanog@lists.nanog.org> wrote:

...

Group, I've been Peering with both Route Servers in the Dallas IX for over a month using a single 10G link with no issues. Due to capacity concerns I had to augment to a 20G LAG. In order to do this, I shut the existing link down (which dropped both eBGP sessions), used the existing IP space to create the LAG, and then added the 2nd 10G link. The eBGP sessions reestablished over the LAG and traffic started flowing error free. No configuration changes to routing policy at all. After a few days we started to get customer complaints for certain sites/domains being unreachable. I worked around the issue by not announcing the customer blocks to the route servers and changed the return path to traverse transit. This solved the issue, but I'm perplexed as to what could've caused the issue, and where to look to resolve it. If you guys could provide feedback and point me in the right direction I'd appreciate it. TIA.

~Andy _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/VQJ37BWP...

William Herrin

5:24 a.m.

On Sat, Sep 27, 2025 at 7:31 PM Bruce Wainer via NANOG <nanog@lists.nanog.org> wrote:

...

Excuse my ignorance about this IXP and your equipment, but is Micro-BFD (RFC 7130) supported? And if so, is it enabled or can you enable it? While configuration wise it will use the single IP addresses of the aggregate, separate BFD instances are set up for each underlying link and will confirm whether Layer 3 is working on that point-to-point connection.

Hi Bruce, I'm also not familiar with this particular IXP but generally with IXPs we're not talking about point to point connections. The multiple participants' routers are part of a shared layer-2 fabric (a switch or switches) over which they trade layer-3 packets directly with each other. The route advertisements may transit the route servers but the routed packets do not. You can get into some really finicky errors where both participants successfully talk to the route server and thereby exchange routes, but for one reason or another can't get packets back and forth to each other. Bonded circuits (LAGs) add complexity which makes troubleshooting that much harder. If it were me, I would have considered building this connection differently. For speed, I'd have chosen a 100G link instead of two 10G links. Had my objective been reliability, I'd have built that at layer 3 instead of layer 2 -- two routers each with its own 10G link, and then done some balancing of the advertised routes. But in all fairness to Andy, I don't have anywhere near complete information here and the details matter a lot. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/

Pedro Prado

11:58 a.m.

IMHO, the key info here is that a known set of subnets was affected. This rules out some stuff: - LACP manages link bundling, as in “can this interface be added to the bundle?”. The effect of bundling should be to have multiple links to choose from when egressing a packet. RFC7130 is a nice addition to bundles as it uses BFD to manage each link - meaning a bad member is removed quickly (LACP timers are not that fast and LACP itself is not designed to react fast). - Hashing (which is used for load balancing traffic in hardware switches) is not managed by LACP - it’s always local to each device and as said before, each side usually has a different view of the ideal hashing. A classical example is when there are many IPs behind one firewall doing NAT - you can’t rely on diversity of IPs and MACs to select an egress link, so you usually change the hashing to be per port. - Link errors would affect random traffic to any destination / from any source So, none of the above technologies would affect traffic connectivity _selectively_. Perhaps a malformed bundle could blackhole traffic, but that wouldn’t be specific to certain subnets unless someone is *extremely* unlucky and _only_ his subnets hashed to the “bad bundle member” :) It simply looks like a routing issue through this path. Perhaps the flapping of the BGP session re-advertised this path to some place that previously wasn’t using it, and apparently can't? Pedro Martins Prado pedro.prado@gmail.com / +353 83 036 1875

...

On 28 Sep 2025, at 06:24, William Herrin via NANOG <nanog@lists.nanog.org> wrote:

On Sat, Sep 27, 2025 at 7:31 PM Bruce Wainer via NANOG <nanog@lists.nanog.org> wrote:

...
Excuse my ignorance about this IXP and your equipment, but is Micro-BFD (RFC 7130) supported? And if so, is it enabled or can you enable it? While configuration wise it will use the single IP addresses of the aggregate, separate BFD instances are set up for each underlying link and will confirm whether Layer 3 is working on that point-to-point connection.

Hi Bruce,

I'm also not familiar with this particular IXP but generally with IXPs we're not talking about point to point connections. The multiple participants' routers are part of a shared layer-2 fabric (a switch or switches) over which they trade layer-3 packets directly with each other. The route advertisements may transit the route servers but the routed packets do not.

You can get into some really finicky errors where both participants successfully talk to the route server and thereby exchange routes, but for one reason or another can't get packets back and forth to each other. Bonded circuits (LAGs) add complexity which makes troubleshooting that much harder.

If it were me, I would have considered building this connection differently. For speed, I'd have chosen a 100G link instead of two 10G links. Had my objective been reliability, I'd have built that at layer 3 instead of layer 2 -- two routers each with its own 10G link, and then done some balancing of the advertised routes. But in all fairness to Andy, I don't have anywhere near complete information here and the details matter a lot.

Regards, Bill Herrin

-- William Herrin bill@herrin.us https://bill.herrin.us/ _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RXKZDTZB...

Age (days ago)

Last active (days ago)

List overview

Download

14 comments

10 participants

participants (10)

Andy Cole
Brian Turnbow
Bruce Wainer
Mel Beckman
nanog＠immibis.com
Nick Hilliard
Pedro Prado
Pierre LANCASTRE
Tom Beecher
William Herrin

Sites unreachable while traversing Dallas IXP

tags

participants (10)