
Andy Cole via NANOG wrote on 26/09/2025 04:21:
No configuration changes to routing policy at all. After a few days we started to get customer complaints for certain sites/domains being unreachable. I worked around the issue by not announcing the customer blocks to the route servers and changed the return path to traverse transit. This solved the issue, but I'm perplexed as to what could've caused the issue, and where to look to resolve it. If you guys could provide feedback and point me in the right direction I'd appreciate it. TIA.
If this was confirmed working before upgrading to 2x10, then that's useful data. The starting point here would be to check both 10G bearer circuits for errors and discards. Dallas-IX is using IXP Manager so you should be able to log in and check for discards and errors on both ports at the remote side in addition to checking the same on your local router (or switch). If it's not traffic being dropped on the link, then it could be an issue relating to the hashing algo on one side of the LAG or the other. Try to get a repeat case with specific traffic, and then bring this up with the Dallas IX people. Is traffic using both links? Are either of them filling up? Does the problem go away if you disable one link, or the other? Make sure to rule out MTU problems in each bearer link too. Also, be sure to rule out ipv6 routing. Sometimes web pages don't load up properly because some of the assets are delivered over ipv6. Because ipv6 isn't as well monitored as ipv4 in general (cue outrage) and because everyone starts out diagnostics with tools which default to ipv4, this can sometimes slip under the radar. Nick