On Apr 29, 2020, at 7:59 PM, Kaiser, Erich <erich@gotfusion.net> wrote:
So it has been 3 weeks of major ICMP packet loss to any google service over the Dallas Equinix IX, it is not affecting performance of service but is affecting us with customer complaints and service calls due to some software using it for monitoring purposes people using it for benchmark testing. I have been told from them that they know the cause now and know that a Large ISP on the IX is causing the issue(Hmm wonder who that is...), so why do they not shutdown the peer with them and force the ISP to fix the issue? This issue is affecting everyone on the IX not just us, very very frustrating. Hopefully this will reach someone over there that can do something about it….
Issues with the IXP ecosystem aren’t new in the US. This is why some providers don’t appear at them. The original one member could hurt it all was really the gigaswitch HOLB (head of line blocking) issue that was triggered by congested ports. (Waits for others to crawl out of the woodwork who were more involved in this :-) This is why the majority of traffic volume for interconnection has generally been over private peering links (paid, SFI, otherwise). If you tried to force it through an IXP ecosystem the tens of Tbps wouldn’t fit even in each city. Things like CDNs, the Netflix OpenConnect and otherwise have really shifted the demand off the interconnection points as much as feasible. Sometimes an organization can’t handle it or tries to cling to it’s old ways. Sometimes it takes organization change or people change to improve the situation. I know it can sound like a broken record, but upgrading to match the capacity demands really can make a difference to offload paths. It may also expose other weak points. My personal goal is to cease thinking about things in the 95/5 model and more of a peak model. 95/5 gets you so far but the peaks are really where networks can shine or show their age. I understand it’s not always possible to upgrade links, or sometimes one party holds out on the other. It’s certainly not the case at $dayjob and I try to ensure the process works as best as it can here. Sometimes it’s best to just de-peer a network. You may find it works out better for all involved. At $nightJob I want to peer as much traffic off as possible, but if the network paths aren’t there or low-speed it may not make sense. Evaluate your peers periodically to ensure you are getting what you expect. - jared