On Thu, 31 Aug 2023 at 23:56, Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
The best working theory that several people I know in the neteng community have come up with is because Cogent does not want to adversely impact all other customers on their router in some sites, where the site's upstreams and links to neighboring POPs are implemented as something like 4 x 10 Gbps. In places where they have not upgraded that specific router to a full 100 Gbps upstream. Moving large flows >2Gbps could result in flat topping a traffic chart on just 1 of those 10Gbps circuits.
It is a very plausible theory, and everyone has this problem to a lesser or greater degree. There was a time when edge interfaces were much lower capacity than backbone interfaces, but I don't think that time will ever come back. So this problem is systemic. Luckily there is quite a reasonable solution to the problem, called 'adaptive load balancing', where software monitors balancing, and biases the hash_result => egress_interface tables to improve balancing when dealing with elephant flows. -- ++ytti