Hey Nimrod,
I was contacted by my NOC to investigate a LAG that was not distributing traffic evenly among the members to the point where one member was congested while the utilization on the LAG was reasonably low. Looking at my netflow data, I was able to confirm that this was caused by a single large flow of ESP traffic. Fortunately, I was able to shift this flow to another path that had enough headroom available so that the flow could be accommodated on a single member link.
With the increase in remote workers and VPN traffic that won't hash across multiple paths, I thought this anecdote might help someone else track down a problem that might not be so obvious.
This problem is called elephant flow. Some vendors have solution for this, by dynamically monitoring utilisation and remapping the hashResult => egressInt table to create bias to offset the elephant flow. One particular example: https://www.juniper.net/documentation/en_US/junos/topics/reference/configura... Ideally VPN providers would be defensive and would use SPORT for entropy, like MPLSoUDP does. -- ++ytti