On Thu, Apr 23, 2020 at 12:45 PM Sabri Berisha <sabri@cluecentral.net> wrote:
----- On Apr 23, 2020, at 8:06 AM, Nick Zurku <nzurku@teraswitch.com> wrote:
We’re having serious throughput issues with our AS20326 pushing packets to Comcast over v4. Our transfers are either the full line-speed of the Comcast customer modem, or they’re seemingly capped at 200-300KB/s. This behavior appears to be almost stateful, as if the speed is decided when the connection starts. As long as it starts fast it will remain fast for the length of the transfer and slow if it starts slow. Traces seem reasonable and currently we’ve influenced the path onto GTT both ways. If we prepend and reroute on our side, the same exact issue with happen on another transit provider.
Have you tried running a test to see if there may be ECMP issues? I wrote a rudimentary script once, https://pastebin.com/TTWEj12T, that might help here. This script is written to detect packet loss on multiple ECMP paths, but you might be able to modify it for througput. 

The rationale behind my thinking is that if you have certain ECMP links that are oversubscribed, the TCP sessions following that path will stay "low" bandwidth. Sessions what win the ECMP lottery and pass through a non-congested ECMP path may show better performance.

Thanks,

Sabri


And for a slightly more formal package to do this,
there's UDPing, developed by the amazing networking
team at Yahoo; it was written to identify intermittent 
issues affecting a single link in an ECMP or L2-hashed
aggregate link pathway.

https://github.com/yahoo/UDPing

It does have the disadvantage of being designed for
one-way measurement in each direction; that decision 
was intentional, to ensure each direction was measuring 
a completely known, deterministic pathway based on the
hash values in the packets, without the return trip potentially
obscuring or complicating identification of problematic links.

But if you have access to both the source and destination ends 
of the connection, it's a wonderful tool to narrow down exactly 
where the underlying problem on a hashed ECMP/aggregate 
link is.

Matt