Comcast - Significant v4 vs v6 throughput differences, almost stateful.
Hello all, I would appreciate if someone from Comcast could contact me about this. We’re having serious throughput issues with our AS20326 pushing packets to Comcast over v4. Our transfers are either the full line-speed of the Comcast customer modem, or they’re seemingly capped at 200-300KB/s. This behavior appears to be almost stateful, as if the speed is decided when the connection starts. As long as it starts fast it will remain fast for the length of the transfer and slow if it starts slow. Traces seem reasonable and currently we’ve influenced the path onto GTT both ways. If we prepend and reroute on our side, the same exact issue with happen on another transit provider. This issue does not affect v6 and that is full speed on every attempt. This may be regionalized to the Comcast Pittsburgh market. This is most widely affecting our linux mirror repository server: http://mirror.pit.teraswitch.com/ Our colocation customers who are hosting VPN systems are also noticing bottlenecks have started recently for their Comcast employees. -- Nick Zurku Systems Engineer TeraSwitch, Inc. nzurku@teraswitch.com
We have customers in CT with the same issues. When did this start? On Thu, Apr 23, 2020 at 11:07 AM Nick Zurku <nzurku@teraswitch.com> wrote:
Hello all,
I would appreciate if someone from Comcast could contact me about this.
We’re having serious throughput issues with our AS20326 pushing packets to Comcast over v4. Our transfers are either the full line-speed of the Comcast customer modem, or they’re seemingly capped at 200-300KB/s. This behavior appears to be almost stateful, as if the speed is decided when the connection starts. As long as it starts fast it will remain fast for the length of the transfer and slow if it starts slow. Traces seem reasonable and currently we’ve influenced the path onto GTT both ways. If we prepend and reroute on our side, the same exact issue with happen on another transit provider.
This issue does not affect v6 and that is full speed on every attempt. This may be regionalized to the Comcast Pittsburgh market.
This is most widely affecting our linux mirror repository server: http://mirror.pit.teraswitch.com/ Our colocation customers who are hosting VPN systems are also noticing bottlenecks have started recently for their Comcast employees.
-- Nick Zurku Systems Engineer TeraSwitch, Inc. nzurku@teraswitch.com
On Thu, Apr 23, 2020 at 8:27 AM Dovid Bender <dovid@telecurve.com> wrote:
We have customers in CT with the same issues. When did this start?
Seems to have started 5 years ago when we ran out of ipv4 and all comers needed to embrace ipv4 life-support mechanisms https://www.arin.net/vault/announcements/2015/20150924.html The e2e ipv6 internet being faster and more robust than life-supported, bot-ridden, and scarce ipv4 is.... a feature, not a bug. https://www.internetsociety.org/blog/2015/04/facebook-news-feeds-load-20-40-...
On Thu, Apr 23, 2020 at 11:07 AM Nick Zurku <nzurku@teraswitch.com> wrote:
Hello all,
I would appreciate if someone from Comcast could contact me about this.
We’re having serious throughput issues with our AS20326 pushing packets to Comcast over v4. Our transfers are either the full line-speed of the Comcast customer modem, or they’re seemingly capped at 200-300KB/s. This behavior appears to be almost stateful, as if the speed is decided when the connection starts. As long as it starts fast it will remain fast for the length of the transfer and slow if it starts slow. Traces seem reasonable and currently we’ve influenced the path onto GTT both ways. If we prepend and reroute on our side, the same exact issue with happen on another transit provider.
This issue does not affect v6 and that is full speed on every attempt. This may be regionalized to the Comcast Pittsburgh market.
This is most widely affecting our linux mirror repository server: http://mirror.pit.teraswitch.com/ Our colocation customers who are hosting VPN systems are also noticing bottlenecks have started recently for their Comcast employees.
-- Nick Zurku Systems Engineer TeraSwitch, Inc. nzurku@teraswitch.com
We started getting the wave of complaints over the last two weeks or so. Perhaps up to a month ago was when the initial few issues that were reported but were chalked up to being “an issues out on the internet.” Did your issues in CT start on a certain date? -- Nick Zurku Systems Engineer TeraSwitch, Inc. Office: 412-945-7048 nzurku@teraswitch.com On April 23, 2020 at 11:24:59 AM, Dovid Bender (dovid@telecurve.com) wrote: We have customers in CT with the same issues. When did this start? On Thu, Apr 23, 2020 at 11:07 AM Nick Zurku <nzurku@teraswitch.com> wrote:
Hello all,
I would appreciate if someone from Comcast could contact me about this.
We’re having serious throughput issues with our AS20326 pushing packets to Comcast over v4. Our transfers are either the full line-speed of the Comcast customer modem, or they’re seemingly capped at 200-300KB/s. This behavior appears to be almost stateful, as if the speed is decided when the connection starts. As long as it starts fast it will remain fast for the length of the transfer and slow if it starts slow. Traces seem reasonable and currently we’ve influenced the path onto GTT both ways. If we prepend and reroute on our side, the same exact issue with happen on another transit provider.
This issue does not affect v6 and that is full speed on every attempt. This may be regionalized to the Comcast Pittsburgh market.
This is most widely affecting our linux mirror repository server: http://mirror.pit.teraswitch.com/ Our colocation customers who are hosting VPN systems are also noticing bottlenecks have started recently for their Comcast employees.
-- Nick Zurku Systems Engineer TeraSwitch, Inc. nzurku@teraswitch.com
On Thu, Apr 23, 2020 at 8:06 AM Nick Zurku <nzurku@teraswitch.com> wrote:
We’re having serious throughput issues with our AS20326 pushing packets to Comcast over v4. Our transfers are either the full line-speed of the Comcast customer modem, or they’re seemingly capped at 200-300KB/s. This behavior appears to be almost stateful, as if the speed is decided when the connection starts. As long as it starts fast it will remain fast for the length of the transfer and slow if it starts slow.
Hi Nick, That's actually kinda normal for TCP behavior. The two most dominating factors in TCP throughput are the round-trip time (RTT) and how large the congestion window has grown prior to the first lost packet. Other factors (including later mild packet loss) tend to move the needle so slowly you might not notice it moving at all. One of the interesting patterns with TCP is that the sender tends to shove out all the packets it can in the first few percent of the RTT and then sits idle. When the bandwidths are relatively fast, the receiver receives and acks them all in a short time window as well. As a result you get these high-bandwidth spurts where packet loss due to full buffers is likely even though for most of the RTT there are no packets being transmitted at all. It can take several minutes for packets to spread out within the RTT, and by then the congestion window (hence throughput) is firmly established. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
----- On Apr 23, 2020, at 8:06 AM, Nick Zurku <nzurku@teraswitch.com> wrote:
We’re having serious throughput issues with our AS20326 pushing packets to Comcast over v4. Our transfers are either the full line-speed of the Comcast customer modem, or they’re seemingly capped at 200-300KB/s. This behavior appears to be almost stateful, as if the speed is decided when the connection starts. As long as it starts fast it will remain fast for the length of the transfer and slow if it starts slow. Traces seem reasonable and currently we’ve influenced the path onto GTT both ways. If we prepend and reroute on our side, the same exact issue with happen on another transit provider.
Have you tried running a test to see if there may be ECMP issues? I wrote a rudimentary script once, [ https://pastebin.com/TTWEj12T | https://pastebin.com/TTWEj12T ] , that might help here. This script is written to detect packet loss on multiple ECMP paths, but you might be able to modify it for througput. The rationale behind my thinking is that if you have certain ECMP links that are oversubscribed, the TCP sessions following that path will stay "low" bandwidth. Sessions what win the ECMP lottery and pass through a non-congested ECMP path may show better performance. Thanks, Sabri
On Thu, Apr 23, 2020 at 12:45 PM Sabri Berisha <sabri@cluecentral.net> wrote:
----- On Apr 23, 2020, at 8:06 AM, Nick Zurku <nzurku@teraswitch.com> wrote:
We’re having serious throughput issues with our AS20326 pushing packets to Comcast over v4. Our transfers are either the full line-speed of the Comcast customer modem, or they’re seemingly capped at 200-300KB/s. This behavior appears to be almost stateful, as if the speed is decided when the connection starts. As long as it starts fast it will remain fast for the length of the transfer and slow if it starts slow. Traces seem reasonable and currently we’ve influenced the path onto GTT both ways. If we prepend and reroute on our side, the same exact issue with happen on another transit provider.
Have you tried running a test to see if there may be ECMP issues? I wrote a rudimentary script once, https://pastebin.com/TTWEj12T, that might help here. This script is written to detect packet loss on multiple ECMP paths, but you might be able to modify it for througput.
The rationale behind my thinking is that if you have certain ECMP links that are oversubscribed, the TCP sessions following that path will stay "low" bandwidth. Sessions what win the ECMP lottery and pass through a non-congested ECMP path may show better performance.
Thanks,
Sabri
And for a slightly more formal package to do this, there's UDPing, developed by the amazing networking team at Yahoo; it was written to identify intermittent issues affecting a single link in an ECMP or L2-hashed aggregate link pathway. https://github.com/yahoo/UDPing It does have the disadvantage of being designed for one-way measurement in each direction; that decision was intentional, to ensure each direction was measuring a completely known, deterministic pathway based on the hash values in the packets, without the return trip potentially obscuring or complicating identification of problematic links. But if you have access to both the source and destination ends of the connection, it's a wonderful tool to narrow down exactly where the underlying problem on a hashed ECMP/aggregate link is. Matt
participants (6)
-
Ca By
-
Dovid Bender
-
Matthew Petach
-
Nick Zurku
-
Sabri Berisha
-
William Herrin