Long-haul 100Mbps EPL circuit throughput issue
Hello NANOG, We've been dealing with an interesting throughput issue with one of our carrier. Specs and topology: 100Mbps EPL, fiber from a national carrier. We do MPLS to the CPE providing a VRF circuit to our customer back to our data center through our MPLS network. Circuit has 75 ms of latency since it's around 5000km. Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco 2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <-> Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test machine in customer's VRF We can full the link in UDP traffic with iperf but with TCP, we can reach 80-90% and then the traffic drops to 50% and slowly increase up to 90%. Any one have dealt with this kind of problem in the past? We've tested by forcing ports to 100-FD at both ends, policing the circuit on our side, called the carrier and escalated to L2/L3 support. They tried to also police the circuit but as far as I know, they didn't modify anything else. I've told our support to make them look for underrun errors on their Cisco switch and they can see some. They're pretty much in the same boat as us and they're not sure where to look at. Thanks Eric
hi eric On 11/05/15 at 04:48pm, Eric Dugas wrote: ...
Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco 2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <-> Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test machine in customer's VRF
We can full the link in UDP traffic with iperf but with TCP, we can reach 80-90% and then the traffic drops to 50% and slowly increase up to 90%.
if i was involved with these tests, i'd start looking for "not enough tcp send and tcp receive buffers" for flooding at 100Mbit/s, you'd need about 12MB buffers ... udp does NOT care too much about dropped data due to the buffers, but tcp cares about "not enough buffers" .. somebody resend packet# 1357902456 :-) at least double or triple the buffers needed to compensate for all kinds of network whackyness: data in transit, misconfigured hardware-in-the-path, misconfigured iperfs, misconfigured kernels, interrupt handing, etc, etc - how many "iperf flows" are you also running ?? - running dozen's or 100's of them does affect thruput too - does the same thing happen with socat ?? - if iperf and socat agree with network thruput, it's the hw somewhere - slowly increasing thruput doesn't make sense to me ... it sounds like something is cacheing some of the data magic pixie dust alvin
Any one have dealt with this kind of problem in the past? We've tested by forcing ports to 100-FD at both ends, policing the circuit on our side, called the carrier and escalated to L2/L3 support. They tried to also police the circuit but as far as I know, they didn't modify anything else. I've told our support to make them look for underrun errors on their Cisco switch and they can see some. They're pretty much in the same boat as us and they're not sure where to look at.
Along with recv window/buffer which is needed for your particular bandwidth/delay product, it appears you're also seeing TCP moving from slow-start to a congestion avoidance mechanism (Reno, Tahoe, CUBIC etc). Greg Foletta greg@foletta.org On 6 November 2015 at 10:19, alvin nanog <nanogml@mail.ddos-mitigator.net> wrote:
hi eric
On 11/05/15 at 04:48pm, Eric Dugas wrote: ...
Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco 2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <-> Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test machine in customer's VRF
We can full the link in UDP traffic with iperf but with TCP, we can reach 80-90% and then the traffic drops to 50% and slowly increase up to 90%.
if i was involved with these tests, i'd start looking for "not enough tcp send and tcp receive buffers"
for flooding at 100Mbit/s, you'd need about 12MB buffers ...
udp does NOT care too much about dropped data due to the buffers, but tcp cares about "not enough buffers" .. somebody resend packet# 1357902456 :-)
at least double or triple the buffers needed to compensate for all kinds of network whackyness: data in transit, misconfigured hardware-in-the-path, misconfigured iperfs, misconfigured kernels, interrupt handing, etc, etc
- how many "iperf flows" are you also running ?? - running dozen's or 100's of them does affect thruput too
- does the same thing happen with socat ??
- if iperf and socat agree with network thruput, it's the hw somewhere
- slowly increasing thruput doesn't make sense to me ... it sounds like something is cacheing some of the data
magic pixie dust alvin
Any one have dealt with this kind of problem in the past? We've tested by forcing ports to 100-FD at both ends, policing the circuit on our side, called the carrier and escalated to L2/L3 support. They tried to also police the circuit but as far as I know, they didn't modify anything else. I've told our support to make them look for underrun errors on their Cisco switch and they can see some. They're pretty much in the same boat as us and they're not sure where to look at.
Eric, I have seen that happen. 1st double check that the gear is truly full duplex....seems like it may claim it is and you just discovered it is not. That's always been an issue with manufactures claiming they are full duplex and on short distances it's not so noticeable. Try to perf in both directions at the same time and it become obvious. Thank You Bob Evans CTO
Hello NANOG,
We've been dealing with an interesting throughput issue with one of our carrier. Specs and topology:
100Mbps EPL, fiber from a national carrier. We do MPLS to the CPE providing a VRF circuit to our customer back to our data center through our MPLS network. Circuit has 75 ms of latency since it's around 5000km.
Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco 2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <-> Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test machine in customer's VRF
We can full the link in UDP traffic with iperf but with TCP, we can reach 80-90% and then the traffic drops to 50% and slowly increase up to 90%.
Any one have dealt with this kind of problem in the past? We've tested by forcing ports to 100-FD at both ends, policing the circuit on our side, called the carrier and escalated to L2/L3 support. They tried to also police the circuit but as far as I know, they didn't modify anything else. I've told our support to make them look for underrun errors on their Cisco switch and they can see some. They're pretty much in the same boat as us and they're not sure where to look at.
Thanks Eric
With default window size of 64KB, and a delay of 75 msec, you should only get around 7Mbps of throughput with TCP. You would need a window size of about 1MB in order to fill up the 100 Mbps link. 1/0.75 = 13.333 (how many RTTs in a second) 13.333 * 65535 * 8 = 6,990,225.24 (about 7Mbps) You would need to increase the window to 1,048,560 KB, in order to get around 100Mbps. 13.333 * 1,048,560 * 8 = 111,843,603.84 (about 100 Mbps) *Pablo Lucena* *Cooper General Global Services* *Network Administrator* *Office: 305-418-4440 ext. 130* *plucena@coopergeneral.com <plucena@coopergeneral.com>* On Thu, Nov 5, 2015 at 6:31 PM, Bob Evans <bob@fiberinternetcenter.com> wrote:
Eric,
I have seen that happen.
1st double check that the gear is truly full duplex....seems like it may claim it is and you just discovered it is not. That's always been an issue with manufactures claiming they are full duplex and on short distances it's not so noticeable.
Try to perf in both directions at the same time and it become obvious.
Thank You Bob Evans CTO
Hello NANOG,
We've been dealing with an interesting throughput issue with one of our carrier. Specs and topology:
100Mbps EPL, fiber from a national carrier. We do MPLS to the CPE providing a VRF circuit to our customer back to our data center through our MPLS network. Circuit has 75 ms of latency since it's around 5000km.
Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco 2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <-> Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test machine in customer's VRF
We can full the link in UDP traffic with iperf but with TCP, we can reach 80-90% and then the traffic drops to 50% and slowly increase up to 90%.
Any one have dealt with this kind of problem in the past? We've tested by forcing ports to 100-FD at both ends, policing the circuit on our side, called the carrier and escalated to L2/L3 support. They tried to also police the circuit but as far as I know, they didn't modify anything else. I've told our support to make them look for underrun errors on their Cisco switch and they can see some. They're pretty much in the same boat as us and they're not sure where to look at.
Thanks Eric
With default window size of 64KB, and a delay of 75 msec, you should only get around 7Mbps of throughput with TCP.
You would need a window size of about 1MB in order to fill up the 100 Mbps link.
1/0.75 = 13.333 (how many RTTs in a second) 13.333 * 65535 * 8 = 6,990,225.24 (about 7Mbps)
You would need to increase the window to 1,048,560 KB, in order to get around 100Mbps.
13.333 * 1,048,560 * 8 = 111,843,603.84 (about 100 Mbps)
I realized I made a typo: 1/*0.075* = 13.333 not 1/0.75 = 13.333
On Nov 5, 2015, at 8:18 PM, Pablo Lucena <plucena@coopergeneral.com> wrote:
I realized I made a typo:
switch.ch has a nice bandwidth delay product calculator. https://www.switch.ch/network/tools/tcp_throughput/ <https://www.switch.ch/network/tools/tcp_throughput/> Punching in the link spec from the original post, gives pretty much exactly what you said Pablo, including that it'd get ~6.999 megabits with a default 64k window. BDP (100 Mbit/sec, 75.0 ms) = 0.94 MByte required tcp buffer to reach 100 Mbps with RTT of 75.0 ms >= 915.5 KByte Theo
On Thu, Nov 5, 2015 at 9:17 PM, Pablo Lucena <plucena@coopergeneral.com> wrote:
With default window size of 64KB, and a delay of 75 msec, you should only get around 7Mbps of throughput with TCP.
Hi Pablo, Modern TCPs support and typically use window scaling (RFC 1323). You may not notice it in packet dumps because the window scaling option is negotiated once for the connection, not repeated in every packet. Regards, Bill Herrin -- William Herrin ................ herrin@dirtside.com bill@herrin.us Owner, Dirtside Systems ......... Web: <http://www.dirtside.com/>
Modern TCPs support and typically use window scaling (RFC 1323). You may not notice it in packet dumps because the window scaling option is negotiated once for the connection, not repeated in every packet.
Absolutely. Most host OS should support this by now. Some test utilities however, like iperf (at least the versions I've used) default to a 16 bit window size though. The goal of my response was to allude to the fact that TCP relies on windowing unlike UDP, thus explaining the discrepancies. This is a good article outlining these details: https://www.edge-cloud.net/2013/06/measuring-network-throughput/
On 5 Nov 2015 21:50, "Eric Dugas" <edugas@unknowndevice.ca> wrote:
Hello NANOG,
We've been dealing with an interesting throughput issue with one of our carrier. Specs and topology:
100Mbps EPL, fiber from a national carrier. We do MPLS to the CPE
providing
a VRF circuit to our customer back to our data center through our MPLS network. Circuit has 75 ms of latency since it's around 5000km.
Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco 2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <-> Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test machine in customer's VRF
We can full the link in UDP traffic with iperf but with TCP, we can reach 80-90% and then the traffic drops to 50% and slowly increase up to 90%.
Any one have dealt with this kind of problem in the past? We've tested by forcing ports to 100-FD at both ends, policing the circuit on our side, called the carrier and escalated to L2/L3 support. They tried to also police the circuit but as far as I know, they didn't modify anything else. I've told our support to make them look for underrun errors on their Cisco switch and they can see some. They're pretty much in the same boat as us and they're not sure where to look at.
Thanks Eric
Hi Eric, Sounds like a TCP problem off the top of my head, however just throwing it out there, we use a mix of wholesale access circuit providers and carriers for locations we haven't PoP'ed and we are an LLU provider (CLEC in US terms). For such issues I have been developing an app to test below TCP/UDP and for pseudowires testing etc: https://github.com/jwbensley/Etherate It may or may not shed some light when you have an underlying problem (although yours sounds TCP related). Cheers, James.
participants (8)
-
alvin nanog
-
Bob Evans
-
Eric Dugas
-
Greg Foletta
-
James Bensley
-
Pablo Lucena
-
Theodore Baschak
-
William Herrin