Some interesting new developments on this, independent of the divergent network equipment discussion.
😊
Cogent had a field engineer at the east coast location where my local loop (10gig wave) meets their equipment, i.e. (me – patch cable to loop provider’s wave equipment – wave – patch cable to Cogent equipment). On the other end, the geographically
distant west coast direction, it’s Cogent equipment to my equipment in the same facility with just patch cable. They connected some model of EXFO’s NetBlazer FTBx 8880-series testing device to a port on their east coast network device, not disconnecting my
circuit. Originally, they were planning to have someone physically loop at their equipment at the other end, but I volunteered that my Arista gear supports a provider-facing loop at the transceiver level if they wanted to try that, so my loop, cabling, and
transceiver could be part of the testing.
One direction at a time, they interrupted the point to point config to create a point to point between one direction of my gear, set to loopback mode, and the NetBlazer device. The device was set to use five parallel streams. In the close
direction, where the third-party wave is involved, they ran at full 5 x 2gbps for thirty minutes, had zero packets lost, no issues. My monitoring confirmed this rate of port input was occurring, although oddly not output, but perhaps Arista doesn’t “see”/count
the retransmitted packets in phy loopback mode.
In the distant direction across their backbone, their equipment at the remote end, and the fiber patch cable to me, they tested at 9.5 Gbit for thirty minutes through my device in loopback mode. The result was, of 2.6B packets sent, only
334 packets lost. They configured for 9.5 gbps rate of testing, so five 1.9gbps streams. Across the five streams, the report has a “frame loss” and out of sequence section. Zero out of sequence, but among the five streams, loss seconds / count were 3 /
26, 3 / 48, 1 / 5, 13 / 221, 1 / 34. I’m not familiar with this testing device, but to me that suggests it’s stating how many of the total seconds experienced loss, and the counted packet loss. So really the only one that stands out is the one with thirteen
seconds where loss occurred, but the packet counts we’re talking about are miniscule. Again, my monitoring at the interface level showed this 9.5gbps of testing occurring for the thirty minutes the report says.
So, now I’m just completely confused. How is this device, traversing the same equipment, ports, cables, able to achieve far greater average throughput, and almost no loss, across a very long duration? There are times I’ll be able to achieve
nearly the same, but never for a test longer than ten seconds as it just falls off from there. For example, I did a five parallel stream TCP test with iperf just now and did achieve a net throughput of 8.16 Gbps with about 1200 retransmits. Same five stream
test run for half hour like theirs, I got no better than 2.64 Gbps and 183,000 retransmits.
iperf and UDP allow me to see loss at any rate of transmit exceeding ~140mbps, in just seconds, not a half hour. To rule out my gear, I’m also able to perform the same tests from the same systems (both VM and physical) using public addresses
and traversing the internet, as these are publicly connected systems. I get far lower loss and much greater throughput on the internet path. For example, simple ten second test of a single stream at 400 Mbit UDP; 5 packets lost across internet, 491 across
P2P. Single stream TCP across the internet for ten seconds; 3.47 Gbps, 162 retransmits. Across the P2P, this time at least, 637 Mbps, 3633 retransmits.
David