On 31 May 2017 at 11:56, Saku Ytti <saku@ytti.fi> wrote:
Cool. Seems you're using AF_PACKET, which makes it actually unique. iperf/netperf etc use UDP or TCP socket, so UDP performance is just abysmal, you can't saturate 1GE link with any reliability. So measuring for example packet loss is not possible at all.
I've been meaning to write AF_PACKET based UDP sender/receiver and have gotten pretty far with friend of mine on rust version, we can congest 1GE (on minimum size frames) on Linux reliably and actually tell if you're lossy. It has server/client design, where client requests via JSON based messages through control-channel server to receive or send, and what exactly. Alas, we're only 80% there, and seem to struggle to find time to polish it for initial release.
We definitely need tool like iperf, which performs at least to 1GE, and AF_PACKET can do that, UDP socket cannot. Alas 10GE is still pipe dream for anything as portable as iperf, as you'd need to use DPDK, netmap or equivalent which will remove the NIC from userland, there are quite few options for that use-case, but no good option for use-case when you want at least 1GE but you cannot remove NIC from userland.
Hi Saku, Yeah AF_PACKET sockets are used and you really need to be on a 4.x Kernel for better performance (update your NIC firmware etc). The problem with Etherate is that is uses Ethernet for the test data and control data and since Ethernet is loss-less is does some strange (read: lame) things like send some control or data frames three times to try and ensure the other side receives it when there is frame loss. Yeah 1G with large frames is do-able. 10G with large frames is also do-able with a fast CPU. Etherate is single threaded though so you’ll not get anywhere near 10G with 64 byte frames in Etherate. I have started writing a multi-threaded version which will use TCP sockets to exchange control data but still use AF_PACKET sockets for data plane traffic. 10G with 64 byte packets should be achievable (still writing it so not 100% confirmed yet) when using the PACKET_MMAP Tx/Rx rings in AF_PACKET which is what the new aptly named EtherateMT (multi-threaded) uses. One can then use multiple threads (each on a difference CPU core) and each with its own Tx or Rx ring buffer to push packets to the NIC and we can use RSS on the NIC and assign each NIC Tx queue to a separate core also for processing NET_TX and NET_RX IRQs. So it might take 12 or 16 cores but it should be do-able in EtherateMT still with the iperf like portability, whereas DPDK can do this on a single core (pkt-gen and moon-gen etc). However EtherateMT would ideally use only Kernel native features (no 3rd party libraries required or custom Kernel complication to enable an optional modules). Yeah Rust seems cool, it's on my "to-learn" list along with Go and seven thousand over things so writing in C for now. Cheers, James.