On Sun, 9 Mar 2003, Richard A Steenbergen wrote:
On the send size, the application transmitting is guaranteed to utilize the buffers immediately (ever seen a huge jump in speed at the beginning of a transfer, this is the local buffer being filled, and the application has no way to know if this data is going out to the wire, or just to the kernel). Then the network must drain the packets onto the wire, sometimes very slowly (think about a dialup user downloading from your GigE server).
Actually this is often way too fast as the congestion window doubles with each ACK. This means that with a large buffer = large window and a bottleneck somewhere along the way, you are almost guaranteed to have some serious congestion in the early stages of the session and lower levels of congestion periodially later on whenever TCP tries to figure out how large the congestion window can get without losing packets. This is the part about TCP that I've never understood: why does it send large numbers of packets back-to-back? This is almost never a good idea.
On the receive size, the socket buffers must be large enough to accommodate all the data received between application read()'s,
That's not true. It's perfectly acceptable for TCP to stall when the receiving application fails to read the data fast enough. (TCP then simply announces a window of 0 to the other side so the communication effectively stops until the application reads some data and a >0 window is announced.) If not, the kernel would be required to buffer unlimited amounts of data in the event an application fails to read it from the buffer for some time (which is a very common situation).
locally. Jumbo frames help too, but their real benefit is not the simplistic "hey look theres 1/3rd the number of frames/sec" view that many people see. The good stuff comes from techniques like page flipping, where the NIC DMA's data into a memory page which can be flipped through the system straight to the application, without copying it throughout. Some day TCP may just be implemented on the NIC itself, with ALL work offloaded, and the system doing nothing but receiving nice page-sized chunks of data at high rates of speed.
Hm, I don't see this happening to a usable degree as TCP has no concept of records. You really want to use fixed size chunks of information here rather than pretending everything's a stream.
IMHO the 1500 byte MTU of ethernet will still continue to prevent good end to end performance like this for a long time to come. But alas, I digress...
Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND to support a per-neighbor MTU? This should make backward-compatible adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while we're at it.) Iljitsch van Beijnum