On Tue, Mar 11, 2003 at 12:41:15AM +0100, Iljitsch van Beijnum wrote:
On the receive size, the socket buffers must be large enough to accommodate all the data received between application read()'s,
That's not true. It's perfectly acceptable for TCP to stall when the receiving application fails to read the data fast enough. (TCP then simply announces a window of 0 to the other side so the communication effectively stops until the application reads some data and a >0 window is announced.) If not, the kernel would be required to buffer unlimited amounts of data in the event an application fails to read it from the buffer for some time (which is a very common situation).
Ok, I think I was unclear. You don't NEED to have buffers large enough to accommodate all that data received between application read()'s, unless you are trying to achieve maximum performance. I thought that was the general framework we were all working under. :)
locally. Jumbo frames help too, but their real benefit is not the simplistic "hey look theres 1/3rd the number of frames/sec" view that many people see. The good stuff comes from techniques like page flipping, where the NIC DMA's data into a memory page which can be flipped through the system straight to the application, without copying it throughout. Some day TCP may just be implemented on the NIC itself, with ALL work offloaded, and the system doing nothing but receiving nice page-sized chunks of data at high rates of speed.
Hm, I don't see this happening to a usable degree as TCP has no concept of records. You really want to use fixed size chunks of information here rather than pretending everything's a stream.
We're talking optimizations for high performance transfers... It can't always be a stream.
IMHO the 1500 byte MTU of ethernet will still continue to prevent good end to end performance like this for a long time to come. But alas, I digress...
Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND to support a per-neighbor MTU? This should make backward-compatible adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while we're at it.)
Not necessarily sure thats the right thing to do, but SOMETHIG has got to be better than what passes for path mtu discovery now. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)