
On Mon, 10 Mar 2003, Richard A Steenbergen wrote:
On the receive size, the socket buffers must be large enough to accommodate all the data received between application read()'s,
That's not true. It's perfectly acceptable for TCP to stall when the receiving application fails to read the data fast enough.
Ok, I think I was unclear. You don't NEED to have buffers large enough to accommodate all that data received between application read()'s, unless you are trying to achieve maximum performance. I thought that was the general framework we were all working under. :)
You got me there. :-) It seemed that you were talking about more general requirements at this point, though with the upper and lower limits for kernel buffer space and all.
Hm, I don't see this happening to a usable degree as TCP has no concept of records. You really want to use fixed size chunks of information here rather than pretending everything's a stream.
We're talking optimizations for high performance transfers... It can't always be a stream.
Right. But TCP is a stream protocol. This has many advantages, nearly all of which are irrelevant for high volume high bandwidth bulk data transfer. I can imagine a system that only works in one direction and where the data is split into fixed size records (which would ideally fit into a single packet) where each record is acknowledged independently (but certainly not for each individual packet). I would also want to take advantage of traffic classification mechanisms: first the data is flooded at the maximum speed at the lowest possible traffic class. Everything that doesn't make it to the other end is then resent slower with a higher traffic class. If the network supports priority queuing then this would effectively sponge up all free bandwidth without impacting regular interactive traffic. If after a few retries some data still didn't make it: simply skip this for now (but keep a record of the missing bits) and keep going. Many applications can live with some lost data and for others it's probably more efficient to keep running at high speed and repair the gaps afterwards.
IMHO the 1500 byte MTU of ethernet will still continue to prevent good end to end performance like this for a long time to come. But alas, I digress...
Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND to support a per-neighbor MTU? This should make backward-compatible adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while we're at it.)
Not necessarily sure thats the right thing to do, but SOMETHIG has got to be better than what passes for path mtu discovery now. :)
We can't replace path MTU discovery (but hopefully people will start to realize ICMP messages were invented for another reason than job security for firewalls). But what we need is a way for 10/100 Mbps 1500 byte hosts to live with 1000 Mbps 9000 byte hosts on the same subnet. I thought IPv6 neighbor discovery supported this because ND can communicate the MTU between hosts on the same subnet, but unfortunately this is a subnet-wide MTU and not a per-host MTU, which is what we really need. Iljitsch