On Wed, 4 Feb 1998, Phil Howard wrote:
Marc Slemko fills my screen with:
The reason for this change cited by many customers is that many ISPs have 576 MTUs set "inside" their networks and packets get fragmented.
I really don't buy that. Many or most backbone links have MTU >1500, and MTUs <1500 outside of low-speed dialup connections aren't that common. They are there, yes. But not that common.
My understanding of why a lower MTU is demonstratable better under Win95 is because the Win95 TCP stack is broken, and it is a good workaround. Most of the people raving about it are saying they are getting 2-4 times speed increases from changing their MTU from 1500 to 576. Something is very wrong there. I thought I had heard details about exactly what is broken in the Win95 TCP stack that causes this problem, but can't recall them at the moment. It could have no basis in reality and just be a rumour.
I have my MTU set to 552 and it helps quite a bit. It's not an issue of the Win95 stack being broken. I'm running Linux.
The reason I chose to use a low MTU, and sometimes I knock it down even further, is to be able to improve my telnet interactive response over a 33.6k link that I also run as many as 4 concurrent downloads on.
Interactive use is a different issue. A lot of people are claiming that lowering their MTU in Win95 improves their speeds by several times when downloading over a single TCP connection. There appears to be something else going on with Win95 users who have trouble with this.
Here's what I suspect is happening:
With web surfing, a page loads with many images, each of which is often larger than a sliding window worth of packets. The browser will nearly concurrently connect and request for every image. Thus for N images you now get N sliding windows worth of packets slammed at you. This takes up a _lot_ of buffer space in the dialup routers for all these concurrent TCP connections all sending data at the same time over a high speed net to a low speed final link.
I am doubtful. TCP's congestion control should deal with this reasonably. Interactive performance can certainly be helped with a lower MTU, but that is really another issue.
With this happening, buffer space is exhausted and packets are discarded.
If you set the MTU smaller, then the size of all those packets is smaller and the chance of being discarded due to memory exhaustion is reduced, even if you're the only one on that server with small packets.
Anything is possible. I am doubtful of that being the case. It is more likely that servers are retransmitting too quickly for the situation and causing excess traffic. This would be helped by a smaller MTU, yes.
There are all sorts of people spouting all sorts of lies around Windows newsgroups about why small MTUs are good; I think novice users are simply getting drawn in by supposed experts.
Such as?
That the reason why you need to lower the MTU is because if the packet is too big the modem will fragment it because the modem can only handle packets of a certain size.
I assert that small MTUs are good because in real life practice it actually does improve the effective responsiveness and reduces the lag causing loss of packets.
Of course of something is broken, it needs fixing.
Like maybe a lack of buffer space in terminal server?
I doubt it. Personally, aside from a gain for interactive work over a loaded low bandwidth connection, especially with some sort of priority queueing used, I find no gain from a smaller MTU on a dialup link.
I guess systems receiving data from servers with broken retransmission timers (eg. how Solaris used to be) could be helped by lowering the MTU which would result in faster ACKs so bogus retransmissions won't happen all the time, but the fix for this really isn't to lower the MTU.
Perhaps an initial transmission pace on the first few packets sent over a connection would help, too. There is virtually no value to send packets 1-7 immediately behind packet 0. The problem is there isn't any definition of what the pace should initially be. If I were to redesign TCP I would probably make the sliding window be small initially and gradually widen as the turnaround time becomes obvious, then pace the packets to match what the turnaround time looks like.
Look at TCP slow start.
You also get the obvious improvements in interactive performance, and you start getting data more quickly.
I would suggest that you would be well advised to find a handy user or four where this effect is easily observable, and find out what is really going on.
I suspect the buffer overflow situation. At least it should be looked at.
I recall a few months ago a provider I was connected to was having some very horrible performance. My modem connection wasn't blocking at all so it was letting everything through. I started up a ping from a machine out on the net back to me, and noticed I was getting most packets through OK, yet my TCP connections were horrible. Well ping defaults to 64 bytes so I increased it to 1472 and virtually none of them ever got here. I tried a number of different sizes and found erratic performance but on average the loss was proportional to the packet size up to around 1k where there was near total loss. I dropped my MTU down to 64!! and reconnected the telnet session. Now I was getting through. I was getting a very obvious "small-chunk" kind of response performance, but it did let me through.
This is obviously a pathological case where something else is broken and I'm really not sure it carries over. [...]
Some ISP may in fact be setting the MTUs on their routers lower to 576 or some other number just to force MTU discovery to kick in and chop the size of the packets. While I might consider doing this myself, I hesitate due to the fact that MTU discovery is often broken due to a number of reasons. Some sites are (stupidly) blocking all ICMP and if they also have DF set, the connection is hosed. In other cases, like the early version of the Cisco Local Director, the ICMP did not get back to the machine that held the connection (since ICMP network-unreachable doesn't include port numbers and LD's tables surely were port indexed) so again MTU discovery was broken and if DF was set, the connection was hosed. I saw this because many web servers apparently push the HTTP header part of the response ahead and it
Yea, because if they don't then dumb clients like Navigator screw up sometimes. :( It can't even properly handle reads where the boundry of the read data falls in the wrong place. I hate dumb workaround for broken clients. If you ever see a "X-Pad: avoid browser bug" sent from Apache, that's why since Apache tries to keep responses in as few packets as possible.
was smaller than my link MTU at the time so it came through OK, but the data that followed filled the packets and never made it. I saw the DF on the packet with the HTTP part and that's how I figured what was going on.
So turning down the MTU in the router at the ISP can be a problem and should not be done, but turning down the MTU on the end machine will work, whether it is ultimately the correct solution or not.
Users also perceive a better response if all the images are loading in parallel as opposed to them loading one at a time, even if the MTU setting to accomplish this smoothly has a net effect of a longer time for a complete load of all images.
Maybe what we need is a multiplex-HTTP with a server smart enough to send the images over separate subchannels without the browser needing to request them.
HTTP-NG. I would say more but I don't know more because the W3C is a PITA when it comes to individuals. Heck, even as things are now, clients would often come out ahead to use a pipelined persistent connection rather than a bunch of other connections. Lowering the MTU may appear to fix a lot of things, but really is a bad thing for performance and network overhead. I would strongly recommend that Microsoft look at why people are complaining and exactly what they are complaining about before changing any defaults. There are lots of possible reasons for this behaviour, and while no one reason applies in all cases, I suspect that one does apply in many.