Re: PMTU-D: remember, your load balancer is broken
On Wed, 14 Jun 2000 Valdis.Kletnieks@vt.edu wrote:
On Tue, 13 Jun 2000 22:36:08 MDT, Marc Slemko said:
Except that, technically, you are not permitted to just blindly send segments of such size. Well, you can but systems in the middle don't have to handle them. No?
Hmm.. either I did a bad job of explaining or I haven't had enough caffiene to parse what you said. Given that you also suggest going to a 1460 MSS, I suspect that we're actually violently in agreement here.
Now if I can remember why I chose 1396 for a default MSS.... ;)
It is also a concern that, in my experience, many of the links with MTUs <1500 are also the links with greater packet loss, etc. so you really don't want fragmentation on them.
The worst part here is that I suspect that most of these links (just on sheer numbers of shipped product) are the aformentioned Win98 576-MTU.
However, in this case, the fragmentation happens in a terminal server on the last hop, and hopefully the case of a terminal server running out of queueing buffers and having to drop one of the 2 remaining fragments of a 1500->576 split after sending the first one is pretty rare....
I seem to remember that the *original* motivation for slow-start and all that was Van Jacobson's observation that the most common cause of a TCP retransmit was that an *entire* packet had been silently dropped due to queueing congestion, and could thus be treated identical to an ICMP Source Quench.
Has this changed? Has "fragmentation" become a Great Evil, rather than an annoyance that some links have to deal with?
Anything not in the fast-path (fragmentation, IP options, etc) is a scourge to all that is good and rightous about networking. In other words, if it isn't an every day occurance people seem to forget that it needs to be cared about, checked for "issues", DoS potential, etc (I just heard about a DoS potential against a popular unix stack because of lack of bounds checking in the IP options this evening in fact). I'm hoping IPv6 will fix some of that for IP options, since they're a bit more usable and a bit more important, but I doubt anything will change the mentality. Fortunantly it seems that the backbone links are all running larger MTUs then the hosts (PoS, FDDI, jumbo frame support for gige, even if it isn't standard). As long as its the hosts shrinking the MTU and not the network in-between things are better, its just less then "optimal" thruput. Simple WFQ would be of more use to those poor bastard souls on dialup though. If you are a provider running the tunnel (not on an end host where MSS can be set), you would do well to keep the available MTU post-tunnel >= 1500 to keep everyone happy, if at all possible. Just a quick 5:30AM thought, but it seems like a better solution would be to have the hosts signal that fragmentation was encountered on the ACK, so if ICMP discovery is not possible the packets are not lost with no idea why (it can go right next to the ECN bit :P). -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/humble PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
[ On Friday, June 16, 2000 at 05:38:25 (-0400), Richard A. Steenbergen wrote: ]
Subject: Re: PMTU-D: remember, your load balancer is broken
If you are a provider running the tunnel (not on an end host where MSS can be set), you would do well to keep the available MTU post-tunnel >= 1500 to keep everyone happy, if at all possible.
Sometimes it's just not that simple. You really do have to make room for the encapsulation header. If the maximum MTU of the tunnel transport is only 1500 bytes then the only way to have an MTU of 1500 bytes on the tunnel as well is to employ forced fragmentation (and perhaps windowing and maybe even retransmission since this level of fragmentation cannot stand for loss or out-of-order reassembly) in the tunneling protocol. Unfortunately there are all kinds of ugly things that can happen when you run TCP inside another windowing protocol. (For further info on that latter issue see "Why TCP Over TCP Is A Bad Idea" at <URL:http://sites.inka.de/sites/bigred/devel/tcp-tcp.html>.)
Just a quick 5:30AM thought, but it seems like a better solution would be to have the hosts signal that fragmentation was encountered on the ACK, so if ICMP discovery is not possible the packets are not lost with no idea why (it can go right next to the ECN bit :P).
Back when I was running with only PPP and a 1024-byte MTU I racked my brain to try and think of something better than PMTU-D. I did come up with a proposal that would allow a PPP router to do a legal end-run around a broken PMTU-D link. I haven't tried to implement it yet. The thing that really makes PMTU-D an ugly hack is that it in effect overloads the meaning of the DF flag. If there were only some room in the IP header for a new flag like "don't fragment unless you really have to...." I am somewhat worried about the vulnerabilities of PMTU-D too now that I've thought about the possibilities. -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
participants (2)
-
Richard A. Steenbergen
-
woods@weird.com