Re: PMTU-D: remember, your load balancer is broken
In message <200006140333.e5E3XmL28888@black-ice.cc.vt.edu>, Valdis.Kletnieks@vt .edu writes:
b) If you're a webserver or something else providing service Out There to random users, just nail the MTU at 1500, which will work for any Ethernet/PPP/SLIP out there. And if you're load balancing to geographically disparate servers, then your users are probably Out There, with an MTU almost guaranteed to be 1500.
I assert that the chances of PMTU-D helping are in direct ratio to the number of end users who have connections with MTU>1500 - it's almost a sure thing that you probably won't have users with an MTU on their last-hop that's bigger than their campus backbone and/or Internet connection's MTU.
Is anybody seeing any documentable wins by using PMTU-D?
There are two places where it's very important. First, some server farms are on FDDI rings, so they have a higher MTU. Second -- and this one is growing in importance -- tunnels, for IPsec, PPTP, etc. -- generally have smaller MTUs. This very reply will travel over a tunnel with an MTU of, I believe, 1480. --Steve Bellovin
On Tue, 13 Jun 2000 23:50:55 EDT, "Steven M. Bellovin" said:
There are two places where it's very important. First, some server farms are on FDDI rings, so they have a higher MTU. Second -- and this
Yes, but I think I covered that in (a) - if you know there's a bigger MTU, nail it down. I dread to think of a load-balancer in the middle of a FDDI ring in a server farm ;)
one is growing in importance -- tunnels, for IPsec, PPTP, etc. -- generally have smaller MTUs. This very reply will travel over a tunnel with an MTU of, I believe, 1480.
Good point. It's been a long day, I wasn't QUITE thinking straight. Another respondent commented that Windows98 apparently nails an MTU of 576 on a dialup - Apparently I've not run into any Windows98 people setting their clocks off the server I got the numbers from. Also, he said that ADSL uses just under 1500. I don't have a Win98 or ADSL handy to check. ;) In any case, it's good fodder for an operational debate. ;) Valdis Kletnieks Operating Systems Analyst Virginia Tech
----- Original Message ----- From: <Valdis.Kletnieks@vt.edu>
Good point. It's been a long day, I wasn't QUITE thinking straight. Another respondent commented that Windows98 apparently nails an MTU of 576 on a dialup - Apparently I've not run into any Windows98 people setting their clocks off the server I got the numbers from. Also, he said that ADSL uses just under 1500. I don't have a Win98 or ADSL handy to check. ;)
Small MTUs at the ends don't matter. If I dial up with a Windows 98 machine and negotiate an MTU of 576 bytes, the MSS will be set accordingly in the TCP SYN and SYN ACK frames that I send, and the far end will start with 576 byte frames. No PMTU Discovery required. Same thing with ADSL or end-user VPN stuff. PMTU Discovery is important when you have larger MTUs on the ends and small MTUs in the middle. For example, a tunnel (VPN or otherwise) between two routers or VPN servers, for a WAN link with a small MTU, or ... It's a real problem, and the Load Balancer manufacturers need to handle the ICMPs properly. But it's not so bad that everyone with a 576 byte Windwos 98 PPP dial-up would be unable to reach Load Balanced sites. (Arguably, it would be better if it were a problem for such users, because that would guarantee that the problem would get fixed quickly ...) -- Brett
[ On Wednesday, June 14, 2000 at 07:21:54 (-0500), Brett Frankenberger wrote: ]
Subject: Re: PMTU-D: remember, your load balancer is broken
PMTU Discovery is important when you have larger MTUs on the ends and small MTUs in the middle. For example, a tunnel (VPN or otherwise) between two routers or VPN servers, for a WAN link with a small MTU, or ...
I think that should read: "PMTU Discovery is important when you have larger MTUs on either end...." Almost all of my systems, until recently, were advertising an MSS default of 512, and I've had either a PPP connection with an MTU of about 1024, (I forget exactly what it was), or more recently a GRE tunnel with an MTU of 1460. Back when my router was PPP connected I had enormous problems with SunOS-4.1.x, and only slightly fewer problems with NetBSD. Since discovering that servers with an MSS default of 512 bytes cannot possibly ever deliver good TCP throughput to local high-speed customers (eg. on a cable or DSL plant), I've also been hard-coding a TCP MSS default of 1460 on most systems I control (though on cable modem squid servers, etc., it could probably safely be raised to 1500, but of course on my GRE tunnel this is the maximum I can use without fragmentation).
It's a real problem, and the Load Balancer manufacturers need to handle the ICMPs properly.
You're damn right it is! In fact I think I'm having this very problem with segue.merit.edu [198.108.1.41] trying to deliver some NANOG messages to my server ever since yesterday or the day before! (Another server at theplanet.co.uk is definitely giving me these headaches -- I still have to capture a failed connection from segue.merit.edu to prove the latter though....) The system in question still has an MSS default of 512. I've not yetI'm not exactly a TCP guru, but I'm guessing that nothing will improve even if I increase it to 1460.... Maybe I'll try this anyway because in the mean time those damn mailers are clogging mine with zillions of stagnant connections and are preventing any other mailers from delivering.... Personally I think it should be required that an admin jump through multiple burning hoops and then prove he or she can stop a charging locomotive and leap tall buildings before they are allowed to turn on Path-MTU-discovery. Any OS vendor that ships with it on by default should be put in stocks in the town centre so they can be publicly humiliated! -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
[ On Thursday, June 15, 2000 at 10:15:22 (-0400), Greg A. Woods wrote: ]
Subject: Re: PMTU-D: remember, your load balancer is broken
Since discovering that servers with an MSS default of 512 bytes cannot possibly ever deliver good TCP throughput to local high-speed customers (eg. on a cable or DSL plant), I've also been hard-coding a TCP MSS default of 1460 on most systems I control (though on cable modem squid servers, etc., it could probably safely be raised to 1500, but of course on my GRE tunnel this is the maximum I can use without fragmentation).
No, silly me -- it has to be lowered to 1410 on my GRE tunnel when the tunnel MTU is 1450... I *still* keep getting the MSS and MTU confused. I do like the way some folks have been saying 1460+40 to express the MTU as that does eliminate some of the confusion by stating the obvious....
In fact I think I'm having this very problem with segue.merit.edu [198.108.1.41] trying to deliver some NANOG messages to my server ever since yesterday or the day before! (Another server at theplanet.co.uk is definitely giving me these headaches -- I still have to capture a failed connection from segue.merit.edu to prove the latter though....)
A whole bunch of tcpdump'ing on my upstream router later I was finally able to duplicate the problem using a remote host where I new the path was open to all ICMP and where I could run tcpdump and could turn on Path-MTU-discovery and do some FTPs through my tunnel. It turns out the "needs frag" packets were arriving just fine at the remote host and these packets were correctly specifying the maximum size (which at the time was 1448 bytes). In fact I could send a ping packet of exactly that size and no larger from the same test server through my tunnel without it being fragmented or rejected. However it seems that there's a bug somewhere deep in, or below, the GRE tunnel code on NetBSD (1.4ZD) that causes it to silently drop maximum sized packets if they have the DF bit set. It may be that the MTU of the GRE interface is by default one or two bytes too large, and based on that hypothesis we manually forced the tunnel to have a lower MTU of 1400 and, voila!, it works like a charm now! All my NANOG mail came flooding in in short order! ;-) So Path-MTU-discovery is still the problem -- but at least in my current scenario it can sometimes be made to work, if really necessary. I still have to wonder though why people seem to think they need to use PMTU in the first place. Certainly it may be of some advantage if you want the majority of your traffic to be carried in "giant frames" but yet you still need to communicate with some hosts that have interfaces with more traditional sized MTUs *and* you don't want your gateway router to have to fragment all the remote traffic (and then of course remote hosts have to reassemble the fragments). I'm guessing though that this exact scenario is extrememly rare and that the improved throughput for bulk transfers that most people see when using PMTU can be achieved with far fewer headaches (and indeed on far more servers where PMTU is not available in the first place) by simply increasing the default MSS to 1460 (or 1360 to be friendly to users of PPPoE and GRE and similar :-). [[it's almost always possible to increase the default MSS for a server even if it's not easy.]] So, how about it everyone? Can we please all disable PMTU everywhere and try just increasing our default MSS where necessary? I.e. even if you're using a load balancer or not? Pretty please? The extra fragmentation is only going to be a problem for those people who live behind tunnels of one sort or another. I certainly don't mind paying for a bit of extra fragmentation in order to use my low-cost high-bandwidth tunnel! -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
On Thu, 15 Jun 2000, Greg A. Woods wrote:
So, how about it everyone? Can we please all disable PMTU everywhere
I assume you Mean PMTU-D, not PMTU.
and try just increasing our default MSS where necessary? I.e. even if you're using a load balancer or not? Pretty please? The extra fragmentation is only going to be a problem for those people who live behind tunnels of one sort or another. I certainly don't mind paying for a bit of extra fragmentation in order to use my low-cost high-bandwidth tunnel!
NO! If PMTU-D is causing problems, then get whoever has a broken network to fix it. Is it always practical? Of course not. But education is the key. PMTU-D is not the problem here, and it is very shortsighted to say "oh, we just know better and can manually tune things to work well". That is not a wise "solution". If even 5% of people are in a situation where broken networks cause PMTU-D to not work, then such broken networks will be fixed, period. If you want to work around it on your systems, then lower your MTUs. But the solution is not for everyone to go disable PMTU-D because there are some broken networks; after all, the people that would listen to disable it are the same people who would just fix their broken networks. And in 99% of the cases, the broken network will be at their end or at the user's end, it will very seldom be in some network in the middle providing transit.
[ On Thursday, June 15, 2000 at 21:54:58 (-0700), Marc Slemko wrote: ]
Subject: Re: PMTU-D: remember, your load balancer is broken
On Thu, 15 Jun 2000, Greg A. Woods wrote:
So, how about it everyone? Can we please all disable PMTU everywhere
I assume you Mean PMTU-D, not PMTU.
Yes, of course... :-)
If PMTU-D is causing problems, then get whoever has a broken network to fix it. Is it always practical? Of course not. But education is the key. PMTU-D is not the problem here, and it is very shortsighted to say "oh, we just know better and can manually tune things to work well". That is not a wise "solution". If even 5% of people are in a situation where broken networks cause PMTU-D to not work, then such broken networks will be fixed, period.
I don't yet agree. I've never yet seen Path-MTU-Discovery used on the public Internet for any purpose that cannot better be achieved by simply tuning your default MSS to a more "modern" value. IIRC you yourself advocated this very same solution. People say they need PMTU-D to get good throughput on bulk data transfers and yet they can achieve the same efficiencies by simply tuning their TCP stacks to meet the demands and capabilities of the modern Internet. PMTU-D is really only a hack that's not currently necessary.
If you want to work around it on your systems, then lower your MTUs.
In my particular case that's what causes the problem in the first place! ;-)
But the solution is not for everyone to go disable PMTU-D because there are some broken networks; after all, the people that would listen to disable it are the same people who would just fix their broken networks.
Actually I think the most practical solution is for server OS vendors to choose better defaults (i.e. PMTU-D should be off by default and the default MSS should be set to something very close to 1460), and for them to better document both the effects and the dangers of changing these values. In the mean time those who are using PMTU-D really must re-evaluate the reasons they are using it and check to see if they can't achieve the same results through adjusting their default MSS instead. Defaulting to always using PMTU-D will be guaranteed to always lead to problems that, as has been said already, will always result in 100% failure for those affected. Not tuning your default MSS will only result in degraded service, never complete failure so far as I can tell. Furthermore as I've tried to demonstrate, and as you more or less confirm in your next sentence, any degradation introduced will only affect those few people who are in the first place susceptible to complete failures when PMTU-D is used. The overall effect on the Internet will be minor (and perhaps minutely positive since there'll no longer be any excess "needs frag" packets and retransmissions being sent). Even people running servers on local networks with >1500-byte MTUs would not suffer (and might actually benefit as above too) if their primary purpose is to serve to the Internet since most of the Internet is running with just 1500-byte MTUs and so they can't usually send bigger packets anyway.....
And in 99% of the cases, the broken network will be at their end or at the user's end, it will very seldom be in some network in the middle providing transit.
Indeed it's almost never the network in the middle that's at fault, though strictly speaking in my case I've always encountered problems when the link between my networks and the next hop out has the lower MTU (eg. PPP, PPPoE, GRE, etc.) BTW, what happens to a server using PMTU-D if some attacker starts successfully spoofing "needs frag" replies to it with rediculously low next-hop-MTU? :-) I.e. how many existing server implementations are robust enough to even verify the sanity of the MTU they're being asked to use, never mind validating that the IP header and data returned in the needs-frag payload match the original bit-for-bit? -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
participants (5)
-
Brett Frankenberger
-
Marc Slemko
-
Steven M. Bellovin
-
Valdis.Kletnieks@vt.edu
-
woods@weird.com