Re: PMTU-D: remember, your load balancer is broken

16 Jun 2000

      [ On Thursday, June 15, 2000 at 10:15:22 (-0400), Greg A. Woods wrote: ]
...
Subject: Re: PMTU-D: remember, your load balancer is broken
Since discovering that servers with an MSS default of 512 bytes cannot
possibly ever deliver good TCP throughput to local high-speed customers
(eg. on a cable or DSL plant), I've also been hard-coding a TCP MSS
default of 1460 on most systems I control (though on cable modem squid
servers, etc., it could probably safely be raised to 1500, but of course
on my GRE tunnel this is the maximum I can use without fragmentation).
No, silly me -- it has to be lowered to 1410 on my GRE tunnel when the
tunnel MTU is 1450...  I *still* keep getting the MSS and MTU confused.
I do like the way some folks have been saying 1460+40 to express the MTU
as that does eliminate some of the confusion by stating the obvious....
...
In fact I think I'm having this very problem with segue.merit.edu
[198.108.1.41] trying to deliver some NANOG messages to my server ever
since yesterday or the day before!  (Another server at theplanet.co.uk
is definitely giving me these headaches -- I still have to capture a
failed connection from segue.merit.edu to prove the latter though....)
A whole bunch of tcpdump'ing on my upstream router later I was finally
able to duplicate the problem using a remote host where I new the path
was open to all ICMP and where I could run tcpdump and could turn on
Path-MTU-discovery and do some FTPs through my tunnel.

It turns out the "needs frag" packets were arriving just fine at the
remote host and these packets were correctly specifying the maximum size
(which at the time was 1448 bytes).  In fact I could send a ping packet
of exactly that size and no larger from the same test server through my
tunnel without it being fragmented or rejected.  However it seems that
there's a bug somewhere deep in, or below, the GRE tunnel code on NetBSD
(1.4ZD) that causes it to silently drop maximum sized packets if they
have the DF bit set.  It may be that the MTU of the GRE interface is by
default one or two bytes too large, and based on that hypothesis we
manually forced the tunnel to have a lower MTU of 1400 and, voila!, it
works like a charm now!  All my NANOG mail came flooding in in short
order!  ;-)

So Path-MTU-discovery is still the problem -- but at least in my current
scenario it can sometimes be made to work, if really necessary.

I still have to wonder though why people seem to think they need to use
PMTU in the first place.  Certainly it may be of some advantage if you
want the majority of your traffic to be carried in "giant frames" but
yet you still need to communicate with some hosts that have interfaces
with more traditional sized MTUs *and* you don't want your gateway
router to have to fragment all the remote traffic (and then of course
remote hosts have to reassemble the fragments).  I'm guessing though
that this exact scenario is extrememly rare and that the improved
throughput for bulk transfers that most people see when using PMTU can
be achieved with far fewer headaches (and indeed on far more servers
where PMTU is not available in the first place) by simply increasing the
default MSS to 1460 (or 1360 to be friendly to users of PPPoE and GRE
and similar :-).  [[it's almost always possible to increase the default
MSS for a server even if it's not easy.]]

So, how about it everyone?  Can we please all disable PMTU everywhere
and try just increasing our default MSS where necessary?  I.e. even if
you're using a load balancer or not?  Pretty please?  The extra
fragmentation is only going to be a problem for those people who live
behind tunnels of one sort or another.  I certainly don't mind paying
for a bit of extra fragmentation in order to use my low-cost
high-bandwidth tunnel!

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>

Re: PMTU-D: remember, your load balancer is broken

woods＠weird.com