
I've recently had the pleasure of troubleshooting a problem I don't normally have to deal with, and the results don't quite make sense to me. I'm hoping someone can enlighten me as to what is going on. A diagram: server---internet---fw---tunnelbox1----tunnelbox2----user The tunnel between the tunnelboxes is a lower (1480) MTU. Originally the user couldn't access some servers, turns out the firewall was filtering ICMP Can't Fragment messages, preventing PMTU from working in the server->user direction (tunnelbox1 would generate Can't Fragement, firewall would filter). That's been corrected. Going to a server I control I see good PMTU in both directions between the server and the user. However, there are still a number of web servers for popular sites that behave just like the firewall was still filtering Can't Fragments. The theory is that the servers are behind a firewall/load balancer that is filtering them on the server side -- but I find it slightly (emphasis on the slightly) that someone would turn on PMTU discovery, and then filter it out right in front of the boxes where they turned it on. Also, it seems to me most DSL users are behind PPPoE links with lower MTU, and should get hit by the same problem. The temporary hack is to have tunnelbox1 clear the DF bit on all incoming packets, which just causes the packets to get fragmented going down the tunnel. A minor performance hit, but it works. This is a new problem to me, but I'm sure people have run into it before. Are the servers really that broken (PMTU enabled, ICMP Can't Fragement filtered)? Does the head end box of DSL services generally do something to work around this (ie, clear the DF bit)? Am I just being an idiot and missing something obvious? -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/ Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org

On Thu, 8 May 2003, Leo Bicknell wrote:
I've recently had the pleasure of troubleshooting a problem I don't normally have to deal with, and the results don't quite make sense to me. I'm hoping someone can enlighten me as to what is going on. A diagram:
I had a rant about this a few months back (as many others have done before me), its a combination of ICMP filtering and RFC1918 links on the Internet that cause this
server---internet---fw---tunnelbox1----tunnelbox2----user
The tunnel between the tunnelboxes is a lower (1480) MTU. Originally the user couldn't access some servers, turns out the firewall was filtering ICMP Can't Fragment messages, preventing PMTU from working in the server->user direction (tunnelbox1 would generate Can't Fragement, firewall would filter).
That's been corrected. Going to a server I control I see good PMTU in both directions between the server and the user. However, there are still a number of web servers for popular sites that behave just like the firewall was still filtering Can't Fragments. The theory is that the servers are behind a firewall/load balancer that is filtering them on the server side -- but I find it slightly (emphasis on the slightly) that someone would turn on PMTU discovery, and then filter it out right in front of the boxes where they turned it on. Also, it seems to me most DSL users are behind PPPoE links with lower MTU, and should get hit by the same problem.
The temporary hack is to have tunnelbox1 clear the DF bit on all incoming packets, which just causes the packets to get fragmented going down the tunnel. A minor performance hit, but it works.
Consider this a permanent hack if you want to keep things working on the tunnel..
This is a new problem to me, but I'm sure people have run into it before. Are the servers really that broken (PMTU enabled, ICMP
Absolutely
Can't Fragement filtered)? Does the head end box of DSL services generally do something to work around this (ie, clear the DF bit)?
I've wondered this too, not sure but they clearly do something, perhaps they encapsulate the packets in fragments then recombine without altering the original packet?
Am I just being an idiot and missing something obvious?
Steve

"bicknell" == Leo Bicknell <bicknell@ufp.org> writes:
bicknell> This is a new problem to me, but I'm sure people have bicknell> run into it before. Are the servers really that broken bicknell> (PMTU enabled, ICMP Can't Fragement filtered)? Does the bicknell> head end box of DSL services generally do something to bicknell> work around this (ie, clear the DF bit)? Am I just bicknell> being an idiot and missing something obvious? I first saw this about four years ago with a web site running behind a load balancing device. It was -- and probably still is -- another issue of default configuration hell. The web servers were configured by default to do Path MTU discovery, while the load balancer had no concept of passing the ICMP Need Fragment packet back to the appropriate server. (There may still be no good way to do this; if I remember right, the ICMP Need Fragment packet contains only IPs and not ports; the host sending the ICMP packet will be using its IP and the outside IP of the load balancer, giving the load balancer no good way to determine where to pass the ICMP packet, unless the load balancer is guaranteeing that all data from a particular IP goes to a particular server -- also not a default configuration.) It's a hard call for which to make the default; PMTU makes sense, obviously, unless you're running behind a load balancer. It's another one of those things that probably isn't documented anywhere, or if it is, it's buried in an appendix that nobody gets to. The only solution is to mail the folks maintaining the web sites you can't get to with a short explanation of what you think the problem is, and hope they look into it and fix it. Not unlike smurf relays and networks that don't filter outgoing source addresses. }:> -dalvenjah --

This is a new problem to me, but I'm sure people have run into it before. Are the servers really that broken (PMTU enabled, ICMP Can't Fragement filtered)? Does the head end box of DSL services generally do something to work around this (ie, clear the DF bit)? Am I just being an idiot and missing something obvious?
This is fairly common, since PMTU-D is generally enabled by deafult, and for better or worse, many folks filter all ICMP, despite the bad effects that can lead to. I've had arguments with customers about their having a broken config, but their unwillingness to believe it because "they haven't changed anything". The only real workaround is to have a minimum MTU of 1500 across your network including all encapsulation.

I've had the problem before. Not all routers handle PMTU correctly. Curtis On Thu, 8 May 2003, Leo Bicknell wrote:
I've recently had the pleasure of troubleshooting a problem I don't normally have to deal with, and the results don't quite make sense to me. I'm hoping someone can enlighten me as to what is going on. A diagram:
server---internet---fw---tunnelbox1----tunnelbox2----user
The tunnel between the tunnelboxes is a lower (1480) MTU. Originally the user couldn't access some servers, turns out the firewall was filtering ICMP Can't Fragment messages, preventing PMTU from working in the server->user direction (tunnelbox1 would generate Can't Fragement, firewall would filter).
That's been corrected. Going to a server I control I see good PMTU in both directions between the server and the user. However, there are still a number of web servers for popular sites that behave just like the firewall was still filtering Can't Fragments. The theory is that the servers are behind a firewall/load balancer that is filtering them on the server side -- but I find it slightly (emphasis on the slightly) that someone would turn on PMTU discovery, and then filter it out right in front of the boxes where they turned it on. Also, it seems to me most DSL users are behind PPPoE links with lower MTU, and should get hit by the same problem.
The temporary hack is to have tunnelbox1 clear the DF bit on all incoming packets, which just causes the packets to get fragmented going down the tunnel. A minor performance hit, but it works.
This is a new problem to me, but I'm sure people have run into it before. Are the servers really that broken (PMTU enabled, ICMP Can't Fragement filtered)? Does the head end box of DSL services generally do something to work around this (ie, clear the DF bit)? Am I just being an idiot and missing something obvious?
-- -- Curtis Maurand mailto:curtis@maurand.com http://www.maurand.com

You mean theres routers which get a large packet and silently drop it rather than return an icmp? Curious as to know which vendors? (read fundementally broken!) Steve On Mon, 12 May 2003, Curtis Maurand wrote:
I've had the problem before. Not all routers handle PMTU correctly.
Curtis
On Thu, 8 May 2003, Leo Bicknell wrote:
I've recently had the pleasure of troubleshooting a problem I don't normally have to deal with, and the results don't quite make sense to me. I'm hoping someone can enlighten me as to what is going on. A diagram:
server---internet---fw---tunnelbox1----tunnelbox2----user
The tunnel between the tunnelboxes is a lower (1480) MTU. Originally the user couldn't access some servers, turns out the firewall was filtering ICMP Can't Fragment messages, preventing PMTU from working in the server->user direction (tunnelbox1 would generate Can't Fragement, firewall would filter).
That's been corrected. Going to a server I control I see good PMTU in both directions between the server and the user. However, there are still a number of web servers for popular sites that behave just like the firewall was still filtering Can't Fragments. The theory is that the servers are behind a firewall/load balancer that is filtering them on the server side -- but I find it slightly (emphasis on the slightly) that someone would turn on PMTU discovery, and then filter it out right in front of the boxes where they turned it on. Also, it seems to me most DSL users are behind PPPoE links with lower MTU, and should get hit by the same problem.
The temporary hack is to have tunnelbox1 clear the DF bit on all incoming packets, which just causes the packets to get fragmented going down the tunnel. A minor performance hit, but it works.
This is a new problem to me, but I'm sure people have run into it before. Are the servers really that broken (PMTU enabled, ICMP Can't Fragement filtered)? Does the head end box of DSL services generally do something to work around this (ie, clear the DF bit)? Am I just being an idiot and missing something obvious?

Thus spake "Stephen J. Wilcox" <steve@telecomplete.co.uk>
You mean theres routers which get a large packet and silently drop it rather than return an icmp?
Curious as to know which vendors? (read fundementally broken!)
Well, most core routers rate-limit the ICMP messages they generate, so any given packet may not result in a Needs-Fragmentation error. If the result is consistent, however, you're likely dealing with an ACL or broken loadbalancer as Leo describes:
On Thu, 8 May 2003, Leo Bicknell wrote: However, there
are still a number of web servers for popular sites that behave just like the firewall was still filtering Can't Fragments. The theory is that the servers are behind a firewall/load balancer that is filtering them on the server side -- but I find it slightly (emphasis on the slightly) that someone would turn on PMTU discovery, and then filter it out right in front of the boxes where they turned it on. Also, it seems to me most DSL users are behind PPPoE links with lower MTU, and should get hit by the same problem.
The problem here is that the Needs-Frag error comes back as an ICMP, and many load balancers don't bother looking inside at the offending packet to determine which server to forward the error to. Why do these people use PMTUD? It's on by default, and you have to muck with the registry (or the unix equivalent) to disable it, at which point you're better off enabling PMTU Black Hole Detection. Hopefully BHD will also be default someday. Most network folk have found it's easier to provide 1500 MTU than to educate all of the server operators and end users as to what's going wrong with PMTU. This is also, IMHO, the only significant reason jumbo frames aren't in widespread use -- we have no reliable means of coping with networks that remain at 1500 MTU. S

* stephen@sprunk.org (Stephen Sprunk) [Mon 12 May 2003, 19:24 CEST]:
Most network folk have found it's easier to provide 1500 MTU than to educate all of the server operators and end users as to what's going wrong with PMTU. This is also, IMHO, the only significant reason jumbo frames aren't in widespread use -- we have no reliable means of coping with networks that remain at 1500 MTU.
That was already the case when the FDDI MAEs were still in operation with their 4470 byte MTUs, where the Gigaswitches didn't have IP addresses they could send ICMP Fragmentation Needed messages from when having to bridge large frames from FDDI to Ethernet... Regards, -- Niels.

I had a problem where a NXNetworks VPN router didn't process the results properly. I couldn't put my finger on exactly whose router was causing the trouble, but using freeswan to a freeswan I was able to test my theory as I gradually increased the MTU on my connection until I got a failure. One end of the VPN is on a RoadRunner connection and the other was on a Prexar connection. The route in between is anyone's guess, but I think, at the time, Prexar was trying to push traffic over their Cable and Wireless connection. Now that C&W is gone, I'll have to try it again. Curtis On Mon, 12 May 2003, Stephen J. Wilcox wrote:
You mean theres routers which get a large packet and silently drop it rather than return an icmp?
Curious as to know which vendors? (read fundementally broken!)
Steve
On Mon, 12 May 2003, Curtis Maurand wrote:
I've had the problem before. Not all routers handle PMTU correctly.
Curtis
On Thu, 8 May 2003, Leo Bicknell wrote:
I've recently had the pleasure of troubleshooting a problem I don't normally have to deal with, and the results don't quite make sense to me. I'm hoping someone can enlighten me as to what is going on. A diagram:
server---internet---fw---tunnelbox1----tunnelbox2----user
The tunnel between the tunnelboxes is a lower (1480) MTU. Originally the user couldn't access some servers, turns out the firewall was filtering ICMP Can't Fragment messages, preventing PMTU from working in the server->user direction (tunnelbox1 would generate Can't Fragement, firewall would filter).
That's been corrected. Going to a server I control I see good PMTU in both directions between the server and the user. However, there are still a number of web servers for popular sites that behave just like the firewall was still filtering Can't Fragments. The theory is that the servers are behind a firewall/load balancer that is filtering them on the server side -- but I find it slightly (emphasis on the slightly) that someone would turn on PMTU discovery, and then filter it out right in front of the boxes where they turned it on. Also, it seems to me most DSL users are behind PPPoE links with lower MTU, and should get hit by the same problem.
The temporary hack is to have tunnelbox1 clear the DF bit on all incoming packets, which just causes the packets to get fragmented going down the tunnel. A minor performance hit, but it works.
This is a new problem to me, but I'm sure people have run into it before. Are the servers really that broken (PMTU enabled, ICMP Can't Fragement filtered)? Does the head end box of DSL services generally do something to work around this (ie, clear the DF bit)? Am I just being an idiot and missing something obvious?
-- -- Curtis Maurand mailto:curtis@maurand.com http://www.maurand.com

Okay we're not actually saying the TCP stack is broken then as I interpreted your previous email, we mean there are routers with broken (user) config on them ie dropping icmp frags. Sorry! Steve On Mon, 12 May 2003, Curtis Maurand wrote:
I had a problem where a NXNetworks VPN router didn't process the results properly. I couldn't put my finger on exactly whose router was causing the trouble, but using freeswan to a freeswan I was able to test my theory as I gradually increased the MTU on my connection until I got a failure. One end of the VPN is on a RoadRunner connection and the other was on a Prexar connection. The route in between is anyone's guess, but I think, at the time, Prexar was trying to push traffic over their Cable and Wireless connection. Now that C&W is gone, I'll have to try it again.
Curtis
On Mon, 12 May 2003, Stephen J. Wilcox wrote:
You mean theres routers which get a large packet and silently drop it rather than return an icmp?
Curious as to know which vendors? (read fundementally broken!)
Steve
On Mon, 12 May 2003, Curtis Maurand wrote:
I've had the problem before. Not all routers handle PMTU correctly.
Curtis
On Thu, 8 May 2003, Leo Bicknell wrote:
I've recently had the pleasure of troubleshooting a problem I don't normally have to deal with, and the results don't quite make sense to me. I'm hoping someone can enlighten me as to what is going on. A diagram:
server---internet---fw---tunnelbox1----tunnelbox2----user
The tunnel between the tunnelboxes is a lower (1480) MTU. Originally the user couldn't access some servers, turns out the firewall was filtering ICMP Can't Fragment messages, preventing PMTU from working in the server->user direction (tunnelbox1 would generate Can't Fragement, firewall would filter).
That's been corrected. Going to a server I control I see good PMTU in both directions between the server and the user. However, there are still a number of web servers for popular sites that behave just like the firewall was still filtering Can't Fragments. The theory is that the servers are behind a firewall/load balancer that is filtering them on the server side -- but I find it slightly (emphasis on the slightly) that someone would turn on PMTU discovery, and then filter it out right in front of the boxes where they turned it on. Also, it seems to me most DSL users are behind PPPoE links with lower MTU, and should get hit by the same problem.
The temporary hack is to have tunnelbox1 clear the DF bit on all incoming packets, which just causes the packets to get fragmented going down the tunnel. A minor performance hit, but it works.
This is a new problem to me, but I'm sure people have run into it before. Are the servers really that broken (PMTU enabled, ICMP Can't Fragement filtered)? Does the head end box of DSL services generally do something to work around this (ie, clear the DF bit)? Am I just being an idiot and missing something obvious?

Most of the equipment in between would be Cisco, Juniper and Redback. I doubt that any of that equipment has broken stacks, just configured to not send ICMP replies so PMTU discovery will break. IPSEC is rather picky. :-) curtis On Mon, 12 May 2003, Stephen J. Wilcox wrote:
Okay we're not actually saying the TCP stack is broken then as I interpreted your previous email, we mean there are routers with broken (user) config on them ie dropping icmp frags. Sorry!
Steve
On Mon, 12 May 2003, Curtis Maurand wrote:
I had a problem where a NXNetworks VPN router didn't process the results properly. I couldn't put my finger on exactly whose router was causing the trouble, but using freeswan to a freeswan I was able to test my theory as I gradually increased the MTU on my connection until I got a failure. One end of the VPN is on a RoadRunner connection and the other was on a Prexar connection. The route in between is anyone's guess, but I think, at the time, Prexar was trying to push traffic over their Cable and Wireless connection. Now that C&W is gone, I'll have to try it again.
Curtis
On Mon, 12 May 2003, Stephen J. Wilcox wrote:
You mean theres routers which get a large packet and silently drop it rather than return an icmp?
Curious as to know which vendors? (read fundementally broken!)
Steve
On Mon, 12 May 2003, Curtis Maurand wrote:
I've had the problem before. Not all routers handle PMTU correctly.
Curtis
On Thu, 8 May 2003, Leo Bicknell wrote:
I've recently had the pleasure of troubleshooting a problem I don't normally have to deal with, and the results don't quite make sense to me. I'm hoping someone can enlighten me as to what is going on. A diagram:
server---internet---fw---tunnelbox1----tunnelbox2----user
The tunnel between the tunnelboxes is a lower (1480) MTU. Originally the user couldn't access some servers, turns out the firewall was filtering ICMP Can't Fragment messages, preventing PMTU from working in the server->user direction (tunnelbox1 would generate Can't Fragement, firewall would filter).
That's been corrected. Going to a server I control I see good PMTU in both directions between the server and the user. However, there are still a number of web servers for popular sites that behave just like the firewall was still filtering Can't Fragments. The theory is that the servers are behind a firewall/load balancer that is filtering them on the server side -- but I find it slightly (emphasis on the slightly) that someone would turn on PMTU discovery, and then filter it out right in front of the boxes where they turned it on. Also, it seems to me most DSL users are behind PPPoE links with lower MTU, and should get hit by the same problem.
The temporary hack is to have tunnelbox1 clear the DF bit on all incoming packets, which just causes the packets to get fragmented going down the tunnel. A minor performance hit, but it works.
This is a new problem to me, but I'm sure people have run into it before. Are the servers really that broken (PMTU enabled, ICMP Can't Fragement filtered)? Does the head end box of DSL services generally do something to work around this (ie, clear the DF bit)? Am I just being an idiot and missing something obvious?
-- -- Curtis Maurand mailto:curtis@maurand.com http://www.maurand.com
participants (7)
-
bdragon@gweep.net
-
Curtis Maurand
-
Dalvenjah FoxFire
-
Leo Bicknell
-
Niels Bakker
-
Stephen J. Wilcox
-
Stephen Sprunk