I-D on operational MTU/fragmentation issues in tunneling
Hi all, I've written a very short (about 5 pages of meat) Internet-Draft describing the issues and operational approaches to the problems faced with doing tunneling in the network -- as these issues kept coming up again and again with IP-in-IP, GRE, L2TP, etc. The approaches may be different for passive monitoring ('wiretapping' etc.) and 'active' tunneling. The document is about to be IETF Last Called for Informational RFC, but prior to that, I'd like to solicit comments/feedback/review from the people here because I'm 100% sure a lot of people have been faced with these issues (we certainly have..). Please send comments to me by the end of this week, either on- of off-list, as you deem appropriate. Find it at: http://www.ietf.org/internet-drafts/draft-savola-mtufrag-network-tunneling-0... Abstract Tunneling techniques such as IP-in-IP when deployed in the middle of the network, typically between routers, have certain issues regarding how large packets can be handled: whether such packets would be fragmented and reassembled (and how), whether Path MTU Discovery would be used, or how this scenario could be operationally avoided. This memo justifies why this is a common, non-trivial problem, and goes on to describe the different solutions and their characteristics at some length. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
On 11-okt-04, at 10:12, Pekka Savola wrote:
The document is about to be IETF Last Called for Informational RFC, but prior to that, I'd like to solicit comments/feedback/review from the people here because I'm 100% sure a lot of people have been faced with these issues (we certainly have..).
Well, tunnels suck. No news there. It is interesting to note that at least one implementation provides a special knob to fragment the inner packet prior to encapsulation even if the DF bit has been set -- this is non-compliant behaviour, but possibly has been required in certain tightly controlled passive monitoring scenarios. Such a setup wouldn't work for packets which have already been fragmented if they needed to be fragmented again, though. Why would it be impossible to refragment fragments??? I have a setup with dial-up over L2TP that doesn't support an MTU bigger than 576 (which is completely unnecessary of course, but try telling the people at the other end of the L2TP thingy that) so I clear the DF bit for all incoming packets that have to go through the PPP/L2TP tunnel. Works like a charm. (Surprisingly, all users seem to have systems that are capable of reassembling 1.5 kB packets now.) But I don't understand why anyone would want to use tunnels in the backbone. That's what VLANs are for. And if you don't use ether, you aren't bound by yester-millennium's 1500 byte MTU anyway. In IPv6 there is the interesting problem that there are already many tunnels all over the place that often have a 1280 byte MTU, so tunneling over that can't be done because of the mandatory minimum MTU of 1280 bytes.
Thanks to you, and all who have replied (both off and on-list). I was pleasantly surprised at the amount of review I've received. Keep them coming! I'll try to respond/react to them shortly. I'll respond to both posts on this list in one message: On Wed, 13 Oct 2004, Iljitsch van Beijnum wrote:
On 11-okt-04, at 10:12, Pekka Savola wrote:
The document is about to be IETF Last Called for Informational RFC, but prior to that, I'd like to solicit comments/feedback/review from the people here because I'm 100% sure a lot of people have been faced with these issues (we certainly have..).
Well, tunnels suck. No news there.
It is interesting to note that at least one implementation provides a special knob to fragment the inner packet prior to encapsulation even if the DF bit has been set -- this is non-compliant behaviour, but possibly has been required in certain tightly controlled passive monitoring scenarios. Such a setup wouldn't work for packets which have already been fragmented if they needed to be fragmented again, though.
Why would it be impossible to refragment fragments???
True -- thanks for catching this. I had a brain fart when I thought that there isn't enough information in the IP header to do that. But as long as you don't exhaust the IP identification number space, it's OK..
But I don't understand why anyone would want to use tunnels in the backbone. That's what VLANs are for. And if you don't use ether, you aren't bound by yester-millennium's 1500 byte MTU anyway.
I don't think it's quite as simple as that. First, even if you used Ethernet, you would seem to have to require that all the tunnel entry and exit points reside in the same Ethernet VLAN "space". That is, all the entry/exit points would have to be hooked to the Ethernet switch core network (somehow), or that the routers would support some kind of VLAN 'passthrough' -- encapsulating the VLAN's traffic to some other interface's VLAN. These may hold in some situations, but not in general. Remember that the problem comes up especially if you need to tunnel beyond the "domain" where you have a high MTU (or can use VLANs). If you can assume that.. well, that's one solution proposed in the draft.
In IPv6 there is the interesting problem that there are already many tunnels all over the place that often have a 1280 byte MTU, so tunneling over that can't be done because of the mandatory minimum MTU of 1280 bytes.
Actually, it can be done, see RFC2473 ('Generic Packet Tunneling in IPv6'). The entry point trying to encapsulate a 1280 byte packet in 1280 byte MTU just have to do some fragmentation, see section 7.1 (b). .......... On Thu, 14 Oct 2004, Sabri Berisha wrote:
On Mon, Oct 11, 2004 at 11:12:55AM +0300, Pekka Savola wrote:
Hi Pekka and others,
Please send comments to me by the end of this week, either on- of off-list, as you deem appropriate.
With the risk of stating the obvious I would say that normally, PMTUD should do the trick. [...]
For some (mostly host-based) tunnels, yes. But the point is that if you insert such a tunnel in the middle of the network, where you have e.g. Internet traffic from millions of nodes passing through on both directions, just counting on PMTUD would require that your network originated billions of Packet too Big messages each day, and depended on the fact that the users have not blocked the ICMPs. Further, there are also passive monitoring applications (like wiretaps) where you DON'T want anyone to know something "fishy" is going on. So, in practice, I fail to see how PMTUD or the like would really work in the more generic environments than just host-based or "last-hop" tunnels. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
On Mon, Oct 11, 2004 at 11:12:55AM +0300, Pekka Savola wrote: Hi Pekka and others,
Please send comments to me by the end of this week, either on- of off-list, as you deem appropriate.
With the risk of stating the obvious I would say that normally, PMTUD should do the trick. Afterall, there is no real difference between the lower MTU of a tunnel and the lower MTU of any other link. With this in mind, the real problem can be found on networks and hosts that block ICMP-host-unreachables (or simply all ICMP traffic for "security" reasons). Taking this one step further, one might realise that we (as networking community) are looking for a technical solution to compensate for the lack of knowledge of the end-user administrators or webmasters. In my work I have been using tunnels quite a lot, and have delt with a lot if issues regarding PMTUD problems. For end-users behind a tunnel, the best solution is usually to turn PMTUD off completely, such as [root@bofh root]# sysctl -w net.inet.tcp.path_mtu_discovery=0 net.inet.tcp.path_mtu_discovery: 1 -> 0 on a FreeBSD box. I agree that this is far less efficient than it should be, but that's always the flipside of the tunnel-coin. Another option would be to simply strip the DF bit on your tunnel entrance point, but that would be rather undesirable.. -- Sabri Berisha, SAB666-RIPE - I route, therefore you are http://www.cluecentral.net - http://www.virt-ix.net
Sabri Berisha wrote:
On Mon, Oct 11, 2004 at 11:12:55AM +0300, Pekka Savola wrote:
Hi Pekka and others,
Please send comments to me by the end of this week, either on- of off-list, as you deem appropriate.
With the risk of stating the obvious I would say that normally, PMTUD should do the trick.
On todays internet everything is more reliable than PMTUD. How about replacing it completely with something more inband, less prone to firewall breakage?
On Thu, 14 Oct 2004, Joe Maimon wrote:
Sabri Berisha wrote:
On Mon, Oct 11, 2004 at 11:12:55AM +0300, Pekka Savola wrote:
Hi Pekka and others,
Please send comments to me by the end of this week, either on- of off-list, as you deem appropriate.
With the risk of stating the obvious I would say that normally, PMTUD should do the trick. On todays internet everything is more reliable than PMTUD.
How about replacing it completely with something more inband, less prone to firewall breakage?
You mean something like Packetization Layer Path MTU Discovery (PLPMTUD)? http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-02.txt http://www.psc.edu/~mathis/MTU/pmtud/ Sam
Sam Stickland wrote:
On Thu, 14 Oct 2004, Joe Maimon wrote:
Sabri Berisha wrote:
On Mon, Oct 11, 2004 at 11:12:55AM +0300, Pekka Savola wrote:
Hi Pekka and others,
Please send comments to me by the end of this week, either on- of off-list, as you deem appropriate.
With the risk of stating the obvious I would say that normally, PMTUD should do the trick.
On todays internet everything is more reliable than PMTUD.
How about replacing it completely with something more inband, less prone to firewall breakage?
You mean something like Packetization Layer Path MTU Discovery (PLPMTUD)?
http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-02.txt
http://www.psc.edu/~mathis/MTU/pmtud/
Sam
Thanks for raising this to the forefront. I had been aware of this I-D in previous form, also referenced in the linked to by parent I-D. Its a very ingenuous mechanism to allow discovery while still delivering packets and looks like a big improvement over what we live with now. --Downsides as applies to the I-D that pretty much apply as well to the current PMTUD * its pretty complex and needs to be re-incarnated into every l4 protocol. * data delivery can be interrupted pending retransmission of dropped probe packets (if not sent concurrently) * data packets can only be sent concurrently in different sized packets if the l4 layer supports detecting duplicate data * does not operate on the layer it is meant to interrogate. IOW -- its a l4 protocol feature concerned about l3 features Other ideas I mentioned that may very well be unworkable or naive. I would appreciate any pointers to any prior discussion for any of them. All these do NOT need to set the DF bit. *A probing mechanism that does not turn on the DF bit would not interrupt data flow with dropped probes. The protocol would need to support being informed by the remote site of max payload size received. It can then use this as the outgoing value or as an indication to fallback to a previous value and/or reset a timer for when to try a higher packet size again. Except for spoofing concerns this naturally belongs in the l3 protocol. A cookie option might mitigate spoofing concerns. This could be implemented in a l3 or l4 protocol. A l3 protocol implemenation could allow the upper l4 protocol the decision to turn the l3 one off, turn its own mechanism off, or use both. One gotcha. hops that optimize by fragging into equal or other sized packets not clearly corresponding to actual link mtu. An implementation would need heuristics to catch this, instead of merely using the returned value. *A protocol that is dedicated completely to path mtu discovery would be a nice addition to the stacks toolbox and would be fairly usefull for any protocol on the stack that does not have its own method or for some reason cannot trust its own methods results or just want a second opinion. This is outband enough that if successfull or unsuccessfull operation should not affect the main traffic flow of interest. A UDP protocol would need to use cookie values to prevent easy spoofs. Heuristics might also be neccessary. * An IP option that when present triggers a new ICMP message, Fragemented and Delivered with frag size and link size as values. A returned cookie or packet header contents would minimize spoofs. * The above without the new IP option. It now occurs to me that I should take this over to the WG.......oh well. I have already written it. Sorry for the BW.
On Thu, 14 Oct 2004, Sabri Berisha wrote:
for the lack of knowledge of the end-user administrators or webmasters.
... or vendors of equipment that these people use. There are plenty of vendors out there who make loadsharing-equipment for the enterprise that doesn't handle all these cases. It's just a myth that this is a simple user ignorance issue, it's a much bigger problem than that, it's a vendor ignorance issue as well. -- Mikael Abrahamsson email: swmike@swm.pp.se
Mikael Abrahamsson wrote:
On Thu, 14 Oct 2004, Sabri Berisha wrote:
for the lack of knowledge of the end-user administrators or webmasters.
... or vendors of equipment that these people use. There are plenty of vendors out there who make loadsharing-equipment for the enterprise that doesn't handle all these cases.
It's just a myth that this is a simple user ignorance issue, it's a much bigger problem than that, it's a vendor ignorance issue as well.
Yea, we need an FDA to approve network equipment. ;-) -- Andre
On Thu, Oct 14, 2004 at 06:05:06PM +0200, Mikael Abrahamsson wrote:
On Thu, 14 Oct 2004, Sabri Berisha wrote:
for the lack of knowledge of the end-user administrators or webmasters.
... or vendors of equipment that these people use. There are plenty of vendors out there who make loadsharing-equipment for the enterprise that doesn't handle all these cases.
Unfortunately yes. In fact, I quite recently found a problem in Riverstone's SSR2000's which just drop host-unreachables on tcp-loadbalanced connection.. However, we still need to ask the question "do we keep finding workarounds for other peoples poor administration/implementation of technical solutions?".. The technical solution for MTU problems is Path MTU Discovery. If a vendor fails to implement, one should not buy its equipment. If an end-user breaks his own connectivity, he/she needs education. That would be the ideal world. In the less-than ideal world we have to find a way to defeat the cluelessness (excuse the language) of vendors and end-users. -- Sabri Berisha, SAB666-RIPE - I route, therefore you are http://www.cluecentral.net - http://www.virt-ix.net
participants (7)
-
Andre Oppermann
-
Iljitsch van Beijnum
-
Joe Maimon
-
Mikael Abrahamsson
-
Pekka Savola
-
Sabri Berisha
-
Sam Stickland