Hi, Do transit routers in the wild actually get to do IP fragmentation these days? I was wondering if routers actually do it or not, because the source usually discovers the path MTU and sends its data with the least supported MTU. Is this true? Even if this is, then this would break for multicast IP. The source cannot determine which receivers would get interested in the traffic and what capacities the links connecting them would support. So, a source would send IP packets with some size, and theres a chance that one of the routers *may* have to fragment those IP packets before passing it on to the next router. I would wager that the vendors and operators would want to avoid IP fragmentation since thats usually done in SW (unless you've got a very powerful ASIC or your box is NP based). Thanks, Glen
Glen Kent wrote:
Do transit routers in the wild actually get to do IP fragmentation these days? I was wondering if routers actually do it or not, because the source usually discovers the path MTU and sends its data with the least supported MTU. Is this true?
I believe that is only true for TCP over IPv4. UDP over IPv4 per se doesn't involve any MTU path discovery. Some UDP applications may in fact attempt MTU discovery and self-limit teh size of their packets, but that's not part of the UDP protocol. A hypothetical specific "real world" example of where very large UDP packets might occur is SNMP. An SNMP "get" or "set" operation generally has to fit inside a UDP packet. But UDP allows up to 64k bytes in the datagram. If an SNMP object value is a really long string (say 2000 bytes long), then it will exceed the typical 1500 MTU most Ethernet interfaces expect. So I believe fragmentation will occur at the originating system. On the other hand, some systems support Ethernet jumbograms, so I believe it is possible that a default gateway router would be the first network element forced to fragment the datagram. IPv6 is a different (and more complex) story of course - fragmentation is only supposed to occur on end points - even for UDP. Quick experiment you can try if you have a Unix-like system handy: use ping (and/or ping6 or an IPv6 aware ping) and supply it with a "-s" data size parameter of, say, 2000. That makes a larger than normal packet that can't fit into a standard Ethernet frame. Use wireshark or ethereal to see what happens. If your Ethernet cards support jumbograms, use the mtu parameter of ifconfig and set it up larger than 1500. Repeat the experiment with the large data sized pings with both locally and remote systems.
Even if this is, then this would break for multicast IP. The source cannot determine which receivers would get interested in the traffic and what capacities the links connecting them would support. So, a source would send IP packets with some size, and theres a chance that one of the routers *may* have to fragment those IP packets before passing it on to the next router.
I would wager that the vendors and operators would want to avoid IP fragmentation since thats usually done in SW (unless you've got a very powerful ASIC or your box is NP based).
I'm not sure how to address the above points since there appear to be some incorrect assumptions at play. It all depends on whether the Don't Fragment (DF) bit is set in IPv4 and how the source application responds to any resulting ICMP error responses (if the DF is set and one of the routes requires fragmentation).
I'm not sure how to address the above points since there appear to be some incorrect assumptions at play. It all depends on whether the Don't Fragment (DF) bit is set in IPv4 and how the source application responds to any resulting ICMP error responses (if the DF is set and one of the routes requires fragmentation).
OK, so what happens if a transit router does not support IP fragmentation and it receives a packet which is bigger than the outgoing link's MTU. Should it simply drop the packet or proactively send an ICMP Dest Unreachable error (Frag required) to the peer? I understand that routers usually must send this error only when a fragmentation is required and they recieve a packet with DF bit set. However, in this case this router would drop the packet (for it doesnt support fragmentation) and sending an ICMP error back to the host, warning it that its packets will get dropped seems to be a better option. OTOH, what do most of the implementations do if they send a regular IP packet and receive an ICMP dest unreachable - Fragmentation reqd message back? Do they fragment this packet and then send it out, or this message is silently ignored? Glen
|OK, so what happens if a transit router does not support IP |fragmentation All IPv4 routers are supposed to support fragmentation per RFC 1812 (Router Requirements), section 4.2.2.7. Tony
I understand, but the question is what if they dont? Or let me rephrase the question. What do standard implementations do if they send a regular IP packet (no DF bit set) and receive an ICMP dest unreachable - Fragmentation reqd message back? Do they fragment this packet and then send it out again with the MTU reported in the ICMP error message, or is the ICMP error message silently ignored? Glen On 8/29/08, Tony Li <tony.li@tony.li> wrote:
|OK, so what happens if a transit router does not support IP |fragmentation
All IPv4 routers are supposed to support fragmentation per RFC 1812 (Router Requirements), section 4.2.2.7.
Tony
On Fri, 29 Aug 2008 05:44:28 +0530, Glen Kent said:
I understand, but the question is what if they dont?
If it's an alleged router, and it doesn't know how to frag a packet, it's probably so brain-damaged that it can't send a recognizable 'Frag Needed' ICMP back either. At that point, all bets are off...
What do standard implementations do if they send a regular IP packet (no DF bit set) and receive an ICMP dest unreachable - Fragmentation reqd message back? Do they fragment this packet and then send it out again with the MTU reported in the ICMP error message, or is the ICMP error message silently ignored?
A quick perusal of the current Linux 2.6 net/ipv4/icmp.c source says this case ICMP_FRAG_NEEDED: if (ipv4_config.no_pmtu_disc) { LIMIT_NETDEBUG(KERN_INFO "ICMP: " NIPQUAD_FMT ": " "fragmentation needed " "and DF set.\n", NIPQUAD(iph->daddr)); } else { info = ip_rt_frag_needed(net, iph, ntohs(icmph->un.frag.mtu), skb->dev); In other words, if we're configured to do PMTU discovery, we cut back the MTU, and if PMTUD is disabled, we make a note in the kernel log that something odd happened and keep going. Note that it's by definition "odd", because if PMTUD is disabled, we didn't *send* a packet with the DF bit set, so any ICMP error complaining about a DF bit we didn't set is considered spurious.
In a message written on Wed, Aug 20, 2008 at 09:43:44PM +0530, Glen Kent wrote:
Do transit routers in the wild actually get to do IP fragmentation these days? I was wondering if routers actually do it or not, because the source usually discovers the path MTU and sends its data with the least supported MTU. Is this true?
Yes. A GigE jumbo frames host (9120) to a standard POS interface (4420) to a DS3 customer (1500) happens, and the GigE->POS and POS->DS3 routers must both do fragmentation.
I would wager that the vendors and operators would want to avoid IP fragmentation since thats usually done in SW (unless you've got a very powerful ASIC or your box is NP based).
As far as I know the "big" routers all do it in hardware with no real performance penality; but I haven't studied in detail. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
Leo Bicknell wrote:
In a message written on Wed, Aug 20, 2008 at 09:43:44PM +0530, Glen Kent wrote:
Do transit routers in the wild actually get to do IP fragmentation these days? [...]
Yes.
A GigE jumbo frames host (9120) to a standard POS interface (4420) to a DS3 customer (1500) happens, and the GigE->POS and POS->DS3 routers must both do fragmentation.
From the application (as opposed to network operator) point of view, the big problem with fragmentation is that if you lose one fragment in transit, all the fragments eventually get discarded, even if they've made it all the way to the destination. This hurts performance and wastes resources. So you may be better off not sending those jumbo frames in the first place. If your packet loss rate, end-to-end, is epsilon, and epsilon is so small that even several times epsilon is negligible, then maybe you don't care. But you're clearly now relying on a higher standard of performance from the network fabric than you otherwise would be. Way back when, before my beard was gray, Sun came out with the Sun-4 servers, based on the new SPARC architecture. These were then widely deployed as NFS servers for Sun-3 desktops. The default NFS blocksize was 8K, the default (maybe only) transport was UDP. Sun-3 would make a read request, Sun-4 would send an 8K+ UDP response, which would get fragmented into a burst of 6 IP fragments, Sun-3 would get the first 3 or 4 before falling behind (this was, after all, the blistering fast 10 megabit Ethernet) and dropping a fragment. Eventually, the reassembly would time out, all the received fragments would get discarded, NFS would resend ... lather, rinse, repeat. Setting the NFS read and write sizes to 1460 fixed this by avoiding fragmentation. This concludes today's presentation from the history channel. Jim Shankland
On Wed, 20 Aug 2008 21:43:44 +0530, Glen Kent said:
Do transit routers in the wild actually get to do IP fragmentation these days? I was wondering if routers actually do it or not, because the source usually discovers the path MTU and sends its data with the least supported MTU. Is this true?
Hypothetically true. Unfortunately, enough places do bozo firewalling and drop the ICMP Frag Needed packets to severely limit the utility of PMTU Discovery.
On 20 aug 2008, at 20:04, Valdis.Kletnieks@vt.edu wrote:
Hypothetically true. Unfortunately, enough places do bozo firewalling and drop the ICMP Frag Needed packets to severely limit the utility of PMTU Discovery.
Yet all OSes have it enabled and there is no fallback to fragmentation in PMTUD: if your system doesn't get the ICMP messages, your session is dead in the water.
Iljitsch van Beijnum wrote:
On 20 aug 2008, at 20:04, Valdis.Kletnieks@vt.edu wrote:
Hypothetically true. Unfortunately, enough places do bozo firewalling and drop the ICMP Frag Needed packets to severely limit the utility of PMTU Discovery.
Yet all OSes have it enabled and there is no fallback to fragmentation in PMTUD: if your system doesn't get the ICMP messages, your session is dead in the water.
Windows Vista/2007 has black hole detection enabled by default. It's not massively elegant, but it will keep sessions up (falls back to 536 byte MTU). http://support.microsoft.com/kb/925280 Sam
At 07:07 p.m. 20/08/2008, Sam Stickland wrote:
Yet all OSes have it enabled and there is no fallback to fragmentation in PMTUD: if your system doesn't get the ICMP messages, your session is dead in the water. Windows Vista/2007 has black hole detection enabled by default. It's not massively elegant, but it will keep sessions up (falls back to 536 byte MTU).
IPv4 minimum MTU is 68 bytes, not 536. 536 is the minimum fragment re-assembly buffer size. Falling back to 536-byte packets does not guarantee that sessions will be kept up. Kind regards, -- Fernando Gont e-mail: fernando@gont.com.ar || fgont@acm.org PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1
On 25 aug 2008, at 12:27, Fernando Gont wrote:
IPv4 minimum MTU is 68 bytes,
That's kind of like "a human being can live without food for four to six weeks". It's not a recommendation.
536 is the minimum fragment re-assembly buffer size. Falling back to 536-byte packets does not guarantee that sessions will be kept up.
But: "PMTU black hole router detection is triggered on a TCP connection when TCP starts retransmitting full-sized segments with the DF flag set. TCP resets the PTMU for the connection to 536 bytes. Then, TCP retransmits its segments when the DF flag is clear."
Sam Stickland writes:
Iljitsch van Beijnum wrote:
Yet all OSes have it enabled and there is no fallback to fragmentation in PMTUD: if your system doesn't get the ICMP messages, your session is dead in the water.
Windows Vista/2007 has black hole detection enabled by default. It's not massively elegant, but it will keep sessions up (falls back to 536 byte MTU).
Note that there's a new IETF specification (RFC 4821) for ("Packetization Layer") Path MTU discovery, which doesn't rely on ICMP messages to work. If what I wrote here http://kb.pert.geant2.net/PERTKB/PathMTU is correct, this has been implemented in recent (>= 2.6.17) Linux kernels. I don't know of any other OSes that have this yet - not that they'd tell me (but they could go and edit the page above, that's why it's a Wiki). -- Simon.
On 2008/08/20 08:04 PM Valdis.Kletnieks@vt.edu wrote:
On Wed, 20 Aug 2008 21:43:44 +0530, Glen Kent said:
Do transit routers in the wild actually get to do IP fragmentation these days? I was wondering if routers actually do it or not, because the source usually discovers the path MTU and sends its data with the least supported MTU. Is this true?
Hypothetically true. Unfortunately, enough places do bozo firewalling and drop the ICMP Frag Needed packets to severely limit the utility of PMTU Discovery.
Well obviously, ICMP is only used by hackers to DDoS you. Everyone knows that, especially all the banks. It's even more important to obliterate PMTU discovery when you're using HTTPS - for security, you know. Sorry, I spent the better part of today bashing my head against the wall trying to fix MSS and PMTU issues somewhere which was being aggravated by the tragic programming of Linux l2tpns package...
Glen, With the v4 networks that I have worked on in the past, they did not do end to end MTU discovery before sending packets. The TTL had to be set appropriately so that if you had low speed links, for example, the packet and response would get through in time. On our DS3 (T3) and OC-3c packet links we did 4k, 9k, and 16k packet sizes for video and file transfers. At the other end of the spectrum are civilian and military systems with tactical links, both wired and radio, with low bit rates and header compression on IP and TCP packets. Speeds range from 300 -9,600 bps, 16k, 32k, 64k and Nx64k bps links that can do packet fragmentation and adding proprietary ECC codes for the radio links. Some systems strip the IP packet and use standard or non-standard link layer protocols across the mediums. Some of these systems are store and forward so that the computer/router that is connected to the low speed link will ack the packet for the high speed network connection and buffer it up until it can be sent on the lower speed system. IMHO current IPv6 protocols ignore the lower end segment by specifying the lowest MTU for the circuit be the MTU for the entire circuit and not allow fragmentation. I do not see this as an efficient use of high speed network resources and local link management can handle fragmentation just fine. John (ISDN) Lee A slightly different History Channel. ________________________________________ From: Glen Kent [glen.kent@gmail.com] Sent: Wednesday, August 20, 2008 12:13 PM To: OPS Gurus Subject: IP Fragmentation Hi, Do transit routers in the wild actually get to do IP fragmentation these days? I was wondering if routers actually do it or not, because the source usually discovers the path MTU and sends its data with the least supported MTU. Is this true? Even if this is, then this would break for multicast IP. The source cannot determine which receivers would get interested in the traffic and what capacities the links connecting them would support. So, a source would send IP packets with some size, and theres a chance that one of the routers *may* have to fragment those IP packets before passing it on to the next router. I would wager that the vendors and operators would want to avoid IP fragmentation since thats usually done in SW (unless you've got a very powerful ASIC or your box is NP based). Thanks, Glen
The "network" may not but the end hosts may try. Many client operating systems perform PMTU by default. Some also do blackhole probing that can also change the MTU. -- Tim Sanderson, network administrator tims@donet.com -----Original Message----- From: John Lee [mailto:john@internetassociatesllc.com] Sent: Wednesday, August 20, 2008 2:11 PM To: Glen Kent; OPS Gurus Subject: RE: IP Fragmentation Glen, With the v4 networks that I have worked on in the past, they did not do end to end MTU discovery before sending packets. The TTL had to be set appropriately so that if you had low speed links, for example, the packet and response would get through in time. On our DS3 (T3) and OC-3c packet links we did 4k, 9k, and 16k packet sizes for video and file transfers. At the other end of the spectrum are civilian and military systems with tactical links, both wired and radio, with low bit rates and header compression on IP and TCP packets. Speeds range from 300 -9,600 bps, 16k, 32k, 64k and Nx64k bps links that can do packet fragmentation and adding proprietary ECC codes for the radio links. Some systems strip the IP packet and use standard or non-standard link layer protocols across the mediums. Some of these systems are store and forward so that the computer/router that is connected to the low speed link will ack the packet for the high speed network connection and buffer it up until it can be sent on the lower speed system. IMHO current IPv6 protocols ignore the lower end segment by specifying the lowest MTU for the circuit be the MTU for the entire circuit and not allow fragmentation. I do not see this as an efficient use of high speed network resources and local link management can handle fragmentation just fine. John (ISDN) Lee A slightly different History Channel. ________________________________________ From: Glen Kent [glen.kent@gmail.com] Sent: Wednesday, August 20, 2008 12:13 PM To: OPS Gurus Subject: IP Fragmentation Hi, Do transit routers in the wild actually get to do IP fragmentation these days? I was wondering if routers actually do it or not, because the source usually discovers the path MTU and sends its data with the least supported MTU. Is this true? Even if this is, then this would break for multicast IP. The source cannot determine which receivers would get interested in the traffic and what capacities the links connecting them would support. So, a source would send IP packets with some size, and theres a chance that one of the routers *may* have to fragment those IP packets before passing it on to the next router. I would wager that the vendors and operators would want to avoid IP fragmentation since thats usually done in SW (unless you've got a very powerful ASIC or your box is NP based). Thanks, Glen
participants (13)
-
Colin Alston
-
Fernando Gont
-
Glen Kent
-
Iljitsch van Beijnum
-
Jim Logajan
-
Jim Shankland
-
John Lee
-
Leo Bicknell
-
Sam Stickland
-
Simon Leinen
-
Tim Sanderson
-
Tony Li
-
Valdis.Kletnieks@vt.edu