Glen Kent wrote:
Do transit routers in the wild actually get to do IP fragmentation these days? I was wondering if routers actually do it or not, because the source usually discovers the path MTU and sends its data with the least supported MTU. Is this true?
I believe that is only true for TCP over IPv4. UDP over IPv4 per se doesn't involve any MTU path discovery. Some UDP applications may in fact attempt MTU discovery and self-limit teh size of their packets, but that's not part of the UDP protocol. A hypothetical specific "real world" example of where very large UDP packets might occur is SNMP. An SNMP "get" or "set" operation generally has to fit inside a UDP packet. But UDP allows up to 64k bytes in the datagram. If an SNMP object value is a really long string (say 2000 bytes long), then it will exceed the typical 1500 MTU most Ethernet interfaces expect. So I believe fragmentation will occur at the originating system. On the other hand, some systems support Ethernet jumbograms, so I believe it is possible that a default gateway router would be the first network element forced to fragment the datagram. IPv6 is a different (and more complex) story of course - fragmentation is only supposed to occur on end points - even for UDP. Quick experiment you can try if you have a Unix-like system handy: use ping (and/or ping6 or an IPv6 aware ping) and supply it with a "-s" data size parameter of, say, 2000. That makes a larger than normal packet that can't fit into a standard Ethernet frame. Use wireshark or ethereal to see what happens. If your Ethernet cards support jumbograms, use the mtu parameter of ifconfig and set it up larger than 1500. Repeat the experiment with the large data sized pings with both locally and remote systems.
Even if this is, then this would break for multicast IP. The source cannot determine which receivers would get interested in the traffic and what capacities the links connecting them would support. So, a source would send IP packets with some size, and theres a chance that one of the routers *may* have to fragment those IP packets before passing it on to the next router.
I would wager that the vendors and operators would want to avoid IP fragmentation since thats usually done in SW (unless you've got a very powerful ASIC or your box is NP based).
I'm not sure how to address the above points since there appear to be some incorrect assumptions at play. It all depends on whether the Don't Fragment (DF) bit is set in IPv4 and how the source application responds to any resulting ICMP error responses (if the DF is set and one of the routes requires fragmentation).