Iljitsch van Beijnum wrote:
Dear NANOGers,
It irks me that today, the effective MTU of the internet is 1500 bytes, while more and more equipment can handle bigger packets.
What do you guys think about a mechanism that allows hosts and routers on a subnet to automatically discover the MTU they can use towards other systems on the same subnet, so that:
1. It's no longer necessary to limit the subnet MTU to that of the least capable system
2. It's no longer necessary to manage 1500 byte+ MTUs manually
Any additional issues that such a mechanism would have to address?
I have a half completed, prototype "mtud" that runs under Linux. It sets the interface to 9k, but sets the route for the subnet down to 1500. It then watches the arp table for new arp entries. As a new MAC is added, it sends a 9k UDP datagram to that host and listens for an ICMP port unreachable reply (like traceroute does). If the error arrives, it assumes that host can receive packets that large, and adds a host route with the larger MTU to that host. It steps up the mtu's from 1500 to 16k trying to rapidly increase the MTU without having to wait for annoying timeouts. If anything goes wrong somewhere along the way, (a host is firewalled or whatever) then it won't receive the ICMP reply, and won't raise the MTU. The idea is that you can run this on routers/servers on a network that has 9k mtu's but not all the hosts are assured to be 9k capable, and it will increase correctly detect the available MTU between servers, or routers, but still be able to correctly talk to machines that are still stuck with 1500 byte mtu's etc. In other interesting data points in this field, for some reason a while ago we had reason to do some throughput tests under Linux with varying the MTU using e1000's and ended up with this pretty graph: http://wand.net.nz/~perry/mtu.png we never had the time to investigate exactly what was going on, but interestingly at 8k MTU's (which is presumably what NFS would use), performance is exceptionally poor compared to 9k and 1500 byte MTU's. Our (untested) hypothesis is that the Linux kernel driver isn't smart about how it allocates it's buffers.