On Mon, 17 Jul 2000, Mikael Abrahamsson wrote:
We had a discussion here a while back about exchange point media. The outcome was that Gigabit ethernet vendors do support jumbo frames and that the MTU disadvantage GE has could be overcome.
Now, imagine the following scenario:
We connect a router (router1)to this fictous exchange point running (gig)ethernet. This router does support jumbo frames and has a 8k MTU.
Somewhere else on the exchange point is another router (router2), also connected to the same broadcast domain. This router does NOT support jumbo frames but has the standard 1500 MTU.
What happens if router1 tries to send a packet to router2 which is 1500 MTU? It thinks it's perfectly valid to send an 8k packet. (PMTUd won't work here, we're talking layer2).
Correct, Silent L2 discard, giant frame...
My other guess is that if the switch in between (we're probably not talking point-to-point-links here because this is an exchange point, right?) is layer3-aware (as most are today) it could/would fragment the packet or give a needtofrag-ICMP to the originator IP. Will any switch today do this? What vendors do this? (I have been told that the old DEC Gigaswitches will do this between FDDI and FastEth, it will fragment the IP packet if neccessary).
A Foundry BigIron doing L3 should, exactly as if it was a router and not a switch, I believe. At that point there is no real technical distinction between it and a router with lots of ethernet ports however. I'm not aware of any exchanges doing L3...
A third solution would be that I think I saw somewhere that some OSes support setting host routes where you could enter the MTU of certain specific IPs. This could also rectify the problem by simply configuring the switches for jumbo frames and then setting the default MTU to 1500 on routers and then people who support jumbo frames could include this in their perring announcements/agreements and if two parties do support these both then their equipment could use the larger frames when talking to each other over this shared medium.
FreeBSD lets you set the MTU based on the route... You could do something like this, enabling a larger MTU for specific targets, I suppose. I'm not aware of anyone who is doing this (or probably anyone who would, especially at L2, without a good reason). This assumes the exchange point has a switch capable of it.
Another option would be to pick the other unit's MTU off of the TCP session enabled for the (very probable) BGP peering. I seem to remember that TCP involves a MTU negotiation between endpoints and that would mean that you implicitly get to know the MTU of all your peers (which are the ones you might send packets to). Any vendors which do a "hack" like this? This would not work if the default MTU is 1500 though, it would rather mean you have to have a default MTU of 8k (or so) and find out anyone who is not jumbo capable via the TCP session involved with the BGP peering.
The TCP MSS is negiotated based off the MTU, so yo cannot base the MTU off the MSS, circular logic. I highly doubt you will ever get support for jumbo frames auto-negotiated without first standarding the jumbo-frames. I for one would love to see an intelligent standard realizing that 1500 is a remarkably stupid and limiting number, and enabling us to bring new life to public exchange point peering. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/humble PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)