On Mon, 17 Jul 2000, Mikael Abrahamsson wrote:
We had a discussion here a while back about exchange point media. The outcome was that Gigabit ethernet vendors do support jumbo frames and that the MTU disadvantage GE has could be overcome.
Now, imagine the following scenario:
We connect a router (router1)to this fictous exchange point running (gig)ethernet. This router does support jumbo frames and has a 8k MTU.
Somewhere else on the exchange point is another router (router2), also connected to the same broadcast domain. This router does NOT support jumbo frames but has the standard 1500 MTU.
What happens if router1 tries to send a packet to router2 which is 1500 MTU? It thinks it's perfectly valid to send an 8k packet. (PMTUd won't work here, we're talking layer2).
Correct, Silent L2 discard, giant frame...
My other guess is that if the switch in between (we're probably not talking point-to-point-links here because this is an exchange point, right?) is layer3-aware (as most are today) it could/would fragment the packet or give a needtofrag-ICMP to the originator IP. Will any switch today do this? What vendors do this? (I have been told that the old DEC Gigaswitches will do this between FDDI and FastEth, it will fragment the IP packet if neccessary).
A Foundry BigIron doing L3 should, exactly as if it was a router and not a switch, I believe. At that point there is no real technical distinction between it and a router with lots of ethernet ports however. I'm not aware of any exchanges doing L3...
A third solution would be that I think I saw somewhere that some OSes support setting host routes where you could enter the MTU of certain specific IPs. This could also rectify the problem by simply configuring the switches for jumbo frames and then setting the default MTU to 1500 on routers and then people who support jumbo frames could include this in their perring announcements/agreements and if two parties do support these both then their equipment could use the larger frames when talking to each other over this shared medium.
FreeBSD lets you set the MTU based on the route... You could do something like this, enabling a larger MTU for specific targets, I suppose. I'm not aware of anyone who is doing this (or probably anyone who would, especially at L2, without a good reason). This assumes the exchange point has a switch capable of it.
Another option would be to pick the other unit's MTU off of the TCP session enabled for the (very probable) BGP peering. I seem to remember that TCP involves a MTU negotiation between endpoints and that would mean that you implicitly get to know the MTU of all your peers (which are the ones you might send packets to). Any vendors which do a "hack" like this? This would not work if the default MTU is 1500 though, it would rather mean you have to have a default MTU of 8k (or so) and find out anyone who is not jumbo capable via the TCP session involved with the BGP peering.
The TCP MSS is negiotated based off the MTU, so yo cannot base the MTU off the MSS, circular logic. I highly doubt you will ever get support for jumbo frames auto-negotiated without first standarding the jumbo-frames. I for one would love to see an intelligent standard realizing that 1500 is a remarkably stupid and limiting number, and enabling us to bring new life to public exchange point peering. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/humble PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
On Mon, 17 Jul 2000, Richard A. Steenbergen wrote:
A Foundry BigIron doing L3 should, exactly as if it was a router and not a switch, I believe. At that point there is no real technical distinction between it and a router with lots of ethernet ports however. I'm not aware of any exchanges doing L3...
Well, the device should do no routing in the classical meaning of the word, not on L3 anyway. To do fragmentation it has to be semi-L3-aware though. It also needs an IP adress to send the needtofrag-ICMPs from.
FreeBSD lets you set the MTU based on the route... You could do something like this, enabling a larger MTU for specific targets, I suppose. I'm not aware of anyone who is doing this (or probably anyone who would, especially at L2, without a good reason). This assumes the exchange point has a switch capable of it.
We're talking L3 here (routers). Normally the L3 MTU is derived from the L2 MTU, here we would need to derive it from either static configuration or from the below MSS/MTU mechanism (which I don't think will happen as it has too much of a "hack" in it).
The TCP MSS is negiotated based off the MTU, so yo cannot base the MTU off the MSS, circular logic. I highly doubt you will ever get support for jumbo frames auto-negotiated without first standarding the jumbo-frames.
Yes, it can. Router1 should be able to figure out router2:s MTU from the MSS of its TCP session with router2. Router2 has no problem here since it's MTU is the lowest one anyway. I was under the impression that there is nothing magical about jumbo frames and that there are no interoperational problems with them as long as they're supported at all. Please correct me if I am wrong.
I for one would love to see an intelligent standard realizing that 1500 is a remarkably stupid and limiting number, and enabling us to bring new life to public exchange point peering.
I think any new exchange point technology needs to have an MTU > 1500. -- Mikael Abrahamsson email: swmike@swm.pp.se
participants (2)
-
Mikael Abrahamsson
-
Richard A. Steenbergen