On Sat, Nov 06, 2010 at 03:49:19PM -0700, George Bonser wrote:
When the TCP/IP connection is opened between the routers for a routing session, they should each send the other an MSS value that says how large a packet they can accept. You already have that information available. TCP provides that negotiation for directly connected machines.
You're proposing that routers should dynamically alter the interface MTU based on the TCP MSS value they receive from an EBGP neighbor? I barely know where to begin, but first off MSS is not MTU, it is only loosely related to MTU. MSS is affected by TCP options (window scale, sack, MD5 authentication, etc), and MSS between routers can be set to any value a user chooses. There is absolutely no guarantee that MSS is going to lead to a correct guess at the MTU. Also, many routers still default to having PMTUD turned off, would you suggest that they should set the physical interface MTU to 576 based on that? :) And alas, it's one hell of a layer violation too. A negotiation protocol is needed, but you could argue about where it should be for days. Maybe at the physical layer as part of auto-negotiation, maybe at the L3<->L2 layer (i.e. negotiate it per IP as part of arp or neighbor discovery), hell maybe even in BGP, but keyed off MSS is way over the top. :)
Again, nothing changes from the current method of operating. If I showed up at a peering switch and wanted to use 1000 byte MTU, I would probably have some problems. The point I am making is that 1500 is a relic value that hamstrings Internet performance and there is no good reason not to use 9000 byte MTU at peering points (by all participants) since it A: introduces no new problems and B: I can't find a vendor of modern gear at a peering point that doesn't support it though there may be some ancient gear at some peering points in use by some of the peers.
Have you ever tried showing up to the Internet with a 1000 byte MTU? The only time that works correctly today is when you're rewriting TCP MSS values as the packet goes through the constrained link, which may be fine for the GRE tunnel to a Linux box at your house, but clearly can't work on the real Internet.
I can not think of a problem changing from 1500 to 9000 as the standard at peering points introduces. It would also speed up the
This suggests a serious lack of imagination on your part. :)
loading of the BGP routes between routers at the peering points. If
It's a very very modest increase at best.
Joe Blow at home with a dialup connection with an MTU of 576 is talking to a server at Y! with an MTU of 10 billion, changing a peering path from 1500 to 9000 bytes somewhere in the path is not going to change that PMTU discovery one iota. It introduces no problem whatsoever. It changes nothing.
You know one very good reason for the people on a dialup connection to have low MTUs is serialization delay. As link speeds have gotten faster but MTUs have stayed the same, one tangible benefit is the lack of a need for fair queueing to keep big packets from significantly increasing the latency of small packets. Overall I agree with the theory of larger MTUs... Improved efficiency, being able to do page-flipping with your payload, not having to worry about screwing things up if you DO need to use a tunnel or turn on IPsec, it's all well and good... But from a practical standpoint there are still a lot of very serious issues that have not been addressed, and anyone who actually tries to do this at scale is in for a world of hurt. I for one would love to see the situation improved, but trying to gloss over it and pretend the problems don't exist just delays the day when it actually CAN be supported.
That is a list of 9000 byte clean gear. The very bottom is the stuff that doesn't support it. Of the stuff that doesn't support it, how much is connected directly to a peering point? THAT is the bottleneck
This argument is completely destroyed at the line that says 7206VXR w/PA-GE, you don't need to read any further.
I am talking about right now. One step at a time. Removing the bottleneck at the peering points is all I am talking about. That will not change PMTU issues elsewhere and those will stand just exactly as they are today without any change. In fact it will ensure that there are *fewer* PMTU discovery issues by being able to support a larger range of packets without having to fragment them.
The issues I listed are precisely why it doesn't work at peering points. I know this because I do a lot of peering, and I spend a lot of time dealing with getting people to peer at larger MTU values (correctly). If it was easier to do without breaking stuff, I'd be a lot more successful at it. :)
We *already* have SONET MTU of >4000 and this hasn't broken anything since the invention of SONET.
SONET MTU works because it's on by default, it's the same size everywhere, and every piece of gear supports it. It also doesn't accomplish anything, as almost no packets flowing through your SONET links are > 1500 bytes, and if you actually tried to show up to the Internet with a PC and a 4474 byte MTU you'd have a bad time. At any rate, I'm going to stop arguing this one, as I think we've beaten this dead horse enough for one day. Please read what I said carefully, I promise you this isn't as easy as you think it is. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)