On Sat, Nov 06, 2010 at 12:32:55PM -0700, George Bonser wrote:
I doubt that 1500 is (still) widely used in our Internet... Might be, though, that most of us don't go all the way to 9k.
Last week I asked the operator of fairly major public peering points if they supported anything larger than 1500 MTU. The answer was "no".
It would be absolutely trivial for them to enable jumbo frames, there is just no demand for them to do so, as supporting Internet wide jumbo frames (particularly over exchange points) is highly non-scalable in practice. It's perfectly safe to have the L2 networks in the middle support the largest MTU values possible (other than maybe triggering an obscure Force10 bug or something :P), so they could roll that out today and you probably wouldn't notice. The real issue is with the L3 networks on either end of the exchange, since if the L3 routers that are trying to talk to each other don't agree about their MTU valus precisely, packets are blackholed. There are no real standards for jumbo frames out there, every vendor (and in many cases particular type/revision of hardware made by that vendor) supports a slightly different size. There is also no negotiation protocol of any kind, so the only way to make these two numbers match precisely is to have the humans on both sides talk to each other and come up with a commonly supported value. There are two things that make this practically impossible to support at scale, even ignoring all of the grief that comes from trying to find a clueful human to talk to on the other end of your connection to a third party (which is a huge problem in and of itself): #1. There is currently no mechanism on any major router to set multiple MTU values PER NEXTHOP on a multi-point exchange, so to do jumbo frames over an exchange you would have to pick a single common value that EVERYONE can support. This also means you can't mix and match jumbo and non-jumbo participants over the same exchange, you essentially have to set up an entirely new exchange point (or vlan within the same exchange) dedicated to the jumbo frame support, and you still have to get a common value that everyone can support. Ironically many routers (many kinds of Cisco and Juniper routers at any rate) actually DO support per-nexthop MTUs in hardware, there is just no mechanism exposed to the end user to configure those values, let alone auto-negotiate them. #2. The major vendors can't even agree on how they represent MTU sizes, so entering the same # into routers from two different vendors can easily result in incompatible MTUs. For example, on Juniper when you type "mtu 9192", this is INCLUSIVE of the L2 header, but on Cisco the opposite is true. So to make a Cisco talk to a Juniper that is configured 9192, you would have to configure mtu 9178. Except it's not even that simple, because now if you start adding vlan tagging the L2 header size is growing. If you now configure vlan tagging on the interface, you've got to make the Cisco side 9174 to match the Juniper's 9192. And if you configure flexible-vlan-tagging so you can support q-in-q, you've now got to configure to Cisco side for 9170. As an operator who DOES fully support 9k+ jumbos on every internal link in my network, and as many external links as I can find clueful people to talk to on the other end to negotiate the correct values, let me just tell you this is a GIANT PAIN IN THE ASS. And we're not even talking about making sure things actually work right for the end user. Your IGP may not come up at all if the MTUs are misconfigured, but EBGP certainly will, even if the two sides are actually off by a few bytes. The maximum size of a BGP message is 4096 octets, and there is no mechanism to pad a message and try to detect MTU incompatibility, so what will actually happen in real life is the end user will try to send a big jumbo frame through and find that some of their packets are randomly and silently blackholed. This would be an utter nightmare to support and diagnose. Realistically I don't think you'll ever see even a serious attempt at jumbo frame support implemented in any kind of scale until there is a negotiation protocol and some real standards for the mtu size that must be supported, which is something that no standards body (IEEE, IETF, etc) has seemed inclined to deal with so far. Of course all of this is based on the assumption that path mtu discovery will work correctly once the MTU valus ARE correctly configured on the L3 routers, which is a pretty huge assumption, given all the people who stupidly filter ICMP. Oh and even if you solved all of those problems, I could trivially DoS your router with some packets that would overload your ability to generate ICMP Unreach Needfrag messages for PMTUD, and then all your jumbo frame end users going through that router would be blackholed as well. Great idea in theory, epic disaster in practice, at least given the mechanisms currently at our disposal. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)