Re: RINA - scott whaps at the nanog hornets nest :-)

6 Nov 2010

      On Sat, Nov 06, 2010 at 12:32:55PM -0700, George Bonser wrote:
...
...
I doubt that 1500 is (still) widely used in our Internet... Might be,
though, that most of us don't go all the way to 9k.
Last week I asked the operator of fairly major public peering points 
if they supported anything larger than 1500 MTU.  The answer was "no".
It would be absolutely trivial for them to enable jumbo frames, there is 
just no demand for them to do so, as supporting Internet wide jumbo 
frames (particularly over exchange points) is highly non-scalable in 
practice.

It's perfectly safe to have the L2 networks in the middle support the 
largest MTU values possible (other than maybe triggering an obscure 
Force10 bug or something :P), so they could roll that out today and you 
probably wouldn't notice. The real issue is with the L3 networks on 
either end of the exchange, since if the L3 routers that are trying to 
talk to each other don't agree about their MTU valus precisely, packets 
are blackholed. There are no real standards for jumbo frames out there, 
every vendor (and in many cases particular type/revision of hardware 
made by that vendor) supports a slightly different size. There is also 
no negotiation protocol of any kind, so the only way to make these two 
numbers match precisely is to have the humans on both sides talk to each 
other and come up with a commonly supported value.

There are two things that make this practically impossible to support at 
scale, even ignoring all of the grief that comes from trying to find a 
clueful human to talk to on the other end of your connection to a third 
party (which is a huge problem in and of itself):

#1. There is currently no mechanism on any major router to set multiple 
MTU values PER NEXTHOP on a multi-point exchange, so to do jumbo frames 
over an exchange you would have to pick a single common value that 
EVERYONE can support. This also means you can't mix and match jumbo and 
non-jumbo participants over the same exchange, you essentially have to 
set up an entirely new exchange point (or vlan within the same exchange) 
dedicated to the jumbo frame support, and you still have to get a common 
value that everyone can support. Ironically many routers (many kinds of 
Cisco and Juniper routers at any rate) actually DO support per-nexthop 
MTUs in hardware, there is just no mechanism exposed to the end user to 
configure those values, let alone auto-negotiate them.

#2. The major vendors can't even agree on how they represent MTU sizes,
so entering the same # into routers from two different vendors can 
easily result in incompatible MTUs. For example, on Juniper when you 
type "mtu 9192", this is INCLUSIVE of the L2 header, but on Cisco the 
opposite is true. So to make a Cisco talk to a Juniper that is 
configured 9192, you would have to configure mtu 9178. Except it's not 
even that simple, because now if you start adding vlan tagging the L2 
header size is growing. If you now configure vlan tagging on the 
interface, you've got to make the Cisco side 9174 to match the Juniper's 
9192. And if you configure flexible-vlan-tagging so you can support 
q-in-q, you've now got to configure to Cisco side for 9170.

As an operator who DOES fully support 9k+ jumbos on every internal link 
in my network, and as many external links as I can find clueful people 
to talk to on the other end to negotiate the correct values, let me just 
tell you this is a GIANT PAIN IN THE ASS. And we're not even talking 
about making sure things actually work right for the end user. Your IGP 
may not come up at all if the MTUs are misconfigured, but EBGP certainly 
will, even if the two sides are actually off by a few bytes. The maximum 
size of a BGP message is 4096 octets, and there is no mechanism to pad a 
message and try to detect MTU incompatibility, so what will actually 
happen in real life is the end user will try to send a big jumbo frame 
through and find that some of their packets are randomly and silently 
blackholed. This would be an utter nightmare to support and diagnose.

Realistically I don't think you'll ever see even a serious attempt at 
jumbo frame support implemented in any kind of scale until there is a 
negotiation protocol and some real standards for the mtu size that must 
be supported, which is something that no standards body (IEEE, IETF, 
etc) has seemed inclined to deal with so far. Of course all of this is 
based on the assumption that path mtu discovery will work correctly once 
the MTU valus ARE correctly configured on the L3 routers, which is a 
pretty huge assumption, given all the people who stupidly filter ICMP. 
Oh and even if you solved all of those problems, I could trivially DoS 
your router with some packets that would overload your ability to 
generate ICMP Unreach Needfrag messages for PMTUD, and then all your 
jumbo frame end users going through that router would be blackholed as 
well.

Great idea in theory, epic disaster in practice, at least given the 
mechanisms currently at our disposal. :)

-- 
Richard A Steenbergen <ras@e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)