On Jun 5, 2012, at 6:02 PM, Jimmy Hess wrote:
On 6/5/12, Owen DeLong <owen@delong.com> wrote:
This is a horrible misconfiguration of the devices on that link. If your MTU setting on your interface is larger than the smallest MTU of any L2 forwarder on the link, then, you have badly misconfigured
Not really; The network layer and L2 protocols should both be designed to handle this, it is a design error in the protocol that it doesn't. You say it's "misconfiguration", but if IP handled the situation reasonably, it shouldn't be necessary to configure anything in the first place. Whether the neighbors are LAN or cross-tunnel, the issues are similar.
Really, no. The L3 MTU on an interface should be configured to the lowest MTU reachable via that link without crossing a router. It's just that simple. Anything else _IS_ a misconfiguration. First, your idea of handling the situation reasonably is a layering violation. Second, you are correct. All L2 bridges for a given media type should support the largest configurable MTU for that media type, so, it is arguably a design flaw in the bridges. However, in an environment where you have broken L2 devices (design flaw), you have to configure appropriately for that.
It's only a misconfiguration because of flaws in the protocol.
No, it's a misconfiguration because of the limitations of the hardware due to its design defects. L3 should not need to test the end-to-end L2 capabilities. It should be able to depend on what the OS tells it.
Just like you expect to plug devices in a typical LAN and it's not a configuration error to fail to manually find every switch in the LAN and enter MAC addresses into a forwarding table by hand; likewise, you shouldn't expect to key a MTU into every device by hand.
You don't expect to ever care about the MAC addresses of any of the switches in the LAN let alone enter them into any form of forwarding table at all. You do expect to need to know about the MAC addresses of adjacent systems you are trying to reach, and, you use either ND or ARP to map L3 addresses onto their corresponding L2 addresses as needed. I will note that this depends on sending a packet out to an address that reaches all of the candidate hosts (In the case of ND, this is a multicast to all hosts which have the same last 24 bits in their IP suffix. In the case of ARP, this is a broadcast packet) and expects them (at L3) to answer "That's ME!". Of course you can enter them by hand in situations where ARP or ND don't work for whatever reason. You expect ARP or ND to work and a bridge that didn't forward ARP would be just as broken as a bridge which doesn't support the full interface MTU. I would expect to have to enter MAC adjacencies manually if I had a bridge that didn't pass ARP/ND traffic, just as I expect to have to enter the MTU manually if I have a bridge that doesn't support the correct full MTU of the network.
IP should be designed so that devices on the link that _can_ handle the large transmission unit, which provides efficiency gains, should be allowed to fully utilize those capabilities, without breakage of connectivity to devices on the same link that have more limited capabilities and can only receive the Minimum required frame size (smaller MTU), and without separating the subnet or installing dividing Proxy ARP servers to send ICMP TooBig packets.
No, it really shouldn't. Doing this is a serious layering violation for one, and, it can't be achieved efficiently number two. It adds lots of overhead and is very error prone. There's no signaling mechanism for L3 to be informed when the L2 topology changes, for example, which might necessitate a recalculation of the MTU. A given link should have a single MTU period. I don't know of ANY L3 protocol which supports anything else. Not IP, not IPX, not DECNET, not AppleTalk, no Banyan Vines, not XNS, none of them support the idea of MTU per adjacency. If you can only have one MTU per link, then, it must be the lowest common denominator of all participants and forwarders on that link.
Adding probing to compensate for this misconfiguration merely serves to perpetuate such errant configurations.
Just like adding MAC address learning to Ethernet switches to compensate for the misconfiguration of failing to manually enter hardware addreses into your switches, serves to perpetuate such errant configurations, where the state of the forwarding tables are unreliably left in a non-deterministic state.
Apples and oranges. See above. In fact, MAC address learning on the switches is utterly unrelated to the MAC adjacency table maintained by ARP/ND. One is an L2 forwarding tree never learned by anything at L3 (the MAC forwarding table learned on the switches) and the other is a MAC adjacency table for a given link used by the L2 software on the host to populate the L2 packet header based on the L3 information.
You've got an issue if there are 100ms between two peers on your LAN. You're right, you don't need to probe for possible MTUs below 1280. LAN, sure. However, consider that there are intercontinental L2 links.
Intercontinental multi-access L2 links, perhaps, are a horrible misconfiguration.
No, they are not. They may be a horribly bad idea in many cases, but, there are actually legitimate applications for them and they conform to the existing documented standards. Owen