Forwarding this so that everybody can comment on this nasty proposal ;) Forcing replies to v6ops@ietf.org where they likely should be taking place as that is where recently the mentioned draft was accepted as a WG item. Greets, Jeroen -------- Forwarded Message -------- Subject: [v6ops] IPv6 MTU Flow-label.... (related to draft-v6ops-pmtud-ecmp-problem-01) Date: Mon, 10 Nov 2014 11:31:52 +0100 From: Jeroen Massar <jeroen@massar.ch> Organization: Massar To: ipv6@ietf.org, v6ops@ietf.org Hola folks (and folks in BCC ;), With the recent Google and Akamai outages (latter still ongoing afaik), it came to light that the cause is likely the model and problem described here: https://tools.ietf.org/html/draft-v6ops-pmtud-ecmp-problem-01 which previously was: https://tools.ietf.org/html/draft-v6ops-jaeggli-pmtud-ecmp-problem-01 Or shortly described: terminating an IP address at different hosts and having the balancer box not knowing where to deliver the ICMP PTBs that get send for large packets. One of the suggestions there is to lower the MSS for every connection by forcing it (either on the loadbalancer or on the final host) to a value that "works everywhere": the one for an MTU of 1280. MSS only applies to TCP, and people like Google are coming out with QUIC and other schemes. As we really do not want an Internet at an MTU of 1280, why don't we indicate in the packet what the MTU is when it is diverting from the norm? What if we instead let a router that sources a packet from a link or is going to transmit a packet over a link < 1500 indicate with that packet that that packet came from/is going to is a link with a MTU < 1500. We can't use an additional extension header, as adding anything would mean we might hit the MTU of the packet and we have other issues. As our least-known-used field is the FlowLabel field, we could abuse that and have enough bits there to stuff our data. What if we define that when the first 4 bits are set to 0xF (all one) that the rest (16bits) defines the MTU of the link (MTU 0 - 65k)? (We could even use a 'base of 1280' and thus 0xf0000 = 1280 MTU, but possibly it is better to state "value of < 0xf0500 is invalid") Thus allowing when the first 4 bits are not set to all-1 that the flowlabel field is a "normal flowlabel" field ala RFC6437. We could even state "Only set this MTU option when the FlowLabel field == 0" to avoid incompatibility (though I do not expect any as I rarely see packets with the field non-0...) Thus given a network like: [H1] 2001:db8:1500::1/64 | mtu = 1500 2001:db8:1500::a/64 [RA] 2001:db8:1501::a/64 | mtu = 1500 2001:db8:1501::b/64 [RB] 2001:db8:1480::b/64 | mtu = 1480 2001:db8:1480::c/64 [RC] 2001:db8:1280::c/64 | mtu = 1280 2001:db8:1280::d/64 [RD] 2001:db8:9000::d/64 | mtu = 9000 2001:db8:9000::2/64 [H2] RA receives packet, src+dst interface are MTU=1500, thus does nothing RB receives packet, src = 1500, dst = 1480, thus sets FL = 0xf05c8 RC receives packet, src = 1480, dst = 1280, thus sets FL = 0xf0500 RD receives packet, src = 1280, dst = 9000, thus sets FL = 0xf0500 (again, just set is quicker than checking) Now even if H2 is a loadbalancer, if the flow is just forwarded (without TTL change btw...) the destination receives it correctly. The disadvantage is of course that you lose the ability to balance based on the FlowLabel, but if we go with "only change when not 0 then there was one anyway. Also you got src+dst which is 256bits, which should be pretty good already and optionally next-header + the contents of the header if you want that. Note that as we have no checksum in IPv6, there is little overhead to do this kind of forwarding, HopLimit already needs updating, this is just another field to update. In another model from the above, we could even just let every hop set the known lowest MTU. In that case, H1 would set 0xf05dc in the packet, and then it gets lowered automatically. Which would also mean that a pure 9000 path would nicely work suddenly as everybody knows that 9000 will fit :) Greets, Jeroen _______________________________________________ v6ops mailing list v6ops@ietf.org https://www.ietf.org/mailman/listinfo/v6ops