On 2019-03-05 07:26 CET, Mark Andrews wrote:
It does work as designed except when crap middleware is added. ECMP should be using the flow label with IPv6. It has the advantage that it works for non-0-offset fragments as well as 0-offset fragments and also works for transports other than TCP and UDP. This isn’t a protocol failure. It is shitty implementations.
Out of curiosity, which operating systems put anything useful (for use in ECMP) into the flow label of IPv6 packets? At the moment, I only have access to CentOS 6 and CentOS 7 machines, and both of them set the flow label to zero for all traffic. There is also the problem that the device generating the Packet Too Big ICMP, is not the same as the end host that the big packet was destined for, and does not know what flow label the end host would have set in its TCP responses. RFC 6437 is also explicit that: o Forwarding nodes such as routers and load distributors MUST NOT depend only on Flow Label values being uniformly distributed. In any usage such as a hash key for load distribution, the Flow Label bits MUST be combined at least with bits from other sources within the packet, so as to produce a constant hash value for each flow In practice, using at least the source and destination IP(v6) addresses in addition to the flow label. But the ICMP packet has a different source address than TCP responses from the end host. Further problem is that the TCP responses from the destination end host might not even be *passing* the router that generates a Packet Too Big ICMP error. In an anycast scenario, that router might have a route to the sending IPv6 address that goes to a different datacenter than the host that sent the large packet. E.g, consider the following network: A1 A2 | | DC1 DC2 / \ / / \ / / \ / R1 R2 \ / \ / \ / R3 | B A1 and A2 are hosts in different datacenters, using the same anycast address A. Host B initiates a TCP session with address A, R3 selects the route via R1, and thus reaches A1 in datacenter DC1. A1 sends a large packet towards B, but the router in DC1 elects to send that via R2. R2 generates a PTB ICMP, but has its best route to address A towards DC2... /Bellman