ICMPv6 "too-big" packets ignored (filtered ?) by Cloudflare farms

newer
Should Netflix and Hulu give you...

older
Free Open Source Network Operating...

Jean-Daniel Pauget

27 Feb 2019 27 Feb '19

10:01 a.m.

hello, I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" service of the concerned operator doesn't handle IPv6 yet. as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" (rfc 4443) seem to be ignored or filtered at ~60% of ClouFlare's http farms as a result, random sites such as http://nanog.org/ or https://www.ansible.com/ are badly reachable whenever small mtu are involved ... support@cloudflare answered me that because I'm not the owner of concerned site, and because of security reasons, they wouldn't investigate further. are there security concerns with ICMP-too-big ? regards, -- Jean-Daniel Pauget http://rezopole.net/ Rezopole/LyonIX +33 (0)4 27 46 00 50

Show replies by date

Saku Ytti

4 Mar 4 Mar

7:06 p.m.

Hey Jean,

...

I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" service of the concerned operator doesn't handle IPv6 yet.

as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" (rfc 4443) seem to be ignored or filtered at ~60% of ClouFlare's http farms

Might be related to this: https://blog.cloudflare.com/path-mtu-discovery-in-practice/ If you run ECMP then the hash algorithms make no guarantees ICMP messages generated by transit devices reach the correct host. -- ++ytti

Mark Andrews

10:25 p.m.

...

On 5 Mar 2019, at 6:06 am, Saku Ytti <saku@ytti.fi> wrote:

Hey Jean,

...
I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" service of the concerned operator doesn't handle IPv6 yet.

as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" (rfc 4443) seem to be ignored or filtered at ~60% of ClouFlare's http farms

Might be related to this: https://blog.cloudflare.com/path-mtu-discovery-in-practice/

If you run ECMP then the hash algorithms make no guarantees ICMP messages generated by transit devices reach the correct host.

Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if they have installed broken ECMP devices. The simplest way to do that is to set the interface MTUs to 1280 on all the servers. Why should the rest of the world have to put up with their inability to purchase devices that work with RFC compliant data streams. Mark

...

-- ++ytti

-- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

Mark Tinka

5 Mar 5 Mar

6:18 a.m.

On 5/Mar/19 00:25, Mark Andrews wrote:

...

Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if they have installed broken ECMP devices. The simplest way to do that is to set the interface MTUs to 1280 on all the servers. Why should the rest of the world have to put up with their inability to purchase devices that work with RFC compliant data streams.

I've had this issue with cdnjs.cloudflare.com for the longest time at my house. But as some of you may recall, my little unwanted TCP MSS hack for IPv6 last weekend fixed that issue for me. Not ideal, and I so wish IPv6 would work as designed, but... Mark.

Mark Andrews

6:26 a.m.

...

On 5 Mar 2019, at 5:18 pm, Mark Tinka <mark.tinka@seacom.mu> wrote:

On 5/Mar/19 00:25, Mark Andrews wrote:

...
Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if they have installed broken ECMP devices. The simplest way to do that is to set the interface MTUs to 1280 on all the servers. Why should the rest of the world have to put up with their inability to purchase devices that work with RFC compliant data streams.

I've had this issue with cdnjs.cloudflare.com for the longest time at my house. But as some of you may recall, my little unwanted TCP MSS hack for IPv6 last weekend fixed that issue for me.

Not ideal, and I so wish IPv6 would work as designed, but…

It does work as designed except when crap middleware is added. ECMP should be using the flow label with IPv6. It has the advantage that it works for non-0-offset fragments as well as 0-offset fragments and also works for transports other than TCP and UDP. This isn’t a protocol failure. It is shitty implementations.

...

Mark.

-- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

Mark Tinka

6:34 a.m.

On 5/Mar/19 08:26, Mark Andrews wrote:

...

It does work as designed except when crap middleware is added. ECMP should be using the flow label with IPv6. It has the advantage that it works for non-0-offset fragments as well as 0-offset fragments and also works for transports other than TCP and UDP. This isn’t a protocol failure. It is shitty implementations.

That's what I mean... we find ways to break protocols ourselves. Mark.

Joel Jaeggli

9:20 a.m.

Sent from my iPhone

...

On Mar 4, 2019, at 22:26, Mark Andrews <marka@isc.org> wrote:

...
On 5 Mar 2019, at 5:18 pm, Mark Tinka <mark.tinka@seacom.mu> wrote:

...
On 5/Mar/19 00:25, Mark Andrews wrote:

Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if they have installed broken ECMP devices. The simplest way to do that is to set the interface MTUs to 1280 on all the servers. Why should the rest of the world have to put up with their inability to purchase devices that work with RFC compliant data streams.

I've had this issue with cdnjs.cloudflare.com for the longest time at my house. But as some of you may recall, my little unwanted TCP MSS hack for IPv6 last weekend fixed that issue for me.

Not ideal, and I so wish IPv6 would work as designed, but…

It does work as designed except when crap middleware is added. ECMP should be using the flow label with IPv6. It has the advantage that it works for non-0-offset fragments as well as 0-offset fragments and also works for transports other than TCP and UDP. This isn’t a protocol failure. It is shitty implementations.

Your mobile carrier’s stateless tcp accelerator should stop sending acks with a zero flow label so we can actually identify them as part of the same flow... There a lot of headwind in the real world for using the flow label as a hash component.

...

...
Mark.

-- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

Thomas Bellman

10:54 a.m.

On 2019-03-05 07:26 CET, Mark Andrews wrote:

...

It does work as designed except when crap middleware is added. ECMP should be using the flow label with IPv6. It has the advantage that it works for non-0-offset fragments as well as 0-offset fragments and also works for transports other than TCP and UDP. This isn’t a protocol failure. It is shitty implementations.

Out of curiosity, which operating systems put anything useful (for use in ECMP) into the flow label of IPv6 packets? At the moment, I only have access to CentOS 6 and CentOS 7 machines, and both of them set the flow label to zero for all traffic. There is also the problem that the device generating the Packet Too Big ICMP, is not the same as the end host that the big packet was destined for, and does not know what flow label the end host would have set in its TCP responses. RFC 6437 is also explicit that: o Forwarding nodes such as routers and load distributors MUST NOT depend only on Flow Label values being uniformly distributed. In any usage such as a hash key for load distribution, the Flow Label bits MUST be combined at least with bits from other sources within the packet, so as to produce a constant hash value for each flow In practice, using at least the source and destination IP(v6) addresses in addition to the flow label. But the ICMP packet has a different source address than TCP responses from the end host. Further problem is that the TCP responses from the destination end host might not even be *passing* the router that generates a Packet Too Big ICMP error. In an anycast scenario, that router might have a route to the sending IPv6 address that goes to a different datacenter than the host that sent the large packet. E.g, consider the following network: A1 A2 | | DC1 DC2 / \ / / \ / / \ / R1 R2 \ / \ / \ / R3 | B A1 and A2 are hosts in different datacenters, using the same anycast address A. Host B initiates a TCP session with address A, R3 selects the route via R1, and thus reaches A1 in datacenter DC1. A1 sends a large packet towards B, but the router in DC1 elects to send that via R2. R2 generates a PTB ICMP, but has its best route to address A towards DC2... /Bellman

sthaug＠nethelp.no

12:09 p.m.

New subject: ICMPv6 "too-big" packets ignored (filtered ?) by Cloudflare farms,Re: ICMPv6 "too-big" packets ignored (filtered ?) by Cloudflare farms

...

Out of curiosity, which operating systems put anything useful (for use in ECMP) into the flow label of IPv6 packets? At the moment, I only have access to CentOS 6 and CentOS 7 machines, and both of them set the flow label to zero for all traffic.

FreeBSD 11.2-STABLE. Steinar Haug, Nethelp consulting, sthaug@nethelp.no

Stephen Satchell

2:53 p.m.

On 3/5/19 2:54 AM, Thomas Bellman wrote:

...

Out of curiosity, which operating systems put anything useful (for use in ECMP) into the flow label of IPv6 packets? At the moment, I only have access to CentOS 6 and CentOS 7 machines, and both of them set the flow label to zero for all traffic.

Did you submit a bug report?

Bjørn Mork

4:08 p.m.

Stephen Satchell <list@satchell.net> writes:

...

On 3/5/19 2:54 AM, Thomas Bellman wrote:

...
Out of curiosity, which operating systems put anything useful (for use in ECMP) into the flow label of IPv6 packets? At the moment, I only have access to CentOS 6 and CentOS 7 machines, and both of them set the flow label to zero for all traffic.

Did you submit a bug report?

I believe this was fixed 5 years ago (in Linux v3.17): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... But RHEL and CentOS are using kernels from the stone age, so they haven't noticed yet. Bjørn

Hunter Fuller

4:23 p.m.

On Tue, Mar 5, 2019 at 10:09 AM Bjørn Mork <bjorn@mork.no> wrote:

...

Stephen Satchell <list@satchell.net> writes:

...
Did you submit a bug report?

I believe this was fixed 5 years ago (in Linux v3.17): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...

But RHEL and CentOS are using kernels from the stone age, so they haven't noticed yet.

For those who might need this feature, and have a Red Hat contract, a suggestion: If you submit a ticket, someone at Red Hat might backport the patch for you.

Fernando Gont

6 Mar 6 Mar

2:36 a.m.

On 5/3/19 03:26, Mark Andrews wrote:

...

...
On 5 Mar 2019, at 5:18 pm, Mark Tinka <mark.tinka@seacom.mu> wrote:

On 5/Mar/19 00:25, Mark Andrews wrote:

...
Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if they have installed broken ECMP devices. The simplest way to do that is to set the interface MTUs to 1280 on all the servers. Why should the rest of the world have to put up with their inability to purchase devices that work with RFC compliant data streams.

I've had this issue with cdnjs.cloudflare.com for the longest time at my house. But as some of you may recall, my little unwanted TCP MSS hack for IPv6 last weekend fixed that issue for me.

Not ideal, and I so wish IPv6 would work as designed, but…

It does work as designed except when crap middleware is added. ECMP should be using the flow label with IPv6. It has the advantage that it works for non-0-offset fragments as well as 0-offset fragments and also works for transports other than TCP and UDP. This isn’t a protocol failure. It is shitty implementations.

Not to play devil's advocate but the IETF fot to publish a spec for ECMP use of Flow Labels only a few years ago. For quite a while, they were unasable... and might still be, for some implementations. -- Fernando Gont SI6 Networks e-mail: fgont@si6networks.com PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492

Mark Andrews

4:21 a.m.

...

On 6 Mar 2019, at 1:36 pm, Fernando Gont <fgont@si6networks.com> wrote:

On 5/3/19 03:26, Mark Andrews wrote:

...
...
On 5 Mar 2019, at 5:18 pm, Mark Tinka <mark.tinka@seacom.mu> wrote:

On 5/Mar/19 00:25, Mark Andrews wrote:

...
Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if they have installed broken ECMP devices. The simplest way to do that is to set the interface MTUs to 1280 on all the servers. Why should the rest of the world have to put up with their inability to purchase devices that work with RFC compliant data streams.

I've had this issue with cdnjs.cloudflare.com for the longest time at my house. But as some of you may recall, my little unwanted TCP MSS hack for IPv6 last weekend fixed that issue for me.

Not ideal, and I so wish IPv6 would work as designed, but…

It does work as designed except when crap middleware is added. ECMP should be using the flow label with IPv6. It has the advantage that it works for non-0-offset fragments as well as 0-offset fragments and also works for transports other than TCP and UDP. This isn’t a protocol failure. It is shitty implementations.

Not to play devil's advocate but the IETF fot to publish a spec for ECMP use of Flow Labels only a few years ago.

For quite a while, they were unasable... and might still be, for some implementations.

And if it is still using the quintuple the PTB has all the necessary information for unfragmented and 0 offset fragment packets (which there shouldn’t be with a working TCP stack) to be passed through.

...

-- Fernando Gont SI6 Networks e-mail: fgont@si6networks.com PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492

-- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

Saku Ytti

5 Mar 5 Mar

9:31 a.m.

On Tue, Mar 5, 2019 at 12:26 AM Mark Andrews <marka@isc.org> wrote:

...

Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if they have installed broken ECMP devices. The simplest way to do that

Out of curiosity does that imply you are aware of non-broken ECMP devices, which are able to hash on the embedded original packet? -- ++ytti

Joel Jaeggli

10:09 a.m.

Sent from my iPhone

...

On Mar 5, 2019, at 01:31, Saku Ytti <saku@ytti.fi> wrote:

...
On Tue, Mar 5, 2019 at 12:26 AM Mark Andrews <marka@isc.org> wrote:

Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if they have installed broken ECMP devices. The simplest way to do that

Out of curiosity does that imply you are aware of non-broken ECMP devices, which are able to hash on the embedded original packet?

Parsing the icmp payload was something we considered in rfc7690 but wasn’t one the approaches we pursued (we broadcasted the ptb to all hosts on the segment(s) behind the load balancers in our original implementation). It actually seems like it is becoming feasible to do in an Ethernet switch ASIC like tofino if that is what you want to burn real estate on. Being worthwhile is another matter.

...

-- ++ytti

Saku Ytti

11:05 a.m.

On Tue, Mar 5, 2019 at 12:09 PM Joel Jaeggli <joelja@bogus.com> wrote:

...

Parsing the icmp payload was something we considered in rfc7690 but wasn’t one the approaches we pursued (we broadcasted the ptb to all hosts on the segment(s) behind the load balancers in our original implementation).

It actually seems like it is becoming feasible to do in an Ethernet switch ASIC like tofino if that is what you want to burn real estate on. Being worthwhile is another matter.

It is definitely possible in all relevant existing NPUs like Trio, Solar, FP, EZChip, Lightspeed et.al. As it is within visibility of lookup engine and it is at fixed offset. So not only possible but also cheap. -- ++ytti

Töma Gavrichenkov

8 Mar 8 Mar

2:03 p.m.

On Tue, Mar 5, 2019, 7:27 AM Mark Andrews <marka@isc.org> wrote:

...

[..]

their inability to purchase

...

devices that work with RFC compliant data streams.

To prove your point, you may want to provide a sample list of devices that work that way, along with the benchmarks showing that those devices could still handle arbitrary junk ICMP packets at a line rate. NB: Cloudflare is basically busy filtering excessive amounts of spoofed ICMP packets containing whatever parameters and payload criminals could fit into, at virtually no cost for a customer. Your list might become somewhat short then. -- Töma

...

Saku Ytti

2:11 p.m.

Hey Töma,

...

NB: Cloudflare is basically busy filtering excessive amounts of spoofed ICMP packets containing whatever parameters and payload criminals could fit into, at virtually no cost for a customer. Your list might become somewhat short then.

I don't know what is the problem is here, but the Cloudflare blog documents one specific problem related to ECMP, where the ICMPv6 messages arrive at wrong host and some solutions they are using to overcome that problem. You are proposing that in this case, there is no such issue of delivering ICMPv6 messages to correct host, but in this case issue is voluntary protection mechanism against too high volume of bad ICMPv6 packets. Is this something you personally are aware of or is this something you suspect might explain the problem? Personally I'm surprised if ICMP volume is relevant based on our netflow data. And I've personally been affected in own deployments with the ECMP problem and have solved it by just sending smaller packets. I understand it to be common problem and it would be good if we'd start asking vendors to fix the problem. The Cloudflare blog entry is 4 years old, if they had started actively pursuing proper fix to the ECMP problem, the fix would be in production right about now. -- ++ytti

Tarko Tikan

2:21 p.m.

hey,

...

The Cloudflare blog entry is 4 years old, if they had started actively pursuing proper fix to the ECMP problem, the fix would be in production right about now.

You can find more recent overview at https://blog.cloudflare.com/increasing-ipv6-mtu/ -- tarko

Töma Gavrichenkov

3:44 p.m.

On Fri, Mar 8, 2019 at 5:11 PM Saku Ytti <saku@ytti.fi> wrote:

...

Personally I'm surprised if ICMP volume is relevant based on our netflow data.

Legitimate ICMP traffic volume — oh, that's for sure. But when it comes to attack volumes, it's a different story, and current netflow measurements might be a bad indicator here, as in "peacetime generals are always fighting the last war instead of the next one".

...

You are proposing that in this case, there is no such issue of delivering ICMPv6 messages to correct host

Guaranteed delivery of untrusted remote messages to exactly the particular host behind an equal cost fanout, if allowed in a DDoS mitigation network, is itself a problem, but that has been discussed in detail in the Section 6 of RFC 6437. My point is that it might be hard to find an affordable device that implements ECMP with v6 flow labels without a considerable performance impact. I would personally happy to see what others have tested in that regard. -- Töma

Saku Ytti

4:48 p.m.

On Fri, Mar 8, 2019 at 5:44 PM Töma Gavrichenkov <ximaera@gmail.com> wrote:

...

My point is that it might be hard to find an affordable device that implements ECMP with v6 flow labels without a considerable performance impact. I would personally happy to see what others have tested in that regard.

Why do you think it would be expensive? It's cheaper than how ECMP is done for L3 keys, because you just read the flow label and not calculate any hash. Much much cheaper than how ECMP is done for L3+L4 keys, if that is done right, which it is not, because no device implements IPv6 correctly, as it's not possible in reasonably performing hardware, but this has nothing to do with ECMP. But in any case, flow labels is not the right solution here, this is not IPv6 problem, this is IP problem. The right solution is to look at L3+L4 inside the embedded ICMP packet, as that solves the problem for both AFIs. This at most costs one branch (negligible in typical NPU), as you set different static offset based on if you're parsing ICMP or not. In all likelyhood it costs nothing, as the code likely already contains branch for ICMP where you can just reset the ECMP offset. I still fail to understand why you think this particular problem has anything to do attacks or ICMP volume, I find no such indications, and the two cloudflare blog articles do not state attacks as motivators to this, it's just technical problem at delivering the ICMP packets to correct host. A real problem affecting other networks too, but a problem we can fix, if we start asking our vendors for a fix. -- ++ytti

Töma Gavrichenkov

5:06 p.m.

On Fri, Mar 8, 2019 at 7:48 PM Saku Ytti <saku@ytti.fi> wrote:

...

Why do you think it would be expensive? It's cheaper than how ECMP is done for L3 keys, because you just read the flow label and not calculate any hash.

The most honest answer would be: I have no idea. That's just what I've seen, rather briefly though, as we weren't going to investigate that part at the time. It's been a while since then, and maybe there was a mistake on our side (at least within a perfectly academic context I must assume that there was, as there was no peer review — we were not in academy after all!), but I'm still inclined to, first, see the benchmarks of any proposed piece of hardware that's promising you ECMP with flow labels, second, make any statements about the latter. -- Töma

Saku Ytti

5:18 p.m.

On Fri, Mar 8, 2019 at 7:07 PM Töma Gavrichenkov <ximaera@gmail.com> wrote:

...

It's been a while since then, and maybe there was a mistake on our side (at least within a perfectly academic context I must assume that there was, as there was no peer review — we were not in academy after all!), but I'm still inclined to, first, see the benchmarks of any proposed piece of hardware that's promising you ECMP with flow labels, second, make any statements about the latter.

1) current implementation - set offset byte to 8 - read 128 bits to memory1 - read 128 bits to memory2 - return hash_function(memory1, memory2) This is _JUST_ for L3 keys, in reality customers want L4 keys too, so it's more expensive. Particularly in IPv6 the L4 keys could be _anywhere_ potentially gigabytes in future, for same reasons in IPv6 you can bypass ACL filters in many cases, because the HW device won't know what the L4 keys are. 2) flow label implementation - set offset to 12 bits - read 20 bits to memory1 - return memory1 Seems cheaper to me. But still not a good solution, as it is AFI specific and requires us to actually use the flow label consistently, which is not universally true. ECMP on embedded ICMP actually would work without any changes anywhere else but the device calculating the hash. -- ++ytti

adamv0025＠netconsultings.com

12 Mar 12 Mar

11:47 a.m.

...

Töma Gavrichenkov Sent: Friday, March 8, 2019 5:07 PM

On Fri, Mar 8, 2019 at 7:48 PM Saku Ytti <saku@ytti.fi> wrote:

...
Why do you think it would be expensive? It's cheaper than how ECMP is done for L3 keys, because you just read the flow label and not calculate any hash.

The most honest answer would be: I have no idea. That's just what I've seen, rather briefly though, as we weren't going to investigate that part at the time.

It's been a while since then, and maybe there was a mistake on our side (at least within a perfectly academic context I must assume that there was, as there was no peer review — we were not in academy after all!), but I'm still inclined to, first, see the benchmarks of any proposed piece of hardware that's promising you ECMP with flow labels, second, make any statements about the latter.

We did this exact testing a while back on Juniper 2nd and 3rd gen PFEs. The results showed it doesn't matter a tiny bit whether you do 5-tuple hash or use flow label. So the bottom line is on modern NPUs it doesn't really matter. adam

Saku Ytti

11:53 a.m.

Hey Adam,

...

We did this exact testing a while back on Juniper 2nd and 3rd gen PFEs. The results showed it doesn't matter a tiny bit whether you do 5-tuple hash or use flow label. So the bottom line is on modern NPUs it doesn't really matter.

Does PFE mean PE or Trio? What exactly did you test? I don't see way to disable L3+L4 keys and enable flow_label. Doing flow_label + sip + dip + sport + dport indeed would be pretty almost same cost as sip + dip + spot + dport, the cost difference will be very marginal. Doing flow_label or sip+sip+sport+dport the cost difference is non-marginal, if that actually is true for any specific implementation is separate matter. -- ++ytti

adamv0025＠netconsultings.com

5:55 p.m.

Hey Saku,

...

From: Saku Ytti <saku@ytti.fi> Sent: Tuesday, March 12, 2019 11:54 AM

Hey Adam,

...
We did this exact testing a while back on Juniper 2nd and 3rd gen PFEs. The results showed it doesn't matter a tiny bit whether you do 5-tuple hash or use flow label. So the bottom line is on modern NPUs it doesn't really matter.

Does PFE mean PE or Trio? What exactly did you test? I don't see way to disable L3+L4 keys and enable flow_label.

This was on Trio and sorry I should have clarified we did test with default L3+L4 keys on MPLS labelled packets -default in Junos (as baseline). And then repeated the test using flow labels -which forced Trio to ignore the L3+L4 keys and act solely on flow label. PPS performance wise we couldn’t really tell the difference (was in the noise). adam

Saku Ytti

6 p.m.

On Tue, Mar 12, 2019 at 7:55 PM <adamv0025@netconsultings.com> wrote:

...

This was on Trio and sorry I should have clarified we did test with default L3+L4 keys on MPLS labelled packets -default in Junos (as baseline). And then repeated the test using flow labels -which forced Trio to ignore the L3+L4 keys and act solely on flow label. PPS performance wise we couldn’t really tell the difference (was in the noise).

Are you sure we are talking about same thing. This thread is about 20bit IPv6 header Flow Label. I feel like you're talking about FAT pseudowires? By default JNPR will in pseudowire transit look for IP keys, with or without FAT. Optionally it can look even with existence of CW. You need to specifically ask it not to look for IP keys in pseudowires or add CW (and not explicitly tell it to look). -- ++ytti

adamv0025＠netconsultings.com

6:09 p.m.

...

From: Saku Ytti <saku@ytti.fi> Sent: Tuesday, March 12, 2019 6:01 PM

On Tue, Mar 12, 2019 at 7:55 PM <adamv0025@netconsultings.com> wrote:

...
This was on Trio and sorry I should have clarified we did test with default L3+L4 keys on MPLS labelled packets -default in Junos (as baseline). And then repeated the test using flow labels -which forced Trio to ignore the L3+L4 keys and act solely on flow label. PPS performance wise we couldn’t really tell the difference (was in the noise).

Are you sure we are talking about same thing. This thread is about 20bit IPv6 header Flow Label. I feel like you're talking about FAT pseudowires?

Yes right, but the lookup principle is the same either you look at IPv6 flow label or you look at the Entropy label.

...

By default JNPR will in pseudowire transit look for IP keys, with or without FAT. Optionally it can look even with existence of CW. You need to specifically ask it not to look for IP keys in pseudowires or add CW (and not explicitly tell it to look).

We didn't use FAT PWs, but rather entropy labels for VPNv4 traffic. adam

Saku Ytti

6:14 p.m.

On Tue, Mar 12, 2019 at 8:09 PM <adamv0025@netconsultings.com> wrote:

...

Yes right, but the lookup principle is the same either you look at IPv6 flow label or you look at the Entropy label.

Correct, FAT, Entropy and IPv6 Flow Label are all in principle same, a way for source node to communicates what constitutes a flow. And in every case, there is no guarantee implementation has any performance gains, as implementation may choose to do normal flow speculation in addition of doing the fast thing. -- ++ytti

adamv0025＠netconsultings.com

13 Mar 13 Mar

8:51 a.m.

...

From: Saku Ytti <saku@ytti.fi> Sent: Tuesday, March 12, 2019 6:14 PM

On Tue, Mar 12, 2019 at 8:09 PM <adamv0025@netconsultings.com> wrote:

...
Yes right, but the lookup principle is the same either you look at IPv6 flow label or you look at the Entropy label.

Correct, FAT, Entropy and IPv6 Flow Label are all in principle same, a way for source node to communicates what constitutes a flow. And in every case, there is no guarantee implementation has any performance gains, as implementation may choose to do normal flow speculation in addition of doing the fast thing.

That's right, and I didn't test that by sending forged packets (with conflicting L3+L4 keys and flow label) at the DUT to see if DUT uses L3+L4 keys or indeed relies on the flow information. adam

Masataka Ohta

10 Mar 10 Mar

3:12 a.m.

Mark Andrews wrote:

...

Why should the rest of the world have to put up with their inability to purchase devices that work with RFC compliant data streams.

Because RFCs specifying IPv6 are broken. That is, as PTB is generated against multicast, we should block them. Then, not blocking PTB against unicast needs very deep inspection, which is not possible with some network processors. See https://meetings.apnic.net/32/pdf/pathMTU.pdf for details. William Herrin wrote:

...

IPv4's inventors did a brilliant job with what they knew at the time. IPv6's inventors not so much. Sadly, they were too busy figuring out how to make IPv6 integrate well with ATM. Seriously, > if you dig up a copy of the original IPng book I think it's chapter 3.

Indeed. IPv6 replaced link broadcast by various kind of multicast addresses only to increase MLDP overhead, because IPng WG believed that simple broadcast does not but more complicated multicast does work with IP over ATM. Masataka Ohta

Fernando Gont

6 Mar 6 Mar

1:43 a.m.

On 27/2/19 07:01, Jean-Daniel Pauget wrote:

...

hello,

I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" service of the concerned operator doesn't handle IPv6 yet.

as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" (rfc 4443) seem to be ignored or filtered at ~60% of ClouFlare's http farms

as a result, random sites such as http://nanog.org/ or https://www.ansible.com/ are badly reachable whenever small mtu are involved ...

support@cloudflare answered me that because I'm not the owner of concerned site, and because of security reasons, they wouldn't investigate further.

are there security concerns with ICMP-too-big ?

Please see: https://tools.ietf.org/html/rfc5927 and also: https://tools.ietf.org/html/rfc8021 Thanks, -- Fernando Gont SI6 Networks e-mail: fgont@si6networks.com PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492

Tore Anderson

7:17 a.m.

* Jean-Daniel Pauget

...

I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" service of the concerned operator doesn't handle IPv6 yet.

as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" (rfc 4443) seem to be ignored or filtered at ~60% of ClouFlare's http farms

as a result, random sites such as http://nanog.org/ or https://www.ansible.com/ are badly reachable whenever small mtu are involved ...

Hi Jean-Daniel. If you're using using tunnels you'll want to have your tunnel endpoint adjust down the TCP MSS value to match the MTU of the tunnel interface. That way, you'll avoid problems with Path MTU Discovery. Even in those situations where PMTUD does work fine, doing TCP MSS adjustment will improve performance as the server does not need to spend an RTT to discover your reduced MTU. (This isn't really an IPv6 issue, by the way - ISPs using PPPoE will typically perform MSS adjustment for IPv4 packets too.) If you're using Linux as your tunnel endpoint, try: ip6tables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu Tore

Jean-Daniel Pauget

8 Mar 8 Mar

12:51 p.m.

hello, Tore Anderson, you're right, clamping MSS is very efficient and very certainly solves most of the problems. now for UDP, I don't know yet how does things like QUIC can be handled ... regards, -- Jean-Daniel Pauget http://rezopole.net/ Rezopole/LyonIX +33 (0)4 27 46 00 50 On Wed, Mar 06, 2019 at 08:17:42AM +0100, Tore Anderson wrote:

...

* Jean-Daniel Pauget

...
I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" service of the concerned operator doesn't handle IPv6 yet.

as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" (rfc 4443) seem to be ignored or filtered at ~60% of ClouFlare's http farms

as a result, random sites such as http://nanog.org/ or https://www.ansible.com/ are badly reachable whenever small mtu are involved ...

Hi Jean-Daniel.

If you're using using tunnels you'll want to have your tunnel endpoint adjust down the TCP MSS value to match the MTU of the tunnel interface. That way, you'll avoid problems with Path MTU Discovery. Even in those situations where PMTUD does work fine, doing TCP MSS adjustment will improve performance as the server does not need to spend an RTT to discover your reduced MTU.

(This isn't really an IPv6 issue, by the way - ISPs using PPPoE will typically perform MSS adjustment for IPv4 packets too.)

If you're using Linux as your tunnel endpoint, try:

ip6tables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

Tore

Saku Ytti

1:38 p.m.

Hey,

...

now for UDP, I don't know yet how does things like QUIC can be handled ...

Unfortunately the magic answer you were hoping does not exist, what they do is they just send smaller packets. -- ++ytti

Brandon Martin

1:45 p.m.

On 3/8/19 8:38 AM, Saku Ytti wrote:

...

Hey,

...
now for UDP, I don't know yet how does things like QUIC can be handled ...

Unfortunately the magic answer you were hoping does not exist, what they do is they just send smaller packets.

What we almost seem to be moving toward in this discussion is an IP header where the path can reduce the reported MTU which can then be read at the receiving end. This would be somewhat like ECN just with more than a couple bits. Of course, we know how well extension headers, much less hop-by-hop headers, are handled on IPv6... Re-writing a field in the L4 header works, but it seems ugly since it means every hop that reduces the MTU of the link has to know every L4 that participates in such a scheme. ICMP is nice in that it's totally protocol agnostic and doesn't require altering of packets in transit. It's a shame we can't reasonably rely on it being delivered. -- Brandon Martin

Jeroen Massar

2:09 p.m.

On 2019-03-08 14:45, Brandon Martin wrote:

...

On 3/8/19 8:38 AM, Saku Ytti wrote:

...
Hey,

...
now for UDP, I don't know yet how does things like QUIC can be handled ...

Unfortunately the magic answer you were hoping does not exist, what they do is they just send smaller packets.

What we almost seem to be moving toward in this discussion is an IP header where the path can reduce the reported MTU which can then be read at the receiving end. This would be somewhat like ECN just with more than a couple bits.

Something like what I once described in: https://jeroen.massar.ch/archive/drafts/draft-massar-v6man-mtu-label-02.txt ? :) Greets, Jeroen

William Herrin

5:40 p.m.

On Fri, Mar 8, 2019 at 5:45 AM Brandon Martin <lists.nanog@monmotha.net> wrote:

...

ICMP is nice in that it's totally protocol agnostic and doesn't require altering of packets in transit. It's a shame we can't reasonably rely on it being delivered.

Path MTU discovery is broken. It's the one place in TCP/IP where the end-to-end principle was thrown out the window and we keep on paying for it. A correct solution would have been for the intermediate router to truncate the packet. Not fragment, truncate. On receiving the truncated packet, the RECIPIENT (not the intermediate router) would report the truncation to the sender. This could easily have been done at layer 3, just like existing PMTUD. IPv4's inventors did a brilliant job with what they knew at the time. IPv6's inventors not so much. Sadly, they were too busy figuring out how to make IPv6 integrate well with ATM. Seriously, if you dig up a copy of the original IPng book I think it's chapter 3. Regards, Bill Herrin -- William Herrin ................ herrin@dirtside.com bill@herrin.us Dirtside Systems ......... Web: <http://www.dirtside.com/>

2416

Age (days ago)

2430

Last active (days ago)

List overview

Download

38 comments

19 participants

participants (19)

adamv0025＠netconsultings.com
Bjørn Mork
Brandon Martin
Fernando Gont
Hunter Fuller
Jean-Daniel Pauget
Jeroen Massar
Joel Jaeggli
Mark Andrews
Mark Tinka
Masataka Ohta
Saku Ytti
Stephen Satchell
sthaug＠nethelp.no
Tarko Tikan
Thomas Bellman
Tore Anderson
Töma Gavrichenkov
William Herrin