Re: ECN

13 Nov 2019

      I am testing disabling our use of ECMP as it is not strictly necessary and
we are moving to a new platform anyway. Waiting for feedback from the
customer to hear if this fixes the issue.

In any case, is it not recommended that users of anycast proxy packets that
arrive at the wrong place? To avoid this kind of issue.

Regards,

Baldur

On Wed, Nov 13, 2019 at 6:35 PM Todd Underwood <toddunder@gmail.com> wrote:
...
as one of the authors of that talk, it definitely is "a thing", has been
for years and years and years, and indeed, mostly works.
t
On Wed, Nov 13, 2019 at 12:18 PM Hunter Fuller <hf0002+nanog@uah.edu>
wrote:
...
It is certainly odd, but it's definitely a "thing."
https://archive.nanog.org/meetings/nanog37/presentations/matt.levine.pdf
On Wed, Nov 13, 2019 at 10:24 AM Matt Corallo <nanog@as397444.net> wrote:
...
This sounds like a bug on Cloudflare’s end (cause trying to do anycast
TCP is... out of spec to say the least), not a bug in ECN/ECMP.
...
...
On Nov 13, 2019, at 11:07, Toke Høiland-Jørgensen via NANOG <
...
...

...
Hello
I have a customer that believes my network has a ECN problem. We do
not, we just move packets. But how do I prove it?
Is there a tool that checks for ECN trouble? Ideally something I
could
...
...
run on the NLNOG Ring network.
I believe it likely that it is the destination that has the problem.
Hi Baldur
I believe I may be that customer :)
First of all, thank you for looking into the issue! We've been having
great fun over on the ecn-sane mailing list trying to figure out
what's
going on. I'll summarise below, but see this thread for the discussion
and debugging details:
https://lists.bufferbloat.net/pipermail/ecn-sane/2019-November/000527.html
...
The short version is that the problem appears to come from a
combination
...
of the ECMP routing in your network, and Cloudflare's heavy use of
anycast. Specifically, a router in your network appears to be doing
ECMP
by hashing on the packet header, *including the ECN bits*. This breaks
TCP connections with ECN because the TCP SYN (with no ECN bits set)
end
up taking a different path than the rest of the flow (which is marked
as
ECT(0)). When the destination is anycasted, this means that the data
packets go to a different server than the SYN did. This second server
doesn't recognise the connection, and so replies with a TCP RST. To
fix
this, simply exclude the ECN bits (or the whole TOS byte) from your
router's ECMP hash.
For a longer exposition, see below. You should be able to verify this
from somewhere else in the network, but if there's anything else you
want me to test, do let me know. Also, would you mind sharing the
router
make and model that does this? We're trying to collect real-world
examples of network problems caused by ECN and this is definitely an
interesting example.
-Toke
The long version:
From my end I can see that I have two paths to Cloudflare; which is
taken appears to be based on a hash of the packet header, as can be
seen
by varying the source port:
$ traceroute -q 1 --sport=10000 104.24.125.13
traceroute to 104.24.125.13 (104.24.125.13), 30 hops max, 60 byte
...
...
1  _gateway (10.42.3.1)  0.357 ms
2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  4.707 ms
3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.283 ms
4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.667 ms
5  netnod-ix-cph-blue-9000.cloudflare.com (212.237.192.246)  1.406 ms
6  104.24.125.13 (104.24.125.13)  1.322 ms
$ traceroute -q 1 --sport=10001 104.24.125.13
traceroute to 104.24.125.13 (104.24.125.13), 30 hops max, 60 byte
...
...
1  _gateway (10.42.3.1)  0.293 ms
2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  3.430 ms
3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.194 ms
4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.297 ms
5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.805 ms
6  149.6.142.130 (149.6.142.130)  6.925 ms
7  104.24.125.13 (104.24.125.13)  1.501 ms
This is fine in itself. However, the problem stems from the fact that
the ECN bits in the IP header are also included in the ECMP hash (-t
sets the TOS byte; -t 1 ends up as ECT(0) on the wire and -t 2 is
ECT(1)):
$ traceroute -q 1 --sport=10000 104.24.125.13 -t 1
traceroute to 104.24.125.13 (104.24.125.13), 30 hops max, 60 byte
...
...
1  _gateway (10.42.3.1)  0.336 ms
2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  6.964 ms
3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.056 ms
4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.512 ms
5  netnod-ix-cph-blue-9000.cloudflare.com (212.237.192.246)  1.313 ms
6  104.24.125.13 (104.24.125.13)  1.210 ms
$ traceroute -q 1 --sport=10000 104.24.125.13 -t 2
traceroute to 104.24.125.13 (104.24.125.13), 30 hops max, 60 byte
...
...
1  _gateway (10.42.3.1)  0.339 ms
2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  2.565 ms
3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.301 ms
4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.339 ms
5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.570 ms
6  149.6.142.130 (149.6.142.130)  6.888 ms
7  104.24.125.13 (104.24.125.13)  1.785 ms
So why is this a problem? The TCP SYN packet first needs to negotiate
ECN, so it is sent without any ECN bits set in the header; after
negotiation succeeds, the data packets will be marked as ECT(0). But
because that becomes part of the ECMP hash, those packets will take
another path. And since the destination is anycasted, that means they
will also end up at a different endpoint. This second endpoint won't
recognise the connection, and reply with a TCP RST. This is clearly
visible in tcpdump; notice the different TOS values, and that the RST
packet has a different TTL than the SYN-ACK:
12:21:47.816359 IP (tos 0x0, ttl 64, id 25687, offset 0, flags [DF],
...
...
10.42.3.130.34420 > 104.24.125.13.80: Flags [SEW], cksum 0xf2ff
(incorrect -> 0x0853), seq 3345293502, win 64240, options [mss
1460,sackOK,TS val 4248691972 ecr 0,nop,wscale 7], length 0
12:21:47.823395 IP (tos 0x0, ttl 58, id 0, offset 0, flags [DF],
...
...
104.24.125.13.80 > 10.42.3.130.34420: Flags [S.E], cksum 0x9f4a
(correct), seq 1936951409, ack 3345293503, win 29200, options [mss
1400,nop,nop,sackOK,nop,wscale 10], length 0
12:21:47.823479 IP (tos 0x0, ttl 64, id 25688, offset 0, flags [DF],
nanog@nanog.org> wrote:
packets
packets
packets
packets
proto TCP (6), length 60)
proto TCP (6), length 52)
proto TCP (6), length 40)
...
...
10.42.3.130.34420 > 104.24.125.13.80: Flags [.], cksum 0xf2eb
(incorrect -> 0x503e), seq 1, ack 1, win 502, length 0
12:21:47.823665 IP (tos 0x2,ECT(0), ttl 64, id 25689, offset 0, flags
[DF], proto TCP (6), length 117)
   10.42.3.130.34420 > 104.24.125.13.80: Flags [P.], cksum 0xf338
(incorrect -> 0xc1d4), seq 1:78, ack 1, win 502, length 77: HTTP, length: 77
   GET / HTTP/1.1
   Host: 104.24.125.13
   User-Agent: curl/7.66.0
   Accept: */*
12:21:47.825485 IP (tos 0x2,ECT(0), ttl 60, id 0, offset 0, flags
[DF], proto TCP (6), length 40)
   104.24.125.13.80 > 10.42.3.130.34420: Flags [R], cksum 0x3a65
(correct), seq 1936951410, win 0, length 0
The fix is to stop hashing on the ECN bits when doing ECMP. You could
keep hashing on the diffserv part of the TOS field if you want, but I
think it would also be fine to just exclude the TOS field entirely
from
the hash.

Re: ECN

Baldur Norddahl