Europe-to-US congestion and packet loss on he.net network, and their NOC@ won't even respond
Dear NANOG@, I'm not exactly sure how else I can get he.net's attention, because I've been experiencing congestion issues between my dedi and Indiana for a couple of months now, all due to he.net's poor transit, as it turns out. The issue was complicated by the fact that the routes are asymmetric, and it appears as if the traffic loss is going on somewhere where there is none at all. I will just provide the data, and people can make their own conclusions, any insights are welcome. During all of this, since some late September 2013, all 4 networks involved have been contacted -- hetzner, init7, he.net, indiana; all except for he.net have responded and did troubleshooting. After pressing the lack of any kind of response from he.net, all they did was ask for a customer number, and that was back in September. I have not heard from their NOC@ ever since, with requests left unanswered, sans the "we have received your request" autoreply. Interestingly enough, only some of their Europe-to-US routes are blatantly congested and have very obvious packet loss (often making ssh unusable), whereas others appear to be doing just fine (at least, not losing packets and not experiencing jitter, and the increased latency). E.g. IPv6 routes don't appear affected, for example. IPv4 addresses in North America that are announced directly from AS6939 (e.g. Linode in Fremont) don't appear affected, either. But the multi-homed indiana.edu and wiscnet.net are affected. The single-homed ntp1.yycix.ca is affected, too. Probably other customers are affected as well. Where's the end to this? Or is the ongoing 0.5+% traffic loss, and the 140+ms avg latency on a 114ms route, with random spikes and jitter in certain hours of the day (generally around midnight ET), every day for several weeks or even months, an acceptable practice? From hetzner.de through he.net: Cns# date ; mtr --report{,-wide,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ????c????????.indiana.edu ; date Fri Nov 29 21:06:17 PST 2013 HOST: Cns??????? Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- static.??.???.4.46.clients.your-server.de 600 600 0.0% 0.5 1.0 1.3 4.9 1.1 2.|-- hos-tr1.juniper1.rz13.hetzner.de 600 600 0.0% 0.1 0.2 1.9 66.0 7.6 3.|-- core21.hetzner.de 600 600 0.0% 0.2 0.2 0.2 5.8 0.4 4.|-- core22.hetzner.de 600 600 0.0% 0.2 0.2 0.2 19.4 1.2 5.|-- core1.hetzner.de 600 600 0.0% 4.8 4.8 4.8 13.2 0.7 6.|-- juniper1.ffm.hetzner.de 600 600 0.0% 4.8 4.8 4.8 27.4 1.4 7.|-- 30gigabitethernet1-3.core1.ams1.he.net 600 600 0.0% 11.2 14.0 14.6 48.7 4.5 8.|-- 10gigabitethernet1-4.core1.lon1.he.net 600 600 0.0% 18.2 19.6 19.9 53.9 4.1 9.|-- 10gigabitethernet10-4.core1.nyc4.he.net 600 599 0.2% 87.0 116.1 116.7 145.7 12.4 10.|-- 100gigabitethernet7-2.core1.chi1.he.net 600 597 0.5% 106.6 135.4 136.1 192.0 13.3 11.|-- ??? 600 0 100.0 0.0 0.0 0.0 0.0 0.0 12.|-- et-11-0-0.945.rtr.ictc.indiana.gigapop.net 600 594 1.0% 113.3 139.3 139.7 166.1 11.4 13.|-- xe-0-3-0.11.br2.ictc.net.uits.iu.edu 600 596 0.7% 113.2 139.8 140.3 177.3 12.0 14.|-- ae-0.0.br2.bldc.net.uits.iu.edu 600 595 0.8% 114.2 140.1 140.6 183.2 11.8 15.|-- ae-10.0.cr3.bldc.net.uits.iu.edu 600 597 0.5% 114.3 140.3 140.8 165.0 11.5 16.|-- ????c????????.indiana.edu 600 597 0.5% 114.7 140.7 141.1 161.6 11.4 Fri Nov 29 21:08:52 PST 2013 Cns# unbuffer hping --icmp-ts ????c????????.indiana.edu | \ perl -ne 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ if (/tsrtt=(\d+)/) { \ print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\n"; }' 0 143.5 144 = 87 + 57 1 125.5 126 = 69 + 57 2 143.6 144 = 87 + 57 3 157.9 158 = 102 + 56 4 122.0 122 = 66 + 56 5 141.6 142 = 85 + 57 6 132.2 133 = 76 + 57 7 146.2 146 = 89 + 57 8 145.1 145 = 88 + 57 9 119.9 119 = 63 + 56 10 132.7 132 = 75 + 57 11 140.1 140 = 83 + 57 12 151.0 151 = 94 + 57 13 152.6 152 = 96 + 56 14 129.1 129 = 72 + 57 15 128.5 128 = 71 + 57 ^C Single-homed at he.net: Cns# date ; mtr --report{,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ntp1.yycix.ca ; date Fri Nov 29 21:16:14 PST 2013 HOST: Cns??????? Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- static.??.???.4.46.client 600 600 0.0% 0.5 1.0 1.3 10.2 1.2 2.|-- hos-tr4.juniper2.rz13.het 600 600 0.0% 0.1 0.2 2.0 153.9 9.8 3.|-- core22.hetzner.de 600 600 0.0% 0.2 0.2 0.2 10.6 0.6 4.|-- core1.hetzner.de 600 600 0.0% 4.8 4.8 4.8 16.4 0.9 5.|-- juniper1.ffm.hetzner.de 600 600 0.0% 4.8 4.8 4.8 36.4 1.5 6.|-- 30gigabitethernet1-3.core 600 600 0.0% 11.2 13.5 14.0 36.6 4.3 7.|-- 10gigabitethernet1-4.core 600 600 0.0% 18.0 21.5 21.8 43.1 4.0 8.|-- 10gigabitethernet10-4.cor 600 597 0.5% 93.2 128.0 128.3 157.5 8.9 9.|-- 10gigabitethernet1-2.core 600 596 0.7% 103.1 139.4 139.6 157.5 8.2 10.|-- 10gigabitethernet3-1.core 600 597 0.5% 128.2 164.9 165.1 181.9 8.2 11.|-- 10gigabitethernet1-1.core 600 593 1.2% 138.7 175.9 176.1 192.6 7.8 12.|-- sebo-systems-inc.gigabite 600 597 0.5% 139.0 176.4 176.5 187.5 6.9 13.|-- ??? 600 0 100.0 0.0 0.0 0.0 0.0 0.0 14.|-- ntp1.yycix.ca 600 597 0.5% 141.0 176.9 177.0 186.9 6.9 Fri Nov 29 21:18:32 PST 2013 Cns# traceroute -A ntp1.yycix.ca traceroute to ntp1.yycix.ca (192.75.191.6), 64 hops max, 40 byte packets 1 static.??.???.4.46.clients.your-server.de (46.4.???.??) [AS24940] 0.664 ms 0.648 ms 0.453 ms 2 hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) [AS24940] 23.985 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) [AS24940] 0.234 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) [AS24940] 0.238 ms 3 core22.hetzner.de (213.239.245.121) [AS24940] 0.238 ms core21.hetzner.de (213.239.245.81) [AS24940] 0.234 ms 0.236 ms 4 core1.hetzner.de (213.239.245.177) [AS24940] 4.811 ms 4.809 ms core22.hetzner.de (213.239.245.162) [AS24940] 0.248 ms 5 core1.hetzner.de (213.239.245.177) [AS24940] 4.831 ms juniper1.ffm.hetzner.de (213.239.245.5) [AS24940] 4.842 ms 4.826 ms 6 juniper1.ffm.hetzner.de (213.239.245.5) [AS24940] 4.857 ms 4.864 ms 30gigabitethernet1-3.core1.ams1.he.net (195.69.145.150) [AS1200] 11.233 ms 7 10gigabitethernet1-4.core1.lon1.he.net (72.52.92.81) [AS6939, AS6939] 19.869 ms 30gigabitethernet1-3.core1.ams1.he.net (195.69.145.150) [AS1200] 18.420 ms 11.255 ms 8 10gigabitethernet10-4.core1.nyc4.he.net (72.52.92.241) [AS6939, AS6939] 115.845 ms 101.875 ms 10gigabitethernet1-4.core1.lon1.he.net (72.52.92.81) [AS6939, AS6939] 17.249 ms 9 10gigabitethernet10-4.core1.nyc4.he.net (72.52.92.241) [AS6939, AS6939] 138.302 ms 10gigabitethernet1-2.core1.tor1.he.net (184.105.222.18) [AS6939] 120.449 ms 139.730 ms 10 10gigabitethernet1-2.core1.tor1.he.net (184.105.222.18) [AS6939] 134.755 ms 104.661 ms 10gigabitethernet3-1.core1.ywg1.he.net (184.105.223.221) [AS6939] 167.282 ms 11 10gigabitethernet1-1.core1.yyc1.he.net (184.105.223.214) [AS6939] 139.310 ms 10gigabitethernet3-1.core1.ywg1.he.net (184.105.223.221) [AS6939] 155.983 ms 155.910 ms 12 sebo-systems-inc.gigabitethernet2-23.core1.yyc1.he.net (216.218.214.250) [AS6939] 138.703 ms 178.530 ms 10gigabitethernet1-1.core1.yyc1.he.net (184.105.223.214) [AS6939] 172.423 ms 13 sebo-systems-inc.gigabitethernet2-23.core1.yyc1.he.net (216.218.214.250) [AS6939] 158 ms * * 14 * * ntp1.yycix.ca (192.75.191.6) [AS53339] 181.433 ms Cns# Cns# Cns# Cns# unbuffer hping --icmp-ts ntp1.yycix.ca | perl -ne \ 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ if (/tsrtt=(\d+)/) { \ print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\n"; }' 0 165.0 165 = 95 + 70 1 156.2 156 = 86 + 70 2 178.9 179 = 109 + 70 3 181.0 181 = 111 + 70 4 178.3 179 = 108 + 71 5 163.8 164 = 94 + 70 6 175.7 176 = 106 + 70 7 173.9 174 = 104 + 70 8 172.6 173 = 103 + 70 9 163.5 164 = 94 + 70 10 181.8 182 = 112 + 70 11 161.9 162 = 92 + 70 12 183.1 184 = 113 + 71 13 174.5 174 = 104 + 70 14 181.8 181 = 111 + 70 15 181.7 181 = 111 + 70 ^C Cns# From indiana.edu to hetzner.de; notice that the mtr by itself gives a false impression of a traffic loss at init7, whereas in reality, it's the reverse path through he.net that's causing the loss, as hping confirms: m: {5134} date ; sudo mtr --report{,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ?????? ; date Sat Nov 30 00:36:27 EST 2013 HOST: ????c????????.indiana.edu Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- 129.79.???.? 600 600 0.0% 0.4 0.7 0.9 24.7 1.5 2.|-- ae-13.0.br2.bldc.net.uits 600 600 0.0% 0.5 0.7 0.9 22.6 1.8 3.|-- ae-0.0.br2.ictc.net.uits. 600 600 0.0% 1.4 1.7 1.8 20.2 1.6 4.|-- xe-0-1-0.11.rtr.ictc.indi 600 600 0.0% 1.4 2.1 3.8 66.5 8.1 5.|-- 64.57.21.13 600 600 0.0% 6.0 7.2 8.4 72.9 8.0 6.|-- xe-2-2-0.0.ny0.tr-cps.int 600 600 0.0% 32.3 33.9 34.4 81.0 6.9 7.|-- paix-nyc.init7.net 600 600 0.0% 32.5 35.3 35.5 44.7 3.8 8.|-- r1lon1.core.init7.net 600 599 0.2% 100.1 104.7 104.9 146.5 7.5 9.|-- r1nue1.core.init7.net 600 599 0.2% 114.6 115.7 115.7 125.4 2.2 10.|-- gw-hetzner.init7.net 600 594 1.0% 112.4 141.3 142.4 241.9 18.2 11.|-- core12.hetzner.de 600 468 22.0% 112.2 142.7 144.0 203.4 20.3 12.|-- core21.hetzner.de 600 202 66.3% 114.4 143.7 145.0 204.1 20.1 13.|-- juniper1.rz13.hetzner.de 600 594 1.0% 114.7 141.4 142.1 212.2 14.3 14.|-- hos-tr2.ex3k11.rz13.hetzn 600 599 0.2% 113.8 123.9 125.5 218.2 21.8 15.|-- static.88-198-??-??.clien 599 592 1.2% 114.6 137.2 137.9 167.6 13.2 0.244u 1.766s 1:05.52 3.0% 0+0k 0+1io 0pf+0w Sat Nov 30 00:37:32 EST 2013 m: {5137} sudo script -q /dev/null hping3 --icmp-ts 88.198.??.?? | perl -ne 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ if (/tsrtt=(\d+)/) { \ print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\r\n"; }' 0 131.3 131 = 57 + 74 1 122.4 122 = 56 + 66 2 122.6 123 = 56 + 67 3 127.6 128 = 57 + 71 4 146.5 147 = 57 + 90 5 139.8 140 = 56 + 84 6 131.0 131 = 57 + 74 7 134.6 135 = 57 + 78 8 137.7 138 = 57 + 81 9 148.1 148 = 57 + 91 10 141.2 142 = 57 + 85 11 146.4 146 = 56 + 90 12 153.6 154 = 57 + 97 13 149.4 150 = 57 + 93 14 120.2 121 = 57 + 64 15 120.6 120 = 56 + 64 16 130.7 131 = 57 + 74 17 126.4 126 = 56 + 70 18 117.9 118 = 57 + 61 19 116.9 117 = 57 + 60 20 119.8 119 = 56 + 63 21 132.0 132 = 56 + 76 22 134.2 134 = 56 + 78 23 138.8 139 = 57 + 82 Note the ICMP timestamp data from hping above. From this ICMP timestamping data, it is obvious that the congestion is only happening on one path -- the one over he.net, and init7 is in the clear. Any further insights are welcome. But finding out about the ICMP timestamp feature has so far been the most useful thing in troubleshooting this issue; I'm surprised it's a rather unknown method to get to the bottom of these problems. However, even after finding out about the cause and the party responsible, the problem is yet to be exhausted. Any help appreciated. Best regards, Constantine.
Constantine, Please mail me offlist if Init7 can be of any help to resolve the case. -- Fredy Kuenzler Init7 (Switzerland) Ltd. St.-Georgen-Strasse 70 CH-8400 Winterthur Switzerland http://www.init7.net/
Am 30.11.2013 um 08:30 schrieb "Constantine A. Murenin" <mureninc@gmail.com>:
Dear NANOG@,
I'm not exactly sure how else I can get he.net's attention, because I've been experiencing congestion issues between my dedi and Indiana for a couple of months now, all due to he.net's poor transit, as it turns out. The issue was complicated by the fact that the routes are asymmetric, and it appears as if the traffic loss is going on somewhere where there is none at all.
I will just provide the data, and people can make their own conclusions, any insights are welcome.
During all of this, since some late September 2013, all 4 networks involved have been contacted -- hetzner, init7, he.net, indiana; all except for he.net have responded and did troubleshooting.
After pressing the lack of any kind of response from he.net, all they did was ask for a customer number, and that was back in September. I have not heard from their NOC@ ever since, with requests left unanswered, sans the "we have received your request" autoreply.
Interestingly enough, only some of their Europe-to-US routes are blatantly congested and have very obvious packet loss (often making ssh unusable), whereas others appear to be doing just fine (at least, not losing packets and not experiencing jitter, and the increased latency). E.g. IPv6 routes don't appear affected, for example. IPv4 addresses in North America that are announced directly from AS6939 (e.g. Linode in Fremont) don't appear affected, either. But the multi-homed indiana.edu and wiscnet.net are affected. The single-homed ntp1.yycix.ca is affected, too. Probably other customers are affected as well.
Where's the end to this?
Or is the ongoing 0.5+% traffic loss, and the 140+ms avg latency on a 114ms route, with random spikes and jitter in certain hours of the day (generally around midnight ET), every day for several weeks or even months, an acceptable practice?
From hetzner.de through he.net:
Cns# date ; mtr --report{,-wide,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ????c????????.indiana.edu ; date Fri Nov 29 21:06:17 PST 2013 HOST: Cns??????? Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- static.??.???.4.46.clients.your-server.de 600 600 0.0% 0.5 1.0 1.3 4.9 1.1 2.|-- hos-tr1.juniper1.rz13.hetzner.de 600 600 0.0% 0.1 0.2 1.9 66.0 7.6 3.|-- core21.hetzner.de 600 600 0.0% 0.2 0.2 0.2 5.8 0.4 4.|-- core22.hetzner.de 600 600 0.0% 0.2 0.2 0.2 19.4 1.2 5.|-- core1.hetzner.de 600 600 0.0% 4.8 4.8 4.8 13.2 0.7 6.|-- juniper1.ffm.hetzner.de 600 600 0.0% 4.8 4.8 4.8 27.4 1.4 7.|-- 30gigabitethernet1-3.core1.ams1.he.net 600 600 0.0% 11.2 14.0 14.6 48.7 4.5 8.|-- 10gigabitethernet1-4.core1.lon1.he.net 600 600 0.0% 18.2 19.6 19.9 53.9 4.1 9.|-- 10gigabitethernet10-4.core1.nyc4.he.net 600 599 0.2% 87.0 116.1 116.7 145.7 12.4 10.|-- 100gigabitethernet7-2.core1.chi1.he.net 600 597 0.5% 106.6 135.4 136.1 192.0 13.3 11.|-- ??? 600 0 100.0 0.0 0.0 0.0 0.0 0.0 12.|-- et-11-0-0.945.rtr.ictc.indiana.gigapop.net 600 594 1.0% 113.3 139.3 139.7 166.1 11.4 13.|-- xe-0-3-0.11.br2.ictc.net.uits.iu.edu 600 596 0.7% 113.2 139.8 140.3 177.3 12.0 14.|-- ae-0.0.br2.bldc.net.uits.iu.edu 600 595 0.8% 114.2 140.1 140.6 183.2 11.8 15.|-- ae-10.0.cr3.bldc.net.uits.iu.edu 600 597 0.5% 114.3 140.3 140.8 165.0 11.5 16.|-- ????c????????.indiana.edu 600 597 0.5% 114.7 140.7 141.1 161.6 11.4 Fri Nov 29 21:08:52 PST 2013
Cns# unbuffer hping --icmp-ts ????c????????.indiana.edu | \ perl -ne 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ if (/tsrtt=(\d+)/) { \ print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\n"; }' 0 143.5 144 = 87 + 57 1 125.5 126 = 69 + 57 2 143.6 144 = 87 + 57 3 157.9 158 = 102 + 56 4 122.0 122 = 66 + 56 5 141.6 142 = 85 + 57 6 132.2 133 = 76 + 57 7 146.2 146 = 89 + 57 8 145.1 145 = 88 + 57 9 119.9 119 = 63 + 56 10 132.7 132 = 75 + 57 11 140.1 140 = 83 + 57 12 151.0 151 = 94 + 57 13 152.6 152 = 96 + 56 14 129.1 129 = 72 + 57 15 128.5 128 = 71 + 57 ^C
Single-homed at he.net:
Cns# date ; mtr --report{,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ntp1.yycix.ca ; date Fri Nov 29 21:16:14 PST 2013 HOST: Cns??????? Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- static.??.???.4.46.client 600 600 0.0% 0.5 1.0 1.3 10.2 1.2 2.|-- hos-tr4.juniper2.rz13.het 600 600 0.0% 0.1 0.2 2.0 153.9 9.8 3.|-- core22.hetzner.de 600 600 0.0% 0.2 0.2 0.2 10.6 0.6 4.|-- core1.hetzner.de 600 600 0.0% 4.8 4.8 4.8 16.4 0.9 5.|-- juniper1.ffm.hetzner.de 600 600 0.0% 4.8 4.8 4.8 36.4 1.5 6.|-- 30gigabitethernet1-3.core 600 600 0.0% 11.2 13.5 14.0 36.6 4.3 7.|-- 10gigabitethernet1-4.core 600 600 0.0% 18.0 21.5 21.8 43.1 4.0 8.|-- 10gigabitethernet10-4.cor 600 597 0.5% 93.2 128.0 128.3 157.5 8.9 9.|-- 10gigabitethernet1-2.core 600 596 0.7% 103.1 139.4 139.6 157.5 8.2 10.|-- 10gigabitethernet3-1.core 600 597 0.5% 128.2 164.9 165.1 181.9 8.2 11.|-- 10gigabitethernet1-1.core 600 593 1.2% 138.7 175.9 176.1 192.6 7.8 12.|-- sebo-systems-inc.gigabite 600 597 0.5% 139.0 176.4 176.5 187.5 6.9 13.|-- ??? 600 0 100.0 0.0 0.0 0.0 0.0 0.0 14.|-- ntp1.yycix.ca 600 597 0.5% 141.0 176.9 177.0 186.9 6.9 Fri Nov 29 21:18:32 PST 2013 Cns# traceroute -A ntp1.yycix.ca traceroute to ntp1.yycix.ca (192.75.191.6), 64 hops max, 40 byte packets 1 static.??.???.4.46.clients.your-server.de (46.4.???.??) [AS24940] 0.664 ms 0.648 ms 0.453 ms 2 hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) [AS24940] 23.985 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) [AS24940] 0.234 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) [AS24940] 0.238 ms 3 core22.hetzner.de (213.239.245.121) [AS24940] 0.238 ms core21.hetzner.de (213.239.245.81) [AS24940] 0.234 ms 0.236 ms 4 core1.hetzner.de (213.239.245.177) [AS24940] 4.811 ms 4.809 ms core22.hetzner.de (213.239.245.162) [AS24940] 0.248 ms 5 core1.hetzner.de (213.239.245.177) [AS24940] 4.831 ms juniper1.ffm.hetzner.de (213.239.245.5) [AS24940] 4.842 ms 4.826 ms 6 juniper1.ffm.hetzner.de (213.239.245.5) [AS24940] 4.857 ms 4.864 ms 30gigabitethernet1-3.core1.ams1.he.net (195.69.145.150) [AS1200] 11.233 ms 7 10gigabitethernet1-4.core1.lon1.he.net (72.52.92.81) [AS6939, AS6939] 19.869 ms 30gigabitethernet1-3.core1.ams1.he.net (195.69.145.150) [AS1200] 18.420 ms 11.255 ms 8 10gigabitethernet10-4.core1.nyc4.he.net (72.52.92.241) [AS6939, AS6939] 115.845 ms 101.875 ms 10gigabitethernet1-4.core1.lon1.he.net (72.52.92.81) [AS6939, AS6939] 17.249 ms 9 10gigabitethernet10-4.core1.nyc4.he.net (72.52.92.241) [AS6939, AS6939] 138.302 ms 10gigabitethernet1-2.core1.tor1.he.net (184.105.222.18) [AS6939] 120.449 ms 139.730 ms 10 10gigabitethernet1-2.core1.tor1.he.net (184.105.222.18) [AS6939] 134.755 ms 104.661 ms 10gigabitethernet3-1.core1.ywg1.he.net (184.105.223.221) [AS6939] 167.282 ms 11 10gigabitethernet1-1.core1.yyc1.he.net (184.105.223.214) [AS6939] 139.310 ms 10gigabitethernet3-1.core1.ywg1.he.net (184.105.223.221) [AS6939] 155.983 ms 155.910 ms 12 sebo-systems-inc.gigabitethernet2-23.core1.yyc1.he.net (216.218.214.250) [AS6939] 138.703 ms 178.530 ms 10gigabitethernet1-1.core1.yyc1.he.net (184.105.223.214) [AS6939] 172.423 ms 13 sebo-systems-inc.gigabitethernet2-23.core1.yyc1.he.net (216.218.214.250) [AS6939] 158 ms * * 14 * * ntp1.yycix.ca (192.75.191.6) [AS53339] 181.433 ms Cns# Cns# Cns# Cns# unbuffer hping --icmp-ts ntp1.yycix.ca | perl -ne \ 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ if (/tsrtt=(\d+)/) { \ print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\n"; }' 0 165.0 165 = 95 + 70 1 156.2 156 = 86 + 70 2 178.9 179 = 109 + 70 3 181.0 181 = 111 + 70 4 178.3 179 = 108 + 71 5 163.8 164 = 94 + 70 6 175.7 176 = 106 + 70 7 173.9 174 = 104 + 70 8 172.6 173 = 103 + 70 9 163.5 164 = 94 + 70 10 181.8 182 = 112 + 70 11 161.9 162 = 92 + 70 12 183.1 184 = 113 + 71 13 174.5 174 = 104 + 70 14 181.8 181 = 111 + 70 15 181.7 181 = 111 + 70 ^C Cns#
From indiana.edu to hetzner.de; notice that the mtr by itself gives a false impression of a traffic loss at init7, whereas in reality, it's the reverse path through he.net that's causing the loss, as hping confirms:
m: {5134} date ; sudo mtr --report{,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ?????? ; date Sat Nov 30 00:36:27 EST 2013 HOST: ????c????????.indiana.edu Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- 129.79.???.? 600 600 0.0% 0.4 0.7 0.9 24.7 1.5 2.|-- ae-13.0.br2.bldc.net.uits 600 600 0.0% 0.5 0.7 0.9 22.6 1.8 3.|-- ae-0.0.br2.ictc.net.uits. 600 600 0.0% 1.4 1.7 1.8 20.2 1.6 4.|-- xe-0-1-0.11.rtr.ictc.indi 600 600 0.0% 1.4 2.1 3.8 66.5 8.1 5.|-- 64.57.21.13 600 600 0.0% 6.0 7.2 8.4 72.9 8.0 6.|-- xe-2-2-0.0.ny0.tr-cps.int 600 600 0.0% 32.3 33.9 34.4 81.0 6.9 7.|-- paix-nyc.init7.net 600 600 0.0% 32.5 35.3 35.5 44.7 3.8 8.|-- r1lon1.core.init7.net 600 599 0.2% 100.1 104.7 104.9 146.5 7.5 9.|-- r1nue1.core.init7.net 600 599 0.2% 114.6 115.7 115.7 125.4 2.2 10.|-- gw-hetzner.init7.net 600 594 1.0% 112.4 141.3 142.4 241.9 18.2 11.|-- core12.hetzner.de 600 468 22.0% 112.2 142.7 144.0 203.4 20.3 12.|-- core21.hetzner.de 600 202 66.3% 114.4 143.7 145.0 204.1 20.1 13.|-- juniper1.rz13.hetzner.de 600 594 1.0% 114.7 141.4 142.1 212.2 14.3 14.|-- hos-tr2.ex3k11.rz13.hetzn 600 599 0.2% 113.8 123.9 125.5 218.2 21.8 15.|-- static.88-198-??-??.clien 599 592 1.2% 114.6 137.2 137.9 167.6 13.2 0.244u 1.766s 1:05.52 3.0% 0+0k 0+1io 0pf+0w Sat Nov 30 00:37:32 EST 2013
m: {5137} sudo script -q /dev/null hping3 --icmp-ts 88.198.??.?? | perl -ne 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ if (/tsrtt=(\d+)/) { \ print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\r\n"; }' 0 131.3 131 = 57 + 74 1 122.4 122 = 56 + 66 2 122.6 123 = 56 + 67 3 127.6 128 = 57 + 71 4 146.5 147 = 57 + 90 5 139.8 140 = 56 + 84 6 131.0 131 = 57 + 74 7 134.6 135 = 57 + 78 8 137.7 138 = 57 + 81 9 148.1 148 = 57 + 91 10 141.2 142 = 57 + 85 11 146.4 146 = 56 + 90 12 153.6 154 = 57 + 97 13 149.4 150 = 57 + 93 14 120.2 121 = 57 + 64 15 120.6 120 = 56 + 64 16 130.7 131 = 57 + 74 17 126.4 126 = 56 + 70 18 117.9 118 = 57 + 61 19 116.9 117 = 57 + 60 20 119.8 119 = 56 + 63 21 132.0 132 = 56 + 76 22 134.2 134 = 56 + 78 23 138.8 139 = 57 + 82
Note the ICMP timestamp data from hping above. From this ICMP timestamping data, it is obvious that the congestion is only happening on one path -- the one over he.net, and init7 is in the clear.
Any further insights are welcome. But finding out about the ICMP timestamp feature has so far been the most useful thing in troubleshooting this issue; I'm surprised it's a rather unknown method to get to the bottom of these problems.
However, even after finding out about the cause and the party responsible, the problem is yet to be exhausted. Any help appreciated.
Best regards, Constantine.
On Fri, Nov 29, 2013 at 11:30 PM, Constantine A. Murenin <mureninc@gmail.com
wrote:
Dear NANOG@,
...
From hetzner.de through he.net:
Cns# date ; mtr --report{,-wide,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ????c????????.indiana.edu ; date
Using a 1/10th of a second interval is rather anti-social. I know we rate-limit ICMP traffic down, and such a short interval would be detected as attack traffic, and treated as such. I would take any results you get from such probes with a grain of salt. What results do you get with a more sane interval, one of at least 1 second or more? Matt
On 2013-W48-6 23:19 -0800, Matthew Petach wrote:
On Fri, Nov 29, 2013 at 11:30 PM, Constantine A. Murenin <mureninc@gmail.com
wrote:
Dear NANOG@,
...
From hetzner.de through he.net:
Cns# date ; mtr --report{,-wide,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ????c????????.indiana.edu ; date
Using a 1/10th of a second interval is rather anti-social. I know we rate-limit ICMP traffic down, and such a short interval would be detected as attack traffic, and treated as such.
I would take any results you get from such probes with a grain of salt. What results do you get with a more sane interval, one of at least 1 second or more?
Matt
For what it is worth, I used to think the same, until I saw several providers themselves suggest that 1000 packets should be sent, with the 0.1 s interval. So, this is considered normal and appropriate nowadays. Anyhow, is this better? I now saw a 2% traffic loss this night at a random test time, and the 151ms avg rtt on this 114ms rtt route. Cns# date ; mtr --report{,-wide,-cycles=600} --interval 0.5 --order "SRL BGAWV" -4 ????c????????.indiana.edu ; date Sat Nov 30 23:17:13 PST 2013 HOST: Cns??????? Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- static.??.???.4.46.clients.your-server.de 600 600 0.0% 0.5 1.0 1.3 4.6 1.1 2.|-- hos-tr1.juniper1.rz13.hetzner.de 600 600 0.0% 0.1 0.2 2.0 58.5 7.9 3.|-- core21.hetzner.de 600 600 0.0% 0.2 0.2 0.2 10.2 0.7 4.|-- core22.hetzner.de 600 600 0.0% 0.2 0.2 0.2 11.2 0.8 5.|-- core1.hetzner.de 600 600 0.0% 4.8 4.8 4.8 25.1 1.3 6.|-- juniper1.ffm.hetzner.de 600 600 0.0% 4.8 4.8 4.8 13.9 0.6 7.|-- 30gigabitethernet1-3.core1.ams1.he.net 600 595 0.8% 11.2 14.3 15.2 121.4 7.4 8.|-- 10gigabitethernet1-4.core1.lon1.he.net 600 600 0.0% 18.2 21.0 21.3 51.2 4.0 9.|-- 10gigabitethernet10-4.core1.nyc4.he.net 600 592 1.3% 86.9 125.9 126.4 160.7 10.6 10.|-- 100gigabitethernet7-2.core1.chi1.he.net 600 591 1.5% 106.6 145.1 145.4 190.9 10.5 11.|-- ??? 600 0 100.0 0.0 0.0 0.0 0.0 0.0 12.|-- et-11-0-0.945.rtr.ictc.indiana.gigapop.net 600 589 1.8% 114.3 148.9 149.2 167.9 9.1 13.|-- xe-0-3-0.11.br2.ictc.net.uits.iu.edu 600 589 1.8% 113.4 149.2 149.5 173.4 9.3 14.|-- ae-0.0.br2.bldc.net.uits.iu.edu 600 590 1.7% 114.5 150.2 150.5 175.6 9.3 15.|-- ae-10.0.cr3.bldc.net.uits.iu.edu 600 589 1.8% 114.3 150.5 150.8 181.0 9.1 16.|-- ????c????????.indiana.edu 600 589 1.8% 114.8 150.7 151.0 170.7 9.0 Sat Nov 30 23:24:06 PST 2013 The ICMP timestamp request/reply test still indicates that only one path is affected: the one from Europe to US over he.net. Cns# date ; unbuffer hping --icmp-ts --count 30 ????c????????.indiana.edu | \ perl -ne 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ if (/tsrtt=(\d+)/) { \ print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\n"; }' Sun Dec 1 00:55:46 PST 2013 0 151.3 151 = 91 + 60 1 154.2 154 = 93 + 61 2 127.8 127 = 67 + 60 3 123.6 123 = 63 + 60 4 136.9 137 = 76 + 61 5 149.6 149 = 89 + 60 6 147.4 147 = 87 + 60 7 133.5 133 = 73 + 60 8 152.2 152 = 92 + 60 9 137.3 137 = 77 + 60 10 143.7 144 = 84 + 60 11 124.5 124 = 64 + 60 12 141.4 141 = 81 + 60 13 118.0 118 = 58 + 60 14 153.6 154 = 94 + 60 15 137.7 138 = 78 + 60 16 119.9 120 = 60 + 60 17 130.6 131 = 71 + 60 18 144.6 145 = 85 + 60 19 138.8 139 = 79 + 60 20 155.7 156 = 96 + 60 21 128.8 129 = 69 + 60 22 153.0 153 = 93 + 60 23 146.5 147 = 87 + 60 24 137.2 138 = 77 + 61 25 153.3 154 = 94 + 60 26 146.3 147 = 87 + 60 27 150.1 151 = 91 + 60 28 150.5 150 = 90 + 60 29 143.5 143 = 83 + 60 Cheers, Constantine.
Using a 1/10th of a second interval is rather anti-social. I know we rate-limit ICMP traffic down, and such a short interval would be detected as attack traffic, and treated as such. ... For what it is worth, I used to think the same, until I saw several providers themselves suggest that 1000 packets should be sent, with the 0.1 s interval. So, this is considered normal and appropriate nowadays.
Disagree. You are of course free to use whatever rate you want between your own end points. If you want a response from routers on the public Internet, you should *expect* this to be rate limited. I certainly don't think that a 0.1s interval is appropriate - and configure control plane policing on "my" routers accordingly. Steinar Haug, Nethelp consulting, sthaug@nethelp.no
Using a 1/10th of a second interval is rather anti-social. I know we rate-limit ICMP traffic down, and such a short interval would be detected as attack traffic, and treated as such. For what it is worth, I used to think the same, until I saw several providers themselves suggest that 1000 packets should be sent, with the 0.1 s interval. So, this is considered normal and appropriate nowadays.
matthew is correct go back to your old way of thinking. while some providers may tolerate fast pings, few if any grown-ups do. and even thouse who think they do have routing engines which consider all pings as low priority rubbish to be dropped when there is any real work to do. randy
On 1.12.2013 11:49, Randy Bush wrote:
Using a 1/10th of a second interval is rather anti-social. I know we rate-limit ICMP traffic down, and such a short interval would be detected as attack traffic, and treated as such. For what it is worth, I used to think the same, until I saw several providers themselves suggest that 1000 packets should be sent, with the 0.1 s interval. So, this is considered normal and appropriate nowadays.
matthew is correct
go back to your old way of thinking. while some providers may tolerate fast pings, few if any grown-ups do. and even thouse who think they do have routing engines which consider all pings as low priority rubbish to be dropped when there is any real work to do.
From router control-plane perspective, rate-limiting should be always expected and result evaluation should take that in account. From router perspective, packet with TTL=1 is handled typically in software, in CPU with limited power (compared to modern hardware) and it's not a primary job of router to answer to each TTL=1 packet - that's correct view. But, provided reports shows ALSO end-to-end packet loss, which never will be caused by control-plane policers on transit routers, these packets will never hit router CPU. And there we talk about basic network neutrality - everyone should treat all data equally, independently of protocol used for data transport. Daniel
For anyone's following the story: the weeks-long congestion on he.net remains, however, hetzner has switched my original route to an alternative uplink. I'm no longer experiencing the he.net evening jitter that would bring my avg rtt from the non-congested 114ms to an average of 140ms and more during the busy times. Other sites that have an uplink from he.net are still affected, with a noticeable jitter, delay and some packet loss (which seems to increase with the increase in the avg rtt). Was: Tue Dec 3 22:25:38 PST 2013 Wed Dec 4 07:25:38 CET 2013 HOST: Cns??????? Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- static.??.???.4.46.clients.your-server.de 900 900 0.0% 0.5 0.9 1.3 7.3 1.2 2.|-- hos-tr1.juniper1.rz13.hetzner.de 900 900 0.0% 0.2 0.2 2.1 101.2 9.0 3.|-- core21.hetzner.de 900 900 0.0% 0.2 0.2 0.2 19.7 1.1 4.|-- core22.hetzner.de 900 897 0.3% 0.2 0.2 0.2 19.6 1.1 5.|-- core1.hetzner.de 900 900 0.0% 4.8 4.8 4.8 16.4 1.1 6.|-- juniper1.ffm.hetzner.de 900 893 0.8% 4.8 4.8 4.8 19.1 0.7 7.|-- 30gigabitethernet1-3.core1.ams1.he.net 900 900 0.0% 11.2 14.0 14.8 123.4 6.5 8.|-- 10ge1-4.core1.lon1.he.net 900 900 0.0% 18.2 19.9 20.2 59.7 3.9 9.|-- 10ge10-4.core1.nyc4.he.net 900 896 0.4% 86.9 113.6 114.2 148.4 11.4 10.|-- 100ge7-2.core1.chi1.he.net 900 898 0.2% 106.6 133.0 133.6 184.6 12.4 11.|-- ??? 900 0 100.0 0.0 0.0 0.0 0.0 0.0 12.|-- et-11-0-0.945.rtr.ictc.indiana.gigapop.net 900 899 0.1% 113.3 136.7 137.1 162.3 10.8 13.|-- xe-0-3-0.11.br2.ictc.net.uits.iu.edu 900 895 0.6% 113.3 137.2 137.7 183.3 11.1 14.|-- ae-0.0.br2.bldc.net.uits.iu.edu 900 900 0.0% 114.3 137.8 138.2 162.8 10.7 15.|-- ae-10.0.cr3.bldc.net.uits.iu.edu 900 899 0.1% 114.4 137.9 138.4 167.5 10.7 16.|-- m???c????????.indiana.edu 900 898 0.2% 114.5 138.1 138.5 159.0 10.4 Tue Dec 3 22:43:16 PST 2013 Wed Dec 4 07:43:16 CET 2013 0 154.8 155 = 94 + 61 1 127.9 128 = 67 + 61 2 125.0 125 = 64 + 61 3 131.5 132 = 71 + 61 4 131.2 132 = 71 + 61 5 117.4 118 = 57 + 61 6 132.9 133 = 72 + 61 7 143.4 144 = 83 + 61 8 138.7 139 = 78 + 61 9 131.5 131 = 70 + 61 10 142.2 142 = 81 + 61 11 152.2 152 = 91 + 61 12 125.2 125 = 64 + 61 13 125.6 125 = 64 + 61 14 140.8 141 = 80 + 61 15 150.9 151 = 89 + 62 Tue Dec 3 22:43:31 PST 2013 Wed Dec 4 07:43:31 CET 2013 Now: Tue Dec 3 23:01:20 PST 2013 Wed Dec 4 08:01:20 CET 2013 HOST: Cns??????? Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- static.??.???.4.46.clients.your-server.de 900 900 0.0% 0.5 0.9 1.3 7.0 1.1 2.|-- hos-tr1.juniper1.rz13.hetzner.de 900 900 0.0% 0.1 0.2 4.3 245.4 18.9 3.|-- core21.hetzner.de 900 900 0.0% 0.2 0.2 0.3 99.4 3.5 4.|-- core12.hetzner.de 900 900 0.0% 2.7 2.8 2.8 23.1 1.9 5.|-- juniper4.rz2.hetzner.de 900 899 0.1% 2.8 2.8 2.8 29.0 1.1 6.|-- te0-0-2-1.nr21.b040138-0.nue01.atlas.cogentco.com 900 900 0.0% 3.1 3.3 3.4 7.1 0.6 7.|-- te2-4.ccr01.nue01.atlas.cogentco.com 900 900 0.0% 3.1 6.6 25.9 244.4 51.0 | `|-- 154.25.0.13 8.|-- te0-2-0-3.ccr22.muc01.atlas.cogentco.com 900 900 0.0% 5.7 5.8 5.9 11.1 0.5 9.|-- be2229.ccr22.fra03.atlas.cogentco.com 900 900 0.0% 11.1 11.4 11.4 14.4 0.5 | `|-- 154.54.38.189 10.|-- te0-3-0-2.mpd22.ams03.atlas.cogentco.com 900 900 0.0% 17.7 17.9 17.9 49.1 1.1 | `|-- 154.54.39.193 11.|-- be2276.ccr22.lon13.atlas.cogentco.com 900 900 0.0% 25.3 27.4 27.4 33.2 1.3 | `|-- 154.54.37.106 | |-- 154.54.58.69 12.|-- te0-7-0-33.ccr22.bos01.atlas.cogentco.com 900 900 0.0% 93.1 93.9 93.9 98.8 1.2 | `|-- 154.54.44.189 | |-- 130.117.0.185 13.|-- te7-8.ccr02.alb02.atlas.cogentco.com 900 900 0.0% 96.8 119.9 128.8 364.2 56.9 | `|-- 154.54.27.153 14.|-- te7-8.ccr01.buf02.atlas.cogentco.com 900 900 0.0% 103.3 126.8 135.8 444.9 59.0 15.|-- te0-5-0-2.ccr21.cle04.atlas.cogentco.com 900 900 0.0% 108.6 109.5 109.5 113.5 1.3 16.|-- te3-2.ccr01.cmh02.atlas.cogentco.com 900 900 0.0% 111.1 134.8 143.5 467.0 59.4 17.|-- te4-7.ccr01.cvg02.atlas.cogentco.com 900 900 0.0% 113.8 122.5 126.2 467.9 39.6 18.|-- te3-3.ccr01.ind01.atlas.cogentco.com 900 899 0.1% 116.1 137.8 145.4 457.7 56.0 19.|-- 38.104.214.6 900 900 0.0% 115.8 118.3 118.5 206.0 8.1 20.|-- xe-0-1-0.11.rtr.ictc.indiana.gigapop.net 900 900 0.0% 115.9 116.7 116.7 128.4 1.5 21.|-- xe-0-3-0.11.br2.ictc.net.uits.iu.edu 900 900 0.0% 115.9 117.0 117.1 168.6 3.6 22.|-- ae-0.0.br2.bldc.net.uits.iu.edu 900 900 0.0% 116.8 117.7 117.7 130.8 1.8 23.|-- ae-10.0.cr3.bldc.net.uits.iu.edu 900 899 0.1% 116.8 117.8 117.8 135.1 1.9 24.|-- m???c????????.indiana.edu 900 899 0.1% 117.1 117.9 117.9 122.0 1.2 Tue Dec 3 23:19:56 PST 2013 Wed Dec 4 08:19:56 CET 2013 0 120.1 120 = 57 + 63 1 117.3 117 = 54 + 63 2 120.3 120 = 57 + 63 3 117.3 117 = 54 + 63 4 117.4 118 = 54 + 64 5 117.3 118 = 54 + 64 6 120.0 120 = 57 + 63 7 117.3 118 = 54 + 64 8 117.3 118 = 54 + 64 9 117.6 118 = 54 + 64 10 117.3 118 = 54 + 64 11 117.4 118 = 54 + 64 12 117.4 117 = 53 + 64 13 118.6 118 = 55 + 63 14 117.5 117 = 53 + 64 15 117.3 117 = 53 + 64 Tue Dec 3 23:20:11 PST 2013 Wed Dec 4 08:20:11 CET 2013 Yet: Cns# sh -c 'while (true); do date; env TZ=Europe/Berlin date; mtr --report{,-wide,-cycles=900} --interval 1 --order "SRL BGAWV" -4 ntp1.yycix.ca; date; env TZ=Europe/Berlin date; unbuffer hping --icmp-ts --count 16 ntp1.yycix.ca | perl -ne "if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {(\$s,\$p)=(\$1,\$2);} if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {(\$o,\$r,\$t)=(\$1,\$2,\$3);} if (/tsrtt=(\d+)/) {print \$s,qq/\t/,\$p,qq/\t/,\$1,qq/ = /,\$r-\$o,qq/ + /,\$o+\$1-\$t,qq/\n/;}"; done' Tue Dec 3 23:32:37 PST 2013 Wed Dec 4 08:32:37 CET 2013 HOST: Cns??????? Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- static.??.???.4.46.clients.your-server.de 900 900 0.0% 0.5 0.9 1.2 9.7 1.2 2.|-- hos-tr4.juniper2.rz13.hetzner.de 900 900 0.0% 0.2 0.2 2.0 138.3 8.9 3.|-- core22.hetzner.de 900 900 0.0% 0.2 0.2 0.2 8.5 0.4 4.|-- core1.hetzner.de 900 900 0.0% 4.8 4.8 4.8 16.0 0.7 5.|-- juniper1.ffm.hetzner.de 900 900 0.0% 4.8 4.8 4.8 13.8 0.7 6.|-- 30gigabitethernet1-3.core1.ams1.he.net 900 900 0.0% 11.2 14.2 14.9 106.2 6.2 7.|-- 10ge1-4.core1.lon1.he.net 900 900 0.0% 18.0 20.5 20.8 55.6 4.1 8.|-- 10ge10-4.core1.nyc4.he.net 900 895 0.6% 86.4 119.3 120.0 158.6 13.1 9.|-- 10ge1-2.core1.tor1.he.net 900 896 0.4% 96.4 128.4 129.1 159.6 12.9 10.|-- 10ge3-1.core1.ywg1.he.net 900 895 0.6% 121.8 152.2 152.7 181.5 12.6 11.|-- 10ge1-1.core1.yyc1.he.net 900 888 1.3% 135.7 168.0 168.5 197.4 13.0 12.|-- sebo-systems-inc.gigabitethernet2-23.core1.yyc1.he.net 900 890 1.1% 138.4 168.2 168.6 190.9 12.5 13.|-- ??? 900 0 100.0 0.0 0.0 0.0 0.0 0.0 14.|-- ntp1.yycix.ca 900 893 0.8% 138.7 168.8 169.2 191.9 12.2 Tue Dec 3 23:49:07 PST 2013 Wed Dec 4 08:49:07 CET 2013 0 180.0 180 = 109 + 71 1 155.9 156 = 84 + 72 2 143.6 144 = 72 + 72 3 179.8 180 = 108 + 72 4 161.0 161 = 89 + 72 5 155.9 156 = 84 + 72 6 168.0 168 = 97 + 71 7 164.5 165 = 93 + 72 8 177.9 178 = 107 + 71 9 159.6 160 = 88 + 72 10 179.1 180 = 108 + 72 11 178.6 178 = 106 + 72 12 149.8 149 = 78 + 71 13 166.2 166 = 94 + 72 14 146.8 147 = 75 + 72 15 153.7 154 = 82 + 72 Tue Dec 3 23:49:22 PST 2013 Wed Dec 4 08:49:22 CET 2013 Cheers, Constantine. On 2013-W48-7 01:11 -0800, Constantine A. Murenin wrote:
Anyhow, is this better?
I now saw a 2% traffic loss this night at a random test time, and the 151ms avg rtt on this 114ms rtt route.
Cns# date ; mtr --report{,-wide,-cycles=600} --interval 0.5 --order "SRL BGAWV" -4 ????c????????.indiana.edu ; date Sat Nov 30 23:17:13 PST 2013 HOST: Cns??????? Snt Rcv Loss% Best Gmean Avg Wrst StDev 1.|-- static.??.???.4.46.clients.your-server.de 600 600 0.0% 0.5 1.0 1.3 4.6 1.1 2.|-- hos-tr1.juniper1.rz13.hetzner.de 600 600 0.0% 0.1 0.2 2.0 58.5 7.9 3.|-- core21.hetzner.de 600 600 0.0% 0.2 0.2 0.2 10.2 0.7 4.|-- core22.hetzner.de 600 600 0.0% 0.2 0.2 0.2 11.2 0.8 5.|-- core1.hetzner.de 600 600 0.0% 4.8 4.8 4.8 25.1 1.3 6.|-- juniper1.ffm.hetzner.de 600 600 0.0% 4.8 4.8 4.8 13.9 0.6 7.|-- 30gigabitethernet1-3.core1.ams1.he.net 600 595 0.8% 11.2 14.3 15.2 121.4 7.4 8.|-- 10gigabitethernet1-4.core1.lon1.he.net 600 600 0.0% 18.2 21.0 21.3 51.2 4.0 9.|-- 10gigabitethernet10-4.core1.nyc4.he.net 600 592 1.3% 86.9 125.9 126.4 160.7 10.6 10.|-- 100gigabitethernet7-2.core1.chi1.he.net 600 591 1.5% 106.6 145.1 145.4 190.9 10.5 11.|-- ??? 600 0 100.0 0.0 0.0 0.0 0.0 0.0 12.|-- et-11-0-0.945.rtr.ictc.indiana.gigapop.net 600 589 1.8% 114.3 148.9 149.2 167.9 9.1 13.|-- xe-0-3-0.11.br2.ictc.net.uits.iu.edu 600 589 1.8% 113.4 149.2 149.5 173.4 9.3 14.|-- ae-0.0.br2.bldc.net.uits.iu.edu 600 590 1.7% 114.5 150.2 150.5 175.6 9.3 15.|-- ae-10.0.cr3.bldc.net.uits.iu.edu 600 589 1.8% 114.3 150.5 150.8 181.0 9.1 16.|-- ????c????????.indiana.edu 600 589 1.8% 114.8 150.7 151.0 170.7 9.0 Sat Nov 30 23:24:06 PST 2013
The ICMP timestamp request/reply test still indicates that only one path is affected: the one from Europe to US over he.net.
Cns# date ; unbuffer hping --icmp-ts --count 30 ????c????????.indiana.edu | \ perl -ne 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ if (/tsrtt=(\d+)/) { \ print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\n"; }' Sun Dec 1 00:55:46 PST 2013 0 151.3 151 = 91 + 60 1 154.2 154 = 93 + 61 2 127.8 127 = 67 + 60 3 123.6 123 = 63 + 60 4 136.9 137 = 76 + 61 5 149.6 149 = 89 + 60 6 147.4 147 = 87 + 60 7 133.5 133 = 73 + 60 8 152.2 152 = 92 + 60 9 137.3 137 = 77 + 60 10 143.7 144 = 84 + 60 11 124.5 124 = 64 + 60 12 141.4 141 = 81 + 60 13 118.0 118 = 58 + 60 14 153.6 154 = 94 + 60 15 137.7 138 = 78 + 60 16 119.9 120 = 60 + 60 17 130.6 131 = 71 + 60 18 144.6 145 = 85 + 60 19 138.8 139 = 79 + 60 20 155.7 156 = 96 + 60 21 128.8 129 = 69 + 60 22 153.0 153 = 93 + 60 23 146.5 147 = 87 + 60 24 137.2 138 = 77 + 61 25 153.3 154 = 94 + 60 26 146.3 147 = 87 + 60 27 150.1 151 = 91 + 60 28 150.5 150 = 90 + 60 29 143.5 143 = 83 + 60
Cheers, Constantine.
Matthew Petach <mpetach@netflight.com> writes:
Using a 1/10th of a second interval is rather anti-social. I know we rate-limit ICMP traffic down, and such a short interval would be detected as attack traffic, and treated as such.
This should be obvious to everyone here but just in case, there's also a huge difference between hammering the control plane of every router along the path due to TTL expiration (mtr) and trying to smoke out intermittent performance problems between end points with a few hundred packets/second of various sizes of icmp or udp *between those end points*. Folks should expect the former to be rate limited - a reasonable control plane policing policy is not optional these days. -r
participants (7)
-
Constantine A. Murenin
-
Daniel Suchy
-
Fredy Kuenzler
-
Matthew Petach
-
Randy Bush
-
Rob Seastrom
-
sthaug@nethelp.no