barak-online.net icmp performance vs. traceroute/tcptraceroute, ssh, ipsec
I was wondering if someone could shed some light on this little curiosity. US ping (sourced from different networks, including cable customer in NE) to the consumer grade residental israel dsl cpe (currently cisco 871) look really nice and sweet, gotomypc works alright, consumer is enjoying overall internet experience. vnc from customer to US is a non-starter. ssh from US almost never works. ipsec performance is horrid. traceroute/tcptraceroute show packet loss and MUCH higher rtt than the corresponding direct pings on the reported hop entries. Is this some sort of massaging or plain just "faking it"? Or is such things merely net-urban myth? Here is a traceroute snippet 8 dcr3-ge-0-2-1.newyork.savvis.net (204.70.193.98) 31.008 ms 31.539 ms 31.248 ms 9 208.173.129.14 (208.173.129.14) 62.847 ms 31.095 ms 30.690 ms 10 barak-01814-nyk-b2.c.telia.net (213.248.83.2) 30.529 ms 30.820 ms 30.495 ms 11 * * po1-3.bk3-bb.013bk.net (212.150.232.214) 277.722 ms 12 gi2-1.bk6-gw.013bk.net (212.150.234.94) 223.398 ms 235.616 ms 214.551 ms 13 * * gi11-24.bk6-acc3.013bk.net (212.29.206.37) 227.259 ms 14 212.29.206.60 (212.29.206.60) 244.369 ms013bk.net * 246.271 ms 15 89.1.148.230.dynamic.barak-online.net (89.1.148.230) 251.923 ms 256.817 ms * Compared to ICMP echo root@jml03:~# ping 89.1.148.230 PING 89.1.148.230 (89.1.148.230) 56(84) bytes of data. 64 bytes from 89.1.148.230: icmp_seq=1 ttl=240 time=190 ms --- 89.1.148.230 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 190.479/190.479/190.479/0.000 ms root@jml03:~# ping 89.1.148.230 PING 89.1.148.230 (89.1.148.230) 56(84) bytes of data. 64 bytes from 89.1.148.230: icmp_seq=1 ttl=240 time=186 ms 64 bytes from 89.1.148.230: icmp_seq=2 ttl=240 time=196 ms 64 bytes from 89.1.148.230: icmp_seq=3 ttl=240 time=187 ms 64 bytes from 89.1.148.230: icmp_seq=4 ttl=240 time=181 ms 64 bytes from 89.1.148.230: icmp_seq=5 ttl=240 time=184 ms 64 bytes from 89.1.148.230: icmp_seq=6 ttl=240 time=190 ms --- 89.1.148.230 ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 5001ms rtt min/avg/max/mdev = 181.572/187.756/196.277/4.685 ms root@jml03:~# ping 212.29.206.60 PING 212.29.206.60 (212.29.206.60) 56(84) bytes of data. 64 bytes from 212.29.206.60: icmp_seq=1 ttl=241 time=179 ms 64 bytes from 212.29.206.60: icmp_seq=2 ttl=241 time=171 ms 64 bytes from 212.29.206.60: icmp_seq=3 ttl=241 time=171 ms --- 212.29.206.60 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2001ms rtt min/avg/max/mdev = 171.388/174.375/179.968/3.972 ms root@jml03:~# ping 212.29.206.37 PING 212.29.206.37 (212.29.206.37) 56(84) bytes of data. 64 bytes from 212.29.206.37: icmp_seq=1 ttl=242 time=177 ms 64 bytes from 212.29.206.37: icmp_seq=2 ttl=242 time=176 ms 64 bytes from 212.29.206.37: icmp_seq=3 ttl=242 time=175 ms --- 212.29.206.37 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 175.412/176.516/177.187/0.858 ms Joe
traceroute/tcptraceroute show packet loss and MUCH higher rtt than the corresponding direct pings on the reported hop entries.
Is this some sort of massaging or plain just "faking it"? Or is such things merely net-urban myth?
the vast majority of routers on the internet respond very differently to traffic 'directed at them' as opposed to traffic 'routed through them'. many routers will punt traffic "at them" (such as icmp echo) to a low-priority control-plane (software) stack to respond to. this is vastly different to what may well be a hardware (ASIC) based forwarding path. many routers will also typically rate-limit the number of such queries they respond to per second. this may even be a tunable setting (e.g. CoPP on some Cisco products). i'd suggest that you don't try to read ANYTHING into comparing 'traceroute' with end-to-end icmp echo. nor that traceroute only shows one direction of traffic. if you have IPSec/SSH and/or TCP in general which simply "doesn't work right", i suggest you first verify that the end-to-end MTU is appropriate. my bet is that it isn't, and that PMTUD isn't working as expected because of some filtering and/or broken devices/configuration in the path. try sending pings at 1500 byte packets with DF set & see if they get through. my money is on they don't. cheers, lincoln.
Lincoln Dale wrote:
traceroute/tcptraceroute show packet loss and MUCH higher rtt than the corresponding direct pings on the reported hop entries.
Is this some sort of massaging or plain just "faking it"? Or is such things merely net-urban myth?
the vast majority of routers on the internet respond very differently to traffic 'directed at them' as opposed to traffic 'routed through them'.
Thanks for your reply. I did include icmp echo directly to each hop as a comparison.
the vast majority of routers on the internet respond very differently to traffic 'directed at them' as opposed to traffic 'routed through them'.
Thanks for your reply.
I did include icmp echo directly to each hop as a comparison.
i guess what i'm saying is that you can't read much from the backscatter of what a either: - ping of each hop - eliciting a response from each hop (as traceroute does) as the basis for determining much. you can perhaps derive SOME meaning from it, but that meaning rapidly diminishes when there are multiple intermediate networks involved, some of which you have no direct connectivity to verify problems with easily, likely different return path for traffic (asymmetric routing) etc. as i said before, if you have such terrible ssh/IPSec type performance, far less than you think is reasonable, then my money is on a MTU issue, and probably related to your DSL-based final hops. cheers, lincoln.
Lincoln Dale wrote:
I did include icmp echo directly to each hop as a comparison.
i guess what i'm saying is that you can't read much from the backscatter of what a either: - ping of each hop - eliciting a response from each hop (as traceroute does) as the basis for determining much.
you can perhaps derive SOME meaning from it, but that meaning rapidly diminishes when there are multiple intermediate networks involved, some of which you have no direct connectivity to verify problems with easily, likely different return path for traffic (asymmetric routing) etc.
When the cards consistently fall in certain patterns, you can actually read them quite easily. The standard control plane arguments dont apply when the pattern holds all the way through to equipment under your {remote-}control. In this specific instance, I find interesting the disparity of results between each hop ICMP echo and traceroute time exceeded processing, all the way up to the final hop. I wouldnt care if the application protocols rode well, but they dont seem to.
as i said before, if you have such terrible ssh/IPSec type performance, far less than you think is reasonable, then my money is on a MTU issue, and probably related to your DSL-based final hops.
cheers,
lincoln.
On Sun, May 06, 2007, Joe Maimon wrote:
When the cards consistently fall in certain patterns, you can actually read them quite easily.
Not if the cardplayer is lying..
The standard control plane arguments dont apply when the pattern holds all the way through to equipment under your {remote-}control.
In this specific instance, I find interesting the disparity of results between each hop ICMP echo and traceroute time exceeded processing, all the way up to the final hop.
I wouldnt care if the application protocols rode well, but they dont seem to.
Have you fired up ethereal/wireshark at either end and sniffed the packet flow to see exactly whats going on under these circumstances? Is there a difference between IPSEC and normal TCP traffic? Whats handling your IPSEC at either end? etc, etc. I've got plenty of graphs available which show modern Cisco equipment holding -horrible- ping variance compared to forwarding variance. Eg - Cat 4500 acting as LAN router and switch having ping RTT between <1ms and 15ms, but forwarding ping RTT (ie, to a PC at the other end doing 100% bugger all) is flat sub-1ms. (Makes for some -very- interesting VoIP statistics if you're not careful.) I say "You need more information before jumping to conclusions" and "the information you have, whilst probably quite valid when correlated with other information, isn't going to be very helpful by itself." Adrian
i guess what i'm saying is that you can't read much from the backscatter of what a either: - ping of each hop - eliciting a response from each hop (as traceroute does) as the basis for determining much.
you can perhaps derive SOME meaning from it, but that meaning rapidly diminishes when there are multiple intermediate networks involved, some of which you have no direct connectivity to verify problems with easily, likely different return path for traffic (asymmetric routing) etc.
When the cards consistently fall in certain patterns, you can actually read them quite easily.
The standard control plane arguments dont apply when the pattern holds all the way through to equipment under your {remote-}control.
it most certainly does. lets use an example network of: F | A---B---C---D---E | G you are looking at ICMP/traceroute responses through sending traffic to/from A & E. for all you know, there may be an ICMP DDoS attack going on from F-C or from G-C. the router 'C' is perfectly entitled to rate-limit the # of icmp responses it sends per second, and due to said traffic from F & G may be doing so. this would render your reading of the tea leaves of what A and E are seeing of C. this diagram is incredibly simplistic. for the "greater internet", we could add perhaps 50x connections at each of B, C & D, not to mention the path you posted showed upwards of a dozen hops - so more realistically there could be 4 or 5 order of magnitude more devices causing traffic in the path.
In this specific instance, I find interesting the disparity of results between each hop ICMP echo and traceroute time exceeded processing, all the way up to the final hop.
I wouldnt care if the application protocols rode well, but they dont seem to.
while you can paint a partial picture from elicited icmp responses, it certainly doesn't give you the full canvas. you've only tested traffic from A to E. what about A to F where those are ENDPOINTS and not ROUTERS? e.g. try a long-lived HTTP/FTP stream & see what you get. cheers, lincoln.
On Sun, 06 May 2007 20:27:20 -0400 Joe Maimon <jmaimon@ttec.com> wrote:
Lincoln Dale wrote:
traceroute/tcptraceroute show packet loss and MUCH higher rtt than the corresponding direct pings on the reported hop entries.
the vast majority of routers on the internet respond very differently to
Is this some sort of massaging or plain just "faking it"? Or is such things merely net-urban myth? traffic 'directed at them' as opposed to traffic 'routed through them'.
Thanks for your reply.
I did include icmp echo directly to each hop as a comparison.
Right, but from what you posted you didn't send 1500-byte packets. My reaction was the same as Lincoln's -- it smells like a Path MTU problem. To repeat -- ping and traceroute RTT from intermediate nodes is at best advisory, especially on timing. I should add -- DSL lines often use PPPoE, which in turn cuts the effective MTU available for user packets. If the PMTUD ICMP packets don't get through -- and they often don't, because of misconfigured firewalls -- you're likely to see problems like this. --Steve Bellovin, http://www.cs.columbia.edu/~smb
I did include icmp echo directly to each hop as a comparison.
Right, but from what you posted you didn't send 1500-byte packets. My reaction was the same as Lincoln's -- it smells like a Path MTU problem. To repeat -- ping and traceroute RTT from intermediate nodes is at best advisory, especially on timing.
I should add -- DSL lines often use PPPoE, which in turn cuts the effective MTU available for user packets. If the PMTUD ICMP packets don't get through -- and they often don't, because of misconfigured firewalls -- you're likely to see problems like this.
Of course, and thats why I have cut down ip mtu and tcp adjust mss and all the rest. Not making much of a difference. Furthermore, ipsec performance with normal sized icmp pings is what I was referring to, and those are nowwhere near full-sized.
On May 6, 2007, at 6:07 PM, Joe Maimon wrote:
Of course, and thats why I have cut down ip mtu and tcp adjust mss and all the rest. Not making much of a difference.
Um.. sorry if you mean more than you said, but where did you cut down the TCP MTU? If you did it on your routers, then you are creating or at least complementing the problem. The only way to make smaller MTUs work is to alter the MTU on both the origin and destination systems. Altering the MTU anywhere along the path only breaks things. -- Jo Rhett senior geek Silicon Valley Colocation Support Phone: 408-400-0550
Jo Rhett wrote:
On May 6, 2007, at 6:07 PM, Joe Maimon wrote:
Of course, and thats why I have cut down ip mtu and tcp adjust mss and all the rest. Not making much of a difference.
Um.. sorry if you mean more than you said, but where did you cut down the TCP MTU? If you did it on your routers, then you are creating or at least complementing the problem.
On the CPE dialer interface. On the ezvpn dvti virtual-template
The only way to make smaller MTUs work is to alter the MTU on both the origin and destination systems. Altering the MTU anywhere along the path only breaks things.
Lower than 1500 mtu always requires some kind of hack in real life. That would be the adjust-mss which is the hack-of-choice
Joe Maimon wrote:
Jo Rhett wrote:
On May 6, 2007, at 6:07 PM, Joe Maimon wrote:
Of course, and thats why I have cut down ip mtu and tcp adjust mss and all the rest. Not making much of a difference.
Um.. sorry if you mean more than you said, but where did you cut down the TCP MTU? If you did it on your routers, then you are creating or at least complementing the problem.
On the CPE dialer interface.
On the ezvpn dvti virtual-template
The only way to make smaller MTUs work is to alter the MTU on both the origin and destination systems. Altering the MTU anywhere along the path only breaks things.
Lower than 1500 mtu always requires some kind of hack in real life.
That would be the adjust-mss which is the hack-of-choice
I remember from my early DSL days, it was recommended to configure mtu=1480 on all interfaces connected to the internet or to the NAT-router. I remember at least the Grandstream ATA and DSL-NAT-router was brainded (lobotomized ICMP) enough simply to break connections when packets exceeded the 1480 bytes. Practically all german internet users are on dsl lines. Some smaller hosts with ftp or http servers are on dsl or tunnels, maybe with even smaller mtu. So mtu < 1500 is practically the norm. Kind regards Peter and Karin Dambier -- Peter and Karin Dambier Cesidian Root - Radice Cesidiana Rimbacher Strasse 16 D-69509 Moerlenbach-Bonsweiher +49(6209)795-816 (Telekom) +49(6252)750-308 (VoIP: sipgate.de) mail: peter@peter-dambier.de mail: peter@echnaton.arl.pirates http://iason.site.voila.fr/ https://sourceforge.net/projects/iason/ http://www.cesidianroot.com/
On May 7, 2007, at 12:45 PM, Peter Dambier wrote:
I remember from my early DSL days, it was recommended to configure mtu=1480 on all interfaces connected to the internet or to the NAT- router.
Yes, I remember that too. Back when I was a consultant I came out to a lot of sites and undid that change because it just breaks things if you did that on your router and not on your hosts. And note, hosts on *both* sides of every connection. Which in short means: doesn't work in Real Life. -- Jo Rhett senior geek Silicon Valley Colocation Support Phone: 408-400-0550
Lower than 1500 mtu always requires some kind of hack in real life.
That would be the adjust-mss which is the hack-of-choice
note that using 'adjust-mss' only adjusts the MSS for TCP. it won't do much good for already-encapsulated IPSec traffic with protocol 47 or tunneled over UDP... cheers, lincoln.
After all the discussion, the difference of last hop of the trace (from original email)
15 89.1.148.230.dynamic.barak-online.net (89.1.148.230) 251.923 ms 256.817 ms * And the ping result 64 bytes from 89.1.148.230: icmp_seq=6 ttl=240 time=190 ms is still quite interesting. I assumed the last hop is the cisco 871 (IP=89.1.148.230). It will be good to know what cause the difference if you have full controll of the 871.
Min On 5/7/07, Lincoln Dale <ltd@interlink.com.au> wrote:
Lower than 1500 mtu always requires some kind of hack in real life.
That would be the adjust-mss which is the hack-of-choice
note that using 'adjust-mss' only adjusts the MSS for TCP. it won't do much good for already-encapsulated IPSec traffic with protocol 47 or tunneled over UDP...
cheers,
lincoln.
Lincoln Dale wrote:
Lower than 1500 mtu always requires some kind of hack in real life.
That would be the adjust-mss which is the hack-of-choice
note that using 'adjust-mss' only adjusts the MSS for TCP. it won't do much good for already-encapsulated IPSec traffic with protocol 47 or tunneled over UDP...
Which is why its configured on the ipsec tunnel. And if there isnt one, on the the ingress interface. Which brings forth the observation that adjust mss should rather be used in route-map pbr style. I know we had that whole discussion right here, back when I was younger and dumber, such as here: http://www.merit.edu/mail.archives/nanog/2003-12/msg00088.html Anyways, initial reports are that as per my advice, customer calls vendor says "voip not working" vendor says "i changed something, wont tell you what, reboot everything in 30" and now things seem to work perfectly, strangely enough EVEN the traceroutes. This is obviously not best effort. Best guess would be "managed bandwidth" differentiated by ip ranges and that the "change" was a different pool assignment. I suspect the stellar icmp echo performance is also intentional. Compare: tcptraceroute lsvomonline.dnsalias.com -q 5 -w 1 80 -f 7 Selected device eth0, address 192.168.0.3, port 33204 for outgoing packets Tracing the path to lsvomonline.dnsalias.com (82.166.56.247) on TCP port 80 (www), 30 hops max 7 kar2-so-7-0-0.newyork.savvis.net (204.70.150.253) 45.008 ms 52.978 ms 32.404 ms 50.676 ms 33.657 ms 8 dcr3-ge-0-2-1.newyork.savvis.net (204.70.193.98) 49.037 ms 33.145 ms 48.029 ms 34.355 ms 48.453 ms 9 208.173.129.14 32.841 ms 32.669 ms 33.274 ms 31.861 ms 32.570 ms 10 barak-01814-nyk-b2.c.telia.net (213.248.83.2) 37.181 ms 32.600 ms 33.442 ms 32.696 ms 32.882 ms 11 po1-3.bk3-bb.013bk.net (212.150.232.214) 177.165 ms 175.852 ms 178.104 ms 179.217 ms 175.214 ms 12 gi2-1.bk6-gw.013bk.net (212.150.234.94) 180.923 ms 182.761 ms 179.170 ms 203.878 ms 178.905 ms 13 gi8-1.bk6-acc3.013bk.net (212.29.206.41) 174.266 ms 177.854 ms 177.198 ms 177.439 ms 176.400 ms 14 bk6-lns-3.013bk.net (212.29.206.55) 181.717 ms 176.460 ms 228.843 ms 174.942 ms 176.706 ms 15 82-166-56-247.barak-online.net (82.166.56.247) [open] 190.395 ms 188.043 ms 189.961 ms 200.064 ms 192.943 ms
Joe Maimon wrote:
This is obviously not best effort. Best guess would be "managed bandwidth" differentiated by ip ranges and that the "change" was a different pool assignment.
I suspect the stellar icmp echo performance is also intentional.
Or it could just be some QOS policing/shaping.
On Mon, May 07, 2007, Joe Maimon wrote:
Joe Maimon wrote:
This is obviously not best effort. Best guess would be "managed bandwidth" differentiated by ip ranges and that the "change" was a different pool assignment.
I suspect the stellar icmp echo performance is also intentional.
Or it could just be some QOS policing/shaping.
How asymmetric is the link? I've noticed quite dramatic differences when configuring even basic policy maps with WRED on DSL TX-side on CPE (ie, the small sized pipe upstream from client to ISP.) I can't (normally) control what the ISP is sending to me*, but I can try to make the best of the situation. And it can allow pipes to be almost fully utilised without massive performance drop-offs at the top end. Adrian * except in instances where I also run the ISP network..
Adrian Chadd wrote:
On Mon, May 07, 2007, Joe Maimon wrote:
Joe Maimon wrote:
This is obviously not best effort. Best guess would be "managed bandwidth" differentiated by ip ranges and that the "change" was a different pool assignment.
I suspect the stellar icmp echo performance is also intentional.
Or it could just be some QOS policing/shaping.
I was referring to policing/shaping on the providers network, probably well in advance of the broadband agg network. The 871 is doing a bit of cookie-cutter qos, but that hasnt changed.
On Mon, May 07, 2007, Joe Maimon wrote:
I was referring to policing/shaping on the providers network, probably well in advance of the broadband agg network.
The 871 is doing a bit of cookie-cutter qos, but that hasnt changed.
Hey, point taken. I'm just pointing out what I've seen in real life and how it leads to crappy perceived performance on those little 8xx series routers and their connected bandwidth. I see a lot of them with straight FIFO/taildrop on their DSL interfaces and then people wonder why a few uploads cause their browsing, VPNs, etc, to fail. I can't see whats going on with the 871 because, well, I don't think I've seen its config. :) Adrian
Anyways, initial reports are that as per my advice, customer calls vendor says "voip not working" vendor says "i changed something, wont tell you what, reboot everything in 30" and now things seem to work perfectly, strangely enough EVEN the traceroutes.
This is obviously not best effort. Best guess would be "managed bandwidth" differentiated by ip ranges and that the "change" was a different pool assignment.
its hard to say. could be that a peering connection was down or congested, that cold-potato routing within said provider was suboptimal, there are any number of rational reasons other than "managed bandwidth".
I suspect the stellar icmp echo performance is also intentional.
as stated previously, eliciting a response out of a router through "icmp processing" is vastly different to the standard process of forwarding a packet. there are any number of countless reasons why icmp-ttl-exceeded response times can be vastly-over or vastly-under the actual round-trip-time of a packet. if you still don't believe, do a search for "Cisco Control Plane Policing" or CoPP. other vendors have similar mechanisms also.
Compare: tcptraceroute lsvomonline.dnsalias.com -q 5 -w 1 80 -f 7 Selected device eth0, address 192.168.0.3, port 33204 for outgoing packets Tracing the path to lsvomonline.dnsalias.com (82.166.56.247) on TCP port 80 (www), 30 hops max 7 kar2-so-7-0-0.newyork.savvis.net (204.70.150.253) 45.008 ms 52.978 ms 32.404 ms 50.676 ms 33.657 ms 8 dcr3-ge-0-2-1.newyork.savvis.net (204.70.193.98) 49.037 ms 33.145 ms 48.029 ms 34.355 ms 48.453 ms [..]
using tcptraceroute in this manner is NO DIFFERENT to normal traceroute. the routers in the intermediate hops are still essentially doing icmp-ttl-exceeded behaviour, so the same "can't read anything into the latency" statements i've made a few times now. in either case, its good to hear you have your issue resolved. cheers, lincoln.
I agree with Dale. The problem should be with e2e TCP performance. Maybe there is misconfigured firewall which block SYN or ACK packet. Or, packet larger than 128B is dropped. As you can find in your data, ping and traceroute show different response speed. Maybe you could try layer4 traceroute, and try packet size bigger than 1000Byte. It will show you where the problem may exist. Joe ICMP or traceroute usually use small packet. --- Joe Maimon <jmaimon@ttec.com> wrote:
Lincoln Dale wrote:
traceroute/tcptraceroute show packet loss and MUCH higher rtt than the corresponding direct pings on the reported hop entries.
Is this some sort of massaging or plain just "faking it"? Or is such things merely net-urban myth?
the vast majority of routers on the internet respond very differently to traffic 'directed at them' as opposed to traffic 'routed through them'.
Thanks for your reply.
I did include icmp echo directly to each hop as a comparison.
____________________________________________________ Yahoo! Singapore Answers Real people. Real questions. Real answers. Share what you know at http://answers.yahoo.com.sg
participants (8)
-
Adrian Chadd
-
Jo Rhett
-
Joe Maimon
-
Joe Shen
-
Lincoln Dale
-
Min
-
Peter Dambier
-
Steven M. Bellovin