Can you explain why paths to same host diverge?
Folks, We've been seeing some large latencies to sites inside of UUNET lately. In trying to figure out where the latency is coming from (UUNET or our provider, Sprint), I observed something which has me baffled. I have two hosts on the same subnet. One of them is 128.114.2.53 and the other is 128.114.2.91. Here are ping results from these machines to www.uu.net: from 128.114.2.91: PING www.uu.net (199.170.0.30): 56 data bytes, 100 packets 100 packets transmitted, 100 packets received, 0% packet loss round-trip (ms) min/avg/max = 362/548/778 (std = 103) from 128.114.2.53: PING www.uu.net (199.170.0.30): 56 data bytes, 100 packets 100 packets transmitted, 100 packets received, 0% packet loss round-trip (ms) min/avg/max = 92.9/107/139 (std = 8.43) The RTT from 128.114.2.91 are always much higher (many times a full order of magnitude) than those of 128.114.2.53. The explanation for the differing RTTs turns out to be that the paths between each of my machines and www.uu.net are different. In the below traceroutes, everything agrees up through hop five. But at hop six, the paths diverge: from 128.114.2.53: traceroute to www.uu.net (199.170.0.30): 1-30 hops, 38 byte packets 1 comm-g.UCSC.EDU (128.114.2.252) 4.52 ms 2 frontdoor.UCSC.EDU (128.114.103.1) 1.15 ms 3 UC-net-dmz.ucsc.edu (208.1.176.6) 1.45 ms 4 bgty-lata01.ucnet.net (192.35.219.2) 7.79 ms 5 sl-gw10-stk-11-0-T3.sprintlink.net (144.228.146.49) 8.70 ms 6 sl-bb10-stk-2-1-155M.sprintlink.net (144.232.4.78) 30.2 ms 7 sl-bb3-stk-0-0-0-155M.sprintlink.net (144.232.4.42) 33.0 ms 8 Hssi8-1-0.BR1.SFO1.ALTER.NET (137.39.166.121) 42.3 ms 9 114.ATM3-0-0.XR1.SCL1.ALTER.NET (146.188.145.222) 76.9 ms 10 100.ATM2-0-0.TR1.SCL1.ALTER.NET (146.188.145.226) 41.3 ms 11 107.ATM8-0-0.TR1.DCA1.ALTER.NET (146.188.136.221) 110 ms (ttl=242!) 12 199.ATM4-0-0.XR1.TCO1.ALTER.NET (146.188.161.161) 223 ms (ttl=243!) 13 193.ATM5-0-0.GW2.FFX1.ALTER.NET (146.188.160.209) 158 ms (ttl=242!) 14 UUNET7-GW.UU.NET (137.39.12.162) 214 ms (ttl=241!) 163 ms (ttl=241!) 15 www.uu.net (199.170.0.30) 155 ms (ttl=240!) from 128.114.2.91: traceroute to www.uu.net (199.170.0.30): 1-30 hops, 38 byte packets 1 comm-g.UCSC.EDU (128.114.2.252) 1.29 ms 2 frontdoor.UCSC.EDU (128.114.103.1) 1.7 ms 3 UC-net-dmz.ucsc.edu (208.1.176.6) 2.82 ms 4 bgty-lata01.ucnet.net (192.35.219.2) 7.75 ms 5 sl-gw10-stk-11-0-T3.sprintlink.net (144.228.146.49) 10.6 ms 6 sl-bb11-stk-1-1-155M.sprintlink.net (144.232.4.98) 11.2 ms 7 sl-bb3-stk-4-0-0.sprintlink.net (144.232.4.14) 9.27 ms 8 Hssi8-1-0.BR1.SFO1.ALTER.NET (137.39.166.121) 609 ms 9 114.ATM3-0-0.XR2.SCL1.ALTER.NET (146.188.145.210) 541 ms 10 100.ATM3-0-0.TR2.SCL1.ALTER.NET (146.188.145.246) 578 ms 11 107.ATM8-0-0.TR2.DCA1.ALTER.NET (146.188.136.225) 633 ms 12 198.ATM8-0-0.XR2.TCO1.ALTER.NET (146.188.161.185) 683 ms (ttl=243!) 13 192.ATM12-0-0.GW2.FFX1.ALTER.NET (146.188.160.221) 622 ms (ttl=242!) 14 UUNET7-GW.UU.NET (137.39.12.162) 569 ms (ttl=241!) 534 ms (ttl=241!) 15 www.uu.net (199.170.0.30) 579 ms (ttl=240!) Can anyone explain to me why paths would diverge in this fashion? These traceroutes aren't aberrations - the packets follow these forward paths consistently. Clues greatly appreciated. thanks, mb --- Mark Boolootian UC Santa Cruz
The RTT from 128.114.2.91 are always much higher (many times a full order of magnitude) than those of 128.114.2.53. The explanation for the differing RTTs turns out to be that the paths between each of my machines and www.uu.net are different. In the below traceroutes, everything agrees up through hop five. But at hop six, the paths diverge:
Current code from cisco in the branch supporting Cisco Express Forwarding (CEF -- not publically available) that many NSPs are running has support for multiple path selection based on a hash of the source and destination address of a packet. I thought there was a discusison of this on NANOG around September of last year, but I cannot seem to find it, so I'm forced to conclude it either didn't happen or happened somewhere else :-) [...pause...] Ah yes, it took place on end2end-interest (subscriptions to majordomo@isi.edu, I believe, but possibly end2end-interest-request). See ftp.isi.edu /pub/end2end/end2end-interest-1997.mail, specifically the thread entitled "Multi-Path Routing", beginning on 1 Oct, about byte offset 3070749 into the archive. The short summary is that in the event of multiple paths and CEF, when src/dst hashing is configured (as opposed to per-packet load balancing), the router hashes on the source and destination address and picks one of the multiple paths. This is deterministic and as far as we know is the best sort of layer-three load-balancing available; it has far nicer effects on traffic than per-packet load balancing, and ensures far better balances than destination-based hashing alone. I feel obligated to ask -- is there some reason you didn't direct your query to Sprint, before asking NANOG? It really seems like this is the kind of question they should be able to answer for you, and diagnose the problem to some extent. I can't see a good reason to ask here without asking the providers in question, first.
from 128.114.2.53:
traceroute to www.uu.net (199.170.0.30): 1-30 hops, 38 byte packets 1 comm-g.UCSC.EDU (128.114.2.252) 4.52 ms 2 frontdoor.UCSC.EDU (128.114.103.1) 1.15 ms 3 UC-net-dmz.ucsc.edu (208.1.176.6) 1.45 ms 4 bgty-lata01.ucnet.net (192.35.219.2) 7.79 ms 5 sl-gw10-stk-11-0-T3.sprintlink.net (144.228.146.49) 8.70 ms 6 sl-bb10-stk-2-1-155M.sprintlink.net (144.232.4.78) 30.2 ms 7 sl-bb3-stk-0-0-0-155M.sprintlink.net (144.232.4.42) 33.0 ms 8 Hssi8-1-0.BR1.SFO1.ALTER.NET (137.39.166.121) 42.3 ms 9 114.ATM3-0-0.XR1.SCL1.ALTER.NET (146.188.145.222) 76.9 ms 10 100.ATM2-0-0.TR1.SCL1.ALTER.NET (146.188.145.226) 41.3 ms 11 107.ATM8-0-0.TR1.DCA1.ALTER.NET (146.188.136.221) 110 ms (ttl=242!) 12 199.ATM4-0-0.XR1.TCO1.ALTER.NET (146.188.161.161) 223 ms (ttl=243!) 13 193.ATM5-0-0.GW2.FFX1.ALTER.NET (146.188.160.209) 158 ms (ttl=242!) 14 UUNET7-GW.UU.NET (137.39.12.162) 214 ms (ttl=241!) 163 ms (ttl=241!) 15 www.uu.net (199.170.0.30) 155 ms (ttl=240!)
from 128.114.2.91:
traceroute to www.uu.net (199.170.0.30): 1-30 hops, 38 byte packets 1 comm-g.UCSC.EDU (128.114.2.252) 1.29 ms 2 frontdoor.UCSC.EDU (128.114.103.1) 1.7 ms 3 UC-net-dmz.ucsc.edu (208.1.176.6) 2.82 ms 4 bgty-lata01.ucnet.net (192.35.219.2) 7.75 ms 5 sl-gw10-stk-11-0-T3.sprintlink.net (144.228.146.49) 10.6 ms 6 sl-bb11-stk-1-1-155M.sprintlink.net (144.232.4.98) 11.2 ms 7 sl-bb3-stk-4-0-0.sprintlink.net (144.232.4.14) 9.27 ms 8 Hssi8-1-0.BR1.SFO1.ALTER.NET (137.39.166.121) 609 ms 9 114.ATM3-0-0.XR2.SCL1.ALTER.NET (146.188.145.210) 541 ms 10 100.ATM3-0-0.TR2.SCL1.ALTER.NET (146.188.145.246) 578 ms 11 107.ATM8-0-0.TR2.DCA1.ALTER.NET (146.188.136.225) 633 ms 12 198.ATM8-0-0.XR2.TCO1.ALTER.NET (146.188.161.185) 683 ms (ttl=243!) 13 192.ATM12-0-0.GW2.FFX1.ALTER.NET (146.188.160.221) 622 ms (ttl=242!) 14 UUNET7-GW.UU.NET (137.39.12.162) 569 ms (ttl=241!) 534 ms (ttl=241!) 15 www.uu.net (199.170.0.30) 579 ms (ttl=240!)
Looking at your traceroutes, there are at least two cases of this sort of thing. First, at hop 6, but then your traceroutes are again consistent at hop 8. Second, at hop 9, they diverge once more, and become consistent at hop 14. Either one of these could be the cause of what you're seeing (or both). Additionally, it's perfectly possible that there's something along the return path which your traceroute isn't showing. It is worth noting, I suppose, that optioned packets (i.e. traceroute -g or ping -R) are not CEF-switched, and therefore cannot be used to instrument the behavior of this hash. As a result, your best bet is limited ttl probes to various hops. For instance, you might try traceroute -f 7 -m 7 -q 100 www.uu.net from each of your hosts, to determine if the problem started after hop 6. --jhawk
John Hawkinson wrote:
Current code from cisco in the branch supporting Cisco Express Forwarding (CEF -- not publically available) that many NSPs are running has support for multiple path selection based on a hash of the source and destination address of a packet.
When that code became available? Does anyone care to share experiences with that kind of load sharing in the real networks? --vadim
Current code from cisco in the branch supporting Cisco Express Forwarding (CEF -- not publically available) that many NSPs are running has support for multiple path selection based on a hash of the source and destination address of a packet.
When that code became available?
I believe sometime in March is being targetted. See your cisco representative for details :-)
Does anyone care to share experiences with that kind of load sharing in the real networks?
Well, we're using it in quite a few places and we're very happy with it. If you read my e2e-i citation, you'd have noted that Curtis mentioned that the T3 NSFnet NSSes did this and it was successful then... --jhawk
Vadim Antonov wrote:
John Hawkinson wrote:
Current code from cisco in the branch supporting Cisco Express Forwarding (CEF -- not publically available) that many NSPs are running has support for multiple path selection based on a hash of the source and destination address of a packet.
When that code became available?
March is the set date, It was in the CC package in December
Does anyone care to share experiences with that kind of load sharing in the real networks?
I run it on several points (pos&hssi) and it is working well and has been stable.
--vadim
-- R/Doug _________________________________________________ Netscape - Network Engineering & New Technologies http://people.netscape.com/ddalton
John, Thanks very much for providing the cogent explanation. You ask:
I feel obligated to ask -- is there some reason you didn't direct your query to Sprint, before asking NANOG? It really seems like this is the kind of question they should be able to answer for you, and diagnose the problem to some extent. I can't see a good reason to ask here without asking the providers in question, first.
There were really two reasons I asked this question here. First, it seemed like an interesting operational issue that I hadn't ever seen beaten to death on NANOG. Everyone is used to asymmetry between forward and reverse paths, but I don't think I'd ever seen a case of asymmetry in the forward path (at least, not while the network was stable). Second, I don't necessarily expect my provider to tell me why things route the way they do; I only expect them to fix things when they're broken. NANOG seems the appropriate place to ask the "why" questions. For what it's worth, I am pursuing this with our provider.
It is worth noting, I suppose, that optioned packets (i.e. traceroute -g or ping -R) are not CEF-switched, and therefore cannot be used to instrument the behavior of this hash. As a result, your best bet is limited ttl probes to various hops.
Presumably this is why the reverse traceroutes I ran didn't seem to shed any light. Thanks again for the end2end-interest pointer and the fine explanation. regards, mb -- Mark Boolootian booloo@cats.ucsc.edu
On Fri, Feb 27, 1998 at 08:52:08PM -0800, Mark Boolootian wrote:
You ask:
I feel obligated to ask -- is there some reason you didn't direct your query to Sprint, before asking NANOG? It really seems like this is the kind of question they should be able to answer for you, and diagnose the problem to some extent. I can't see a good reason to ask here without asking the providers in question, first.
There were really two reasons I asked this question here. First, it seemed like an interesting operational issue that I hadn't ever seen beaten to death on NANOG. Everyone is used to asymmetry between forward and reverse paths, but I don't think I'd ever seen a case of asymmetry in the forward path (at least, not while the network was stable). Second,
Actually, this is alot more common than you'd think. Because of the lead time in getting OC-Nc circuits installed (where N is whatever greater than your carrier/transmissions folks are used to delivering) and the difficulty in getting them, alots of folks put in multiple parallel DS3s and such to tide them over. To use such links effectively, many folks are turning to source/dest hashing load balancing in DCEF. -dorian
participants (5)
-
ddalton@netscape.com
-
dorian@blackrose.org
-
John Hawkinson
-
Mark Boolootian
-
Vadim Antonov