The RTT from 128.114.2.91 are always much higher (many times a full order of magnitude) than those of 128.114.2.53. The explanation for the differing RTTs turns out to be that the paths between each of my machines and www.uu.net are different. In the below traceroutes, everything agrees up through hop five. But at hop six, the paths diverge:
Current code from cisco in the branch supporting Cisco Express Forwarding (CEF -- not publically available) that many NSPs are running has support for multiple path selection based on a hash of the source and destination address of a packet. I thought there was a discusison of this on NANOG around September of last year, but I cannot seem to find it, so I'm forced to conclude it either didn't happen or happened somewhere else :-) [...pause...] Ah yes, it took place on end2end-interest (subscriptions to majordomo@isi.edu, I believe, but possibly end2end-interest-request). See ftp.isi.edu /pub/end2end/end2end-interest-1997.mail, specifically the thread entitled "Multi-Path Routing", beginning on 1 Oct, about byte offset 3070749 into the archive. The short summary is that in the event of multiple paths and CEF, when src/dst hashing is configured (as opposed to per-packet load balancing), the router hashes on the source and destination address and picks one of the multiple paths. This is deterministic and as far as we know is the best sort of layer-three load-balancing available; it has far nicer effects on traffic than per-packet load balancing, and ensures far better balances than destination-based hashing alone. I feel obligated to ask -- is there some reason you didn't direct your query to Sprint, before asking NANOG? It really seems like this is the kind of question they should be able to answer for you, and diagnose the problem to some extent. I can't see a good reason to ask here without asking the providers in question, first.
from 128.114.2.53:
traceroute to www.uu.net (199.170.0.30): 1-30 hops, 38 byte packets 1 comm-g.UCSC.EDU (128.114.2.252) 4.52 ms 2 frontdoor.UCSC.EDU (128.114.103.1) 1.15 ms 3 UC-net-dmz.ucsc.edu (208.1.176.6) 1.45 ms 4 bgty-lata01.ucnet.net (192.35.219.2) 7.79 ms 5 sl-gw10-stk-11-0-T3.sprintlink.net (144.228.146.49) 8.70 ms 6 sl-bb10-stk-2-1-155M.sprintlink.net (144.232.4.78) 30.2 ms 7 sl-bb3-stk-0-0-0-155M.sprintlink.net (144.232.4.42) 33.0 ms 8 Hssi8-1-0.BR1.SFO1.ALTER.NET (137.39.166.121) 42.3 ms 9 114.ATM3-0-0.XR1.SCL1.ALTER.NET (146.188.145.222) 76.9 ms 10 100.ATM2-0-0.TR1.SCL1.ALTER.NET (146.188.145.226) 41.3 ms 11 107.ATM8-0-0.TR1.DCA1.ALTER.NET (146.188.136.221) 110 ms (ttl=242!) 12 199.ATM4-0-0.XR1.TCO1.ALTER.NET (146.188.161.161) 223 ms (ttl=243!) 13 193.ATM5-0-0.GW2.FFX1.ALTER.NET (146.188.160.209) 158 ms (ttl=242!) 14 UUNET7-GW.UU.NET (137.39.12.162) 214 ms (ttl=241!) 163 ms (ttl=241!) 15 www.uu.net (199.170.0.30) 155 ms (ttl=240!)
from 128.114.2.91:
traceroute to www.uu.net (199.170.0.30): 1-30 hops, 38 byte packets 1 comm-g.UCSC.EDU (128.114.2.252) 1.29 ms 2 frontdoor.UCSC.EDU (128.114.103.1) 1.7 ms 3 UC-net-dmz.ucsc.edu (208.1.176.6) 2.82 ms 4 bgty-lata01.ucnet.net (192.35.219.2) 7.75 ms 5 sl-gw10-stk-11-0-T3.sprintlink.net (144.228.146.49) 10.6 ms 6 sl-bb11-stk-1-1-155M.sprintlink.net (144.232.4.98) 11.2 ms 7 sl-bb3-stk-4-0-0.sprintlink.net (144.232.4.14) 9.27 ms 8 Hssi8-1-0.BR1.SFO1.ALTER.NET (137.39.166.121) 609 ms 9 114.ATM3-0-0.XR2.SCL1.ALTER.NET (146.188.145.210) 541 ms 10 100.ATM3-0-0.TR2.SCL1.ALTER.NET (146.188.145.246) 578 ms 11 107.ATM8-0-0.TR2.DCA1.ALTER.NET (146.188.136.225) 633 ms 12 198.ATM8-0-0.XR2.TCO1.ALTER.NET (146.188.161.185) 683 ms (ttl=243!) 13 192.ATM12-0-0.GW2.FFX1.ALTER.NET (146.188.160.221) 622 ms (ttl=242!) 14 UUNET7-GW.UU.NET (137.39.12.162) 569 ms (ttl=241!) 534 ms (ttl=241!) 15 www.uu.net (199.170.0.30) 579 ms (ttl=240!)
Looking at your traceroutes, there are at least two cases of this sort of thing. First, at hop 6, but then your traceroutes are again consistent at hop 8. Second, at hop 9, they diverge once more, and become consistent at hop 14. Either one of these could be the cause of what you're seeing (or both). Additionally, it's perfectly possible that there's something along the return path which your traceroute isn't showing. It is worth noting, I suppose, that optioned packets (i.e. traceroute -g or ping -R) are not CEF-switched, and therefore cannot be used to instrument the behavior of this hash. As a result, your best bet is limited ttl probes to various hops. For instance, you might try traceroute -f 7 -m 7 -q 100 www.uu.net from each of your hosts, to determine if the problem started after hop 6. --jhawk