mysterious packet delay to/from www.caida.org was: Cisco Netflow Analysis Software
Can anyone else confirm the following observation ? If yes: what might be causing this ? (CEF is my usual suspect, but toggling it on/off doesn't do anything) Naturally, I went to www.caida.org in response to one of the NANOG postings, but seemingly hit a roadblock: the site just wouldnt want to load properly. As engineers from time to time get bitten by the curiosity bug, I tracerouted and ping'd around to find the cause of this non-responsiveness. Surprise: depending on the IP space used to access the site, there would be a considerable pause in the packet flow: 20 sec. for a BSDI 4.1 machine or a Win98 machine, 80 sec. for an OpenBSD 2.4 machine , but NO pause when using a different set of IP space (to be exact: Interface IP space belonging to upstream provider). One could literally switch around the 'ip telnet source-interface' on the gateway router to use the different IP space blocks, and would get the long delay or not, depending on which space was used. Note that there is no duplicates or visible packet loss whatsoever, and the delay is occurring at the exact same packet in the flow each and every time, regardless of client operating system (tcp receive buffers/window are set to 2^16 though, which didn't make a difference other than different packetization). I have tried this from 2 other locations, and the long pause (there) would melt down into the 100ms range mostly sometimes go away. Sorry for the tcpdump in advance...a simple GET / HTTP/1.0 on their server. This is out of 208.192.0.0/10 (AS 701) , btw., with no more specific route visible net-wide. bye,Kai ------ (BSDI 4.x at work here:) 15:35:57.505400 host.35570 > ipn.caida.org.www: S 3494476576:3494476576(0) win 65535 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,[|tcp]> [tos 0x10] (ttl 64, id 5418) 15:35:57.700158 ipn.caida.org.www > host.35570: S 1151006778:1151006778(0) ack 3494476577 win 8760 <mss 1460> (DF) (ttl 239, id 11105) 15:35:57.700794 host.35570 > ipn.caida.org.www: . ack 1 win 65535 [tos 0x10] (ttl 64, id 12399) 15:36:01.446201 host.35570 > ipn.caida.org.www: P 1:17(16) ack 1 win 65535 [tos 0x10] (ttl 64, id 24844) 15:36:01.714702 ipn.caida.org.www > host.35570: . ack 17 win 8760 (DF) (ttl 239, id 11106) 15:36:01.715261 host.35570 > ipn.caida.org.www: P 17:19(2) ack 1 win 65535 [tos 0x10] (ttl 64, id 463) 15:36:01.972221 ipn.caida.org.www > host.35570: P 1:1461(1460) ack 19 win 8760 (DF) (ttl 239, id 11107) 15:36:01.972910 host.35570 > ipn.caida.org.www: . ack 1461 win 64240 [tos 0x10] (ttl 64, id 10930) 15:36:02.240938 ipn.caida.org.www > host.35570: . 1461:2921(1460) ack 19 win 8760 (DF) (ttl 239, id 11108) 15:36:02.246988 ipn.caida.org.www > host.35570: P 2921:4097(1176) ack 19 win 8760 (DF) (ttl 239, id 11109) 15:36:02.252069 host.35570 > ipn.caida.org.www: . ack 4097 win 65535 [tos 0x10] (ttl 64, id 24582) 15:37:22.984047 ipn.caida.org.www > host.35570: P 4097:4933(836) ack 19 win 8760 (DF) (ttl 239, id 30829) 15:37:22.984192 ipn.caida.org.www > host.35570: F 4933:4933(0) ack 19 win 8760 (DF) (ttl 239, id 30830) 15:37:22.987340 host.35570 > ipn.caida.org.www: . ack 4934 win 64864 [tos 0x10] (ttl 64, id 53067) 15:37:22.999506 host.35570 > ipn.caida.org.www: F 19:19(0) ack 4934 win 65535 [tos 0x10] (ttl 64, id 36125) 15:37:23.222742 ipn.caida.org.www > host.35570: . ack 20 win 8760 (DF) (ttl 239, id 30831)
Hi, It's very interesting that i encountered this problem yesterday. If i'm not mistaken: you have different source address block which have different trace delay to the same external site? The huge jump of delay is up to the source address, not the host OS. Right ? So i think it may be the reason of asymmetric routing. In more detail, say we you have source addr. A, B, and external destination C. Your trace A->C has a larger delay than B->C . The reason is that the backward path from C->A is different than C->B (it's very possible, because you, or your upstream may advertise add block A, B differently), which has a larger delay. You can verify this in this way. Let's A, B trace to a same external traceroute server address (say net.yahoo.com, or others). You may find they have the same OUTBOUND path. But then trace back from the traceroute server to A, and B, i believe you will find the difference. hope this help, regards. Yu Ning -------------------------------------- (Mr.) Yu Ning ChinaNet Operation Center Networking Dep.,Datacom Bureau China Telecom.,Beijing(100088),P.R.C +86-10-82078519/62359464/62367444(fax) -------------------------------------- ----- Original Message ----- From: Kai Schlichting <kai@pac-rim.net> To: <nanog@merit.edu> Cc: <info@caida.org>; <kai@pac-rim.net> Sent: Thursday, March 16, 2000 5:24 AM Subject: mysterious packet delay to/from www.caida.org was: Cisco Netflow Analysis Software
Can anyone else confirm the following observation ? If yes: what might be causing this ? (CEF is my usual suspect, but toggling it on/off doesn't do anything)
Naturally, I went to www.caida.org in response to one of the NANOG postings, but seemingly hit a roadblock: the site just wouldnt want to load properly. As engineers from time to time get bitten by the curiosity bug, I tracerouted and ping'd around to find the cause of this non-responsiveness.
Surprise: depending on the IP space used to access the site, there would be a considerable pause in the packet flow: 20 sec. for a BSDI 4.1 machine or a Win98 machine, 80 sec. for an OpenBSD 2.4 machine , but NO pause when using a different set of IP space (to be exact: Interface IP space belonging to upstream provider). One could literally switch around the 'ip telnet source-interface' on the gateway router to use the different IP space blocks, and would get the long delay or not, depending on which space was used.
Note that there is no duplicates or visible packet loss whatsoever, and the delay is occurring at the exact same packet in the flow each and every time, regardless of client operating system (tcp receive buffers/window are set to 2^16 though, which didn't make a difference other than different packetization).
I have tried this from 2 other locations, and the long pause (there) would melt down into the 100ms range mostly sometimes go away.
Sorry for the tcpdump in advance...a simple GET / HTTP/1.0 on their server. This is out of 208.192.0.0/10 (AS 701) , btw., with no more specific route visible net-wide.
bye,Kai
------ (BSDI 4.x at work here:) 15:35:57.505400 host.35570 > ipn.caida.org.www: S 3494476576:3494476576(0) win 65535 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,[|tcp]> [tos 0x10] (ttl 64, id 5418) 15:35:57.700158 ipn.caida.org.www > host.35570: S 1151006778:1151006778(0) ack 3494476577 win 8760 <mss 1460> (DF) (ttl 239, id 11105) 15:35:57.700794 host.35570 > ipn.caida.org.www: . ack 1 win 65535 [tos 0x10] (ttl 64, id 12399) 15:36:01.446201 host.35570 > ipn.caida.org.www: P 1:17(16) ack 1 win 65535 [tos 0x10] (ttl 64, id 24844) 15:36:01.714702 ipn.caida.org.www > host.35570: . ack 17 win 8760 (DF) (ttl 239, id 11106) 15:36:01.715261 host.35570 > ipn.caida.org.www: P 17:19(2) ack 1 win 65535 [tos 0x10] (ttl 64, id 463) 15:36:01.972221 ipn.caida.org.www > host.35570: P 1:1461(1460) ack 19 win 8760 (DF) (ttl 239, id 11107) 15:36:01.972910 host.35570 > ipn.caida.org.www: . ack 1461 win 64240 [tos 0x10] (ttl 64, id 10930) 15:36:02.240938 ipn.caida.org.www > host.35570: . 1461:2921(1460) ack 19 win 8760 (DF) (ttl 239, id 11108) 15:36:02.246988 ipn.caida.org.www > host.35570: P 2921:4097(1176) ack 19 win 8760 (DF) (ttl 239, id 11109) 15:36:02.252069 host.35570 > ipn.caida.org.www: . ack 4097 win 65535 [tos 0x10] (ttl 64, id 24582) 15:37:22.984047 ipn.caida.org.www > host.35570: P 4097:4933(836) ack 19 win 8760 (DF) (ttl 239, id 30829) 15:37:22.984192 ipn.caida.org.www > host.35570: F 4933:4933(0) ack 19 win 8760 (DF) (ttl 239, id 30830) 15:37:22.987340 host.35570 > ipn.caida.org.www: . ack 4934 win 64864 [tos 0x10] (ttl 64, id 53067) 15:37:22.999506 host.35570 > ipn.caida.org.www: F 19:19(0) ack 4934 win 65535 [tos 0x10] (ttl 64, id 36125) 15:37:23.222742 ipn.caida.org.www > host.35570: . ack 20 win 8760 (DF) (ttl 239, id 30831)
The assymmetric path is similar enough for both IP spaces to discount this possibility: no duplicate ACKs or packets are ever seen, removing the possibility of ANY end-to-end loss. The fact that its 100% reproducable and always gets stuck on the same packet furthers another suspicion: Someone told me that broken reverse DNS may do such things on certain servers: they start to throw out content, then suddenly block on the reverse DNS lookup and stop the flow right in the middle. I haven't been able to produce the problem with any other site so far though. The site in question does have a DNS problem, which leads me to the next set of questions: Is a delegating nameserver (namely UUnet's) supposed to dish out glue A RR records for the servers it delegates to ? (in the "additional records" section of the answer) If yes, does it do so only if root-nameservers have an A RR for such a server (e.g. a registered nameserver) ? Are delegations to servers that are NOT registered breaking RFCs and thus 'illegal' ? Will Networksolutions ever update name server registrations, again ? Renaming doesn't work for me: the form processor complains about the new server name not being registered. That's not the idea in a "rename" operation, really. Thanks, bye,Kai At Wednesday 08:29 PM 3/15/00 , Yu Ning wrote:
Hi,
It's very interesting that i encountered this problem yesterday. If i'm not mistaken: you have different source address block which have different trace delay to the same external site? The huge jump of delay is up to the source address, not the host OS. Right ?
So i think it may be the reason of asymmetric routing. In more detail, say we you have source addr. A, B, and external destination C. Your trace A->C has a larger delay than B->C . The reason is that the backward path from C->A is different than C->B (it's very possible, because you, or your upstream may advertise add block A, B differently), which has a larger delay.
You can verify this in this way. Let's A, B trace to a same external traceroute server address (say net.yahoo.com, or others). You may find they have the same OUTBOUND path. But then trace back from the traceroute server to A, and B, i believe you will find the difference.
hope this help, regards.
Yu Ning
-------------------------------------- (Mr.) Yu Ning ChinaNet Operation Center Networking Dep.,Datacom Bureau China Telecom.,Beijing(100088),P.R.C +86-10-82078519/62359464/62367444(fax) --------------------------------------
Just a follow-up how this was solved, with some actual operational relevance :) Apparently, a fairly large number of resolvers (old BIND4's?) have a significant problem with being told about delegations to nameservers that are NOT registered and hence have no A records about them in the root nameservers. A stern warning to network operators' DNS groups: *do not delegate in-addr.arpa zones to unregistered nameservers or reverse resolution will break for many resolvers trying to resolve your client's IPs* This changes the practice from "SHOULD NOT" to "MUST NOT" for the time being :( (the resolution was to have the network operator change the delegation back to the registered names, until NSI will finally change the host registrations, said operator was helpful with that also) The actual delay (at www.caida.org and other servers) is apparently a blocking call to resolver code in Apache in the middle of serving the page : the page comes up to a certain point, it gets stuck with reverse DNS lookup which eventually times out, then the rest of the page gets served. (Hey CAIDA, turn off your DNS logging on your webserver !) Looks like a real network transport or webserver performance problem, but isn't. Actual chain of events: local machine connects to remote http/smtp/etc. server, remote server tries to resolve PTR record via a PTR query to ROOT-NS's. ROOT-NS's show delegation to ISP/NSP's nameservers. ISP/NSP then delegates to unregistered NS's further down in the chain (end-user, ISP customer). If the machine trying to resolve the PTR record does NOT know the A record for these delegatees (by having it in its cache for example), it will make no effort to recursively resolve such an A record (the only servers asked for the A record are apparently the ROOT-NS's) : A RR's may exist for the servers it has to ask for the ultimate PTR record, but it makes no attempt of an A RR query, and subsequently no final query for a PTR record either. What it does in the 20+ seconds until it times out is a damn good guess: depending on how many nameservers the remote http/smtp server has in it's resolv.conf file, this process will repeat a few times (20-80s delay!). Given that delegation within end-user organizations rarely happens anymore (the Internet is not comprised of /16 .edu's anymore, but /24 .com's who more often than not let their provider do the DNS for them), this bug must have been discovered a long time after it was introduced (and my speculation about older BIND4 code is just that - speculation). Thanks for the bandwidth, bye,Kai -- kai@conti.nu "Just say No" to Spam Kai Schlichting Palo Alto, New York, You name it Sophisticated Technical Peon Kai's SpamShield <tm> is FREE! http://SpamShield.Conti.nu | | LeasedLines-FrameRelay-IPLs-ISDN-PPP-Cisco-Consulting-VoiceFax-Data-Muxes WorldWideWebAnything-Intranets-NetAdmin-UnixAdmin-Security-ReallyHardMath
participants (2)
-
Kai Schlichting
-
Yu Ning