Re: was i asleep when the gtld servers had the worse problem today?
is it just me, or are people confused by the references to f.root-servers.net (Paul Vixie's machine) and f.gtld-servers.net
which was also reported to have problems? I only read "F server" in this article, but there are two of those, and only f.root-servers.net is operated by volunteers, f.gtld-servers.net is (AFAIK) operated by NSI...
three things. 1. there is definitely some confusion here. my f (root-servers.net) was lame (which is not fatal to resolvers who query it). NSI's f (gtld- servers.net) was insane (which was fatal to resolvers who queried it.) 2. my F is not a volunteer. f.root-servers.net is a funded activity of ISC. if my F is "volunteer" than every nameserver everywhere is a "volunteer." 3. holtzman has some other axe he's grinding, this is all smokescreening. if i could just stop worrying about that, i could start being insulted by his wierd comments here and elsewhere. -- Paul Vixie <paul@vix.com>
if i understand, f.root-servers.net was having problems doing an axfr from a.root-servers.net. has anyone determined a technical reason why? and could we all please skip the ad homina? thanks. randy
Date: Fri, 13 Nov 1998 03:51:09 -0800 (PST) From: Randy Bush <randy@psg.com>
if i understand, f.root-servers.net was having problems doing an axfr from a.root-servers.net. has anyone determined a technical reason why?
for four days, tcpdump on my side shows behaviour consistent with lost ACKs; pathchar from my side shows that A's first mile is a lossy 3Mb/s bottleneck. i've switched F from axfr to ftp for now, and rz.internic.net is showing the same lossage. i'm sending a lot of duplicate ACKs. transfer is slow. 08:33:01.038694 198.41.0.19.20 > 204.152.184.251.3022: . 58236:59696(1460) ack 1 win 8760 (DF) [tos 0x8] 08:33:01.038984 204.152.184.251.3022 > 198.41.0.19.20: . ack 58236 win 33580 (DF) 08:33:01.039100 204.152.184.251.3022 > 198.41.0.19.20: . ack 62616 win 29200 (DF) 08:33:01.039176 204.152.184.251.3022 > 198.41.0.19.20: . ack 62616 win 33580 (DF)
Date: Fri, 13 Nov 1998 13:21:36 +0100 From: Ray Davis <ray@carpe.net>
Aren't NSI's nameservers running bind? If not, then that's scary. If so, then why wouldn't they also become lame rather than insane?
they were having different problems. their operators had to restart named but that the zone file itself had transferred cleanly. they weren't lame. in other words, NSI's servers' problems were because of me (or: my code) and my server's problems were because of NSI (or: their transit pipe.) right now i'm looking at the code, and they're working on their link.
[ On Fri, November 13, 1998 at 08:43:34 (-0800), Paul A Vixie wrote: ]
Subject: Re: was i asleep when the gtld servers had the worse problem today?
for four days, tcpdump on my side shows behaviour consistent with lost ACKs; pathchar from my side shows that A's first mile is a lossy 3Mb/s bottleneck.
Just to add a little fuel to the fire: I've been experiencing TCP related connection failures from most/all hosts at internic.net, with at least whois and http (though not SMTP). For example when I attempt to do a whois I'll normally only get back the first two lines of output (i.e. "\nRegistrant:\n"). Then the connection hangs and times out. From my point of view it seems that the connection has indeed been cut off. The weird part is that this only happens for NetBSD-1.3.x hosts on my network. BSDI BSD/OS 1.1, Ultrix, and at one point SunOS-4.1 hosts were all receiving complete output from whois queries. It gets even weirder when I look at the tcpdump traces on the NetBSD-1.3.x gateway that connects my network to the next one up the stream shows that the packets are actually coming from the remote internic.net host, but they're not getting through the NetBSD routing code (i.e. I see the ACK come in on the ethernet interface, but not out the PPP interface). I suspect it's got something to do with the firewall and traffic director stuff they're using for some services at internic.net. The only apparent difference between the two TCP/IP connections (i.e. the ones from NetBSD-1.3 that don't work, and the ones from other systems that do work), are the initial window size negotiations. So far internic.net hosts are the only ones I've ever encountered that trigger this failure in the NetBSD networking code.... I've meant to do some more extensive analysis and bring this up with the NetBSD networking gurus, but so far haven't had time. -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
-----Original Message----- From: Greg A. Woods
Just to add a little fuel to the fire:
I've been experiencing TCP related connection failures from most/all hosts at internic.net, with at least whois and http (though not SMTP).
For example when I attempt to do a whois I'll normally only get back
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 the
first two lines of output (i.e. "\nRegistrant:\n"). Then the connection hangs and times out. From my point of view it seems that the connection has indeed been cut off. The weird part is that this only happens for NetBSD-1.3.x hosts on my network. BSDI BSD/OS 1.1, Ultrix, and at one point SunOS-4.1 hosts were all receiving complete output from whois queries. [...]
I've noticed the same thing, but on a FreeBSD server. My personal Linux box, and the Solaris box here both work fine. Hmmm. -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 6.0 for non-commercial use <http://www.pgp.com> iQA/AwUBNkyFt2fkezbzToVaEQJ9NgCdG6FDMUVfIL5UmR+GZblBuw1O090AmQHL nwRjghBiP+fsATlD/kabwzgY =kYTd -----END PGP SIGNATURE-----
On Fri, 13 Nov 1998, Greg A. Woods wrote:
[ On Fri, November 13, 1998 at 08:43:34 (-0800), Paul A Vixie wrote: ]
Subject: Re: was i asleep when the gtld servers had the worse problem today?
for four days, tcpdump on my side shows behaviour consistent with lost ACKs; pathchar from my side shows that A's first mile is a lossy 3Mb/s bottleneck.
Just to add a little fuel to the fire:
I've been experiencing TCP related connection failures from most/all hosts at internic.net, with at least whois and http (though not SMTP).
For example when I attempt to do a whois I'll normally only get back the first two lines of output (i.e. "\nRegistrant:\n"). Then the connection hangs and times out. From my point of view it seems that the connection has indeed been cut off. The weird part is that this only happens for NetBSD-1.3.x hosts on my network. BSDI BSD/OS 1.1, Ultrix, and at one point SunOS-4.1 hosts were all receiving complete output from whois queries. It gets even weirder when I look at the tcpdump traces on the NetBSD-1.3.x gateway that connects my network to the next one up the stream shows that the packets are actually coming from the remote internic.net host, but they're not getting through the NetBSD routing code (i.e. I see the ACK come in on the ethernet interface, but not out the PPP interface).
I've been seeing the same thing on our linux boxen here. It's being going on for quite some time. The sad thing is the connecting hangs there for an extended period of time, but nothing ever comes through. I find that 1 out of 20 whois attempts actually goes completely through. Someone on the inet-access list mentioned that you can telnet into rs.internic.net and get them that way, i havent tried that yet. I have found that anything in the internic.net domain is just horribly slow to begin with anyways. _ __ _____ __ _________ ______________ /_______ ___ ____ /______ John Gonzalez/Net.Tech __ __ \ __ \ __/_ __ `__ \/ __ /_ ___/ MDC Computers/netMDC! _ / / / `__/ /_ / / / / / / /_/ / / /__ (505)437-7600/fax-437-3052 /_/ /_/\___/\__/ /_/ /_/ /_/\__,_/ \___/ http://www.netmdc.com [---------------------------------------------[system info]-----------] 12:25pm up 33 days, 15:54, 3 users, load average: 0.05, 0.12, 0.09
On Fri, 13 Nov 1998, Greg A. Woods wrote:
Just to add a little fuel to the fire:
I've been experiencing TCP related connection failures from most/all hosts at internic.net, with at least whois and http (though not SMTP).
For example when I attempt to do a whois I'll normally only get back the first two lines of output (i.e. "\nRegistrant:\n"). Then the connection hangs and times out. From my point of view it seems that the connection
This same thing is happening to me, but in my case it is a bug in the InterNIC's whois server that must have just been introduced recently. AFAIK, the whois protocol is supposed to consist of a request terminated by a '\n'. However, the InterNIC's server is sending a response as soon as it gets the first packet, even if there is no \n termination. There is at least one common whois replacement client that sends the \n in a separate packet. What happens is the first packet with the request but without the \n arrives at the InterNIC, then they send a response and close the socket. In the meantime, the second packet from the client with the \n in arrives after the socket is closed for reading, prompting a RST from the server. Even though the full response is either in socket buffers at the InterNIC or on the wire or even in socket buffers on the client, once the RST arrives that response will be (correctly) thrown away if it hasn't been actually read() by the client. The reason you see part of it is that the whois server is sending separate packets for the "\nRegistrant:\n" part and the rest of the response. The fix is for the InterNIC to fix their whois server to conform to the "protocol" and/or do a lingering close so it doesn't send a RST. Simply waiting for the end of the line should be enough and should be correct, although if they really want to do a lingering close then see the lingering_close() function in Apache for an example. Note that I'm not sending this to the InterNIC, because I don't have time to wade through and try to find a contact address that isn't ignored and which is appropriate. I would hope someone from the InterNIC is reading this list and will fix it.
[ On Fri, November 13, 1998 at 12:13:38 (-0800), Marc Slemko wrote: ]
Subject: Re: other network problems with hosts at internic.net
This same thing is happening to me, but in my case it is a bug in the InterNIC's whois server that must have just been introduced recently.
I have the same problem contacting their web server, and sometimes even with telnet and ftp to their servers.
AFAIK, the whois protocol is supposed to consist of a request terminated by a '\n'. However, the InterNIC's server is sending a response as soon as it gets the first packet, even if there is no \n termination.
The NetBSD whois client uses stdio, and sends the last argument with a single fprintf() call followed by an fflush(). BTW, the NetBSD client sends "\r\n" on the end of the data sent to the server.
There is at least one common whois replacement client that sends the \n in a separate packet. What happens is the first packet with the request but without the \n arrives at the InterNIC, then they send a response and close the socket. In the meantime, the second packet from the client with the \n in arrives after the socket is closed for reading, prompting a RST from the server. Even though the full response is either in socket buffers at the InterNIC or on the wire or even in socket buffers on the client, once the RST arrives that response will be (correctly) thrown away if it hasn't been actually read() by the client.
The exact same whois.c client code running on my NetBSD-1.3.x boxes fails to retrieve the complete response, yet when run on at least the BSDI 1.1 box it works fine.
The reason you see part of it is that the whois server is sending separate packets for the "\nRegistrant:\n" part and the rest of the response.
The fix is for the InterNIC to fix their whois server to conform to the "protocol" and/or do a lingering close so it doesn't send a RST. Simply waiting for the end of the line should be enough and should be correct, although if they really want to do a lingering close then see the lingering_close() function in Apache for an example.
That might indeed help, but I'm not going to put the blame on them immediately since I know that even when I make the connection from my NetBSD boxes the packets are making it back as far as the machine on the far end of my PPP link. The only apparent difference between connections that work, and connections that don't, for me at least, is the initial window size. -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
On Fri, Nov 13, 1998 at 03:51:09AM -0800, Randy Bush wrote:
and could we all please skip the ad homina? thanks.
<roar> "Will if Holtz will"... Cheers, -- jra -- Jay R. Ashworth jra@baylink.com Member of the Technical Staff Buy copies of The New Hackers Dictionary. The Suncoast Freenet Give them to all your friends. Tampa Bay, Florida http://www.ccil.org/jargon/ +1 813 790 7592
On Fri, Nov 13, 1998 at 03:51:09AM -0800, Randy Bush wrote:
if i understand, f.root-servers.net was having problems doing an axfr from a.root-servers.net. has anyone determined a technical reason why?
and could we all please skip the ad homina? thanks.
Against who? The only person who has been named here is Holtzman. And as far as I'm concerned, since NetSol is providing a service which is vital to the operation of the Internet (unless y'all want to start memorizing IPs), discussion of their behavior when dealing with problems is *not* off-topic here. As for what I've said being construed as an ad hominem, so be it. What I've said about David Holtzman is *true*. -- Steve Sobol [sjsobol@nacs.net] Part-time Support Droid [support@nacs.net] NACS Spaminator [abuse@nacs.net] Spotted on a bumper sticker: "Possum. The other white meat."
1. there is definitely some confusion here. my f (root-servers.net) was lame (which is not fatal to resolvers who query it). NSI's f (gtld- servers.net) was insane (which was fatal to resolvers who queried it.)
Aren't NSI's nameservers running bind? If not, then that's scary. If so, then why wouldn't they also become lame rather than insane? Cheers, Ray
At 03:09 11/13/98 -0800, you wrote:
3. holtzman has some other axe he's grinding, this is all smokescreening. if i could just stop worrying about that, i could start being insulted by his wierd comments here and elsewhere.
Of course he (and NSI as a whole) has an axe to grind...they want to dominate the Net the way MS does the desktop. Anything that might be interpreted as incompetence on their part might scare the suckers who buy their stock AND raise doubts in the minds of the folks who are allowing them such a leading role in ICANN. Public exposure of their frequent screwups would lead reasonable people to think that they shouldn't be allowed near the Internet, which would hurt their plans. Ergo, any mistakes are someone else's, not theirs. Of course, an official explanation from NSI might disprove this theory, but I've got money that says one isn't forthcoming (at least not a believable one). Right now, NSI's credibility ranks right up their with Saddam Hussein's. Spammers should be investigated by Ken Starr! Dean Robb PC-EASY computer services (757) 495-EASY [3279]
participants (11)
-
Dean Robb
-
Jay R. Ashworth
-
John Gonzalez/netMDC admin
-
Marc Slemko
-
Paul A Vixie
-
Paul Vixie
-
Randy Bush
-
Ray Davis
-
Steven J. Sobol
-
Thom Youngblood
-
woods@most.weird.com