Date: Thu, 17 Jul 1997 22:52:18 +0500 (GMT) From: David Holtzman <dholtz@internic.net> To: nanog@merit.edu Subject: NSI bulletin 097-004 | Root Server Problems Resent-Date: Thu, 17 Jul 1997 14:42:42 -0400 (EDT)
On Wednesday night, July 16, during the computer-generation of the Internet top-level domain zone files, an Ingres database failure resulted in corrupt .COM and .NET zone files. Despite alarms raised by Network Solutions' quality assurance schemes, at approximately 2:30 a.m. (Eastern Time), a system administrator released the zone file without regenerating the file and verifying its integrity. Network Solutions corrected the problem and reissued the zone file by 6:30 a.m. (Eastern Time).
Thank you. David H. Holtzman Sr VP Engineering, Network Solutions dholtz@internic.net
So, if the new zone files were re-issued at 06:30 EST, and they take about an hour to download, why was it that some root servers were still handing out bad data many hours later (at least one until about 14:00 EST)? The particular server I'm thinking of, though not residing in the Eastern timezone, does seem to have what I think is a 24x7 NOC nearby, and in theory could have been prepared to reload as quickly as anyone. This may be just a coincidence, but it was about an hour after I e-mailed and telephoned them that they finally had the right data in place. Unfortunately finding the right contact was not entirely trivial because the listed contact person had a full voice-mailbox and his operator had no idea who else I could speak to, and the NOC has only a 1-800 number (and a FAX) listed that doesn't work outside the USA. The NOC person I finally reached on the telephone didn't even seem to be fully aware that they indeed ran a root nameserver for the Internet. He did know that there was e-mail bouncing, and indeed I didn't expect they could answer my e-mail if they were using their own root server.... Worst of all though they left the errant server on-line, handing out NXDOMAIN replies to any and all who asked, while they were downloading the corrected zone files. Hopefully this is not standard operating procedure for a root server, or at least not from now on. What annoys me most is that I didn't receive any notification of any sort of problem from any of the mailing lists out of internic.net. I probably should subscribe to nanog, but I'd have thought namedroppers, or maybe even rs-info, should have had the above announcement posted just as soon as the mailers had enough trustworthy DNS data to deliver it with. There was nothing in http://rs.internic.net/announcements/ either, except for drivel about "maintaining high customer service levels," and there still isn't (though I suppose this event wasn't exactly "good PR"). What are the current procedures for announcing such problems to more than just the root operators themselves? -- Greg A. Woods +1 416 443-1734 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>