Just a few clarifications... nothing new, just some explainations of various things. On Wed, 11 Nov 1998, Dean Robb wrote:
At 15:36 11/11/98 -0600, you wrote:
Fixed on the next daily update? So when the AOL problem happened, a special update was done, but when several hundred (anyone know how many really) entries are trashed, we all must wait until the next daily update?
Another Public Relations/Customer Service triumph for NSI/InterNIC. </sarcasm>
I suspect more than just the fjk servers are hosed...last night around midnight I was surfing and had over ten sites disappear between one load and the next. The domain names ran the gamut from "f" to "u". Given the time frames, they likely disappeared as the update propagated. Now, either a whole lot of sites simultaneously had server crashes or....
[fjk] do _not_ serve domain names starting with [fjk]. All servers serve all names. Without knowing more, what you experienced could have had any number of causes. I don't know when people first were aware of this, and I would hope some were aware before I complained ~1000PST and NSI should have been aware right away when it happened, since if they don't have automated checking of each server that has a very high notification priority they are even worse than stupid, so I'm somewhat doubtful it started at midnight. But it is possible. NSI does make it hard for anyone who may notice it to contact them. I can't understand, however, why it took over two hours to bring down all the badly broken servers. Some were corrected within 15 minutes or half an hour after I complained (and who knows how long after the appropriate people were first notified). One wasn't. On Wed, 11 Nov 1998, Michael P. Lucking wrote:
Fixed on the next daily update? So when the AOL problem happened, a special update was done, but when several hundred (anyone know how many really) entries are trashed, we all must wait until the next daily update?
Don't take that too literally. It isn't entries that were trashed AFAIK, but servers. A number (or all) servers appear to have had trouble updating their zone file. So far so good. Simply not being updated won't kill anything. Some lost the zone (on purpose or due to a bug, I don't know) and were acting mostly like a lame delgation. No huge problem. Some lost all (or a very large %) of .com yet were still thinking they were authoritative and returning various false negatives. I know of three that were like that, and have had reports of more. Anyone asking one of those servers would be incorrectly told the domain doesn't exist. This is a VERY bad failure mode. What is the impact? Well, if 3/12 were doing this then ~1/4 of the queries (probably not that evenly distributed, but in that ballpark) would have got false negatives. Now, that is only 1/4 of all queries to the root servers. Domains with a large TTL that were in caches wouldn't be as impacted. Domains with a small TTL (eg. 5 minutes) would be very impacted because they would expire from caches so quicky. A lot of email is particularily badly impacted, because not only does the domain it is being sent to have to resolve, but on many systems the sender's domain has to resolve. Any resolver implementations that do not put a short upper bound on negative caching TTLs would be _VERY_ hard hit by this and could still be having problems unless they were restarted. I have heard that one of MS's products is like this, but that is just a vague rumor. Getting back to your question, "the update being completed" refers to servers being able to transfer the proper zone files and put them in place.