On Fri, 13 Nov 1998, Sean Donelan wrote:
Yes and no. It would be fairly 'easy' to become a editor, start Donelan's Journal Of Internet Disasters, get a number of noted experts to contribute articles analyzing failures with no cooperation of the organizations. But I can predict what the organizations in question would say about such an endeavor:
Your predictions are wrong however they would be true if this journal was edited by someone other than yourself. You have a significant amount of credibility in the industry and if you did edit such a journal, then it would be taken seriously.
I'm going to get pedantic. The results may be obvious, but the cause isn't. I would assert there are a number of large failures where the initial obvious cause has turned out to be wrong (or only a contributing factor).
This is a prime example of why your cerdibility in regard to disaster and disruption analysis is so high. You not only have the background knowledge to understand it and the willingness to research the things you don't know. but you also have the right sceptical attitude that does not stop questioning the situation just because a nice answer has arrived.
difficult for an outside group to analyze the failure. In particular I think it would have been close to impossible for an outside group to find the other contributing factors.
As an editor of a network outages journal, you wouldn't be expected to do all the investigative legwork yourself. But I think that your evenhanded treatment of the events would tend to draw out internal investigation reports of the companies involved. I think that you could run such a journal in a way that would largely evade the negative effects that people fear from disclosure because of your ability to draw parallels with disaster situation in other industries.
- Last month the .GOV domain was missing on a.root-servers.net due to a 'known bug' affecting zone transfers from GOV-NIC - Someone has been probing DNS ports for an unknown reason
- it is known that various individuals flood the Internic with packets related to aatempts to suck down the whois database, one item at a time and/or detect when a specific domain name goes off hold and becomes available for re-registration - pathshow indicated that the Internic circuit over which AXFR was being attempted was congested.
- f.root-servers.net and NSI's servers reacted differently. What are the differences between them (BIND versions, in-house source code changes, operating systems/run-time libraries/compilers)
Whatever was causing the Internic link to be congested could have disrupted NSI's server. Wasn't vixie's server acting properly by answering lame for the zones it could not retrieve? It seems like all the problems revolve around NSI's server and network. Vixie's problems were merely a symptom. On the other hand, I would classify the inability of AXCFR to transfer the zone as a weakness in BIND that could be addressed. Additionally, since it is known that zone transfers require a certain amount of bandwidth, Vixie could improve his operations by implementing a system that monitors the bandwidth with pathshow prior to intiating AXFR. Also, he could monitor the progress of the AXFR and also alarm if it was taking too long. This would have allowed a fallback to ftp sooner and operationally, such a fallback might even be something that could be automated. Of course, none of this means Vixie was at fault and I'd argue that NSI is at fault for not being able to detect the problem sooner and not being able to swap in a backup server sooner. Vixie knows that he is one of 13 root nameservers. But NSI knows that they are the one and only master root nameserver which puts more responsibility on them. -- Michael Dillon - E-mail: michael@memra.com Check the website for my Internet World articles - http://www.memra.com