RE: Microsoft spokesperson blames ICANN
From our efforts, it is not at all surprising that someone, at MSFT, munged
Blaming it on ICANN, even indirectly, is about as clueless a move as can be imagined. ICANN has no direct authority over the root, the USG/DOC/NTIA has reserved that privilege for itself alone. They have let out the operations contract to NSI, who manages the a.root-servers.net and has been doing so for years. MHSC has been working with MS engineering and support staff, for the past few months, to integrate BIND8.2.2p7 servers with Win2K/DNS/AD. It isn't easy because of semantic inconsistency, radically diverse architecture concepts, and [above all] severe lack of documentation on MS part (as well as a few roaches in the SRV update stuff). the DNS configuration, totally. Even their best guru could have done it, due to the murky nature of the config. I suspect that there are less than 100 ppl that could even have a clue, in this area, and they don't all have the same pieces of clue. Win2K DNS is self-consistent, BIND is self-consistent, they may be mutually consistent, but that has yet to be determined. MSFT works in a glasshouse with many of the panes painted-over, others are distorted. Similary, there is a tendency, among *nix folk, to discount anything MSFT (a mistake, IMHO). What's needed is for some(one/group) that has a though understanding of both systems, at the design level, to sort out the bits. This is properly, a development project, one that most system admins are unsuited for. Unfortunately, it is left at the system admin level, IMHO. -- ROELAND M.J. MEYER Managing Director Morgan Hill Software Company, Inc. TEL: +001 925 373 3954 FAX: +001 925 373 9781 http://www.mhsc.com mailto: rmeyer@mhsc.com
-----Original Message----- From: Sean Donelan [mailto:sean@donelan.com] Sent: Wednesday, January 24, 2001 11:40 AM To: nanog@merit.edu Subject: Microsoft spokesperson blames ICANN
Microsoft appears to be blaming ICANN for the failure with Microft's domain name servers (all located at the same place at Microsoft).
Microsoft has yet to pin down the cause of the DNS error. "It can be a system or human error, but somebody could also have done this intentionally," De Jonge said. "We don't manage the DNS ourselves, it is a system controlled by the Internet Corporation for Assigned Names and Numbers (ICANN) with worldwide replicas."
http://www.idg.net/ic_386962_1793_1-1681.html
Microsoft gamers (users of www.zone.com) quickly came up with various workarounds so they could continue playing. They were posting HOSTS.TXT files on various gamer bulletin boards overnight. In particular "Asheron's Call" had several problems during the last week, including banning a number of users and rolling back experience points due to a game bug.
[ On Wednesday, January 24, 2001 at 13:09:45 (-0800), Roeland Meyer wrote: ]
Subject: RE: Microsoft spokesperson blames ICANN
From our efforts, it is not at all surprising that someone, at MSFT, munged the DNS configuration, totally. Even their best guru could have done it, due to the murky nature of the config. I suspect that there are less than 100 ppl that could even have a clue, in this area, and they don't all have the same pieces of clue.
That's absolutely idiotic (of M$, that is !;-). Even more idiotic than putting all their nameservers in one basket, so to speak. I'd bet any high-school kid who had any experience whatsoever at installing Linux or FreeBSD could no doubt blow a real OS and a native BIND install onto any sufficiently capable set of four machines in about an hour or so and provided that someone could cough up at least a half-baked zone file from somewhere to load on them they'd all be online and answering to the registered nameserver IP numbers in no time flat. Certainly in less than what's apparently going to be at least 23 hours now! Heck I know a half dozen or more people around the world who would have put their dislike of M$ away for a short period and loaded a zone file or two on their own nameservers for M$ if only M$ could have managed to get the .COM zone updated with new delegations.... What ever happened in this community to asking the community for help when you're caught between a rock and a hard place? (Not that a company the size of M$ should have to ask for a handout -- they no doubt have significant IP connectivity in as many places around the world as almost anyone else!) MS has nothing and no-one to blame but their own stupidity and arrogance in this. Meanwhile they're so damn big and "important" to so many users that this outage is having both a direct and an indirect negative impact on a lot of ISPs around the world! "Hey! The Internet must be broken if I can't get to M$.COM!"
What's needed is for some(one/group) that has a though understanding of both systems, at the design level, to sort out the bits. This is properly, a development project, one that most system admins are unsuited for. Unfortunately, it is left at the system admin level, IMHO.
No, what's needed is for M$ to learn that they need to deploy software that's capable of the task even if it didn't come from a box and doesn't have their logo branded on it. Squishing things together that were never meant to be squished together is only going to cause a big mess. Err, has already caused a big mess, at least for M$ and those who deal with them! ;-) They'd also do well to learn a bit about network geography and just exactly how authoritative nameserver visibility from various locations on this wonderful Internet of ours can directly affect their bottom line! -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com> PS: they're still down for the count from this neck of the woods! Interestingly one of them's completely unreachable from here now... $ host -C microsoft.com microsoft.com NS DNS4.CP.MSFT.NET Nameserver DNS4.CP.MSFT.NET not responding microsoft.com SOA record not found at DNS4.CP.MSFT.NET, try again microsoft.com NS DNS5.CP.MSFT.NET Nameserver DNS5.CP.MSFT.NET not responding microsoft.com SOA record not found at DNS5.CP.MSFT.NET, try again microsoft.com NS DNS7.CP.MSFT.NET Nameserver DNS7.CP.MSFT.NET not responding microsoft.com SOA record not found at DNS7.CP.MSFT.NET, try again microsoft.com NS DNS6.CP.MSFT.NET Nameserver DNS6.CP.MSFT.NET not reachable microsoft.com SOA record not found at DNS6.CP.MSFT.NET, try again
On Wed, Jan 24, 2001 at 06:01:29AM -0500, Greg A. Woods wrote:
[ On Wednesday, January 24, 2001 at 13:09:45 (-0800), Roeland Meyer wrote: ]
Subject: RE: Microsoft spokesperson blames ICANN
From our efforts, it is not at all surprising that someone, at MSFT, munged the DNS configuration, totally. Even their best guru could have done it, due to the murky nature of the config. I suspect that there are less than 100 ppl that could even have a clue, in this area, and they don't all have the same pieces of clue.
{OBofftopic: hmm, look at the two timestamps, above. did greg reply to roeland's e-mail before it was written?} by now i think we are realizing that it's probably more of some kind of server-level/network-level attack, and not a DNS phuque-up. i got plenty o'pings earlier without nary a drop, although the nameservers didn't reply. {Important Point:} nevertheless:
That's absolutely idiotic (of M$, that is !;-). Even more idiotic than putting all their nameservers in one basket, so to speak.
I'd bet any high-school kid who had any experience whatsoever at installing Linux or FreeBSD could no doubt blow a real OS and a native BIND install onto any sufficiently capable set of four machines in about an hour or so and provided that someone could cough up at least a half-baked zone file from somewhere to load on them they'd all be online and answering to the registered nameserver IP numbers in no time flat. Certainly in less than what's apparently going to be at least 23 hours now!
{Oblinux: there are a few itty-bitty "server" distro's out there that you could probably load up in under 15 minutes. also, the e-smith-style "appliance" distros are also quick to load.}
Heck I know a half dozen or more people around the world who would have put their dislike of M$ away for a short period and loaded a zone file or two on their own nameservers for M$ if only M$ could have managed to get the .COM zone updated with new delegations.... What ever happened in this community to asking the community for help when you're caught between a rock and a hard place? (Not that a company the size of M$ should have to ask for a handout -- they no doubt have significant IP connectivity in as many places around the world as almost anyone else!)
whoa, slow down... microsoft apparently hasn't quite figured out what hit them (and in these later hours there's implications that there is more than one issue happening here). any large company is gonna take some non-trivial amount of time to figure things out so that the report to the upper management (ultimately) will be complete, including not only what happened, who's responsible, etc., but also what steps were taken to keep it from happening again. keeping running notes on all of this just makes it slow. take that resulting time and double it when a company has claimed (and, y'know, perhaps it's true) in the past that they possess clue. and finally, take that second time and triple if it's a public company (where somebody can get sued). i'm not making excuses for microsoft, but more clueful companies have had worse times of it, even in the recent past. give 'em a chance.
MS has nothing and no-one to blame but their own stupidity and arrogance in this. Meanwhile they're so damn big and "important" to so many users that this outage is having both a direct and an indirect negative impact on a lot of ISPs around the world! "Hey! The Internet must be broken if I can't get to M$.COM!"
whoa! whoah!! take it easy... chill... let's kick 'em when and where they deserve it, after all the smoke clears. until then, i think this forum should be supportive of internet-connected networks that are facing big troubles. whatever is happening to microsoft today could happen to someone far dearer tomorrow (or today, of course). we all might learn something useful from this. (and maybe not.)
No, what's needed is for M$ to learn that they need to deploy software that's capable of the task even if it didn't come from a box and doesn't have their logo branded on it. Squishing things together that were never meant to be squished together is only going to cause a big mess. Err, has already caused a big mess, at least for M$ and those who deal with them! ;-)
They'd also do well to learn a bit about network geography and just exactly how authoritative nameserver visibility from various locations on this wonderful Internet of ours can directly affect their bottom line!
try: http://secondary.easydns.com -- Henry Yen Aegis Information Systems, Inc. Senior Systems Programmer Hicksville, New York
[ On Wednesday, January 24, 2001 at 20:30:12 (-0500), Henry Yen wrote: ]
Subject: Re: Microsoft spokesperson blames ICANN
On Wed, Jan 24, 2001 at 06:01:29AM -0500, Greg A. Woods wrote:
[ On Wednesday, January 24, 2001 at 13:09:45 (-0800), Roeland Meyer wrote: ]
{OBofftopic: hmm, look at the two timestamps, above. did greg reply to roeland's e-mail before it was written?}
As far as I can tell all my system clocks are close enough to true network time that NTP hasn't been complaining! :-) Note though that my logs show my reply being sent at 18:01 -0500 and the message came back to me with the date header intact and reading: Date: Wed, 24 Jan 2001 18:01:29 -0500 (EST) so perhaps the error is actually in your MUA (i.e. in its formation of the "On ... wrote:" line when preparing the quoted message). Apparently it gets the "AM" and "PM" wrong when converting from a 24-hour clock to a 12-hour clock. It should have written "06:01:29PM -0500".
whoa, slow down... microsoft apparently hasn't quite figured out what hit them (and in these later hours there's implications that there is more than one issue happening here).
In this particular case it's totally irrelevant what hit them. They needed to get at least one solid reliable replacement nameserver up and running and answering on one of those IP addresses as soon as humanly possible if they were to try an mitigate the damage. If it were me and running the show and if I had even a hint that there were malicious agents responsible I'd have grabbed as many raw packets off the network as I could conveniently and quickly store, then I'd have literally pulled the plug on at least two of the machines and sent the works off for forensic analysis while a new, slightly different, and far more secure, machine was brought in to provide this most critical service. Given the hindsight gained from reading their announcment (and guessing what really happened), perhaps they even did that, but it shouldn't have taken them so many more hours to figure out that the world still wasn't seeing their DNS no matter what they did to those servers. Of course it wouldn't have been nearly so critical an issue requiring such quick and dirty action if they would have had more diverse DNS servers. Part of the problem of course is that they may not have percieved the full extent of their problem as quickly as some of us from the outside were imagining it to be, though that's somewhat difficult to understand, especially given the nature of the discussion in open forums such as this one.... I wouldn't have been "kicking" them while they were still down if it wasn't that they'd clearly and obviously tied their own noose and stepped into it and then pulled the lever on their own trap-door themselves! The comedy of errors in their recovery attempts and the enormous delay in returning their DNS to operational status points out several grave operational errors, but none of those errors should ever have caused any visible problems in the first place -- the root cause of their problems remains in the fact that they did not follow the best common practices already well documented by other's who have learned these lessons from the school of hard knocks. -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
I bet Microsoft had to go get a Unix box and install DNS on it so they could get backup and running.. lol Morris Allen VidcomNet, Inc. ----- Original Message ----- From: "Greg A. Woods" <woods@weird.com> To: <nanog@merit.edu> Sent: Thursday, January 25, 2001 12:26 AM Subject: Re: Microsoft spokesperson blames ICANN
[ On Wednesday, January 24, 2001 at 20:30:12 (-0500), Henry Yen wrote: ]
Subject: Re: Microsoft spokesperson blames ICANN
On Wed, Jan 24, 2001 at 06:01:29AM -0500, Greg A. Woods wrote:
[ On Wednesday, January 24, 2001 at 13:09:45 (-0800), Roeland Meyer
wrote: ]
{OBofftopic: hmm, look at the two timestamps, above. did greg reply to
roeland's
e-mail before it was written?}
As far as I can tell all my system clocks are close enough to true network time that NTP hasn't been complaining! :-)
Note though that my logs show my reply being sent at 18:01 -0500 and the message came back to me with the date header intact and reading:
Date: Wed, 24 Jan 2001 18:01:29 -0500 (EST)
so perhaps the error is actually in your MUA (i.e. in its formation of the "On ... wrote:" line when preparing the quoted message). Apparently it gets the "AM" and "PM" wrong when converting from a 24-hour clock to a 12-hour clock. It should have written "06:01:29PM -0500".
whoa, slow down... microsoft apparently hasn't quite figured out what hit them (and in these later hours there's implications that there is more than one issue happening here).
In this particular case it's totally irrelevant what hit them. They needed to get at least one solid reliable replacement nameserver up and running and answering on one of those IP addresses as soon as humanly possible if they were to try an mitigate the damage. If it were me and running the show and if I had even a hint that there were malicious agents responsible I'd have grabbed as many raw packets off the network as I could conveniently and quickly store, then I'd have literally pulled the plug on at least two of the machines and sent the works off for forensic analysis while a new, slightly different, and far more secure, machine was brought in to provide this most critical service.
Given the hindsight gained from reading their announcment (and guessing what really happened), perhaps they even did that, but it shouldn't have taken them so many more hours to figure out that the world still wasn't seeing their DNS no matter what they did to those servers.
Of course it wouldn't have been nearly so critical an issue requiring such quick and dirty action if they would have had more diverse DNS servers.
Part of the problem of course is that they may not have percieved the full extent of their problem as quickly as some of us from the outside were imagining it to be, though that's somewhat difficult to understand, especially given the nature of the discussion in open forums such as this one....
I wouldn't have been "kicking" them while they were still down if it wasn't that they'd clearly and obviously tied their own noose and stepped into it and then pulled the lever on their own trap-door themselves!
The comedy of errors in their recovery attempts and the enormous delay in returning their DNS to operational status points out several grave operational errors, but none of those errors should ever have caused any visible problems in the first place -- the root cause of their problems remains in the fact that they did not follow the best common practices already well documented by other's who have learned these lessons from the school of hard knocks.
-- Greg A. Woods
+1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
On Wed, Jan 24, 2001, Greg A. Woods wrote:
[ On Wednesday, January 24, 2001 at 13:09:45 (-0800), Roeland Meyer wrote: ]
Subject: RE: Microsoft spokesperson blames ICANN
From our efforts, it is not at all surprising that someone, at MSFT, munged the DNS configuration, totally. Even their best guru could have done it, due to the murky nature of the config. I suspect that there are less than 100 ppl that could even have a clue, in this area, and they don't all have the same pieces of clue.
That's absolutely idiotic (of M$, that is !;-). Even more idiotic than putting all their nameservers in one basket, so to speak.
I'd bet any high-school kid who had any experience whatsoever at installing Linux or FreeBSD could no doubt blow a real OS and a native BIND install onto any sufficiently capable set of four machines in about an hour or so and provided that someone could cough up at least a half-baked zone file from somewhere to load on them they'd all be online and answering to the registered nameserver IP numbers in no time flat. Certainly in less than what's apparently going to be at least 23 hours now!
I'm going to play devils advocate here. * I bet any high school kid setup Linux or FreeBSD box will probably die under the load of M$'s zones - the default out-of-the-box config is nice, but not *nice*. * You have no idea whether M$'s DNS servers are serving static zone files, back ended to a database, talking to a mapper of some sort, whatever. As someone mentioned, there are things such as maintenence windows which explaining to management you need to break can sometimes be painful. That said, I think it being dead for 23 hours is a little strange, but then we don't know the exact story so we could be pointing the blame at exactly the wrong place(s). Adrian
[ On Thursday, January 25, 2001 at 19:17:15 (+0800), Adrian Chadd wrote: ]
Subject: Re: Microsoft spokesperson blames ICANN
On Wed, Jan 24, 2001, Greg A. Woods wrote:
I'd bet any high-school kid who had any experience whatsoever at installing Linux or FreeBSD could no doubt blow a real OS and a native BIND install onto any sufficiently capable set of four machines in about an hour or so and provided that someone could cough up at least a half-baked zone file from somewhere to load on them they'd all be online and answering to the registered nameserver IP numbers in no time flat. Certainly in less than what's apparently going to be at least 23 hours now!
I'm going to play devils advocate here.
* I bet any high school kid setup Linux or FreeBSD box will probably die under the load of M$'s zones - the default out-of-the-box config is nice, but not *nice*.
Well, that's why I said "sufficiently capable machine"..... Give *me* a pair of 1GHz Xeon processors with >=2MB cache on a dual-bus motherboard, 1GB of RAM, a pair of 1000baseT interfaces (one for a private administrative interface), a fiber-channel attached RAID array that's properly tuned for speed, and the latest version of FreeBSD, and we'll see just how many queries per second such a box can answer! ;-) Obviously you'd want to install only the bare minimum of software necessary and then turn off inetd and any other stand-alone network daemon but named....
* You have no idea whether M$'s DNS servers are serving static zone files, back ended to a database, talking to a mapper of some sort, whatever.
It doesn't really matter -- that's a back-office implementation issue. The part that's answering the queries has a terribly simple job to do. However in theory if they've got a reliable internal nameserver that's, for example, either insecure or incapable of handling the public query load, then they can update that one any way they please and let BIND on the authoritative server do the zone transfer from it. Dynamic DNS is useless if you don't have your TTLs set right, and if you do have your TTLs right then getting the SOA right is trivial too, and once you've done that it doesn't matter if you stick an extra zone transfer in the path. So long as they're not being total idiots and trying to void BIND's warranty with <300 sec. TTLs, they'd do just fine. -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
participants (5)
-
Adrian Chadd
-
Henry Yen
-
Morris Allen
-
Roeland Meyer
-
woods@weird.com