From the typical monitoring stations Dave sees, everything appears "normal." Yet, out in the real world there is a problem. Like most
Not to pick on Dave, since I suspect he is going to have to face the Microsoft PR department for re-indoctrination for speaking out of turn, I'm glad to see someone from microsoft made an appearance. But he does raise an interesting problem. How do you know if your highly redudant, diverse, etc system has a problem. With an ordinary system its easy. It stops working. In a highly redudant system you can start losing critical components, but not be able to tell if your operation is in fact seriously compromised, because it continues to "work." As many of us have found out as we moved from simple networks to more complex networks, the network management is often much harder than the architecture of the network itself. Instead of relying on being notified when stuff "breaks" you have to actively monitor the state of your systems. Fairly frequently I see cases where the backup system failed, but no one knows about it until after the primary system also fails. things its rarely a single thing that breaks, but chain of problems resulting in the final failure. So what should you be monitoring in addition to the typical graphs and logins to detect the problem seen by Microsot yesterday and today? On Wed, 24 January 2001, Dave McKay wrote:
Microsoft's ITG is investigating this issue. I haven't been clued in as of yet as to what is the main issue. Hotmail's graphs and logins are currently following the same trends as normal, they seem unaffected, however this is not the case in all locations. DNS seems to be the obvious choice for the blame. This is not the case in all areas, however. At this point Microsoft is not willing to put the blame on anyone, or any protocol for that matter. (Unless they already released a public statement saying so, then who knows?) Anyway, the issues are being worked on and service will be restored as soon as possible. I apolozise for not being able to disclose more information.
-- Dave McKay dave@sneakerz.org Microsoft Global Network Architect