On Jul 24, 2007, at 4:57 PM, Patrick Giagnocavo wrote:
On Jul 24, 2007, at 6:54 PM, Seth Mattinen wrote:
I have a question: does anyone seriously accept "oh, power trouble" as a reason your servers went offline? Where's the generators? UPS? Testing said combination of UPS and generators? What if it was important? I honestly find it hard to believe anyone runs a facility like that and people actually *pay* for it.
Sad that the little Telcove DC here in Lancaster, PA, that Level3 bought a few months ago, has weekly full-on generator tests where 100% of the load is transferred to the generator, while apparently large DCs that are charging premium rates, do not.
I am not familiar with the operational details of 365 Main, but, I suspect that they, like most datacenters, probably do have weekly generator and transfer test procedures. However, there are lots of things that can go wrong that are not covered by generators and transfer tests: It is possible to cascade fail a power distribution system in a number of ways. It is possible for someone to connect things out of phase during a maintenance procedure in such a way that everything is fine until a transfer occurs, then, all hell breaks loose (ever seen what happens when a large CRAC unit starts trying to run backwards because the 3 Phase rotation is out of order?) There are also things that can go wrong in the transfer process (like putting the UPS and Generators on the bus together some degrees out of phase). Most of these things become far more likely and far harder to avoid as the amount of power and the number of units in the system increases. I'm not defending the situation at 365 Main. I don't have any first hand knowledge. I'm just saying that the mere fact that they are dark for several hours today does not necessarily mean that they don't do weekly full-on generator tests. I have no idea what the root cause of today's outage is. I will be interested in hearing from any credible source as to any actual details, but, I'm betting that right now, any such credible source is a bit busy. Owen