On 7/24/07, Seth Mattinen <sethm@rollernet.us> wrote:
I have a question: does anyone seriously accept "oh, power trouble" as a
reason your servers went offline? Where's the generators? UPS? Testing
said combination of UPS and generators? What if it was important? I
honestly find it hard to believe anyone runs a facility like that and
people actually *pay* for it.
If you do accept this is a good reason for failure, why?
~Seth
I'm unable to find a link at the moment, but many moons ago power was lost at the 350 E Cermak Equinix facility in Chicago. At the time, we didn't have production equipment there (only a firewall in a shared colo cage/cabinet). This occured on a Friday evening and lasted for quite some time into Saturday morning because their generators would start up but would refuse to continue running. I believe the root cause was a problem related to insulation on the power cables somewhere. I understand testing is done frequently, but I'm also aware that if I want full redundancy, I'm going to have two physically separate locations. There are some events you can't plan for, as well as failure modes that aren't easily/quickly resolved.
-brandon