On 7/24/07, Seth Mattinen <sethm@rollernet.us> wrote:
I have a question: does anyone seriously accept "oh, power trouble" as a reason your servers went offline? Where's the generators? UPS? Testing said combination of UPS and generators? What if it was important? I honestly find it hard to believe anyone runs a facility like that and people actually *pay* for it.
If you do accept this is a good reason for failure, why?
~Seth
I'm unable to find a link at the moment, but many moons ago power was lost at the 350 E Cermak Equinix facility in Chicago. At the time, we didn't have production equipment there (only a firewall in a shared colo cage/cabinet). This occured on a Friday evening and lasted for quite some time into Saturday morning because their generators would start up but would refuse to continue running. I believe the root cause was a problem related to insulation on the power cables somewhere. I understand testing is done frequently, but I'm also aware that if I want full redundancy, I'm going to have two physically separate locations. There are some events you can't plan for, as well as failure modes that aren't easily/quickly resolved. -brandon