I used to work for a small, fairly crappy ISP -- the "datacenter" was a converted brick garage / loading dock. In order to provide cooling, they had chipped out a bunch of bricks, and mounted in 8 or so AC units, all in a line. We monitored everything with WhatsUp Gold[0] - one (hot) night I'm oncall, and at 3:30AM I get an alert that the environmental sensors on one of the routers thinks it's too hot. I'm tired and grumpy, and it's only slightly too hot, so I ack it and go back to bed. A short while later I get paged again - another router now thinks it is uncomfortably warm. Still grumpy, so I ack that too, and back to bed. Sure enough, 20 minutes later, another page.... Fine, I get dressed, drive over to the location -- and realize that bricks / mortar are strong in compression, but weak in tension - the AC window units have been quietly vibrating for many years, and the entire row of bricks above the AC units has popped out. All the AC units are lying outside the building on the grass, still running.... :-) I stared at them for a bit, unsure what to do -- so I turned them off, bumped up the monitoring levels, and went back to bed... Next day we blocked up the hole, installed some temporary chillers, and then finally installed real colling.... There isn't much point to this story, but I've got a cold, and wanted to share... :-P W [0]: Wow, I just realized that WUG still exists... huh. On Tue, May 28, 2019 at 9:13 AM Thomas Bellman <bellman@nsc.liu.se> wrote:
On 2019-05-27 18:18 +0000, Mel Beckman wrote:
Before the trigger temperature is reached, the NMS would have sent various escalating alarms to on call staffers, who hopefully would intervene before this point.
Would they actually have time to react and do something? In our datacenters, we reach our cut-off temperature in about 20 minutes if cooling stops.
This system has triggered one time, successfully shutting down the data center on a holiday weekend when people missed their notifications, and undoubtedly saved a lot of hard drives. When we got to the room the temperature was over 115°, but the power was cut at 95°.
Presumably that was °F, not °C.
I have heard from people who did *not* have automatic cutting of the power at high temperatures. Their computer room reached 100°C in places; some keyboards apparently looked like a certain Salvador Dali painting afterwards... (But I think they had very few actual servers or disk drives breaking.) The reason it didn't get even hotter, was that as temperature rose, servers started overheating and shut them- selves down, thus lowering power disippation more and more.
Our system for cutting power at high temperatures is part of the PLC monitoring power and temperature in the computer rooms. It sends a signal to the large breakers connecting the power subcentrals (where all the 16A fuses are) to the power rail feeding the room. I believe our PLCs are from Schneider Electric, but anyone who delivers PLCs for controlling power and cooling in a datacenter should be capable or programming their PLCs to do the same. You just need to remember putting it in the specifications when you contract the building. :-)
/Bellman
-- I don't think the execution is relevant when it was obviously a bad idea in the first place. This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants. ---maf