Where granular temperature readings are available to control scripts, it would also be possible to implement something like the tiers described below. Adjust thresholds as deemed appropriate for the facility and equipment, and also for the expected rates of temperature rise. System peformance throttling and/or quiescing may also be ways to reduce load (and thus cooling requirements and heat build up rates) during periods of reduced or completely lost cooling capacity). 1.) Elevated temperature watch at 77 F / 25 C. Send alerts to on-call staff but take no other action. 2.) Elevated temperature warning at 81.5 F / 27.5 C. Begin performance throttling and engage other measures to reduce heat buildup to compensate for insufficient cooling capacity. 3.) Elevated temperature severe warning at 86 F / 30 C. Begin automated clean system shutdowns. 4.) Critical temperature limit exceeded at 95 F / 35 C. Trigger EPO to protect hardware. On sensor redundancy: 3x or higher redundancy allows for voting methods to be used to rule out potential false readings. On series vs parallel wiring: either can be used...what makes most sense depends on the design of the system being integrated with (basically NC vs NO). On Mon, May 27, 2019, 13:18 Mel Beckman <mel@beckman.org> wrote:
We use Intermapper, an SNMP network monitoring system, which supports UNIX scripting. Intermapper probes two Weathergoose temperature sensors, and calls a script with the values it retrieves. When both sensors exceed a certain threshold, the script sends an snmp relay trip signal to the Weathergoosen, which close a pair of dry contacts wired in series to the emergency power off contacts for the whole-room UPS.
We chose to use two sensors and two dry contact relays to protect against false trips, and thus false shut downs. Before the trigger temperature is reached, the NMS would have sent various escalating alarms to on call staffers, who hopefully would intervene before this point. This protection is for the worst case scenario where nobody responds and the equipment is at risk of damage.
We could have commanded an orderly shut down to all servers, but decided that it would be better to kill the power in the event of a runaway heat vent than to try to make it through all the disk activity necessary for a clean shut down.
This system has triggered one time, successfully shutting down the data center on a holiday weekend when people missed their notifications, and undoubtedly saved a lot of hard drives. When we got to the room the temperature was over 115°, but the power was cut at 95°.
-mel
On May 27, 2019, at 11:01 AM, Dovid Bender <dovid@telecurve.com> wrote:
Hi,
Is anyone aware of a device that will cut the power if the room goes above X degrees? I am looking for something as a just in case.
Regards,
Dovid