On Mon, Jan 15, 2024 at 6:08 AM Mike Hammett <nanog@ics-il.net> wrote:
Let's say that hypothetically, a datacenter you're in had a cooling failure and escalated to an average of 120 degrees before mitigations started having an effect. What should be expected in the aftermath?
Hi Mike, A decade or so ago I maintained a computer room with a single air conditioner because the boss wouldn't go for n+1. It failed in exactly this manner several times. After the overheat was detected by the monitoring system, it would be brought under control with a combination of spot cooler and powering down to a minimal configuration. But of course it takes time to get people there and set up the mitigations, during which the heat continues to rise. The main thing I noticed was a modest uptick in spinning drive failures for the couple months that followed. If there was any other consequence it was at a rate where I'd have had to be carefully measuring before and after to detect it. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/