* Erik Levinson <erik.levinson@uberflip.com>: [cooling failure]
For those who have gone through such events in the past, what can one expect in terms of long-term impact...should we expect some premature component failures? Does anyone have any stats to share?
We had a similar event (temperatures were a bit higher at 49°C, duration was a bit shorter, 10am to 3pm) this January. In the two days after the event, two of our HP servers had drives that went from "OK" to "Predictive Failure", which is the SmartArray controller's way of telling about high error rates. Two weeks after, we had a single DIMM with an uncorrectable ECC error, causing a server reboot. Three weeks after, a single PSU failed. In our opinion, the disk problems were caused by the cooling failure, while the ECC error and the faulted PSU were probably not related. I believe that your hardware will be fine, but it probably wouldn't be a bad idea to check if you have current maintenance contracts/warranty for your servers, or any other way of obtaining replacement drives in a reasonably short time. Cheers Stefan