On 7/9/2013 10:28 PM, Erik Levinson wrote:
As some may know, yesterday 151 Front St suffered a cooling failure after Enwave's facilities were flooded.
One of the suites that we're in recovered quickly but the other took much longer and some of our gear shutdown automatically due to overheating. We shut down remotely many redundant and non-essential systems in the hotter suite, and transferred remotely some others to the cooler suite, to ensure that we had a minimum of all core systems running in the hotter suite. We waited until the temperatures returned to normal, and brought everything back online. The entire event lasted from approx 18:45 until 01:15. Apparently ambient temperature was above 43 degrees Celcius at one point on the cool side of cabinets in the hotter suite.
For those who have gone through such events in the past, what can one expect in terms of long-term impact...should we expect some premature component failures? Does anyone have any stats to share?
No stats, but way back in the day of very large computers (1 each) in very large facilities, it seems like the thing we worried most about at restart was too-rapid cooling and the resulting condensation if the conditions were right. After power-up the next thing was disk crashes that occurred on the way down (this was a long time ago discs and drums are different now). Lastly was overheat failures which were relatively few and always in components with a weakness reputation. -- Requiescas in pace o email Two identifying characteristics of System Administrators: Ex turpi causa non oritur actio Infallibility, and the ability to learn from their mistakes. (Adapted from Stephen Pinker)