On Wed, 04 Nov 2009 12:26:15 CST, Joe Greco said:
With power:
N+1 is usually better than N Best to assume full load when doing math Things will go wrong, predict common failures
And uncommon ones. :) So as part of a major compute-cluster install, we upgraded our UPS and diesel generator one weekend, and breathed a collective sigh of relief that we were now safe from power outages and mostly dodged a bullet. We *did* have some scary moments when we discovered that (a) of the 400 or so disks on our Sun E10K, about 10 didn't spin up again and (b) several of the boot disks on said box weren't mirrored. Fortunately, none of the 10 fails were on a non-mirrored disk. By Tuesday, all the non-mirrored boot disks were in fact mirrored. That Friday, a bozo contractor relocating a doorway managed to set off the Halon. Only lost two disks on the E10K. Guess which two? ;) And a month later, we discovered that the nice shiny new automatic cutover switch was wired in backwards, necessitating another power outage to re-wire it correctly. So much for safe from power outages... :)