In a past role, I did spend the time grepping through such a properly configured cluster, with tens of thousands of nodes, looking for failing hardware. I should have done a proper paper with statistics, but I did not. The vast majority of servers had zero correctable ecc errors, while a few had a lot, which is consistent with the theory that ECC errors are more often caused by bad ram.
I'd have to say that that's been the experience here as well, ECC is great, yes, but it just doesn't seem to be something that is "absolutely vital" on an ongoing basis, as some of the other posters here have implied, to correct the constant bit errors that are(n't) showing up. Maybe I'll get bored one of these days and find some devtools to stick on one of the Macs. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.