In a message written on Sun, Apr 15, 2012 at 09:54:14PM -0400, Luke S. Crawford wrote:
On my current fleet (well under 100 servers) single bit errors are so rare that if I get one, I schedule that machine for removal from production.
In a previous life, in a previous time, I worked at a place that had a bunch of Cisco's with parity RAM. For the time, these boxes had a lot of RAM, as they had distributed line cards each with their own processor memory. Cisco was rather famous for these parity errors, mostly because of their stock answer: sunspots. The answer was in fact largely correct, but it's just not a great response from a vendor. They had a bunch of statistics though, collected from many of these deployed boxes. We ran the statistics, and given hundreds of routers, each with many line cards the math told us we should have approximately 1 router every 9-10 months get one parity error from sunspots and other random activity (e.g. not a failing RAM module with hundreds of repeatable errors). This was, in fact, close to what we observed. This experience gave me two takeaways. First, single bit flips are rare, but when you have enough boxes rare shows up often. It's very similar to anyone with petabytes of storage, disks fail every couple of days because you have so many of them. At the same time a home user might not see a failure in their lifetime (of disk or memory). Second though, if you're running a business, ECC is a must because the message is so bad. "This was caused by sunspots" is not a customer inspiring response, no matter how correct. "We could have prevented this by spending an extra $50 on proper RAM for your $1M box" is even worse. Some quick looking at Newegg, 4GB DDR3 1333 ECC DIMM, $33.99. 4GB DDR3 1333 Non-ECC DIMM, $21.99. Savings, $12. (Yes, I realize the Motherboard also needs some extra circuitry, I expect it's less than $1 in quantity though). Pretty much everyone I know values their data at more than $12 if it is lost. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/