I think the simple test for this problem is to take a non-ECC machine, boot from a CD/USB Key/etc with memtest or memtest86+ on it, and see if you get errors over the course of a few days. Getting errors will certainly prove that this problem exists (or that you have bad ram).
On Sun, Apr 15, 2012 at 5:35 PM, Mike <ispbuilder@gmail.com> wrote: It's not like ECC memory requires a lot of power, a full-blown ATX board or something; there is the Intel S1200KP Mini-ITX board. See, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.5936&rep=rep1&type=pdf But the exact rate of single bit errors in non-ECC memory today is not necessarily predictable based on past studies from the 90s, and depends on environment also -- local lightning, solar activity, which is increasing lately; how much extra shielding you have in place (Server placed inside a Faraday cage/Lead box ?), etc --- you'd need measurements for your specific hardware; there are likely dependencies on the size of the memory cells, the vertical cross section, other components in the system.
I think the simple test for this problem is to take a non-ECC machine, boot from a CD/USB Key/etc with memtest or memtest86+ on it, and see if you get errors over the course of a few days.
Memtest86+ contains a series of tests that help uncover specific kinds of common memory faults; at any particular point in time, during a memtest, there is only a confined range of physical memory addresses under test, a bit flip anywhere else won't be detected. Which means that Memtest is not likely to detect the error. Test #11 Bit-Fade with modifications could have some promise; you need a 24 hour delay instead of a 5 minute delay. You need to have close to the entire physical address space under test. And you need truly random bit values stored to some "reliable" medium, instead of the shortcut of storing known bit patterns. *Memtest86+ itself and the system BIOS have to be stored in memory or CPU cache somewhere. But then again, a random bit flip in non-ECC CPU L2 cache is a possibility, but software like memtest if suitably modified could be made to detect a 1-bit error that showed up in the majority of the memory addresses. -- -JH
participants (2)
-
Jimmy Hess
-
Mike