Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 Jul 2008 09:28:11 -0400
From:      Michael Powell <nightrecon@verizon.net>
To:        freebsd-questions@freebsd.org
Subject:   Re: FreeBSD and ECC memory?
Message-ID:  <g6ck9v$b1b$1@ger.gmane.org>
References:  <4889BAE0.6030308@skoberne.net> <g6chl3$22s$1@ger.gmane.org> <20080725130052.GA70571@owl.midgard.homeip.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Erik Trulsson wrote:
[snip]
> 
> No, non-ECC RAM cannot detect or correct any errors at all. (Old
> parity-RAM could detect, but not correct, single-bit errors.)

Actually quite true. The old parity bit functionality that was removed from
RAM and then called "non-ECC" actually migrated to the memory controller.
So yes, it isn't the RAM that does it. Poor choice of wording on my part.

> ECC is generally capable of detecting multi-bit errors and fixing
> single-bit errors. (There are different ways of implementing ECC. Some of
> them might well be able to fix multi-bit errors too.)

These cost lots of money. Common on "Big Iron". In fact, non-ECC as an
option isn't even offerred on "B.I".
 
[snip] 
>> The purpose of these schemes is to compensate for the fact that in every
>> so many (some large number) of memory transactions there may be a bit
>> that gets flipped. If this is happening more often than (some large
>> number) then there is a defect present. ECC just buys you "uptime" in the
>> event there are more errors than there should be.
> 
> Note that random, spontaneous bit flips can happen (infrequently) even in
> perfectly good RAM. (Due to cosmic rays, radioactive decay in surrounding
> material, and similar stuff. (No, I am not joking.))  ECC will handle
> such errors just fine, and that is the main reason why I would want ECC.

Especially true in satellites. The RAM in a satellite, or other spacecraft
must be radiation hardened to be usuable at all. And yes, it is no joke but
the truth what you say.

For me the dividing line is when lots of people depend on a box 24/7 it must
be ECC. A storage server in someones basement doesn't necessarily fit into
this category.
 
> You can also get defective memory modules, but such can usually be
> detected
> by running memtest86 or similar.  ECC can usually handle memory modules
> that have some bits more or less permanently wrong, but such modules
> should be replaced as soon as possible.
>

I agree - I was kind of harping on the "defective" idea. If it's defective
the manufacturer owes me a replacement, as in yesterday. 
 
[snip] 





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?g6ck9v$b1b$1>