Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 Sep 2002 16:00:48 -0700 (PDT)
From:      Don Lewis <dl-freebsd@catspoiler.org>
To:        mb@imp.ch
Cc:        current@FreeBSD.ORG, hardware@FreeBSD.ORG
Subject:   memory/filesystem corruption, a cautionary tale (was: Re: Crashdumps available for download (solved I think))
Message-ID:  <200209272300.g8RN0mvU002155@gw.catspoiler.org>
In-Reply-To: <20020919111219.U52781-100000@levais.imp.ch>

next in thread | previous in thread | raw e-mail | index | archive | help
On 19 Sep, Martin Blapp wrote:
> 
> Hi all,
> 
> With help of http://www.memtest86.com/memtest86-3.0.iso I've tracked
> it down to three 3 ! bad DRAMS.

Thanks for the pointer.

I have continued to see transient filesystem damage that would disappear
with a reboot, which made me suspect that the filesystem data cached in
ram was being corrupted.  Over the last few weeks it seemed to migrate
from the /usr/src tree to the .depend files in /usr/obj.  A small
section of the file would be corrupted with binary garbage, but most
characters within the damaged section would not be touched.  The machine
in question has an Athlon XP 1900+ processor and PC2100 ECC RAM.

Last night I downloaded and ran memtest86 and after several passes I saw
a burst of errors in Test #5.  An entire byte of data was being flipped
from 0xff to 0x00 or vice versa at intervals of 8 or 16 bytes over a
small range of addresses.  This would seem to indicate an error caused
by one 8-bit wide chip on the 64-bit wide (72 with ECC) DIMM.  The
memtest86 documentation says that errors in Tests #5 and #8 are not
uncommon on Athlon systems.  The documentation suggests that some cases
can be fixed by relaxing the memory timing, while others require
replacing the RAM with RAM of higher quality.

Since my RAM was from a reputable maker, I decided to try to adjust the
memory timing.  The BIOS allows a large number of tweaks to the memory
timing and I did not relish the idea of having to blindly twiddle all
the knobs.  One thing that caught my eye was that the CAS Latency timing
was set to 2 clocks.  I thought that sounded agressive since not much
RAM is rated for that timing.  I bumped the CAS Latency to 2.5 clocks
and the errors appeared to go away.  Later I went back to check the
specifications for the RAM that I bought, and it turns out that it is
rated for a CAS Latency of 2.5 clocks!  I tweaked the BIOS some more and
tried both the failsafe settings and the "optimized" settings, and in
both cases the automagic RAM configuration settings in the BIOS set the
CAS Latency to 2.

It looks like either my motherboard BIOS is incorrectly sensing the RAM
speed, or it it senses the RAM speed correctly and is incorrectly
configuring the RAM timing, or the actual RAM that I purchases is
advertising the incorrect RAM speed.  If you've got an Athlon system,
you might want to double check this.

I've been running memtest86 since last night with the CAS Latency set to
2.5 clocks and haven't seen any errors.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200209272300.g8RN0mvU002155>