Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Aug 2003 15:16:16 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        wes@softweyr.com
Cc:        stable@FreeBSD.org
Subject:   Re: Strange things going on with 4.8
Message-ID:  <200308152216.h7FMGGM7019302@gw.catspoiler.org>
In-Reply-To: <200308102331.57313.wes@softweyr.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10 Aug, Wes Peters wrote:
> On Sunday 10 August 2003 03:55 pm, Doug White wrote:
>> On Mon, 11 Aug 2003, Daniela wrote:
>> > > How did you test the memory? Generally short of using a hardware
>> > > SIMM tester its very difficult to identify bad modules. memtest86
>> > > run over the period of several hours can sometimes work.
>> > >
>> > > BIOS "tests" don't count.
>> >
>> > I used sysutils/memtest from the ports, and let it run over night.
>> > BTW, is there some kind of "operating system" that boots off a floppy
>> > and just tests the memory? That would be useful because memtest can't
>> > test all the memory.
>>
>> Yes, memtest86. its a boot floppy image.
>>
>> > What other diagnostic software could I use?
>>
>> Well the problem with testing memory with software is that its not
>> necessarily possible to hammer it hard enough to trigger the problem. 
>> If you can reproduce it easily you might try cycling out one dimm and
>> then trying to crash it. If removing a dimm fixes it then you probably
>> took out the bad one.
> 
> In fact, many people in the FreeBSD community feel the best memory test of 
> all is to 'make world' several times.  I have experienced this myself 
> only once, but after returning the SIMM module to the vendor he verified 
> it was bad using a hardware tester.  The replacement SIMM has been in for 
> 5 months now and the machine has been marvelously stable, as I expect 
> from FreeBSD.

The problem with the 'make world' test is that you can't easily
distinguish between random memory corruption caused by hardware problems
and random memory corruption caused by OS bugs.

One of my machines was afflicted by both.  I found the hardware problem
by running memtest86. The problem turned out to be a BIOS bug that was
setting the memory timing incorrectly.  The other source of memory
corruption was related to the infamous DISABLE_PSE and DISABLE_PG_G
options.

It was really nice to have a way to test the hardware so that I could
find and fix the hardware problem, and not have to worry about whether
the source of any given memory error was an OS problem.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200308152216.h7FMGGM7019302>