Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Jan 1998 02:01:48 -0600 (CST)
From:      Joel Ray Holveck <joelh@gnu.org>
To:        tlambert@primenet.com
Cc:        mrcpu@cdsnet.net, hackers@FreeBSD.ORG
Subject:   Re: Had the shotgun out and pointed at my -current/SMP box...
Message-ID:  <199801220801.CAA00609@detlev.UUCP>
In-Reply-To: <199801220637.XAA07251@usr09.primenet.com> (message from Terry Lambert on Thu, 22 Jan 1998 06:37:03 %2B0000 (GMT))
References:   <199801220637.XAA07251@usr09.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help

>>    [ I am messing around with a 3 processor P6/233 system to potentially
>> 	do some heavy-duty database work, and it hasn't been able to 
>> 	complete a make buildworld yet.  Crashes with a wide variety of
>> 	errors.  Pop in the NT drive, works fine.  FreeBSD crash.
>> 	Just about to shoot the damn thing, and...]
> [ ... memory problems ... ]
> One wonders what NT wasn't telling you... if it's bad, it's bad.
> I think maybe the difference was that under NT is was undetectably
> bad.  Which is bad.

Could be happenstance.  For instance, suppose that one bit (bit A),
when undergoing a transition from 0 to 1, causes an column-adjacent
bit (bit B) to become stuck at 1.  Now, suppose that his NT kernel
loads at an address such that part of the code includes keeping bit A
at 0.  This would mean that since bit A has never undergone its
fatal transition, then bit B continues to work perfectly.

Recall that the same thing could have happened under FreeBSD,
depending on where the nails land.

Just a consideration.  I've also had machines which would apparently
work perfectly well until I load more than n devices, at which point
they would fatally fail (the nth device happened to, in each of my
test cases, cause the bad RAM to cover I/O buffers, whereas before it
was covering unused memory), or machines that had worked fine for
years under Win3.1, but died horribly the minute we installed Win95
(frequently taking the system registry with them, meaning I had to do
a complete OS reinstall... what's that saying about eggs and
baskets?).

My point is that just because something works, doesn't mean it's a
good component, just as when swapping a component makes the system
work, doesn't mean the component is at fault.  I still use RAM tester
programs, simply because, in my experience, they tend to find faulty
RAM more reliably than *any* other method I've used.  (It also helps
to use a good RAM tester (I like Qualitas RAMExam), rather than some
dink that just sets all bits zero then all bits 1.)

Would other people here like to see a boot-sector RAM tester?

-- 
Joel Ray Holveck - joelh@gnu.org - http://www.wp.com/piquan
   Fourth law of programming:
   Anything that can go wrong wi
sendmail: segmentation violation - core dumped



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199801220801.CAA00609>