Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Aug 2003 20:02:27 +0200
From:      Erik Trulsson <ertr1013@student.uu.se>
To:        Robert Gray <bob@boulderlabs.com>
Cc:        stable@freebsd.org
Subject:   Re: Strange things going on with 4.8
Message-ID:  <20030811180227.GA53638@falcon.midgard.homeip.net>
In-Reply-To: <200308111639.h7BGdvIL024267@vec.boulderlabs.com>
References:  <200308111639.h7BGdvIL024267@vec.boulderlabs.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Aug 11, 2003 at 10:39:56AM -0600, Robert Gray wrote:
> I'd like to emphasize that memtest86 doesn't catch lots of
> memory problems.  Just last week I was having trouble compiling
> mozilla so I ran memtest86 over night.  Nothing showed up.
> But, "make buildworld" repeatedly failed on 
> compiler signal 11 errors at about 20% complete.
> Using  "make buildworld", I was able to isolate a 
> bad DIMM and now "make buildworld" and
> building mozilla run to completion (multiple times).

"make buildworld" can sometimes trigger memory problems that memtest86
doesn't find, yes. Memtest86 has the big advantage that it can test
*all* the bits of memory, which 'buildworld' can't do.

For all tests it is trues that just becuase you can't find any errors,
that doesn't mean that there are no errors.

> 
> Whenever possible, I run with parity/ECC on the motherboard
> and the memory modules.

Usually a good idea.

> 
> I'm hoping a hardware/memory/motherboard expert will chime in.
> How can manufacturers continue to make PCs without memory
> checking?  

Money. Lots of people just buy the cheapest components they can find,
without considering quality or reliability.
Since ECC memory is inherently more expensive than non-ECC memory these
people buy non-ECC memory. Manufactures sell what people buy.

> With today's standards of 128-256MB in a PC, isn't
> it just a matter of time until a bit gets flipped the wrong way?

Yes. 

> Are manufacturers hoping that the bad bit will go unnoticed
> in multi-media?  

Partly that, but mostly it is probably a case of people being used to
having their computers (mostly running Windows) crashing for
unexplained reasons every now and then. Most people wouldn't notice the
increased reliability from ECC memory since their computers/prograns
crash for other reasons fairly often.

> Is there something in today's
> non-parity memory modules that helps insure reliable data?

Not really. Improved manufacturing processes has decreased the risk
that any particular bit is flipped quite a lot over the years, but the
increased size of memory has probably caused the risk of some bit going
wrong to stay fairly constant.

> Until I hear otherwise, I'll continue to spend extra
> for the redundant, error-checking memories.

As long as one can afford the higher prices for ECC memory and
motherboards that can handle them, that is a good idea.

> 
> Thanks
> -robert gray
> 
> 
> 
> 
> Wes Peters <wes@softweyr.com> Sun, 10 Aug 2003 23:31:57 PDT says:
> >>
> >> Well the problem with testing memory with software is that its not
> >> necessarily possible to hammer it hard enough to trigger the problem. 
> >> If you can reproduce it easily you might try cycling out one dimm and
> >> then trying to crash it. If removing a dimm fixes it then you probably
> >> took out the bad one.
> >
> >In fact, many people in the FreeBSD community feel the best memory test of 
> >all is to 'make world' several times.  I have experienced this myself 
> >only once, but after returning the SIMM module to the vendor he verified 
> >it was bad using a hardware tester.  The replacement SIMM has been in for 
> >5 months now and the machine has been marvelously stable, as I expect 
> >from FreeBSD.



-- 
<Insert your favourite quote here.>
Erik Trulsson
ertr1013@student.uu.se



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030811180227.GA53638>