Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Jun 1996 02:00:04 -0700 (MST)
From:      Don Yuniskis <dgy@rtd.com>
To:        msmith@atrad.adelaide.edu.au (Michael Smith)
Cc:        hua@xenon.chromatic.com, dgy@rtd.com, jsigmon@www.hsc.wvu.edu, hackers@freebsd.org
Subject:   Re: Memory tests ...
Message-ID:  <199606250900.CAA01330@seagull.rtd.com>
In-Reply-To: <199606250112.KAA24941@genesis.atrad.adelaide.edu.au> from "Michael Smith" at Jun 25, 96 10:42:39 am

next in thread | previous in thread | raw e-mail | index | archive | help
> Ernest Hua stands accused of saying:
> > 
> > I would prefer a thorough set of tests such as some reasonably optimized
> > 1's and 0's test.  I'm not familiar with algorithms for testing "flaky"
> > versus "stuck".
> 
> The problem is that no program can generate sequential accesses _fast_ 
> enough, and has no way of watching the critical timing parameters that
> will help you decide _how_ marginal a given memory is.

Agreed.
 
> For this you need a _real_ memory tester, and because measuring nanosconds
> accurately is difficult, thee cost _lots_ of money.
> 
> So if you just want a 'does it work, yes/no' answer, put the memory into
> your favorite high-performance OS (I prefer FreeBSD, OS/2 and Novell are 
> also popular), and thrash it mercilessly for a few days.

I don't see the value of this -- except for the fact that it's "easy"
to invoke from a shell  :>   If the system seizes up, it just tells
you something died (most probably memory).  You are counting on the
failure to happen in such a way as to corrupt the state of the
processor irrevocably.

Exhaustive tests in *software* are usually ridiculous -- they take
forever to execute and rarely detect anything but the grossest
errors (i.e. stuck at * and decoding errors).  These can be found
through other (less painful) techniques.

I find use of a LFSR with a long, "relatively prime" period to 
alternately fill and check memory contents is great as a quick
POST-style check.  It can also be used for more thorough testing
(i.e. to catch thermal problems) if set in an endless loop.  And,
unlike just running a system hard for a while, it (usually)
survives a memory failure and can report on the failure.

Of course, this *doesn't* test other hardware that may be marginal,
etc. (i.e. DMAC's).

My two cents...
--don



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199606250900.CAA01330>