Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Dec 2005 06:07:49 -0800
From:      J.C. Roberts <unknown@abac.com>
To:        freebsd-alpha@freebsd.org
Subject:   Re: SRM memtest
Message-ID:  <mf50q11367nbettb6bllpmvgatqoti16hq@4ax.com>
In-Reply-To: <tcnup15n4j9oa5g22cmodbhu2k8aihdlpv@4ax.com>
References:  <tcnup15n4j9oa5g22cmodbhu2k8aihdlpv@4ax.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 13 Dec 2005 16:00:02 -0800, J.C. Roberts <unknown@abac.com>
wrote:

>I noticed something odd on an Alpha Personal Workstation 433 that I got
>off of eBay. The ARC/AlphaBIOS would occasionally report 256MB rather
>than the usual 384MB. This weirdness was intermittent. I have reseated
>everything in the system to make sure there are no connection/connector
>issues but I think it would be prudent to actually test the memory
>itself.
>
>I kicked the system into SRM Console mode and I've been trying to run
>memtest to no avail. I believe *I* am the real problem since I don't
>know what the heck I'm doing in SRM in spite of the fact that I've read
>the SRM Console user guide.
>http://ftp.digital.com/pub/Digital/info/semiconductor/literature/srmcons=
.pdf
>
>The SRM version is v7.2-1  Mar 6, 2000
>
>Running even the most simple tests seems to basically lock up the system
>since the command fails to ever exit even if you let it run for a couple
>hours to try completing two passes.
>
>  >>> memtest -rb -p 2
>
>If you background the memtest process and run show_status, it seems to
>pass at least once?
>
>  >>> memtest -rb -p 2 &
>  >>> show_status
>  ID        Program   Device   Pass   Hard/Soft   Written   Read
>  -------- --------- -------- ------ ----------- --------- ------
>  00000001      idle system       0      0    0         0      0=20
>  0000004F   memtest memory       1      0    0         0      0=20
>
>
>Using >>>kill_diags afterwards only locks up the system.
>
>I've searched around for more detailed instructions on the web. I found
>a cryptic post to the DebianAlpha list
>http://lists.debian.org/debian-alpha/2004/11/msg00064.html
>
>It mentions using
>
>>>>dynamic -r
>
>to figure out values to use with memtest switches but I still don't
>understand what was meant. The whole "zone" thing is a mystery. Worse
>yet, the SRM Console user guide doesn't even mention "dynamic" as a
>command and the man/help pages in the SRM itself are useless.
>
>I've reduced the system memory to 128MB (two DIMS) so I can test the
>pairs and by accident I figured out which pair is bad (i.e. running
>"dynamic -h" by mistake resulted in errors with one pair).
>
>When you guys use memtest properly, how do you do it?
>
>Thanks,
>JCR

My apologies for replying to myself, but I've had a few people ask me
off list to make the answer public if I ever manage to figure it out.
I've been working on this for a week, reading docs, searching the web
and asking around on OpenVMS, FreeBSD, OpenBSD, NetBSD and linux lists
and groups.

With the help of Graham Burley on comp.os.vms an answer for the problem
with the SRM MEMTEST and MEMORY commands failing to run has been found.
The WRITTEN and READ portions of the SHOW_STATUS output (above) were
telling us that the tests were not actually running.

This system probably came out of a "secure" site (i.e. government), so
it was sold to me without a hard drive. Though I had installed a new
disk, there was no OS or bootable partition on it (an old 4.5GB data
drive with an NTFS partition -this becomes relevant later), and
obviously, there was nothing for the SRM to boot to in the system.

When booting to SRM I got the expected error messages

  CPU 0 booting

  (boot dka0.0.0.1009.0 -flags A)
  block 0 of dka0.0.0.1009.0 is not a valid boot block
  bootstrap failure

  Retrying, type ^C to abort...

Basically, it's an endless loop of trying to boot to the disk, so
I had always just been following instructions and using ^C to get into
the SRM console to run the memory tests. This ^C is the main cause of
the memory testing problems I mentioned above because by aborting, the
system/SRM is _not_ initialized.

If you're having problems with either MEMORY or MEMTEST do a ps (or
CTRL-T) and look at the status of the MEMTEST lines. If you see them
stuck with "WAITING ON" you know your system/SRM was not completely
initialized.

If you run INIT at this point, you just end up with the same bootstrap
failures and ^C issue as before, so you need to change how the system
boots before running INIT.

  >>>set auto_action halt
  >>>init

This gets you to a nice, clean SRM console that's been fully
initialized. At this point MEMORY and MEMTEST commands should work
properly. You can tell they are working by the WRITTEN and READ portions
of the SHOW_STATUS output. If the -p switch has a value of zero, memory
tests will run until you tell them to stop with the KILL_DIAGS command.

By the way, if you want to see what the "normal" switches are for
running MEMTEST you can look at the MEMORY script.

  >>>cat memory


So the system passed it's memory tests and all was well until I rebooted
the system. This put me into AlphaBIOS/ARC for some strange reason. I
didn't think it was a big deal so I did the usual to switch back to SRM:

  F2         (Setup)
  CMOS Setup
  F6         (Advanced)
  Console Selection: "UNIX Console SRM" (or "OPENVMS Console SRM")
  F10        (save)
  F10        (save)
  ESC        (exit)
 =20
  power cycle
 =20
=46or some strange reason I ended up in AlphaBIOS/ARC again? This was
weird so I did the steps again, cold booted again, and sure enough, it
_still_ came up in AlphaBIOS/ARC mode?

The reason why the darn thing refused to go into SRM mode is because of
that old NTFS partition on the disk. Once I deleted that partition
through the AlphaBIOS, I could finally reset the "Console Selection" to
SRM and have it work.

Hopefully this information will help the next person trying to figure
out why their memtest isn't working as expected.

Kind Regards,
JCR





--
|   Patches to developers are like lights to moths;
|   "Ooohhh PATCHES! Look at the pretty patches..."
|   You can expect them to just circle for a while and even if=20
|   they never commit, you'll definitely have their attention.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?mf50q11367nbettb6bllpmvgatqoti16hq>