Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Oct 2002 13:22:41 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        Fred Clift <fclift@verio.net>
Cc:        freebsd-alpha@freebsd.org
Subject:   Re: debugging around machine-checks...
Message-ID:  <15798.56033.844389.549256@grasshopper.cs.duke.edu>
In-Reply-To: <20021023110134.Q98807-100000@vespa.dmz.orem.verio.net>
References:  <20021023110134.Q98807-100000@vespa.dmz.orem.verio.net>

next in thread | previous in thread | raw e-mail | index | archive | help

Fred Clift writes:
 > 
 > Ok -- I'm not terribly alpha proficitent - in fact, the one alpha that I
 > run is just a home-server - little more than a toy (mp3 server, print
 > server and relatively secure ssh enpoint from the outside world).
 > 
 > Could someone explain exactly what is going on when a machine-check
 > happens?  Is this done by the machine firmware or something?  It seems

Yes.

A machine check is the highest priority interrupt.  It occurs when
something seriously bad happens.  Like an uncorrectable memory parity
error, or a rogue application or kernel fondling device memory that
does not belong to it.

In fact, on older alphas, that's how we probe the PCI bus.  We tell
the machine check logic to expect a machine check, and to just clear
it, rather crashing.  We then read from a device which may not
exist. If we get a machine check, then the device wasn't there.  (see
sys/alpha/alpha/interrupt.c:badaddr_read()).

21264s and newer are more forgiving about device memory -- they are
like a PC, and will throw away writes to devices which don't exist,
and return -1 for reads.

 > that FreeBSD is instantenously interrupted when a machine check happens
 > and that I dont get crash-dumps.

Hmm.. I haven't used a machine check generating alpha in a while, but
from the code in interrupt.c, it looks like it *should* give you a
crashdump. 

 > Some of you may recall that I've been playing around with XFree86 V4 on
 > this box - it would be exceptionally helpful if I got usable crash-dumps
 > instead of machine checks when things got wierd.  As it is, debugging the
 > X server is pretty much impossible (for me) because of this.
 > 
 > What I've done is build all of the X distribution with debugging symbols
 > in and then I start the X server from gdb and put in 10 break points near
 > where I think things will be happening.  Eventually, I get a machine check
 > and if I'm lucky, I remember where the last breakpoint that I hit was so
 > that after a reboot, I can kind of start back in that neighborhood.

Can't you use the program counter from the panic output as a start?
If its in the X server, there should be a PC from userspace.
(see disclaimer below)

 > X is hard enough to debug by itself without this inconvienence.  It seems
 > that whatever is making it machine-check should be things that could be
 > fixed in the kernel, at which point, my debugging of the X server could
 > then continue.Then when X dumps core I can just restart X rather than wait
 > for a reboot/fsck.
 > 
 > Am I way off here?  I seem to have read somewhere that there is something
 > you can do to fend off machine-checks so that you can get a proper
 > crash-dump?  What is the mechanism that causes the checks and how bad
 > would it be for the system to do something equivalent to maksing these
 > events out (or whatever you'd do to get them to not happen?).
 > 

Look at alpha/alpha/interrupt.c:badaddr_read().

If you're feeling really lucky, you could add code to send the
appropriate signal (sigbus?) if the PC is in a userland app.

The problem with this is that machine checks are somewhat
asynchronous, and I'm not sure the PC at the time of the fault
corresponds to the PC that actually caused the fault.
(that's why there are so many memory barriers all over the pci probing
and baddaddr code).

Good luck.

Drew

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15798.56033.844389.549256>