Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 06 Feb 2001 20:59:54 -0800 (PST)
From:      John Baldwin <jhb@FreeBSD.org>
To:        Jim Bloom <bloom@acm.org>
Cc:        current@FreeBSD.org
Subject:   Re: Kernel Panic from Yesterday's CVSup
Message-ID:  <XFMail.010206205954.jhb@FreeBSD.org>
In-Reply-To: <3A80B43B.659DCED5@acm.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 07-Feb-01 Jim Bloom wrote:
> John Baldwin wrote:
>> 
>> On 06-Feb-01 Jim Bloom wrote:
>> > Which kernel do you want me to try this with?  I have tried two
>> > different kernels with two different errors.  (Both have been sent at
>> > different times in the past couple days.)  The registers listed here
>> > from the second kernel (with WITNESS, INVARIANTS, INVARIANT_SUPPORT,
>> > MUTEX_DEBUG).  As such the addresses disagree (sw1b has 8 more bytes for
>> > invariants), but the text segment was correct.
>> 
>> You'll have to turn off WITNESS to get it to die in cpu_switch(), but you'll
>> want to leave the others on for now.
> 
> I turned off WITNESS and still received the mutex error.  A little
> reading of the code showed that mutex assertions are inclued with ifdef
> INVARIANTS.
> 
>> 
>> > Without debug, I get the trap 9.  With debug, I get a trap 12
>> > immediately followed by a panic with mutex shced lock recursion.
>> >
>> > I rebuilt the kernel with out the debugging and check the state of
>> > things.  The code is correct and the esi register had the expected
>> > value.
>> 
>> Hmmmmmmmm.  Ok, try with debugging minus WITNESS (and you don't want
>> MUTEX_DEBUG, that slows things down a _lot_).  Then see if %esi is
>> still 0x100 instead of 0x20.  If so, then check the instructions to make
>> sure
>> they aren't hosed.
> 
> With INVARIANTS turned off and WITNESS on, I received a trap 27 (stack
> fault) at sw1b+0x77.  The instructions are fine and %esi was 0x20 as it
> should be.  I won't worry about MUTEX_DEBUG being slow just yet.  I am
> only around the start of /etc/rc when the machine dies.

A stack fault on 'ltr'?  Hmm...  leeching from teh 386 manual on ltr:

Protected Mode Exceptions

#GP(0) for an illegal memory operand effective address in the CS, DS, ES, FS,
or GS segments; #SS(0) for an illegal address in the SS segment; #GP(0) if the
current privilege level is not 0; #GP(selector) if the object named by the
source selector is not a TSS or is already busy; #NP(selector) if the TSS is
marked "not present"; #PF(fault-code) for a page fault 

Hmm, so our %ss selector must be invalid in the new TSS.  h0h0.  Ok, umm.  I
have no idea how that happened.  AFAIK, that shouldn't be touched once the
system is up and running. :(  So, probably something is trashing it by walking
off a dead pointer or some such. :(
 
> Do you have any other ideas on things that I can try to diagnose this
> problem?

Right now, no.  I know of one potential problem child with preemption: the
interrupt thread list handling, and I'm overhauling the ithread code right now
so I can try and fix that.

> Jim Bloom
> bloom@acm.org

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.010206205954.jhb>