Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Sep 2005 19:09:24 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Rob Watt <rob@hudson-trading.com>
Cc:        freebsd-hackers@FreeBSD.org, mikep@hudson-trading.com, freebsd-amd64@FreeBSD.org, Jason Carroll <jason@hudson-trading.com>
Subject:   Re: freebsd-5.4-stable panics
Message-ID:  <20050929185538.R61419@fledge.watson.org>
In-Reply-To: <20050928134724.P56436@daemon.mistermishap.net>
References:  <da4a53d805092310237d732554@mail.gmail.com>  <20050925115912.H11229@fledge.watson.org> <20050927140535.G50334@daemon.mistermishap.net> <20050927203128.S61419@fledge.watson.org> <cf6c78405092714227722d534@mail.gmail.com> <20050927222624.R34322@fledge.watson.org> <20050928134724.P56436@daemon.mistermishap.net>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 28 Sep 2005, Rob Watt wrote:

> We re-compiled the kernel with 'options KDB_STOP_NMI', and were able to 
> get a much more full analysis of what was happening on the 6-BETA5 
> crash.

Great.

> We crashed in top again, and it does look like we may have hit a 
> kern_proc bug.

This sounds good, or at least, promising.

> in the attached file type3-core.txt you can see that it hits an 
> exception in:
>
> 0xffffffff802b897a is in fill_kinfo_thread
> (/usr/src/sys/kern/kern_proc.c:736).
> 731                     }
> 732
> 733                     kg = td->td_ksegrp;
> 734
> 735                     /* things in the KSE GROUP */
> 736                     kp->ki_estcpu = kg->kg_estcpu;
> 737                     kp->ki_slptime = kg->kg_slptime;
> 738                     kp->ki_pri.pri_user = kg->kg_user_pri;
> 739                     kp->ki_pri.pri_class = kg->kg_pri_class;
> 740
> (kgdb) frame 8
> #8  0xffffffff802b897a in fill_kinfo_thread (td=0xffffff0063311260,
> kp=0xffffffffb62d8510)
>    at /usr/src/sys/kern/kern_proc.c:733
> 733                     kg = td->td_ksegrp;
> (kgdb) p kg->kg_estcpu
> Cannot access memory at address 0x173
> (kgdb) p td->td_ksegrp
> $1 = (struct ksegrp *) 0x0
> (kgdb) p kp->ki_estcpu
> $2 = 0
> (kgdb) p kg
> $4 = (struct ksegrp *) 0x12b
>
> it seems that kg is an invalid pointer.

Could you dump the contents of *td and *td->td_proc for me?  I'm quite 
interested to know what the value in td->td_proc->p_state is, among other 
things.  If I could also have you generate a dump of the KSE group 
structures in td->td_proc->p_ksegrps and the threads in 
td->td_proc->p_threads.

Could you tell me if the program named by p->p_comm is linked against a 
threading library?  If it's a custom app, you may already know, and if 
not, you can run ldd on the application to see what it is linked against.

Depending on how much time you have available, it might make sense for me 
to grab from you a copy of your source tree, compiled kernel with debug 
symbols, and core dump.

> We have started our tests again without running top.
>
> Hope you have a great vacation.

It was brief but very enjoyable, and quite disconnected :-).

Thanks,

Robert



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050929185538.R61419>