Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Apr 2009 23:11:44 +0200
From:      Florian Smeets <flo@kasimir.com>
To:        Marius Strobl <marius@alchemy.franken.de>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: US-III crashes on current
Message-ID:  <49EE3690.2010404@kasimir.com>
In-Reply-To: <20090421210332.GD33994@alchemy.franken.de>
References:  <bc4edd860903221730p584dc13s5aff941ae3515b60@mail.gmail.com> <20090325114426.GA74306@alchemy.franken.de> <49CA1BF1.6090507@kasimir.com> <20090420183620.GA25251@alchemy.franken.de> <49ED0917.10402@kasimir.com> <20090421185814.GA33994@alchemy.franken.de> <49EE1B54.50003@kasimir.com> <20090421210332.GD33994@alchemy.franken.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On 21.04.09 23:03, Marius Strobl wrote:
> On Tue, Apr 21, 2009 at 09:15:32PM +0200, Florian Smeets wrote:
>> On 21.04.09 20:58, Marius Strobl wrote:
>>> On Tue, Apr 21, 2009 at 01:45:27AM +0200, Florian Smeets wrote:
>>>>
>>>> Yes, i can still reproduce this on every shutdown. Tried with r191337.
>>>> Trace is still the same.
>>>>
>>>
>>> Could you please run gdb(1) on the corresponding kernel.debug
>>> and report the output of the following commands?
>>> l *(0xc034c96c)
>>> l *(callout_lock+0x40)
>>> Change as needed if the addresses differ from the above
>>> backtrace. Hrm, the one you reported to scsi@ actually
>>> is a bit different:
>>>> -- fast data access mmu miss tar=0x1454156000 %o7=0xc040e7a4 --
>>>> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x5c
>>>> callout_lock() at callout_lock+0x50
>>>
>>> In that case please additionally get the output of
>>> l *(_mtx_lock_spin_flags+0x5c)
>>>
>>
>> OK, to get this straight this is the trace I'm talking about.
>>
>> Uptime: 19h19m49s
>> panic: trap: fast data access mmu miss
>> cpuid = 0
>> KDB: enter: panic
>> [thread pid 97473 tid 100179 ]
>> Stopped at      kdb_enter+0x80: ta              %xcc, 1
>> db>  where
>> Tracing pid 97473 tid 100179 td 0xfffff80006dfc370
>> panic() at panic+0x20c
>> trap() at trap+0x4d0
>> -- fast data access mmu miss tar=0x20007e000 %o7=0xc03f70a4 --
>> callout_lock() at callout_lock+0x20
>> untimeout() at untimeout+0xc
>> isp_done() at isp_done+0x140
>> isp_intr() at isp_intr+0x3eb8
>> isp_poll() at isp_poll+0x38
>> xpt_polled_action() at xpt_polled_action+0xc8
>> dashutdown() at dashutdown+0x16c
>> boot() at boot+0x850
>> reboot() at reboot+0x64
>> syscall() at syscall+0x2b4
>> -- syscall (55, FreeBSD ELF64, reboot) %o7=0x1013e4 --
>> userland() at 0x40564948
>> user trace: trap %o7=0x1013e4
>> pc 0x40564948, sp 0x7fdffffe201
>> pc 0x100df0, sp 0x7fdffffe2c1
>> pc 0x40206954, sp 0x7fdffffe381
>> done
>>
>> (gdb) l *(0xc03f70a4)
>> 0xc03f70a4 is in spinlock_exit (/usr/src/sys/sparc64/sparc64/machdep.c:232).
>> 227	spinlock_exit(void)
>> 228	{
>> 229		struct thread *td;
>> 230	
>> 231		td = curthread;
>> 232		critical_exit();
>> 233		td->td_md.md_spinlock_count--;
>> 234		if (td->td_md.md_spinlock_count == 0)
>> 235			wrpr(pil, td->td_md.md_saved_pil, 0);
>> 236	}
>
> Hrm, this suggests that curthread or the per-CPU data went
> missing at that point, which leaves me clueless at the
> moment. Do you see this problem since installing FreeBSD
> on that machine or has it developed later? If the latter,
> can you pinpoint when it started? What kind of access for
> debugging could you provide?
>

Honestly i don't know for sure. I don't know if it already existed with 
the first USIII patch you sent me. But i know 100% certain that i was 
already seeing this when we were debugging the STICK thing, which was 
only a few days after i installed the machine (with your initial patch).

I cloud provide access to a FreeBSD box from which you could telnet to 
the rsc card of the machine.

Cheers,
Florian



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49EE3690.2010404>