Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Mar 2009 12:44:27 +0100
From:      Marius Strobl <marius@alchemy.franken.de>
To:        zenxyzzy <zenxyzzy@gmail.com>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: US-III crashes on current
Message-ID:  <20090325114426.GA74306@alchemy.franken.de>
In-Reply-To: <bc4edd860903221730p584dc13s5aff941ae3515b60@mail.gmail.com>
References:  <bc4edd860903221730p584dc13s5aff941ae3515b60@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 22, 2009 at 07:30:28PM -0500, zenxyzzy wrote:
> I've been tinkering with my sunblade 1000 for some time, and have run
> pretty much all the os's available on it
> it's really cool to have those easily swappable fiber channel root disks...
> 
> configuration is pretty phat, with 2x900, 4G, 2x73G FC, 500G sata on a
> shoehorned in internal bay, and 1 scsi dvd-rom
> and 1 ide dvd burner, 2x creator 3d UPA, 1 belkin usb2 card, a promise
> 4 drive ide card, and a cheap sil3512 sata card.
> 
> anyhow, I was tickled pink when 8.0-20090111-SNAP showed up a while
> back, and it runs well, with a zfs root, even.
> some caveats:
> 
> 1) the fans run all the time.

As long there's no driver to control the fans based on the
temperature this is what's expected. If you'd like to give
writing a driver a try, OpenSolaris contains the source for
such a daemon. Both Linux and OpenBSD also have a driver
for this. The latter might or might not be a viable start
for a FreeBSD one, depending on whether it can be untangled
from their sensors framework and other stuff which does not
and should not exist in FreeBSD. In any case I'd highly
suggest to verify that it does the same as OpenSolaris does
in order to not risk overheating.

> 2) halt consistently panic's the machine. quite benign, if you think about it:
> 
> panic: trap: fast data access mmu miss
> cpuid = 0
> KDB: enter: panic
> [thread pid 1402 tid 100148 ]
> Stopped at      kdb_enter+0x80: ta              %xcc, 1
> db> where
> Tracing pid 1402 tid 100148 td 0xfffff8000448a700
> panic() at panic+0x20c
> trap() at trap+0x4d0
> -- fast data access mmu miss tar=0x14543da000 %o7=0xc034c96c --
> callout_lock() at callout_lock+0x40
> untimeout() at untimeout+0xc
> isp_done() at isp_done+0x140
> isp_intr() at isp_intr+0x3eb8
> isp_poll() at isp_poll+0x38
> xpt_polled_action() at xpt_polled_action+0xc8
> dashutdown() at dashutdown+0x16c
> boot() at boot+0x858
> reboot() at reboot+0x64
> syscall() at syscall+0x2e8
> -- syscall (55, FreeBSD ELF64, reboot) %o7=0x1013e4 --
> userland() at 0x4056af08
> user trace: trap %o7=0x1013e4
> pc 0x4056af08, sp 0x7fdffffe261
> pc 0x100df0, sp 0x7fdffffe321
> pc 0x402066f4, sp 0x7fdffffe3e1

IIRC, this was recently already (correctly) reported to scsi@.
At least I for one didn't have time to investigate this so far
though.

> 
> 3) no X

X generally works fine with Creator3D-cards on pre-USIII
machines so it shouldn't be that hard to get it also to
work with B{1,2}000. Due to 1) using these as workstations
currenly isn't realistic so I haven't looked into this
so far.
Currently the bigger problem here probably is that like
every X.Org update so far 7.4 has caused severe breakage
for sparc64 which has yet to be fixed.

> 4) no sound

The sound chip integrated in B{1,2}000 should work fine
with snd_audiocs(4).

> 5) annoying lock order reversals.

I haven't seen any sparc64-specific LOR with 8.0-CURRENT
so far, not even one that doesn't also happen on amd64
and i386 (there hardly will be), i.e. they're a general
FreeBSD-problem.

> 6) under extreme loads (load av == 10) possibly a hang or two.
> 

I've pretty much stressed FreeBSD on USIII, USIII+ and
USIIIi machines without seeing such hangs, at least not
with the in-tree source, I'm also not using things like
SIL-controllers or ZFS though. Prior to r190374
opensolaris.ko, which zfs.ko depends on in turn, was
incorrectly built to use emulated atomic operations,
as zfs.ko already used real ones this means that
things weren't necessarily atomic across opensolaris.ko
and zfs.ko, which could lead to all kinds of funny
things. Without detailed information these hangs
could be caused by anything including hardware bugs,
where USIII+ are really good in.

> so, since I want to contribute some data, I build a kernel from the
> SNAP's source, and it works just as well, even with
> the lock instrumentation removed.
> 
> so, I pull a current source tree and build it. oops. no go.  1100+
> files changed in those 2 months; how to find the culprit?
> 
> it panics long before probing devices, using the generic config file:
> 
> BOOM:
> 
> Hit [Enter] to boot immediately, or any other key for command prompt.
> Booting [/boot/kernel/kernel]...
> jumping to kernel entry at 0xc0080000.
> GDB: no debug ports present
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> Copyright (c) 1992-2009 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 8.0-CURRENT #0: Sun Mar 22 09:47:54 CDT 2009
>     root@ra.zen-room.org:/usr/src/sys/sparc64/compile/SAFE
> WARNING: WITNESS option enabled, expect reduced performance.
> real memory  = 4294967296 (4096 MB)
> panic: vm_phys_paddr_to_vm_page: paddr 0xfd81a000 is not in any segment
> cpuid = 0
> KDB: enter: panic
> [thread pid 0 tid 0 ]
> Stopped at      kdb_enter+0x80: ta              %xcc, 1
> db> where
> Tracing pid 0 tid 0 td 0xc08ad670
> panic() at panic+0x20c
> vm_phys_paddr_to_vm_page() at vm_phys_paddr_to_vm_page+0x84
> pmap_remove_tte() at pmap_remove_tte+0x80
> pmap_enter_locked() at pmap_enter_locked+0x204
> pmap_enter() at pmap_enter+0x64
> vm_fault() at vm_fault+0x17ac
> vm_fault_wire() at vm_fault_wire+0x3c
> vm_map_wire() at vm_map_wire+0x26c
> kmem_alloc() at kmem_alloc+0x1b4
> vm_ksubmap_init() at vm_ksubmap_init+0x74
> cpu_startup() at cpu_startup+0xc4
> mi_startup() at mi_startup+0x18c
> btext() at btext+0x30
> 
> anybody got any better source than 8.0-20090111-SNAP?  Those 1100 file
> changes look pretty daunting.

The brute-force way would be to do a binary search, this
somewhat doesn't smell like a new problem but something
you just happen to trigger now though, f.e. by initially
loading a larger kernel, then unloading and booting one
that takes up fewer TLB slots one can provoke a similar
panic. Unfortunately the information you provided is
rather limited and I can't reproduce this problem with
current sources. Did you (un)load any kernels or modules
prior to this snippet, what is the size of the kernel
and pre-loaded modules (if any) and do you use any
special kernel or loader options (for ZFS mauybe)?
Please also provide the output when booting this kernel
and modules with a loader built with LOADER_DEBUG
defined.

Marius




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090325114426.GA74306>