Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 1 Jun 2013 03:25:08 +0000
From:      John <jwd@freebsd.org>
To:        FreeBSD SCSI <freebsd-scsi@freebsd.org>
Cc:        Alexander Motin <mav@freebsd.org>, "Kenneth D. Merry" <ken@freebsd.org>
Subject:   Re: Repeated msgs & kernel panic w/ r246437 (Revamp the CAM enclosure services driver)
Message-ID:  <20130601032508.GA72781@FreeBSD.org>
In-Reply-To: <20130508193313.GA65921@FreeBSD.org>
References:  <20130422030053.GA23186@FreeBSD.org> <517641C6.7010905@FreeBSD.org> <20130423140237.GA50775@nargothrond.kdm.org> <20130508193313.GA65921@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
----- John's Original Message -----
> ----- Kenneth D. Merry's Original Message -----
> > > On 22.04.2013 06:00, John wrote:
> > > >Hi Folks,
> > > >
> > > >    After updating one of our servers to the latest stable image,
> > > >it appears that commit r246437 appears to be causing it to panic.
> 
> > I agree.  I added the xpt_create_path_unlocked() call to fix a
> > panic with a stack trace just like the one above.  It looks like a problem
> > due to running r246437 exactly.
> 
>    As noted above, the system panics with the latest stable image. We
> simply started backing off commits until we found where the system
> no longer panic'd.
> 
>    Side note: is there a way to disable the new daemon that I haven't
> seen? sysctl? loader config?
> 
>    Apologies for the delay - here is the stable information:
> 
> 'working' tag - has 1 of the external LSI cards disabled (so only 1 path through the shelves).
> 'broken' tag - has both cards enabled and panics in short order.
> 
> Both of these are with a stable build (r249895)
> 
> http://www.freebsd.org/~jwd/r246437/dmesg.working.txt
> http://www.freebsd.org/~jwd/r246437/dmesg.broken.txt

Hi Folks,

   I had a little time to look at this again.. I enabled WITNESS
on a 9-stable system from 5/30/2013. It ran for a short while before
panic'ing. 

ses40: ses0,pass20: Element descriptor: '            '
ses40: ses0,pass20: SAS Expander: 24 Physses40:  phy 0: connector 255 other 255
ses40:  phy 1: connector 255 other 255
ses40:  phy 2: connector 255 other 255
ses40:  phy 3: connector 255 other 255
ses40:  phy 4: connector 255 other 255
ses40:  phy 5: connector 255 other 255
ses40:  phy 6: connector 255 other 255
ses40:  phy 7: connector 255 other 255
ses40:  phy 8: connector 255 other 255
ses40:  phy 9: connector 255 other 255
ses40:  phy 10: connector 255 other 255
ses40:  phy 11: connector 255 other 255
ses40:  phy 12: connector 255 other 255
ses40:  phy 13: connector 255 other 255
ses40:  phy 14: connector 255 other 255
ses40:  phy 15: connector 255 other 255
ses40:  phy 16: connector 255 other 255
ses40:  phy 17: connector 255 other 255
ses40:  phy 18: connector 255 other 255
ses40:  phy 19: connector 255 other 255
ses40:  phy 20: connector 255 other 255
ses40:  phy 21: connector 255 other 255
ses40:  phy 22: connector 255 other 255
ses40:  phy 23: connector 255 other 255
Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex MPT2SAS lock (MPT2SAS lock) r = 0 (0xffffff8cce0161b8) locked @ /usr/src.2013-05-30_18.01.06/sys/cam/cam_periph.h:192
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffffa42c96c480
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffa42c96c540
_witness_debugger() at _witness_debugger+0x2c/frame 0xffffffa42c96c560
witness_warn() at witness_warn+0x2d2/frame 0xffffffa42c96c6a0
trap_pfault() at trap_pfault+0x6a/frame 0xffffffa42c96c730
trap() at trap+0x42a/frame 0xffffffa42c96c930
calltrap() at calltrap+0x8/frame 0xffffffa42c96c930
--- trap 0xc, rip = 0xffffffff80cc69e8, rsp = 0xffffffa42c96c9f0, rbp = 0xffffffa42c96ca20 ---
memcpy() at memcpy+0x8/frame 0xffffffa42c96ca20
xpt_getattr() at xpt_getattr+0x10f/frame 0xffffffa42c96cb30
pass_add_physpath() at pass_add_physpath+0x6d/frame 0xffffffa42c96cb60
taskqueue_run_locked() at taskqueue_run_locked+0x93/frame 0xffffffa42c96cbc0
taskqueue_thread_loop() at taskqueue_thread_loop+0x3e/frame 0xffffffa42c96cbe0
fork_exit() at fork_exit+0x135/frame 0xffffffa42c96cc30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffa42c96cc30
--- trap 0, rip = 0, rsp = 0xffffffa42c96ccf0, rbp = 0 ---

Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 12
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80cc69e8
stack pointer           = 0x28:0xffffffa42c96c9f0
frame pointer           = 0x28:0xffffffa42c96ca20
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (thread taskq)
lock order reversal: (Giant after non-sleepable)
lock order reversal: (Giant after non-sleepable)
 1st 0xffffff8cce0161b8 MPT2SAS lock (MPT2SAS lock) @ /usr/src.2013-05-30_18.01.06/sys/cam/cam_periph.h:192
 1st 0xffffff8cce0161b8 MPT2SAS lock (MPT2SAS lock) @ /usr/src.2013-05-30_18.01.06/sys/cam/cam_periph.h:192
 2nd 0xffffffff813d1760 Giant (Giant) @ /usr/src.2013-05-30_18.01.06/sys/dev/usb/input/ukbd.c:1942
 2nd 0xffffffff813d1760 Giant (Giant) @ /usr/src.2013-05-30_18.01.06/sys/dev/usb/input/ukbd.c:1942
KDB: stack backtrace:
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffffa42c96c3b0
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffa42c96c470
_witness_debugger() at _witness_debugger+0x2c/frame 0xffffffa42c96c490
witness_checkorder() at witness_checkorder+0x875/frame 0xffffffa42c96c550
_mtx_lock_flags() at _mtx_lock_flags+0x80/frame 0xffffffa42c96c580
ukbd_poll() at ukbd_poll+0x27/frame 0xffffffa42c96c5a0
kbdmux_poll() at kbdmux_poll+0x3f/frame 0xffffffa42c96c5c0
cngrab() at cngrab+0x2f/frame 0xffffffa42c96c5e0
kdb_trap() at kdb_trap+0xc6/frame 0xffffffa42c96c640
trap_fatal() at trap_fatal+0x29d/frame 0xffffffa42c96c6a0
trap_pfault() at trap_pfault+0x7a/frame 0xffffffa42c96c730
trap() at trap+0x42a/frame 0xffffffa42c96c930
calltrap() at calltrap+0x8/frame 0xffffffa42c96c930
--- trap 0xc, rip = 0xffffffff80cc69e8, rsp = 0xffffffa42c96c9f0, rbp = 0xffffffa42c96ca20 ---
memcpy() at memcpy+0x8/frame 0xffffffa42c96ca20
xpt_getattr() at xpt_getattr+0x10f/frame 0xffffffa42c96cb30
pass_add_physpath() at pass_add_physpath+0x6d/frame 0xffffffa42c96cb60
taskqueue_run_locked() at taskqueue_run_locked+0x93/frame 0xffffffa42c96cbc0
taskqueue_thread_loop() at taskqueue_thread_loop+0x3e/frame 0xffffffa42c96cbe0
fork_exit() at fork_exit+0x135/frame 0xffffffa42c96cc30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffa42c96cc30
--- trap 0, rip = 0, rsp = 0xffffffa42c96ccf0, rbp = 0 ---
[ thread pid 0 tid 100034 ]
Stopped at      memcpy+0x8:     repe movsq      (%rsi),%es:(%rdi)
db> bt
Tracing pid 0 tid 100034 td 0xfffffe002c98c920
memcpy() at memcpy+0x8/frame 0xffffffa42c96ca20
xpt_getattr() at xpt_getattr+0x10f/frame 0xffffffa42c96cb30
pass_add_physpath() at pass_add_physpath+0x6d/frame 0xffffffa42c96cb60
taskqueue_run_locked() at taskqueue_run_locked+0x93/frame 0xffffffa42c96cbc0
taskqueue_thread_loop() at taskqueue_thread_loop+0x3e/frame 0xffffffa42c96cbe0
fork_exit() at fork_exit+0x135/frame 0xffffffa42c96cc30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffa42c96cc30
--- trap 0, rip = 0, rsp = 0xffffffa42c96ccf0, rbp = 0 ---
db>

Bits and pieces... 48 enclosure daemons running:

Tracing command enc_daemon47 pid 64 tid 100160 td 0xfffffe009dd28490
sched_switch() at sched_switch+0x194/frame 0xffffffa70faa4a40
mi_switch() at mi_switch+0x208/frame 0xffffffa70faa4a90
sleepq_switch() at sleepq_switch+0xfc/frame 0xffffffa70faa4ac0
sleepq_wait() at sleepq_wait+0x4d/frame 0xffffffa70faa4af0
_sleep() at _sleep+0x3d4/frame 0xffffffa70faa4b80
enc_daemon() at enc_daemon+0xde/frame 0xffffffa70faa4be0
fork_exit() at fork_exit+0x135/frame 0xffffffa70faa4c30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffa70faa4c30
--- trap 0, rip = 0, rsp = 0xffffffa70faa4cf0, rbp = 0 ---

And it looks like there are some reversals within the daemon:

acquiring duplicate lock of same type: "MPT2SAS lock"
 1st MPT2SAS lock @ /usr/src.2013-05-30_18.01.06/sys/cam/cam_periph.h:192
 2nd MPT2SAS lock @ /usr/src.2013-05-30_18.01.06/sys/cam/scsi/scsi_xpt.c:2529
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffffa70fa4a010
ses27:  phy 0: SAS device type 1 id 1
ses27:  phy 0: protocols: Initiator( None ) Target( SSP )
ses27:  phy 0: parent 50014380019fa87d addr 5000c500345510be
ses26:  phy 0: SAS device type 1 id 1
ses26:  phy 0: protocols: Initiator( None ) Target( SSP )
ses26:  phy 0: parent 50014380019f20bd addr 5000c50034642cb2
ses28:  phy 0: parent 50014380019f787d addr 5000c50034635216
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffa70fa4a0d0
_witness_debugger() at _witness_debugger+0x2c/frame 0xffffffa70fa4a0f0
witness_checkorder() at witness_checkorder+0x95e/frame 0xffffffa70fa4a1b0
_mtx_lock_flags() at _mtx_lock_flags+0x80/frame 0xffffffa70fa4a1e0
scsi_action() at scsi_action+0x1d1/frame 0xffffffa70fa4a200
ses_setphyspath_callback() at ses_setphyspath_callback+0x177/frame 0xffffffa70fa4a310
ses_path_iter_devid_callback() at ses_path_iter_devid_callback+0x1c6/frame 0xffffffa70fa4a8b0
ses_devids_iter() at ses_devids_iter+0xb1/frame 0xffffffa70fa4a930
ses_paths_iter() at ses_paths_iter+0x20/frame 0xffffffa70fa4a950
ses_publish_physpaths() at ses_publish_physpaths+0x263/frame 0xffffffa70fa4ab80
enc_daemon() at enc_daemon+0x2a4/frame 0xffffffa70fa4abe0
fork_exit() at fork_exit+0x135/frame 0xffffffa70fa4ac30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffa70fa4ac30
--- trap 0, rip = 0, rsp = 0xffffffa70fa4acf0, rbp = 0 ---

Cheers,
John





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130601032508.GA72781>