From owner-freebsd-stable@FreeBSD.ORG  Sun Jul 21 13:45:48 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 821)
 id 78053A38; Sun, 21 Jul 2013 13:45:48 +0000 (UTC)
Date: Sun, 21 Jul 2013 13:45:48 +0000
From: John <jwd@FreeBSD.org>
To: FreeBSD-Stable <freebsd-stable@freebsd.org>
Subject: Panic: 9.2-PRERELEASE - enc_daemon & usb LOR?
Message-ID: <20130721134548.GA78666@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 21 Jul 2013 13:45:48 -0000

Hi Folks,

   I'm seeing a panic with the 9.2-PRERELEASE code. The system
will stay up for anywhere from a couple of seconds to a few hours
and then panic.

Fatal trap 12: page fault while in kernel mode
cpuid = 31; apic id = 2f
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80d2b018
stack pointer           = 0x28:0xffffffbfd0fea080
frame pointer           = 0x28:0xffffffbfd0fea0b0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 25 (enc_daemon7)

and:

db:0:kdb.enter.default>  show pcpu
cpuid        = 31
dynamic pcpu = 0xffffff807f203880
curthread    = 0xfffffe0032f53920: pid 25 "enc_daemon7"
curpcb       = 0xffffffbfd0feabc0
fpcurthread  = none
idlethread   = 0xfffffe002600b920: tid 100034 "idle: cpu31"
curpmap      = 0xffffffff8141b510
tssp         = 0xffffffff81489e98
commontssp   = 0xffffffff81489e98
rsp0         = 0xffffffbfd0feabc0
gs32p        = 0xffffffff81487fd0
ldt          = 0xffffffff81488010
tss          = 0xffffffff81488000


   This looks like a bug I started tracing down a while back with
the new enclosure services (r246437 and later). I added witness
into the kernel and received the following LOR:


Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex MPT2SAS lock (MPT2SAS lock) r = 0 (0xffffff8003c851b8) locked @ cam/cam_periph.h:192
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffffbfd0f3cb20
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffbfd0f3cbe0
_witness_debugger() at _witness_debugger+0x2c/frame 0xffffffbfd0f3cc00
witness_warn() at witness_warn+0x2d2/frame 0xffffffbfd0f3cd40
trap_pfault() at trap_pfault+0x6a/frame 0xffffffbfd0f3cdd0
trap() at trap+0x344/frame 0xffffffbfd0f3cfd0
calltrap() at calltrap+0x8/frame 0xffffffbfd0f3cfd0
--- trap 0xc, rip = 0xffffffff80ca8478, rsp = 0xffffffbfd0f3d090, rbp = 0xffffffbfd0f3d0c0 ---
memcpy() at memcpy+0x8/frame 0xffffffbfd0f3d0c0
ses_setphyspath_callback() at ses_setphyspath_callback+0xb3/frame 0xffffffbfd0f3d1d0
ses_path_iter_devid_callback() at ses_path_iter_devid_callback+0x1c6/frame 0xffffffbfd0f3d770
ses_devids_iter() at ses_devids_iter+0xb1/frame 0xffffffbfd0f3d7f0
ses_paths_iter() at ses_paths_iter+0x20/frame 0xffffffbfd0f3d810
ses_publish_physpaths() at ses_publish_physpaths+0x264/frame 0xffffffbfd0f3da40
enc_daemon() at enc_daemon+0x2a4/frame 0xffffffbfd0f3daa0
fork_exit() at fork_exit+0x11d/frame 0xffffffbfd0f3daf0
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffbfd0f3daf0
--- trap 0, rip = 0, rsp = 0xffffffbfd0f3dbb0, rbp = 0 ---


Fatal trap 12: page fault while in kernel mode
cpuid = 8; apic id = 08
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80ca8478
stack pointer           = 0x28:0xffffffbfd0f3d090
frame pointer           = 0x28:0xffffffbfd0f3d0c0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 30 (enc_daemon12)
lock order reversal: (Giant after non-sleepable)
 1st 0xffffff8003c851b8 MPT2SAS lock (MPT2SAS lock) @ cam/cam_periph.h:192
 2nd 0xffffffff8139bc80 Giant (Giant) @ dev/usb/input/ukbd.c:1942

 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'm wondering if there is a bad interaction here.


The system has 8 DS2700 shelves dual attached to a pair
of LSI 8e cards, thus the kernel configuration with an increased
msgbuf size.

Kernel conf:

include   GENERIC
ident     ZFS
options   DDB
options   KDB
options   WITNESS
options   MSGBUF_SIZE=(32768*16)

And some ddb output:

db:0:kdb.enter.default>  run lockinfo
db:1:lockinfo> show locks
exclusive sleep mutex MPT2SAS lock (MPT2SAS lock) r = 0 (0xffffff8003c851b8) locked @ cam/cam_periph.h:192
db:1:locks>  show alllocks
Process 30 (enc_daemon12) thread 0xfffffe003421a000 (100155)
exclusive sleep mutex MPT2SAS lock (MPT2SAS lock) r = 0 (0xffffff8003c851b8) locked @ cam/cam_periph.h:192
db:1:alllocks>  show lockedvnods
Locked vnodes
db:0:kdb.enter.default>  show pcpu
cpuid        = 8
dynamic pcpu = 0xffffff807f1e4800
curthread    = 0xfffffe003421a000: pid 30 "enc_daemon12"
curpcb       = 0xffffffbfd0f3dbc0
fpcurthread  = none
idlethread   = 0xfffffe0021ffe490: tid 100011 "idle: cpu8"
curpmap      = 0xffffffff81399590
tssp         = 0xffffffff815a5640
commontssp   = 0xffffffff815a5640
rsp0         = 0xffffffbfd0f3dbc0
gs32p        = 0xffffffff815a3778
ldt          = 0xffffffff815a37b8
tss          = 0xffffffff815a37a8
spin locks held:
db:0:kdb.enter.default>  bt
Tracing pid 30 tid 100155 td 0xfffffe003421a000
memcpy() at memcpy+0x8/frame 0xffffffbfd0f3d0c0
ses_setphyspath_callback() at ses_setphyspath_callback+0xb3/frame 0xffffffbfd0f3d1d0
ses_path_iter_devid_callback() at ses_path_iter_devid_callback+0x1c6/frame 0xffffffbfd0f3d770
ses_devids_iter() at ses_devids_iter+0xb1/frame 0xffffffbfd0f3d7f0
ses_paths_iter() at ses_paths_iter+0x20/frame 0xffffffbfd0f3d810
ses_publish_physpaths() at ses_publish_physpaths+0x264/frame 0xffffffbfd0f3da40
enc_daemon() at enc_daemon+0x2a4/frame 0xffffffbfd0f3daa0
fork_exit() at fork_exit+0x11d/frame 0xffffffbfd0f3daf0
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffbfd0f3daf0
--- trap 0, rip = 0, rsp = 0xffffffbfd0f3dbb0, rbp = 0 ---

   Any thoughts/ideas are appreciated. I've reviewed the code and
don't see anything obvious.

Thanks,
John