From owner-freebsd-stable@FreeBSD.ORG Sun Jul 21 13:45:48 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 821) id 78053A38; Sun, 21 Jul 2013 13:45:48 +0000 (UTC) Date: Sun, 21 Jul 2013 13:45:48 +0000 From: John To: FreeBSD-Stable Subject: Panic: 9.2-PRERELEASE - enc_daemon & usb LOR? Message-ID: <20130721134548.GA78666@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Jul 2013 13:45:48 -0000 Hi Folks, I'm seeing a panic with the 9.2-PRERELEASE code. The system will stay up for anywhere from a couple of seconds to a few hours and then panic. Fatal trap 12: page fault while in kernel mode cpuid = 31; apic id = 2f fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80d2b018 stack pointer = 0x28:0xffffffbfd0fea080 frame pointer = 0x28:0xffffffbfd0fea0b0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 25 (enc_daemon7) and: db:0:kdb.enter.default> show pcpu cpuid = 31 dynamic pcpu = 0xffffff807f203880 curthread = 0xfffffe0032f53920: pid 25 "enc_daemon7" curpcb = 0xffffffbfd0feabc0 fpcurthread = none idlethread = 0xfffffe002600b920: tid 100034 "idle: cpu31" curpmap = 0xffffffff8141b510 tssp = 0xffffffff81489e98 commontssp = 0xffffffff81489e98 rsp0 = 0xffffffbfd0feabc0 gs32p = 0xffffffff81487fd0 ldt = 0xffffffff81488010 tss = 0xffffffff81488000 This looks like a bug I started tracing down a while back with the new enclosure services (r246437 and later). I added witness into the kernel and received the following LOR: Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex MPT2SAS lock (MPT2SAS lock) r = 0 (0xffffff8003c851b8) locked @ cam/cam_periph.h:192 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffffbfd0f3cb20 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffbfd0f3cbe0 _witness_debugger() at _witness_debugger+0x2c/frame 0xffffffbfd0f3cc00 witness_warn() at witness_warn+0x2d2/frame 0xffffffbfd0f3cd40 trap_pfault() at trap_pfault+0x6a/frame 0xffffffbfd0f3cdd0 trap() at trap+0x344/frame 0xffffffbfd0f3cfd0 calltrap() at calltrap+0x8/frame 0xffffffbfd0f3cfd0 --- trap 0xc, rip = 0xffffffff80ca8478, rsp = 0xffffffbfd0f3d090, rbp = 0xffffffbfd0f3d0c0 --- memcpy() at memcpy+0x8/frame 0xffffffbfd0f3d0c0 ses_setphyspath_callback() at ses_setphyspath_callback+0xb3/frame 0xffffffbfd0f3d1d0 ses_path_iter_devid_callback() at ses_path_iter_devid_callback+0x1c6/frame 0xffffffbfd0f3d770 ses_devids_iter() at ses_devids_iter+0xb1/frame 0xffffffbfd0f3d7f0 ses_paths_iter() at ses_paths_iter+0x20/frame 0xffffffbfd0f3d810 ses_publish_physpaths() at ses_publish_physpaths+0x264/frame 0xffffffbfd0f3da40 enc_daemon() at enc_daemon+0x2a4/frame 0xffffffbfd0f3daa0 fork_exit() at fork_exit+0x11d/frame 0xffffffbfd0f3daf0 fork_trampoline() at fork_trampoline+0xe/frame 0xffffffbfd0f3daf0 --- trap 0, rip = 0, rsp = 0xffffffbfd0f3dbb0, rbp = 0 --- Fatal trap 12: page fault while in kernel mode cpuid = 8; apic id = 08 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ca8478 stack pointer = 0x28:0xffffffbfd0f3d090 frame pointer = 0x28:0xffffffbfd0f3d0c0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 30 (enc_daemon12) lock order reversal: (Giant after non-sleepable) 1st 0xffffff8003c851b8 MPT2SAS lock (MPT2SAS lock) @ cam/cam_periph.h:192 2nd 0xffffffff8139bc80 Giant (Giant) @ dev/usb/input/ukbd.c:1942 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I'm wondering if there is a bad interaction here. The system has 8 DS2700 shelves dual attached to a pair of LSI 8e cards, thus the kernel configuration with an increased msgbuf size. Kernel conf: include GENERIC ident ZFS options DDB options KDB options WITNESS options MSGBUF_SIZE=(32768*16) And some ddb output: db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks exclusive sleep mutex MPT2SAS lock (MPT2SAS lock) r = 0 (0xffffff8003c851b8) locked @ cam/cam_periph.h:192 db:1:locks> show alllocks Process 30 (enc_daemon12) thread 0xfffffe003421a000 (100155) exclusive sleep mutex MPT2SAS lock (MPT2SAS lock) r = 0 (0xffffff8003c851b8) locked @ cam/cam_periph.h:192 db:1:alllocks> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 8 dynamic pcpu = 0xffffff807f1e4800 curthread = 0xfffffe003421a000: pid 30 "enc_daemon12" curpcb = 0xffffffbfd0f3dbc0 fpcurthread = none idlethread = 0xfffffe0021ffe490: tid 100011 "idle: cpu8" curpmap = 0xffffffff81399590 tssp = 0xffffffff815a5640 commontssp = 0xffffffff815a5640 rsp0 = 0xffffffbfd0f3dbc0 gs32p = 0xffffffff815a3778 ldt = 0xffffffff815a37b8 tss = 0xffffffff815a37a8 spin locks held: db:0:kdb.enter.default> bt Tracing pid 30 tid 100155 td 0xfffffe003421a000 memcpy() at memcpy+0x8/frame 0xffffffbfd0f3d0c0 ses_setphyspath_callback() at ses_setphyspath_callback+0xb3/frame 0xffffffbfd0f3d1d0 ses_path_iter_devid_callback() at ses_path_iter_devid_callback+0x1c6/frame 0xffffffbfd0f3d770 ses_devids_iter() at ses_devids_iter+0xb1/frame 0xffffffbfd0f3d7f0 ses_paths_iter() at ses_paths_iter+0x20/frame 0xffffffbfd0f3d810 ses_publish_physpaths() at ses_publish_physpaths+0x264/frame 0xffffffbfd0f3da40 enc_daemon() at enc_daemon+0x2a4/frame 0xffffffbfd0f3daa0 fork_exit() at fork_exit+0x11d/frame 0xffffffbfd0f3daf0 fork_trampoline() at fork_trampoline+0xe/frame 0xffffffbfd0f3daf0 --- trap 0, rip = 0, rsp = 0xffffffbfd0f3dbb0, rbp = 0 --- Any thoughts/ideas are appreciated. I've reviewed the code and don't see anything obvious. Thanks, John