From owner-freebsd-current@FreeBSD.ORG Sat Jun 20 07:11:49 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 980B81065670 for ; Sat, 20 Jun 2009 07:11:49 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from ch-smtp01.sth.basefarm.net (ch-smtp01.sth.basefarm.net [80.76.149.212]) by mx1.freebsd.org (Postfix) with ESMTP id 2AB568FC0C for ; Sat, 20 Jun 2009 07:11:48 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from c83-253-252-234.bredband.comhem.se ([83.253.252.234]:44889 helo=mx.exscape.org) by ch-smtp01.sth.basefarm.net with esmtp (Exim 4.69) (envelope-from ) id 1MHujg-0006XO-5a for freebsd-current@freebsd.org; Sat, 20 Jun 2009 09:11:38 +0200 Received: from [192.168.1.5] (macbookpro [192.168.1.5]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx.exscape.org (Postfix) with ESMTPSA id 495896A98E for ; Sat, 20 Jun 2009 09:11:36 +0200 (CEST) Message-Id: <72163521-40BF-4764-8B74-5446A88DFBF8@exscape.org> From: Thomas Backman To: FreeBSD current Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Sat, 20 Jun 2009 09:11:34 +0200 X-Mailer: Apple Mail (2.935.3) X-Originating-IP: 83.253.252.234 X-Scan-Result: No virus found in message 1MHujg-0006XO-5a. X-Scan-Signature: ch-smtp01.sth.basefarm.net 1MHujg-0006XO-5a 6a313ac0145f1ada6e227df48fcaf444 Subject: "New" ZFS crash on FS (pool?) unmount/export X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jun 2009 07:11:49 -0000 I just ran into this tonight. Not sure exactly what triggered it - the box stopped responding to pings at 02:07AM and it has a cron backup job using zfs send/recv at 02:00, so I'm guessing it's related, even though the backup probably should have finished before then... Hmm. Anyway. r194478. kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x288 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff805a4989 stack pointer = 0x28:0xffffff803e8b57e0 frame pointer = 0x28:0xffffff803e8b5840 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 57514 (zpool) panic: from debugger cpuid = 0 Uptime: 10h22m13s Physical memory: 2027 MB (kgdb) bt #0 doadump () at pcpu.h:223 #1 0xffffffff8059c409 in boot (howto=260) at /usr/src/sys/kern/ kern_shutdown.c:419 #2 0xffffffff8059c85c in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:575 #3 0xffffffff801f1377 in db_panic (addr=Variable "addr" is not available. ) at /usr/src/sys/ddb/db_command.c:478 #4 0xffffffff801f1781 in db_command (last_cmdp=0xffffffff80c38620, cmd_table=Variable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:445 #5 0xffffffff801f19d0 in db_command_loop () at /usr/src/sys/ddb/ db_command.c:498 #6 0xffffffff801f3969 in db_trap (type=Variable "type" is not available. ) at /usr/src/sys/ddb/db_main.c:229 #7 0xffffffff805ce465 in kdb_trap (type=12, code=0, tf=0xffffff803e8b5730) at /usr/src/sys/kern/subr_kdb.c:534 #8 0xffffffff8088715d in trap_fatal (frame=0xffffff803e8b5730, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:847 #9 0xffffffff80887fb2 in trap (frame=0xffffff803e8b5730) at /usr/src/ sys/amd64/amd64/trap.c:345 #10 0xffffffff8086e007 in calltrap () at /usr/src/sys/amd64/amd64/ exception.S:223 #11 0xffffffff805a4989 in _sx_xlock_hard (sx=0xffffff0043557d50, tid=18446742975830720512, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_sx.c:575 #12 0xffffffff805a52fe in _sx_xlock (sx=Variable "sx" is not available. ) at sx.h:155 #13 0xffffffff80fe2995 in zfs_freebsd_reclaim () from /boot/kernel/ zfs.ko #14 0xffffffff808cefca in VOP_RECLAIM_APV (vop=0xffffff0043557d38, a=0xffffff0043557d50) at vnode_if.c:1926 #15 0xffffffff80626f6e in vgonel (vp=0xffffff00437a7938) at vnode_if.h: 830 #16 0xffffffff8062b528 in vflush (mp=0xffffff0060f2a000, rootrefs=0, flags=0, td=0xffffff0061528000) at /usr/src/sys/kern/vfs_subr.c:2450 #17 0xffffffff80fdd3a8 in zfs_umount () from /boot/kernel/zfs.ko #18 0xffffffff8062420a in dounmount (mp=0xffffff0060f2a000, flags=1626513408, td=Variable "td" is not available. ) at /usr/src/sys/kern/vfs_mount.c:1287 #19 0xffffffff80624975 in unmount (td=0xffffff0061528000, uap=0xffffff803e8b5c00) at /usr/src/sys/kern/vfs_mount.c:1172 #20 0xffffffff8088783f in syscall (frame=0xffffff803e8b5c90) at /usr/ src/sys/amd64/amd64/trap.c:984 #21 0xffffffff8086e290 in Xfast_syscall () at /usr/src/sys/amd64/amd64/ exception.S:364 #22 0x000000080104e49c in ?? () Previous frame inner to this frame (corrupt stack?) BTW, I got a (one) "force unmount is experimental" on the console. On regular shutdown I usually get one per filesystem, it seems (at least 10) and this pool should contain exactly as many filesystems as the root pool since it's a copy of it. On running the backup script manually post-crash, though, I didn't get any. Also worth noting is that I was running DTrace all night to test its stability. I'm pretty sure the script was dtrace -n 'syscall::open:entry { @a[copyinstr(arg0)] = count(); }' 0 swap was used and 277700 pages (~1084 MB or 50%) RAM was free, according to the core.txt. Regards, Thomas