From owner-freebsd-current@FreeBSD.ORG Wed Jun 24 19:23:30 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 11B4A1065670 for ; Wed, 24 Jun 2009 19:23:30 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from ch-smtp01.sth.basefarm.net (ch-smtp01.sth.basefarm.net [80.76.149.212]) by mx1.freebsd.org (Postfix) with ESMTP id 7A9978FC21 for ; Wed, 24 Jun 2009 19:23:29 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from c83-253-252-234.bredband.comhem.se ([83.253.252.234]:51045 helo=mx.exscape.org) by ch-smtp01.sth.basefarm.net with esmtp (Exim 4.69) (envelope-from ) id 1MJY3u-0008BI-4o; Wed, 24 Jun 2009 21:23:17 +0200 Received: from [192.168.1.5] (macbookpro [192.168.1.5]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx.exscape.org (Postfix) with ESMTPSA id 745E02C9A0; Wed, 24 Jun 2009 21:23:13 +0200 (CEST) Message-Id: <46A940DF-5E26-4B46-A2DC-1B70E94D1B39@exscape.org> From: Thomas Backman To: Kip Macy In-Reply-To: <3c1674c90906200123l75b4eb29y57d5b48f89c19a5b@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Wed, 24 Jun 2009 21:23:10 +0200 References: <72163521-40BF-4764-8B74-5446A88DFBF8@exscape.org> <3c1674c90906200123l75b4eb29y57d5b48f89c19a5b@mail.gmail.com> X-Mailer: Apple Mail (2.935.3) X-Originating-IP: 83.253.252.234 X-Scan-Result: No virus found in message 1MJY3u-0008BI-4o. X-Scan-Signature: ch-smtp01.sth.basefarm.net 1MJY3u-0008BI-4o 82129959c4242725c255df2019f029b9 Cc: FreeBSD current Subject: Re: "New" ZFS crash on FS (pool?) unmount/export X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Jun 2009 19:23:30 -0000 > > On Sat, Jun 20, 2009 at 12:11 AM, Thomas > Backman wrote: >> I just ran into this tonight. Not sure exactly what triggered it - >> the box >> stopped responding to pings at 02:07AM and it has a cron backup job >> using >> zfs send/recv at 02:00, so I'm guessing it's related, even though >> the backup >> probably should have finished before then... Hmm. Anyway. >> >> r194478. >> >> kernel trap 12 with interrupts disabled >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id = 00 >> fault virtual address = 0x288 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff805a4989 >> stack pointer = 0x28:0xffffff803e8b57e0 >> frame pointer = 0x28:0xffffff803e8b5840 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = resume, IOPL = 0 >> current process = 57514 (zpool) >> panic: from debugger >> cpuid = 0 >> Uptime: 10h22m13s >> Physical memory: 2027 MB >> >> (kgdb) bt >> #0 doadump () at pcpu.h:223 >> #1 0xffffffff8059c409 in boot (howto=260) at >> /usr/src/sys/kern/kern_shutdown.c:419 >> #2 0xffffffff8059c85c in panic (fmt=Variable "fmt" is not available. >> ) at /usr/src/sys/kern/kern_shutdown.c:575 >> #3 0xffffffff801f1377 in db_panic (addr=Variable "addr" is not >> available. >> ) at /usr/src/sys/ddb/db_command.c:478 >> #4 0xffffffff801f1781 in db_command (last_cmdp=0xffffffff80c38620, >> cmd_table=Variable "cmd_table" is not available. >> ) at /usr/src/sys/ddb/db_command.c:445 >> #5 0xffffffff801f19d0 in db_command_loop () at >> /usr/src/sys/ddb/db_command.c:498 >> #6 0xffffffff801f3969 in db_trap (type=Variable "type" is not >> available. >> ) at /usr/src/sys/ddb/db_main.c:229 >> #7 0xffffffff805ce465 in kdb_trap (type=12, code=0, >> tf=0xffffff803e8b5730) >> at /usr/src/sys/kern/subr_kdb.c:534 >> #8 0xffffffff8088715d in trap_fatal (frame=0xffffff803e8b5730, >> eva=Variable >> "eva" is not available. >> ) at /usr/src/sys/amd64/amd64/trap.c:847 >> #9 0xffffffff80887fb2 in trap (frame=0xffffff803e8b5730) at >> /usr/src/sys/amd64/amd64/trap.c:345 >> #10 0xffffffff8086e007 in calltrap () at >> /usr/src/sys/amd64/amd64/exception.S:223 >> #11 0xffffffff805a4989 in _sx_xlock_hard (sx=0xffffff0043557d50, >> tid=18446742975830720512, opts=Variable "opts" is not available. >> ) >> at /usr/src/sys/kern/kern_sx.c:575 >> #12 0xffffffff805a52fe in _sx_xlock (sx=Variable "sx" is not >> available. >> ) at sx.h:155 >> #13 0xffffffff80fe2995 in zfs_freebsd_reclaim () from /boot/kernel/ >> zfs.ko >> #14 0xffffffff808cefca in VOP_RECLAIM_APV (vop=0xffffff0043557d38, >> a=0xffffff0043557d50) at vnode_if.c:1926 >> #15 0xffffffff80626f6e in vgonel (vp=0xffffff00437a7938) at >> vnode_if.h:830 >> #16 0xffffffff8062b528 in vflush (mp=0xffffff0060f2a000, rootrefs=0, >> flags=0, td=0xffffff0061528000) >> at /usr/src/sys/kern/vfs_subr.c:2450 >> #17 0xffffffff80fdd3a8 in zfs_umount () from /boot/kernel/zfs.ko >> #18 0xffffffff8062420a in dounmount (mp=0xffffff0060f2a000, >> flags=1626513408, td=Variable "td" is not available. >> ) >> at /usr/src/sys/kern/vfs_mount.c:1287 >> #19 0xffffffff80624975 in unmount (td=0xffffff0061528000, >> uap=0xffffff803e8b5c00) >> at /usr/src/sys/kern/vfs_mount.c:1172 >> #20 0xffffffff8088783f in syscall (frame=0xffffff803e8b5c90) at >> /usr/src/sys/amd64/amd64/trap.c:984 >> #21 0xffffffff8086e290 in Xfast_syscall () at >> /usr/src/sys/amd64/amd64/exception.S:364 >> #22 0x000000080104e49c in ?? () >> Previous frame inner to this frame (corrupt stack?) >> >> BTW, I got a (one) "force unmount is experimental" on the console. On >> regular shutdown I usually get one per filesystem, it seems (at >> least 10) >> and this pool should contain exactly as many filesystems as the >> root pool >> since it's a copy of it. On running the backup script manually post- >> crash, >> though, I didn't get any. >> >> Also worth noting is that I was running DTrace all night to test its >> stability. I'm pretty sure the script was >> dtrace -n 'syscall::open:entry { @a[copyinstr(arg0)] = count(); }' >> >> 0 swap was used and 277700 pages (~1084 MB or 50%) RAM was free, >> according >> to the core.txt. >> >> Regards, >> Thomas >> On Jun 20, 2009, at 10:23 AM, Kip Macy wrote: > A force unmount causes all vnodes dirty data to be flushed and then > the vnodes to be released. A regular unmount will not complete if > there are open files referencing it (hence the use of force unmount on > shutdown). Pawel had previously disabled forced unmount because of > known "issues". I chose to enable it because it guarantees that dirty > data is flushed to disk before shutdown. It looks like we may be > reclaiming a vnode that has already been freed. I haven't seen this > issue myself, so if you can identify more specifics on how to > reproduce it would greatly increase the likelihood of my being able to > fix it in the near future. > > > Thanks, > Kip (First off, sorry if the quoting is screwed up.) I got the same panic again, same backtrace, and the exact same point in the backup script (and with a force unmount message as well, which as I said before I've never gotten during backups before, only during shutdown). [...] receiving incremental stream of tank/test/fs15935@backup-20090624-2054 into slave/test/fs15935@backup-20090624-2054 received 312B stream in 1 seconds (312B/sec) found clone origin slave/usr/src_r194478-UNTOUCHED@r194478-UNTOUCHED receiving incremental stream of tank/usr/src@backup-20090624-2054 into slave/usr/src@backup-20090624-2054 received 162MB stream in 68 seconds (2.38MB/sec) receiving incremental stream of tank/usr/src_old@backup-20090624-2054 into slave/usr/src_old@backup-20090624-2054 received 108KB stream in 1 seconds (108KB/sec) local fs slave/usr/src does not have fromsnap (backup-20090624-0200 in stream); must have been deleted locally; ignoring Exporting pool Read from remote host 192.168.1.10: Operation timed out Connection to 192.168.1.10 closed. I think it has something to do with the clones. Between this backup one the previous, I did zfs rename tank/usr/src tank/usr/src_old zfs clone tank/usr/src_r194478-UNTOUCHED@... tank/usr/src Then an svn update and build{world,kernel} and install, and post- install, run a backup and get the panic. Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x288 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff805ad7d9 stack pointer = 0x28:0xffffff803e9557e0 frame pointer = 0x28:0xffffff803e955840 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 1278 (zpool) Locked vnodes (neat, I took a picture with a camera, didn't realize they'd end up here! Also, no clue if it helps at all): 0xffffff000280f3b0: tag zfs, type VDIR usecount 1, writecount 0, refcount 2 mountedhere 0 flags () v_object 0xffffff0012a68510 ref 0 pages 0 lock type zfs: SHARED (count 1) 0xffffff0002ded000: tag zfs, type VDIR usecount 1, writecount 0, refcount 2 mountedhere 0 flags () lock type zfs: SHARED (count 1) 0xffffff002ec5c000: tag zfs, type VDIR usecount 1, writecount 0, refcount 1 mountedhere 0xffffff002ec3b5e0 flags () lock type zfs: EXCL by thread 0xffffff002ec14ab0 (pid 1278) 0xffffff002ecae588: tag zfs, type VDIR usecount 0, writecount 0, refcount 1 mountedhere 0 flags (VI_DOOMED) VI_LOCKed lock type zfs: EXCL by thread 0xffffff002ec14ab0 (pid 1278) 0xffffff000280f3b0: tag zfs, type VDIR usecount 1, writecount 0, refcount 2 mountedhere 0 flags () v_object 0xffffff0012a68510 ref 0 pages 0 lock type zfs: SHARED (count 1) 0xffffff0002ded000: tag zfs, type VDIR usecount 1, writecount 0, refcount 2 mountedhere 0 flags () lock type zfs: SHARED (count 1) 0xffffff002ec5c000: tag zfs, type VDIR usecount 1, writecount 0, refcount 1 mountedhere 0xffffff002ec3b5e0 flags () lock type zfs: EXCL by thread 0xffffff002ec14ab0 (pid 1278) 0xffffff002ecae588: tag zfs, type VDIR usecount 0, writecount 0, refcount 1 mountedhere 0 flags (VI_DOOMED) VI_LOCKed lock type zfs: EXCL by thread 0xffffff002ec14ab0 (pid 1278) panic: from debugger cpuid = 0 Uptime: 8m39s Physical memory: 2027 MB ... #10 0xffffffff80879647 in calltrap () at /usr/src/sys/amd64/amd64/ exception.S:223 #11 0xffffffff805ad7d9 in _sx_xlock_hard (sx=0xffffff002ea63480, tid=18446742974982343344, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_sx.c:575 #12 0xffffffff805ae14e in _sx_xlock (sx=Variable "sx" is not available. ) at sx.h:155 #13 0xffffffff80fe19e5 in zfs_freebsd_reclaim () from /boot/kernel/ zfs.ko #14 0xffffffff808da87a in VOP_RECLAIM_APV (vop=0xffffff002ea63468, a=0xffffff002ea63480) at vnode_if.c:1926 #15 0xffffffff80630e7e in vgonel (vp=0xffffff002ecae588) at vnode_if.h: 830 #16 0xffffffff80635438 in vflush (mp=0xffffff002ec3b5e0, rootrefs=0, flags=0, td=0xffffff002ec14ab0) at /usr/src/sys/kern/vfs_subr.c:2450 #17 0xffffffff80fdc3f8 in zfs_umount () from /boot/kernel/zfs.ko #18 0xffffffff8062e11a in dounmount (mp=0xffffff002ec3b5e0, flags=784578016, td=Variable "td" is not available. ) at /usr/src/sys/kern/vfs_mount.c:1287 #19 0xffffffff8062e885 in unmount (td=0xffffff002ec14ab0, uap=0xffffff803e955c00) at /usr/src/sys/kern/vfs_mount.c:1172 #20 0xffffffff80892edf in syscall (frame=0xffffff803e955c90) at /usr/ src/sys/amd64/amd64/trap.c:984 #21 0xffffffff808798d0 in Xfast_syscall () at /usr/src/sys/amd64/amd64/ exception.S:364 #22 0x000000080104e06c in ?? () Previous frame inner to this frame (corrupt stack?) Regards, Thomas