From owner-freebsd-current@FreeBSD.ORG Mon Aug 29 18:31:04 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9E8541065670 for ; Mon, 29 Aug 2011 18:31:04 +0000 (UTC) (envelope-from hans@beastielabs.net) Received: from mail.beastielabs.net (beasties.demon.nl [82.161.3.114]) by mx1.freebsd.org (Postfix) with ESMTP id 000A58FC1B for ; Mon, 29 Aug 2011 18:31:03 +0000 (UTC) Received: from testsoekris.hotsoft.nl (localhost [127.0.0.1]) by mail.beastielabs.net (8.14.4/8.14.4) with ESMTP id p7TIUwJc058058; Mon, 29 Aug 2011 20:30:58 +0200 (CEST) (envelope-from hans@testsoekris.hotsoft.nl) Received: (from hans@localhost) by testsoekris.hotsoft.nl (8.14.4/8.14.4/Submit) id p7TIUwLx058057; Mon, 29 Aug 2011 20:30:58 +0200 (CEST) (envelope-from hans) Date: Mon, 29 Aug 2011 20:30:58 +0200 From: Hans Ottevanger To: freebsd-current@freebsd.org Message-ID: <20110829183058.GA57564@testsoekris.hotsoft.nl> References: <4E4F71B5.3010606@barafranca.com> <20110821100426.GA28260@testsoekris.hotsoft.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110821100426.GA28260@testsoekris.hotsoft.nl> User-Agent: Mutt/1.4.2.3i Cc: freebsd-fs@freebsd.org, Hugo Silva Subject: Snapshots fail with UFS+J (was: Re: Fwd: Re: Can *you* UFS snapshot a filesystem with 9.0-BETA1?) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 18:31:04 -0000 On Sun, Aug 21, 2011 at 12:04:26PM +0200, Hans Ottevanger wrote: > On Sat, Aug 20, 2011 at 09:35:01AM +0100, Hugo Silva wrote: > > > > > > Le Thu, 18 Aug 2011 10:22:31 +0100, > > Hugo Silva a ?crit : > > > > Hello, > > > > > I'm wondering. On a virtual machine (amd64 HVM+PV), it's crashing > > > every time. Not sure if this is SNAFU, as I had never used ufs > > > snapshots on freebsd before. > > > > > > After running mksnap_ffs, ssh stops working (a telnet session doesn't > > > show the sshd banner). The ssh session where the command was run from > > > stops responding, the webserver dies and xm console'ing from the dom0 > > > works, but the VM is unresponsive (ie no login prompt on ENTER). > > > > > > Anyone else seeing the same? > > > > I've tried in a FreeBSD guest (9.0-beta1/i386) into VirtualBox and > > I see a LOR (or looks like a LOR), then the system is freezed. > > This is 100% reproductible. > > > > Unfortunatly, I'm not able to dump a panic or to break into the > > debugger, so a screenshot : > > http://user.lamaiziere.net/patrick/public/lormksnap.png > > > > You should ask on freebsd-current@ > > > > Hi, > > I can confirm that this happens on "real iron" too. > > I use an i386 test installation (P4 2.4 GHz, 2GB RAM, 500GB PATA disk), > running 9.0-BETA1 as distributed (with a kernel effectively being GENERIC > with devices removed that I don't have). When I try to make a snapshot > using > > cd /usr; mksnap_ffs /usr/.snap/testsnap > > the system is still responsive for a few seconds, with lots of disk > activity, but then it prints the following output on the console (using > firewire and dcons to ease capturing): > > lock order reversal: > 1st 0xc5a289e8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425 > 2nd 0xdeb3c078 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658 > 3rd 0xc5663af8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546 > KDB: stack backtrace: > db_trace_self_wrapper(c09ec6ba,616e735f,6f687370,3a632e74,a363435,...) at db_trace_self_wrapper+0x26 > kdb_backtrace(c07099eb,c09efe14,c5035308,c5039408,c4fda440,...) at kdb_backtrace+0x2a > _witness_debugger(c09efe14,c5663af8,c09df984,c5039408,c0a10ba2,...) at _witness_debugger+0x25 > witness_checkorder(c5663af8,9,c0a10ba2,222,0,...) at witness_checkorder+0x839 > __lockmgr_args(c5663af8,80100,c5663b18,0,0,...) at __lockmgr_args+0x804 > ffs_lock(c4fda568,c0bf1250,c59b9c30,80100,c5663aa0,...) at ffs_lock+0x8a > VOP_LOCK1_APV(c0a7fb80,c4fda568,c4fda588,c0a8df20,c5663aa0,...) at VOP_LOCK1_APV+0xb5 > _vn_lock(c5663aa0,80100,c0a10ba2,222,c5011e80,...) at _vn_lock+0x5e > ffs_snapshot(c54f9798,c52dda60,c0a13fb0,1a2,0,...) at ffs_snapshot+0x14cb > ffs_mount(c54f9798,c59b0300,ff,394,3,...) at ffs_mount+0x1c13 > vfs_donmount(c59b9b80,11100,c50c7c80,c50c7c80,c59ae580,...) at vfs_donmount+0x11e7 > nmount(c59b9b80,c4fdacec,c4fdad28,c09ee6dd,0,...) at nmount+0x84 > syscallenter(c59b9b80,c4fdace4,c4fdace4,0,c0ab5690,...) at syscallenter+0x263 > syscall(c4fdad28) at syscall+0x34 > Xint0x80_syscall() at Xint0x80_syscall+0x21 > --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280db52b, esp = 0xbfbfe59c, ebp = 0xbfbfed18 --- > lock order reversal: > 1st 0xdeb3c078 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658 > 2nd 0xc51a72dc snaplk (snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818 > KDB: stack backtrace: > db_trace_self_wrapper(c09ec6ba,662f7366,735f7366,7370616e,2e746f68,...) at db_trace_self_wrapper+0x26 > kdb_backtrace(c07099eb,c09efdfb,c5035308,c5039b58,c4fda440,...) at kdb_backtrace+0x2a > _witness_debugger(c09efdfb,c51a72dc,c0a10c04,c5039b58,c0a10ba2,...) at _witness_debugger+0x25 > witness_checkorder(c51a72dc,9,c0a10ba2,332,c5a28a08,...) at witness_checkorder+0x839 > __lockmgr_args(c51a72dc,80400,c5a28a08,0,0,...) at __lockmgr_args+0x804 > ffs_lock(c4fda568,deb2434c,100000,80400,c5a28990,...) at ffs_lock+0x8a > VOP_LOCK1_APV(c0a7fb80,c4fda568,deb243a8,c0a8df20,c5a28990,...) at VOP_LOCK1_APV+0xb5 > _vn_lock(c5a28990,80400,c0a10ba2,332,0,...) at _vn_lock+0x5e > ffs_snapshot(c54f9798,c52dda60,c0a13fb0,1a2,0,...) at ffs_snapshot+0x295e > ffs_mount(c54f9798,c59b0300,ff,394,3,...) at ffs_mount+0x1c13 > vfs_donmount(c59b9b80,11100,c50c7c80,c50c7c80,c59ae580,...) at vfs_donmount+0x11e7 > nmount(c59b9b80,c4fdacec,c4fdad28,c09ee6dd,0,...) at nmount+0x84 > syscallenter(c59b9b80,c4fdace4,c4fdace4,0,c0ab5690,...) at syscallenter+0x263 > syscall(c4fdad28) at syscall+0x34 > Xint0x80_syscall() at Xint0x80_syscall+0x21 > --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280db52b, esp = 0xbfbfe59c, ebp = 0xbfbfed18 --- > > After this the system is fully unresponsive and requires a hard reset. > > Once rebooted, the snapshot file appears to exist, but is unusable. > > When reverting to just softupdates, i.e. disabling journaling on /usr, > everything goes well, except that the same LOR's still do occur, though > the addresses differ. > > My amd64 9.0-CURRENT system, just updated to r225055, has the same issue, > but since I do not have WITNESS in the kernel config there, the console > output is missing. > > BTW, this issue also makes dump(9) hang the system when the -L option > is used. > > Kind regards, > > Hans Ottevanger > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" Since I did not see any response to these messages and I cannot imagine that Hugo and I are the only ones with this issue, I will follow up to my own post. I have tried just yesterday to make a snapshot of the /usr filesystem (about 16 GB) of my amd64 test system (Q6600, 8GB RAM, 500GB SATA disk) running 9.0-BETA1 (r225228) and the problem still occurs. After these LOR's: lock order reversal: 1st 0xfffffe00073ab278 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425 2nd 0xffffff81eb243498 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658 3rd 0xfffffe00073629f8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x807 __lockmgr_args() at __lockmgr_args+0xdc6 ffs_lock() at ffs_lock+0x8c VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b _vn_lock() at _vn_lock+0x47 ffs_snapshot() at ffs_snapshot+0x1c27 ffs_mount() at ffs_mount+0xa23 vfs_donmount() at vfs_donmount+0xddc nmount() at nmount+0x63 syscallenter() at syscallenter+0x1aa syscall() at syscall+0x4c Xfast_syscall() at Xfast_syscall+0xdd --- syscall (378, FreeBSD ELF64, nmount), rip = 0x8008a118c, rsp = 0x7fffffffd428, rbp = 0x7fffffffde4b --- lock order reversal: 1st 0xffffff81eb243498 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658 2nd 0xfffffe0007404a30 snaplk (snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x807 __lockmgr_args() at __lockmgr_args+0xdc6 ffs_lock() at ffs_lock+0x8c VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b _vn_lock() at _vn_lock+0x47 ffs_snapshot() at ffs_snapshot+0x1b02 ffs_mount() at ffs_mount+0xa23 vfs_donmount() at vfs_donmount+0xddc nmount() at nmount+0x63 syscallenter() at syscallenter+0x1aa syscall() at syscall+0x4c Xfast_syscall() at Xfast_syscall+0xdd --- syscall (378, FreeBSD ELF64, nmount), rip = 0x8008a118c, rsp = 0x7fffffffd428, rbp = 0x7fffffffde4b --- the system is completely unresponsive after a few seconds and can only be revived by pushing the reset button. When making a snapshot of a larger filesystem it takes a bit longer, but the system will finally lock up. Mark that this is not the usual extreme slowdown due to the snapshot taking all the disk bandwidth: the system locks up tightly and does not recover. Is anybody else seeing this? Is it a known problem? How to proceed? Copied to freebsd-fs@ to elicit more response. Kind regards, Hans Ottevanger