Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Aug 2011 12:02:11 +0100
From:      Luke Marsden <luke-lists@hybrid-logic.co.uk>
To:        freebsd-fs@freebsd.org
Subject:   Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind temporary clones
Message-ID:  <1314010931.3477.138.camel@pow>
In-Reply-To: <201108221015.p7MAFHpi048670@freefall.freebsd.org>
References:  <201108221015.p7MAFHpi048670@freefall.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2011-08-22 at 10:15 +0000, mm@FreeBSD.org wrote:
> Synopsis: [zfs] zfs (v28) incremental receive may leave behind temporary clones
> 
> State-Changed-From-To: open->closed
> State-Changed-By: mm
> State-Changed-When: Mon Aug 22 10:15:16 UTC 2011
> State-Changed-Why: 
> Resolved. Thanks!

Brilliant, thanks for fixing this!

Do you have any thoughts about what might have caused the other issue I
reported, the deadlock?  From my email of the 15th July
(mfsbsd-se-8.2-zfsv28-amd64 19.06.2011):

The biggest issue was a DEADLOCK which occurs quite reliably with a
given sequence of events in short succession, on a chroot filesystem
with many snapshots and a MySQL socket and nullfs mounts inside it:

     1. Force unmount the nullfs mounts which are mounted on top of it
     2. Close the MySQL socket in /tmp
     3. Force unmount the actual filesystem (even if there are open FDs)
     4. 'zfs rename' the filesystem into our 'trash' filesystem (which I
        understand consists of a clone, promote and destroy)

The entire ZFS subsystem then hangs on any new I/O.

Here is a procstat of the zfs rename process which hangs after the force
unmount:

25674 100871 zfs              initial thread   mi_switch+0x176
sleepq_wait+0x42 _cv_wait+0x129 txg_wait_synced+0x85
dsl_sync_task_group_wait+0x128 dsl_sync_task_do+0x54 dsl_dir_rename+0x8f
dsl_dataset_rename+0x272 zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl
+0x102 ioctl+0xfd syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 

Unfortunately it's not easy to reproduce, it only seems to happen in an
environment which is under load with a lot of datasets and a lot of zfs
operations happening concurrently on other datasets.  I spent two days
trying to reproduce it in self-contained test environments but had no
luck, so I'm now reporting it anyway.

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Mobile: +1-415-449-1165 (US) / +447791750420 (UK)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1314010931.3477.138.camel>