Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Mar 2016 20:19:14 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>, Maxim Sobolev <sobomax@sippysoft.com>
Cc:        freebsd-fs@FreeBSD.org, Kirk McKusick <mckusick@mckusick.com>, stable@FreeBSD.org, kib@FreeBSD.org
Subject:   Re: Process stuck in "vnread"
Message-ID:  <56F96792.2010800@FreeBSD.org>
In-Reply-To: <20160328162310.GJ1741__41334.1269981631$1459182219$gmane$org@kib.kiev.ua>
References:  <CAH7qZfs3EwT8jnKyodHxF_5nK18MeLSaB_F-qqOfwV0MJMD7Vg@mail.gmail.com> <CAH7qZfssCPxc_uuMoxwAqa6qdi1y=VCqRT6hk-=mTU15RwOCAg@mail.gmail.com> <CAH7qZftHP0b30AnF4Fds9%2BotY0Y24HMFuO=RmkqcBJD3wFNkHg@mail.gmail.com> <20160328162310.GJ1741__41334.1269981631$1459182219$gmane$org@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 28/03/2016 19:23, Konstantin Belousov wrote:
> On Mon, Mar 28, 2016 at 08:52:03AM -0700, Maxim Sobolev wrote:
>> Done some head scratching, it looks like it's got page fault in the
>> copyin() (cp(1) AFAIK mmaps source file). There might be some interlock
>> issue between competing write to the same ZFS, the md0 device is locked
>> forever waiting for the write operation to complete at the very same time.
>> I am curious as to whether we are allowed to sleep in the dmu_write_uio_dbuf(),
>> AFAIK dmu is ZFS's transaction layer, so maybe copyin() should be done
>> earlier to avoid possible page fault in there?

Maxim,

is this copy from UFS to ZFS?
It looks like that because the copyin() fault goes to
vnode_pager_generic_getpages() -> bwait()...

> No idea about ZFS, but if the issue is due to copyin(9) recursing into
> VM and then VFS while owning file system locks, it is well-known and
> long-standing issue. I sometimes call it 'ups deadlock', for some
> reasons, see tools/test/upsdl/ for the distilled test case.
> 
> It is handled for UFS and NFS, read the long comment starting with 'The
> vn_io_fault() is a wrapper' in sys/kern/vfs_vnops.c, which describes the
> deadlock in details and explains the mechanism which is used to prevent
> it. Filesystems must opt-in into it by specifiying MNTK_NO_IOPF flag,
> and then being ready to get an array of pages for io instead of the buffer
> KVA.


I don't have any idea why the thread would be stuck in bwait() and what locks
and threads are involved here.  But, as Kostik said, there is a general problem
and I have a patch for ZFS:
https://reviews.freebsd.org/D2790

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56F96792.2010800>