Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Jan 2012 22:30:06 +0200
From:      Mikolaj Golub <trociny@freebsd.org>
To:        "Robert N. M. Watson" <rwatson@FreeBSD.org>
Cc:        arch@freebsd.org, Kostik Belousov <kib@FreeBSD.org>
Subject:   Re: unix domain sockets on nullfs(5)
Message-ID:  <86fwfnti5t.fsf@kopusha.home.net>
References:  <86sjjobzmn.fsf@kopusha.home.net> <D1B8F00C-1E0D-4916-BD4B-FBCAE28E6F22@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 10 Jan 2012 14:02:34 +0000 Robert N. M. Watson wrote:

 RNMW> On 9 Jan 2012, at 16:37, Mikolaj Golub wrote:

 >> kib@ suggested that the issue could be fixed if one added new VOP_*
 >> operations for setting and accessing vnode's v_socket field.

 RNMW> I like the philosophy of the proposed approach signifiantly better than
 RNMW> the previous discussed approaches. Some thoughts:

 RNMW> (1) I don't think the new behaviour should be optional -- it was always
 RNMW> the intent that nullfs pass through all behaviours to the underlying
 RNMW> layer, it's just that certain edge cases didn't appear in the original
 RNMW> implementation. Memory mapping was fixed a few years ago using similar
 RNMW> techniques. This will significantly reduce the complexity of your
 RNMW> patch, and also avoid user confusion since it will now behave "as
 RNMW> expected". Certainly, mention in future release notes would be
 RNMW> appropriate, however.

I don't mind having only the new behavior, as I can't imagine where I would
need a nullfs with nosobypass option mounted and I also like when things are
simple :-).

On the other hand there might be people who relied on the old behavior and who
would be surprised if it had changed.

So, if other people agree I will remove the old behaviour to make the patch
simpler. Another option would be to have sobypass by default with possibility
to (re)mount fs with nosobypass.

 RNMW> (2) I'd like to think (as John also mentioned?) that we could use a
 RNMW> shared vnode lock when doing read-only access (i.e., connect).

Thanks, trying this.

 RNMW> (3) With this patch, an rwlock is held over vnode operations --
 RNMW> required due to the interlocked synchronisation of vnode and unix
 RNMW> domain sockets. We often try to avoid doing this for reasons of lock
 RNMW> order (and principle). It appears that it is likely fine in this case
 RNMW> but it makes me slightly nervous.

Well, I have not noticed any issues but it might be because I don't have much
experience here.

 RNMW> (4) I'm slightly puzzled by the bind(2) case and interactions with
 RNMW> layering -- possibly there is a bug here. If I issue bind(2) against
 RNMW> the top layer, it looks like vp->v_socket will be set in the bottom
 RNMW> layer, but unp->unp_vnode will be assigned to the top-layer vnode? My
 RNMW> assumption was that you would want unp_vnode to always point to the
 RNMW> bottom (real) vnode, which suggest to me that the VOPs shouldn't just
 RNMW> assign v_socket, but should also assign unp_vnode. This has
 RNMW> implications elsewhere in uipc_usrreq.c as well. Could you clarify
 RNMW> whether you think this could be an issue? 

I have made unp_vnode to always point to a vnode returned in bind() on create
intentionally. On bind v_usecount is increased for this vnode and in
uipc_detach() we have to call vrele() for this same vnode.

I believe that in uipc_usrreq.c via unp->unp_vnode we should reference only
the upper vnode, accessing the lower vnode only via VOP_* operations. I don't
see issues with such approach so far, while doing in the suggested way
(unp_vnode to always point to vnode that has socket reference) I don't see how
to decrease usecount for upper vnode on detach (we want to have usecount for
the upper vnode increased on bind, so e.g. umount would fail with EBUSY when
trying to unmount nullfs if somebody bond to an upper path).

 RNMW>It may also be worth KASSERTing that the top-level vnode never points at
 RNMW>anything but NULL to catch bugs like this. This may mean the VOPs have
 RNMW>to have a bit of "test-and-set" to them to get atomicity properties
 RNMW>right when it comes to bind(2).

 RNMW> In general, I think this is the right thing to do, and I'm very pleased
 RNMW> you're doing it -- but the patch requires some further work.

Thanks.

-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86fwfnti5t.fsf>