Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Jan 2012 08:19:18 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-arch@freebsd.org
Cc:        Mikolaj Golub <trociny@freebsd.org>, arch@freebsd.org, Robert Watson <rwatson@freebsd.org>, Kostik Belousov <kib@freebsd.org>
Subject:   Re: unix domain sockets on nullfs(5)
Message-ID:  <201201100819.18892.jhb@freebsd.org>
In-Reply-To: <86sjjobzmn.fsf@kopusha.home.net>
References:  <86sjjobzmn.fsf@kopusha.home.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, January 09, 2012 11:37:52 am Mikolaj Golub wrote:
> Hi,
> 
> There is a longstanding problem with nullfs(5) that is unix sockets do
> not work between lower and upper layers.
> 
> See, e.g. kern/51583, kern/159663.
> 
> On a unix socket binding the created socket is referenced in the vnode
> field v_socket. This field is used on connect (from the vnode returned
> by lookup). Unix socket functions like unp_bind/connect set/access
> this field directly.
> 
> This is the issue for nullfs, which uses two-layer vnode approach:
> binding to the upper layer, the socket reference is stored in the
> upper vnode; binding to the lower fs, the socket reference is stored
> in the lower vnode and is not seen from the upper layer.
> 
> E.g. having /mnt/upper nullfs mounted on /mnt/lower:
> 
> 1) if we bind to /mnt/lower/test.sock we can connect only to
> /mnt/lower/test.sock.
> 
> 2) if we bind to /mnt/upper/test.sock we can connect only to
> /mnt/upper/test.sock.
> 
> The desired behavior is one can connect to both the lower and the
> upper paths regardless if we bind to /mnt/lower/test.sock or
> /mnt/upeer/test.sock.
> 
> In kern/159663 two approaches were discussed:
> 
> 1) copy the socket pointer from lower vnode to upper vnode on the
> upper vnode get  (fix the case when one binds to the lower fs and wants
> to connect via the upper, but does not fix the case when one binds to
> the upper and wants to connect via the lower fs);
> 
> 2) make null_lookup/create return lower vnode for VSOCK vnodes.
> 
> Both approaches have issues and looks rather hackish.
> 
> kib@ suggested that the issue could be fixed if one added new VOP_*
> operations for setting and accessing vnode's v_socket field.
> 
> The attached patch implements this. It also can be found here:
> 
> http://people.freebsd.org/~trociny/nullfs.VOP_UNP.4.patch
> 
> It adds three VOP_* operations: VOP_UNPBIND, VOP_UNPCONNECT and
> VOP_UNPDETACH. Their purpose can be understood from the modifications
> in uipc_usrreq.c:
> 
> -	vp->v_socket = unp->unp_socket;
> +	VOP_UNPBIND(vp, unp->unp_socket);
> 
> -	so2 = vp->v_socket;
> +	VOP_UNPCONNECT(vp, &so2);
> 
> -	unp->unp_vnode->v_socket = NULL;
> +	VOP_UNPDETACH(unp->unp_vnode);
> 
> The default functions just do these simple operations, while
> filesystems like nullfs can do more complicated things.
> 
> The patch also implements functions for nullfs. By default the old
> behavior is preserved. To get the new behaviour the filesystem should
> be (re)mounted with sobypass option. Then the socket operations are
> bypassed to a lower vnode, which makes the socket be accessible from
> both layers.
> 
> I am very interested to hear other people opinion on this.

I think this is a decent solution.  Why not make the locking notes for 
VOP_UNPCONNECT() be "L" instead of "E"?  A read lock should be sufficient
to fetch the socket?  In fact, I suspect that unp_connect() could actually
use a shared lock on the vnode by adding 'LOCKSHARE' to the flags passed
to namei() via NDINIT().

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201201100819.18892.jhb>