Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Mar 2008 14:53:25 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        "Alexander V. Chernikov" <melifaro@ipfw.ru>
Cc:        freebsd-fs@freebsd.org, freebsd-current@FreeBSD.org
Subject:   Re: unionfs status
Message-ID:  <20080326142115.K34007@fledge.watson.org>
In-Reply-To: <47E9448F.1010304@ipfw.ru>
References:  <47E9448F.1010304@ipfw.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 25 Mar 2008, Alexander V. Chernikov wrote:

> I have made patches solving first 4 problems These patches are available at 
> http://ipfw.ru/patches/ unionfs2.diff fixes fs mounting onto upper layer, 
> unionfs_lmount.diff fixes lower unionfs_threads.diff and unionfs_unix.diff 
> fixes cases 2) and 3) unionfs_rename.diff fixes case with renaming
>
> Can anybody comment/review ?

Dear Alexander,

Unfortunately, I don't know too much about unionfs.  However, I can comment on 
the UNIX domain socket patch:

> --- sys/fs/unionfs/union_subr.c.orig	2008-03-13 23:10:32.000000000 +0300
> +++ sys/fs/unionfs/union_subr.c	2008-03-13 23:17:34.000000000 +0300
> @@ -160,6 +160,8 @@
>  		unp->un_path[cnp->cn_namelen] = '\0';
>  	}
>  	vp->v_type = (uppervp != NULLVP ? uppervp->v_type : lowervp->v_type);
> +	if (vp->v_type == VSOCK)
> +		vp->v_socket = (uppervp != NULLVP) ? uppervp->v_socket : lowervp->v_socket;
>  	if ((lowervp != NULLVP) && (lowervp->v_type == VDIR))
>  		vp->v_mountedhere = lowervp->v_mountedhere;
>  	vp->v_data = unp;

I'm a bit worried about this assignment, as it represents an untracked alias 
for the socket.  Let me explain why:

UNIX domain sockets may have file system bindings, allowing them to use the 
file system namespace as a rendezvous for communication.  Typical use is that 
a socket is created, bind() is called on it with a path in some location like 
/var/run/log.  Other processes turn up and connect() to the path, causing a 
file system lookup to reach the vnode of the socket, and then the socket code 
follows vp->v_socket to find the socket to connect to.  When a bound socket is 
closed, we follow a back-pointer from the UNIX domain socket to the vnode, and 
then clear the pointer.  Doing this in a race-free manner is somewhat tricky, 
and I'm not 100% convinced it's correct currently, although it appears to be 
somewhat close to right.

The upshot of all this is that if you copy the pointer value to other vnodes, 
such as vnodes on upper layer, the UNIX domain socket code won't clear those 
pointers before freeing the socket they point at.  This means that the above 
code snippet may lead to a v_socket pointer on a higher layer vnode pointing 
at the right socket, the wrong socket, or possibly some other bit of freed and 
maybe reused memory.

You can imagine a number of schemes to replicate pointer changes around or 
track the various outstanding references, but I think a more fundamental 
question is whether this is in fact the right behavior at all.  The premise of 
is that writes flow up, but not down, and "connections" to sockets are 
read-write events, not read events, most typically.  If you're using unionfs 
to take a template system and "broadcast it" to many jails, you probably don't 
want all the jails talking to the same syslogd, you want them each talking to 
their own.  When syslogd in a jail finds a disconnected socket, which is 
effectively what a NULL v_socket pointer means, in /var/run/log, it should be 
unlinking it and creating a new socket, not reusing the existing file on disk.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080326142115.K34007>