Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 May 2009 01:05:31 +0200
From:      Jilles Tjoelker <jilles@stack.nl>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        "'freebsd-hackers@freebsd.org'" <freebsd-hackers@freebsd.org>, Tim Kientzle <kientzle@freebsd.org>
Subject:   Re: fdescfs brokenness
Message-ID:  <20090508230531.GA8413@stack.nl>
In-Reply-To: <20090508201203.GJ1948@deviant.kiev.zoral.com.ua>
References:  <4A03A202.2050101@freebsd.org> <20090508201203.GJ1948@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 08, 2009 at 11:12:03PM +0300, Kostik Belousov wrote:
> On Thu, May 07, 2009 at 08:07:46PM -0700, Tim Kientzle wrote:
> > Colin Percival recently pointed out some issues
> > with tar and fdescfs.  Part of the problem
> > here is tar; I need to rethink some of the
> > traversal logic.

> > But fdescfs is really wonky:

> >  * This is a nit, but:  ls /dev/fd/18 should not
> >    return EBADF; it should return ENOENT, just
> >    like any other reference to a non-existent filename.
> >    (Just because a filename reflects a file descriptor
> >    does not mean it is a file descriptor.)
> This is a traditional behaviour for fdescfs. According to man page,
> open("dev/fd/N") shall be equivalent to fcntl(N, F_DUPFD, 0).
> Solaris behaviour is the same.

On open, yes, but stat behaves differently on a Solaris 10 machine here.
A valid but unallocated fd number will still stat as a character
device, like an allocated fd.

% ls -l /dev/fd/0 /dev/fd/999
crw-rw-rw-   1 root     root     320,  0 May  9 00:06 /dev/fd/0
crw-rw-rw-   1 root     root     320, 999 May  9 00:06 /dev/fd/999

By the way, both FreeBSD and Solaris also behave strangely if you try to
access fd numbers 1<<32 or higher.

Linux seems to behave strangely as well: the fds show up as symlinks,
some of which do not contain valid file names but can still be opened.
However, a command like
  { read x <&5; read y </dev/fd/5; read z </dev/fd/5; echo $x $y $z; :; } 5<~/.zshrc
which shows the first three lines under FreeBSD and Solaris,
shows the first line three times under Linux, so apparently it does not
duplicate file descriptors (at least in some cases).

> >  * The fairly routine recursive directory walker
> >    below gets hung in fdescfs.  It appears that
> >    the two opendir() invocations active at the
> >    same time interfere with each other.
> What you mean by "gets hung" ? In my limited testing, it works.
> Opendir creates a new directory in /dir/fd by the mere fact of opening
> the directory. So it walks into that dir, returning to step 1.

> >  * A similar chdir()-based version of the directory
> >    walker below breaks badly; you can chdir() into
> >    a directory under /dev/fd, but you can't chdir("..")
> >    to get back out of it.  (This is the particular
> >    problem that tar is running afoul of.)
> Not sure about this one. I think that fdescfs vnodes do not support
> lookup on anything not being root of the fdescfs.

> >  * Running "find /dev/fd" generates bogus ENOENT errors
> >    because you can opendir() a directory inside of /dev/fd,
> >    and read the entries, but you can't access those entries
> >    because path searches don't work through fdescfs.
> Again, this may be a consequence of the previous issue.

> > I think the right solution here is to add a VOP_ACCESS
> > handler to fdescfs that bars all access to directory
> > nodes under /dev/fd.  Basically, if your program has a
> > directory open, that should be reflected as a directory
> > node that you can't do anything with.  The current implementation
> > allows you to chdir(), opendir(), etc, those directory
> > nodes, but the machinery to fully support those operations
> > is missing so they just screw things up.
> This would chomp the fdescfs functionality, IMHO. Why directory
> file descriptors should behave differently then any other file
> descriptor ?

Linux and Solaris do not have these problems because their /dev/fd does
not copy stat information from the underlying file, instead showing a
character device (Solaris) or a symlink (Linux). I think a character
device would fit best, because you can do little else with it than open
it. The open operation is also different from opening the underlying
file because it does not create a new open file description.

devfs's /dev/fd/0, /dev/fd/1 and /dev/fd/2 work like this as well: they
always show up as character devices no matter what the underlying file
is. When opened, they duplicate the respective fd just like the full
/dev/fd does. (These are located at the end of /sys/kern/kern_descrip.c.)

Apparently someone noticed earlier this could be a problem, because the
R and X mode bits are cleared from directories that show up in /dev/fd.
It does not come as a surprise to me that that hack does not work.

> I think that the actual solution for the walker problems is to
> ignore the synthetic filesystems altogether. The information is
> provided by sysctl vfs.conflist (note that the output is binary),
> see VFCF_* flags, esp. VFCF_SYNTHETIC. The flag is correctly
> set at least by procfs, devfs and fdescfs.

I think it should be possible to write a directory walker program using
only standard interfaces.

-- 
Jilles Tjoelker



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090508230531.GA8413>