Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Apr 2008 10:14:40 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Pawel Jakub Dawidek <pjd@freebsd.org>
Cc:        Roman Divacky <rdivacky@freebsd.org>, kib@freebsd.org, rwatson@freebsd.garage.freebsd.pl, freebsd-arch@freebsd.org
Subject:   Re: final decision about *at syscalls
Message-ID:  <200804161014.41025.jhb@freebsd.org>
In-Reply-To: <20080412112019.GI45299@garage.freebsd.pl>
References:  <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday 12 April 2008 07:20:19 am Pawel Jakub Dawidek wrote:
> On Thu, Dec 20, 2007 at 11:38:55AM -0500, John Baldwin wrote:
> > On Tuesday 18 December 2007 04:22:22 am Roman Divacky wrote:
> > > Dear arch@
> > >
> > > Over this summer I was working (among other things) on *at family of
> > > syscalls kindly sponsored by Google (in their Summer of Code). The
> > > resulting patch is almost finished but I need to decide one design
> > > question. If you are not interested in *at/namei feel free to skip this
> > > mail.
> > >
> > > The *at syscalls are a threads-oriented extension to basic file
> > > syscalls (think of open(), fstat(), etc.) adding the possibility to
> > > specify from where the search for relative path should start.
> > >
> > > image that we have /tmp/foo/bar
> > >
> > > and CWD is set to "/tmp/", and the process has opened "foo" as dirfd.
> > > with ordinary open() syscall you have to either
> > >
> > > 	chdir("/tmp/foo");open("./bar");
> > >
> > > or
> > >
> > > 	open("/tmp/foo/bar");
> > >
> > > The first approach is problematic because it changes CWD for all
> > > threads in the process, the second is prone to race-conditions as some
> > > of the components of the path can change in parallel with the "open".
> > >
> > > So POSIX introduced a new API, called "Extended API set part 2, ISBN:
> > > 1-931624-67-4" (at least this was the latest when I looked last time),
> > > which solves that by introducing "*at" syscalls that supply an fd of
> > > previously opened directory which is used instead of CWD for searching
> > > relative path, ie. the previous example becomes
> > >
> > >    dirfd = open("/tmp/foo"); openat("foo", dirfd);
> > >
> > > I implemented the whole API as native FreeBSD syscalls + in linuxulator
> > > emulation layer. Here's the problem:
> > >
> > > There are two approaches to the name translation from "filedescriptor"
> > > to the "vnode".
> > >
> > > 1) we can do it in the kern_fooat() syscall and pass namei() the
> > > resulting vnode 2) we can pass namei() the filedescriptor and do the
> > > translation there
> > >
> > > PROs of #1:
> > >
> > > 	o	namei() does not need to know about the curthread, you can use this
> > > *at ability for different purposes, it's cleaner (imho)
> > >
> > > PROs of #2
> > >
> > > 	o	raceless implementation
> > > 	o	no code duplication
> > >
> > > CONs of #1
> > >
> > > 	o	some very small code duplication (the translation is done in every
> > > 		kern_fooat() function)
> > > 	o	there is a race between the name translation and the actual use of
> > > the result of the translation that needs to be handled, the
> > > "path_to_file" string is copied to the kernel space twice hence a race
> > >
> > > CONs of #2
> > >
> > > 	o	namei is made thread dependant
> > >
> > > Please tell me what approach you like more. I personally favour #1
> > > because I don't like namei() being thread dependant, Kostik Belousov
> > > prefers #2.
> >
> > Considering Robert's paper on security race problems in things like
> > systrace stemming from when you copy parameters out of userland and into
> > the kernel multiple times, I think #2 is definitely the better choice. 
> > Also, namei() is already thread aware AFAICT since 'struct componentname'
> > already contains a 'cnp_thread' member (was 'cnp_proc' in 4.x).
>
> It looks like I'm a bit too late, but anyway...
>
> From what you write John, #1 is a better choice than #2. If you want to
> avoid races, you can pass already locked vnode. In case of file
> descriptors, if p_fd is not locked another thread can close and open
> different directory under the same descriptor number.

Did you read Robert's paper?  Do you not realize that the kernel copying data 
in from userland multiple times and having it change in between is very bug 
prone?

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200804161014.41025.jhb>