From owner-freebsd-hackers@FreeBSD.ORG Fri Aug 17 16:08:49 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 951E91065672; Fri, 17 Aug 2012 16:08:49 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) by mx1.freebsd.org (Postfix) with ESMTP id C3C018FC1B; Fri, 17 Aug 2012 16:08:48 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id A175B358C68; Fri, 17 Aug 2012 18:08:47 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 7BA1D2847B; Fri, 17 Aug 2012 18:08:47 +0200 (CEST) Date: Fri, 17 Aug 2012 18:08:47 +0200 From: Jilles Tjoelker To: Konstantin Belousov Message-ID: <20120817160847.GA34417@stack.nl> References: <502A1788.9090702@freebsd.org> <20120814094111.GB5883@deviant.kiev.zoral.com.ua> <502A6B7A.6070504@gmail.com> <20120814210911.GA90640@stack.nl> <502AE1D4.4060308@gmail.com> <20120815174942.GN5883@deviant.kiev.zoral.com.ua> <502C3D8B.4060008@gmail.com> <20120816114426.GR5883@deviant.kiev.zoral.com.ua> <20120816223933.GA19925@stack.nl> <20120817124312.GD33100@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120817124312.GD33100@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@freebsd.org, davidxu@freebsd.org Subject: Re: system() using vfork() or posix_spawn() and libthr X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Aug 2012 16:08:49 -0000 On Fri, Aug 17, 2012 at 03:43:12PM +0300, Konstantin Belousov wrote: > On Fri, Aug 17, 2012 at 12:39:33AM +0200, Jilles Tjoelker wrote: > > On Thu, Aug 16, 2012 at 02:44:26PM +0300, Konstantin Belousov wrote: > > > BTW, returning to Jilles proposal, can we call vfork() only for > > > single-threaded parent ? I think it gives good boost for single-threaded > > > case, and also eliminates the concerns of non-safety. > > This would probably fix the safety issues but at a price. There is a > > correlation between processes so large that they benefit greatly from > > vfork and threaded processes. > Ok, so I will continue with my patch. > > On the other hand, I think direct calls to vfork() in applications are > > risky and it may not be possible to support them safely in all > > circumstances. However, if libc is calling vfork() such as via popen(), > > system() or posix_spawn(), it should be possible even in a > > multi-threaded process. In that case, the rtld and libthr problems can > > be avoided by using aliases with hidden visibility for all functions the > > vforked child needs to call (or any other method that prevents > > interposition and hard-codes a displacement into libc.so). > I do not see how using any aliases could help there. Basically, if mt > process is not single-threaded for vfork, you can have both some parent > thread and child enter rtld. This is complete mess. If libc calls a function with hidden visibility, this proceeds directly (not via the PLT) and rtld is not involved. Several ways to do this are in section 2.2.7 Avoid Using Exported Symbols of Ulrich Drepper's dsohowto. One of them is something like extern __typeof(next) next_int __attribute((alias("next"), visibility("hidden"))); in the same source as the definition of the function int next(void) { ...; } As Drepper notes, the visibility attribute is not strictly required if a version script keeps the symbol local but it might lead to better code. At least on i386, though, the optimal direct near call instruction is generated even without it. For example, _nsdispatch() calls libc_dlopen() (kept local by the version script) without going through the PLT (from the output of objdump -dS on the libc.so.7 in /usr/obj). In the assembler syscall stubs using code from lib/libc/arch/SYS.h this can be done by adding another .set (we currently have foo, _foo and __sys_foo for a syscall foo; some syscalls have only _foo and __sys_foo) such as __syshidden_foo. The ones that are actually used then need prototypes (similar to the __sys_* ones in lib/libthr/?). For some reason, the symbol .cerror (HIDENAME(cerror)) is also exported. Either this should be kept local or a local uninterposable alias should be added and used (as with the syscalls). The function __error() (to get errno's address for the current thread) is and must be called via the PLT (because libthr is separate). Therefore, we should ensure it is called at least once before vfork so calls in the child do not involve rtld. The implementations for the various architectures use the TLS register (or memory location for ARM), so they seem safe. This should suffice to fix posix_spawn() but the _execvpe() used by posix_spawnp also uses various string functions. If not all of these have already been called earlier, this will not work. Making calls to them not go through the PLT seems fairly hard, even though it would make sense in general, so perhaps I should instead reimplement it such that the parent does almost all of the work. An alternative is to write the core of posix_spawn() in assembler using system calls directly but I would rather avoid that :( > > There may still be a problem in programs that install signal handlers > > because the set of async-signal-safe functions is larger than what can > > be done in a vforked child. Userland can avoid this by masking affected > > signals before calling vfork() and resetting them to SIG_DFL before > > unmasking them. This will add many syscalls if the code does not know > > which signals are affected (such as libc). Alternatively, the kernel > > could map caught signals to the default action for processes with > > P_PPWAIT (just like it does not stop such processes because of signals > > or TTY job control). > If rtld does not work, then any library function call from a signal handler > is problematic. My goal is to get a working rtld and possibly malloc. > BTW, not quite related, it seems that the placement of openat, > setcontext and swapcontext in the pthread.map is wrong. openat is > at FBSD_1.1 in libc, and *context are at FBSD_1.0 version, while > libthr exports them at 1.2. App or library gets linked to arbitrary > version depending on whether libphread was specified at link time, and > interposition from libthr does not work. Oops :( Can this still be fixed (like by exporting identical functions in multiple versions)? > Below is the latest version of my patch for vfork, which survives (modified) > tools/test/pthread_vfork_test. Patch is only for x86 right now. Why does this patch call thread_single(SINGLE_BOUNDARY)? I think this just causes breakage by causing short writes, [ERESTART] and the like where not permitted by POSIX while not fixing userland's problems. From userland's point of view, thread_single(SINGLE_BOUNDARY) just freezes threads at whatever point they are executing, including while holding an internal lock. Also, the thread_single_end() is before waiting for the child to exec or exit (because that is now just before returning to userland). If single-threading is needed, it should be done in userland such as via pthread_suspend_all_np(). This function already ensures no other thread is in a libthr critical section. If rtld blocks SIGTHR during its critical sections, it likely also waits for the threads to leave rtld. As with kernel-level single-threading, this is slow and causes incorrect short writes. If single-threading is only necessary because umtx locks are formally per-process instead of per-vmspace, then I suggest avoiding the trouble and depending on the fact that they are per-vmspace. > [snip] > diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c > index 46cdca1..134ba80 100644 > --- a/sys/kern/kern_fork.c > +++ b/sys/kern/kern_fork.c > @@ -150,11 +150,7 @@ sys_vfork(struct thread *td, struct vfork_args *uap) > int error, flags; > struct proc *p2; > > -#ifdef XEN > - flags = RFFDG | RFPROC; /* validate that this is still an issue */ > -#else > flags = RFFDG | RFPROC | RFPPWAIT | RFMEM; > -#endif > error = fork1(td, flags, 0, &p2, NULL, 0); > if (error == 0) { > td->td_retval[0] = p2->p_pid; > @@ -756,7 +752,7 @@ fork1(struct thread *td, int flags, int pages, struct proc **procp, > struct thread *td2; > struct vmspace *vm2; > vm_ooffset_t mem_charged; > - int error; > + int error, single_threaded; > static int curfail; > static struct timeval lastfail; > #ifdef PROCDESC > @@ -815,6 +811,19 @@ fork1(struct thread *td, int flags, int pages, struct proc **procp, > } > #endif > > + if (((p1->p_flag & (P_HADTHREADS | P_SYSTEM)) == P_HADTHREADS) && > + (flags & RFPPWAIT) != 0) { > + PROC_LOCK(p1); > + if (thread_single(SINGLE_BOUNDARY)) { > + PROC_UNLOCK(p1); > + error = ERESTART; > + goto fail2; > + } > + PROC_UNLOCK(p1); > + single_threaded = 1; > + } else > + single_threaded = 0; > + > mem_charged = 0; > vm2 = NULL; > if (pages == 0) > @@ -945,6 +954,12 @@ fail1: > if (vm2 != NULL) > vmspace_free(vm2); > uma_zfree(proc_zone, newproc); > + if (single_threaded) { > + PROC_LOCK(p1); > + thread_single_end(); > + PROC_UNLOCK(p1); > + } > +fail2: > #ifdef PROCDESC > if (((flags & RFPROCDESC) != 0) && (fp_procdesc != NULL)) { > fdclose(td->td_proc->p_fd, fp_procdesc, *procdescp, td); -- Jilles Tjoelker