From owner-freebsd-arch@freebsd.org Fri May 6 23:30:14 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E2910B31AF4 for ; Fri, 6 May 2016 23:30:14 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id D1D0E108C for ; Fri, 6 May 2016 23:30:14 +0000 (UTC) (envelope-from jilles@stack.nl) Received: by mailman.ysv.freebsd.org (Postfix) id CDB74B31AF2; Fri, 6 May 2016 23:30:14 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CAE71B31AF1; Fri, 6 May 2016 23:30:14 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6D574108B; Fri, 6 May 2016 23:30:14 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from toad2.stack.nl (toad2.stack.nl [IPv6:2001:610:1108:5010::161]) by mx1.stack.nl (Postfix) with ESMTP id 1734BB80ED; Sat, 7 May 2016 01:30:11 +0200 (CEST) Received: by toad2.stack.nl (Postfix, from userid 1677) id D8576892AE; Sat, 7 May 2016 01:30:11 +0200 (CEST) Date: Sat, 7 May 2016 01:30:11 +0200 From: Jilles Tjoelker To: Konstantin Belousov Cc: threads@freebsd.org, arch@freebsd.org Subject: Re: Robust mutexes implementation Message-ID: <20160506233011.GA99994@stack.nl> References: <20160505131029.GE2422@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160505131029.GE2422@kib.kiev.ua> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 May 2016 23:30:15 -0000 On Thu, May 05, 2016 at 04:10:29PM +0300, Konstantin Belousov wrote: > I implemented robust mutexes for our libthr. A robust mutex is > guaranteed to be cleared by the system upon either thread or process > owner termination while the mutex is held. The next mutex locker is > then notified about inconsistent mutex state and can execute (or > abandon) corrective actions. > The patch mostly consists of small changes here and there, adding > neccessary checks for the inconsistent and abandoned conditions into > existing paths. Additionally, the thread exit handler was extended > to iterate over the userspace-maintained list of owned robust mutexes, > unlocking and marking as terminated each of them. > The list of owned robust mutexes cannot be maintained atomically > synchronous with the mutex lock state (it is possible in kernel, but > is too expensive). Instead, for the duration of lock or unlock > operation, the current mutex is remembered in a special slot that is > also checked by the kernel at thread termination. > Kernel must be aware about the per-thread location of the heads of > robust mutex lists and the current active mutex slot. Initially I tried > to extend TCBs with this data, so only a single syscall at the > threading library initialization would be needed: for any thread the > location of TCB is known by kernel, and the syscall would pass > offsets. Unfortunately, on some architectures the size of TCB is part > of the fixed ABI and cannot be changed. Instead, when a thread touches > a robust mutex for the first time, a new umtx op syscall is issued which > informs about location of lists heads. > The umtx sleep queues for PP and PI mutexes are split between > non-robust and robust. I do not understand the reasoning behind this > POSIX requirement. > Patch passes all glibc tests for robust mutexes I found in the nptl/ > directory. See https://github.com/kostikbel/glibc-robust-tests . > Patch is available at https://kib.kiev.ua/kib/pshared/robust.1.patch > (beware of self-signed root certificate in the chain). Work was > sponsored by The FreeBSD Foundation. > Unrelated things in the patch: > 1. Style. Since I had to re-read whole sys/kern/kern_umtx.c, > lib/libthr/thread/thr_umtx.h and lib/libthr/thread/thr_umtx.c, I > started fixing the numerous style violations in these files, which > actually made my eyes bleed. > 2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared > pi mutexes. > 3. Removal of the struct pthread_mutex m_owner field. I cannot see > why it is useful. The only optimization it provides is the > possibility to avoid clearing UMUTEX_CONTESTED bit when reading > m_lock.m_owner. The disadvantages of having this duplicated field > is that kernel does not know about pthread_mutex, so cannot fix the > dup value. Overall it is less work to clear UMUTEX_CONTESTED when > checking owner, then to try and handle inconsistencies. > I added the PMUTEX_OWNER_ID() macro to simplify code. > 4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls > the lifetime of the shared mutex associated with a vnode' page. > Apparently, there is real code around which expects the following > to work: > - mmap a file, create a shared mutex in the mapping; > - the process exits; > - another process starts, mmaps the same file and expects that the > previously initialized mutex is still usable. > The knob changes the lifetime of such shared off-page from the > 'destroy on last unmap' to either 'until vnode is reclaimed' or > until 'pthread_mutex_destroy' called, whatever comes first. The 'until vnode is reclaimed' bit sounds like a recipe for hard to reproduce bugs. I do think it is related to the robust mutex patch, though. Without robust mutexes and assuming threads do not unmap the memory while having a mutex locked or while waiting on a condition variable, it is sufficient to create the off-page mutex/condvar automatically in its initial state when pthread_mutex_lock() or pthread_cond_*wait() are called and no off-page object exists. With robust mutexes, we need to store somewhere whether the next thread should receive [EOWNERDEAD] or not, and this should persist even if no processes have the memory mapped. This might be done by replacing THR_PSHARED_PTR with a different value in the pthread_mutex_t. I'm not sure I like that memory write being done from the kernel though. The below is a review of https://kib.kiev.ua/kib/pshared/robust.2.patch > diff --git a/lib/libc/gen/Symbol.map b/lib/libc/gen/Symbol.map It is not necessary to export _pthread_mutex_consistent, _pthread_mutexattr_getrobust and _pthread_mutexattr_setrobust (under FBSDprivate_1.0 symbol version). They are not used by name outside the DSO, only via the jump table. The same thing is true of many other FBSDprivate_1.0 symbols but there is a difference between adding new unnecessary exports and removing existing unnecessary exports. > + if ((error2 == 0 || error2 == EOWNERDEAD) && cancel) > _thr_testcancel(curthread); I don't think [EOWNERDEAD] should be swept under the carpet like this. The cancellation cleanup handler will use the protected state and unlock the mutex without making the state consistent and the state will be unrecoverable. > +void > +_mutex_leave_robust(struct pthread *curthread, struct pthread_mutex *m) > +{ > + > + if (!is_robust_mutex(m)) > + return; This accesses the mutex after writing a value to the lock word which allows other threads to lock it. A use after free may result, since it is valid for another thread to lock, unlock and destroy the mutex (assuming the mutex is not used otherwise later). Memory ordering may permit the load of m->m_lock.m_flags to be moved before the actual unlock, so this issue may not actually appear. Given that the overhead of a system call on every robust mutex unlock is not desired, the kernel's unlock of a terminated thread's mutexes will unavoidably have this use after free. However, for non-robust mutexes the previous guarantees should be kept. > + int defered, error; Typo, should be "deferred". > +.It Bq Er EOWNERDEAD > +The argument > +.Fa mutex > +points to the robust mutex and the previous owning thread terminated > +while holding the mutex lock. > +The lock was granted to the caller and it is up to the new owner > +to make the state consistent. "points to a robust mutex". Perhaps add a See .Xr pthread_mutex_consistent 3 here. > diff --git a/share/man/man3/pthread_mutex_consistent.3 b/share/man/man3/pthread_mutex_consistent.3 This man page should mention that pthread_mutex_consistent() can be called when a mutex lock or condition variable wait failed with [EOWNERDEAD]. > @@ -37,7 +37,11 @@ struct umutex { > + __uintptr_t m_rb_lnk; /* Robust linkage */ Although Linux also stores the robust list nodes in the mutexes like this, I think it increases the chance of strange memory corruption. Making the robust list an array in the thread's memory would be more reliable. If the maximum number of owned robust mutexes can be small, this can have a fixed size; otherwise, it needs to grow as needed (which does add an allocation that may fail to the pthread_mutex_lock path, bad). > + * The umutex.m_lock values and bits. The m_owner is the word which > + * serves as the lock. It's high bit is the contention indicator, > + * rest of bits records the owner TID. TIDs values start with PID_MAX > + * + 2 and ends by INT32_MAX, the low range [1..PID_MAX] is guaranteed > + * to be useable as the special markers. Typo, "It's" should be "Its" and "ends" should be "end". Bruce Evans would probably complain about comma splice (two times). > +#define UMTX_OP_ROBUST 26 The name is rather vague. Perhaps this can be something like UMTX_OP_INIT_ROBUST. -- Jilles Tjoelker