From owner-svn-src-user@FreeBSD.ORG  Sun Nov 14 18:16:33 2010
Return-Path: <owner-svn-src-user@FreeBSD.ORG>
Delivered-To: svn-src-user@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6A0461065670;
	Sun, 14 Nov 2010 18:16:33 +0000 (UTC) (envelope-from jilles@stack.nl)
Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104])
	by mx1.freebsd.org (Postfix) with ESMTP id C7BCF8FC0C;
	Sun, 14 Nov 2010 18:16:32 +0000 (UTC)
Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132])
	by mx1.stack.nl (Postfix) with ESMTP id C2ACE35A840;
	Sun, 14 Nov 2010 19:16:31 +0100 (CET)
Received: by turtle.stack.nl (Postfix, from userid 1677)
	id AA1AF1733B; Sun, 14 Nov 2010 19:16:31 +0100 (CET)
Date: Sun, 14 Nov 2010 19:16:31 +0100
From: Jilles Tjoelker <jilles@stack.nl>
To: David Xu <davidxu@freebsd.org>
Message-ID: <20101114181631.GA1831@stack.nl>
References: <201011071349.oA7Dn8Po048543@svn.freebsd.org>
	<20101113151035.GB79975@stack.nl> <4CDF7F38.5010000@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4CDF7F38.5010000@freebsd.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: Re: svn commit: r214915 - user/davidxu/libthr/lib/libthr/thread
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
	src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-user>,
	<mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-user>,
	<mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Nov 2010 18:16:33 -0000

On Sun, Nov 14, 2010 at 02:18:32PM +0800, David Xu wrote:
> Jilles Tjoelker wrote:
> > On Sun, Nov 07, 2010 at 01:49:08PM +0000, David Xu wrote:

> >> Author: davidxu
> >> Date: Sun Nov  7 13:49:08 2010
> >> New Revision: 214915
> >> URL: http://svn.freebsd.org/changeset/base/214915

> >> Log:
> >>   Implement robust mutex, the pthread_mutex locking and
> >>   unlocking code are reworked to support robust mutex and
> >>   other mutex must be locked and unlocked by kernel.

> > The glibc+linux implementation avoids the system call for each robust
> > mutex lock/unlock by maintaining the list in userland and providing a
> > pointer to the kernel. Although this is somewhat less reliable in case a
> > process scribbles over this list, it has better performance.

> > There are various ways this list could be maintained, but the glibc way
> > uses an "in progress" field per thread and a linked list using a field
> > in the pthread_mutex_t, so if we want that we should make sure we have
> > the space in the pthread_mutex_t. Alternatively, a simple array could be
> > used if the number of owned robust mutexes can be limited to a fairly
> > low value.

> > Solaris robust mutexes used to work by entering the kernel for every
> > lock/unlock, but no longer, see
> > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6296770
> > Someone complained about that implementation being too slow.

> I don't like the glibc's idea that reading or writing the mutex after 
> unlocked it,
> why do you think the memory is still valid and being used for the mutex
> after you unlocked it?

> There is a use-case that glibc will mysteriously fail:
> Thread  A at userland unlocked the mutex, then another thread B at
> userland locked it, and thread B reuses the memory area for other purpose,
> before thread A enters kernel, thread B used it for memory-mapped file 
> buffer,
> than write some data into the buffer which will be saved into disk file
> by kernel, but before A runs, the memory happens to contains thread A's
> thread id, then the thread A enters kernel, and thinks the userland
> still hasn't unlocked the mutex, and it tries to write some data into mutex,
> it thinks it unlocked it, but the memory is no longer for mutex now, it just
> simply corrupted the data thread B saved. This implementation is racy and
> dangerous.

That seems rare but very dangerous if it triggers.

> I also know that they have link-entry embedded in mutex,
> if the mutex is being shared by multiple processes,then I can write a
> specific value into the link entry and corrupt the owner's link list,
> even worse, I can write a specific address into the link entry, when
> the owner unlocks it and unlinks it from its list, he will be happy to
> write to any address I specified,  this may be a security problem
> or causes very difficult debugging problem if the mutex memory
> is corrupted.

Indeed. Admittedly, shared memory is inherently not very safe, but one
would expect the unsafety to be restricted to the shared memory segment.

> I think if you want robust mutex, //for whatever you do, there is a price,
> the robust mutex is expensive but reliable. Until you can prove that
> the glibc's write-mutex-memory-after-unlock is  not a problem,
> I would not follow their idea. Based on the research, I think
> Solaris must have a reason to not do it in this way, I would not laugh
> at them that their robust mutex is slow.

As I said, they changed it so it does not require a syscall for
uncontended lock/unlock. Their implementation seems safer than what
glibc+linux does:

* The kernel knows all robust mutexes that are mapped and potentially
  locked in a process (even if there is no thread blocked on them).
  mutexes are added to this list upon pthread_mutex_init() and
  pthread_mutex_lock() and removed when the memory region containing
  them is unmapped. When the process execs or exits, the list is walked
  and all mutexes owned by a thread in this process do the EOWNERDEAD
  thing. A copy of the list is maintained in userland to avoid excessive
  system calls. When a mutex is unmapped, the kernel removes the entry
  from the userland list as well.

* The list of owned robust mutexes is a variable length array instead of
  a linked list, so there are no pointers to thread-private memory in
  the mutex. Furthermore, this list is only used when a thread exits, in
  userland.

This approach appears to solve your objections, except if a program
destroys a robust mutex and starts using the memory for something else
without unmapping it. Perhaps doing some sort of syscall in
pthread_mutex_destroy() to remove the mutex entry for all processes that
have it mapped can solve this.

David Xu wrote:
> I also remembered Solaris 's  condition varaible is automatically
> robust if your mutex is robust mutex type, glibc can not provide
> the feature, when a process crashed, the condition variable's internal
> lock may be held by the dead thread, and the condition variable is
> in not usable state. but Solar's condition variable will still be in
> health state, if a thread found the the pthread_mutex_lock returned
> EOWNERDEAD, it still can use condition variable without any
> problem. Without robust condition variable, a robust mutex is less
> useful.

Indeed, but it seems more-or-less orthogonal to mostly-userland robust
mutex. I think the current kernel condition variable is suitable here,
but I have not thought this through completely.

-- 
Jilles Tjoelker