From owner-svn-src-user@FreeBSD.ORG  Mon Nov 15 00:04:48 2010
Return-Path: <owner-svn-src-user@FreeBSD.ORG>
Delivered-To: svn-src-user@freebsd.org
Received: from alona.my.domain (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id 09F56106566B;
	Mon, 15 Nov 2010 00:04:46 +0000 (UTC)
	(envelope-from davidxu@freebsd.org)
Message-ID: <4CE0790B.3040706@freebsd.org>
Date: Mon, 15 Nov 2010 08:04:27 +0800
From: David Xu <davidxu@freebsd.org>
User-Agent: Thunderbird 2.0.0.21 (X11/20090522)
MIME-Version: 1.0
To: Jilles Tjoelker <jilles@stack.nl>
References: <201011071349.oA7Dn8Po048543@svn.freebsd.org>
	<20101113151035.GB79975@stack.nl>
	<4CDF7F38.5010000@freebsd.org> <20101114181631.GA1831@stack.nl>
In-Reply-To: <20101114181631.GA1831@stack.nl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: Re: svn commit: r214915 - user/davidxu/libthr/lib/libthr/thread
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
	src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-user>,
	<mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-user>,
	<mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Nov 2010 00:04:48 -0000

Jilles Tjoelker wrote:
> On Sun, Nov 14, 2010 at 02:18:32PM +0800, David Xu wrote:
>   
>> Jilles Tjoelker wrote:
>>     
>>> On Sun, Nov 07, 2010 at 01:49:08PM +0000, David Xu wrote:
>>>       
>
>   
>>>> Author: davidxu
>>>> Date: Sun Nov  7 13:49:08 2010
>>>> New Revision: 214915
>>>> URL: http://svn.freebsd.org/changeset/base/214915
>>>>         
>
>   
>>>> Log:
>>>>   Implement robust mutex, the pthread_mutex locking and
>>>>   unlocking code are reworked to support robust mutex and
>>>>   other mutex must be locked and unlocked by kernel.
>>>>         
>
>   
>>> The glibc+linux implementation avoids the system call for each robust
>>> mutex lock/unlock by maintaining the list in userland and providing a
>>> pointer to the kernel. Although this is somewhat less reliable in case a
>>> process scribbles over this list, it has better performance.
>>>       
>
>   
>>> There are various ways this list could be maintained, but the glibc way
>>> uses an "in progress" field per thread and a linked list using a field
>>> in the pthread_mutex_t, so if we want that we should make sure we have
>>> the space in the pthread_mutex_t. Alternatively, a simple array could be
>>> used if the number of owned robust mutexes can be limited to a fairly
>>> low value.
>>>       
>
>   
>>> Solaris robust mutexes used to work by entering the kernel for every
>>> lock/unlock, but no longer, see
>>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6296770
>>> Someone complained about that implementation being too slow.
>>>       
>
>   
>> I don't like the glibc's idea that reading or writing the mutex after 
>> unlocked it,
>> why do you think the memory is still valid and being used for the mutex
>> after you unlocked it?
>>     
>
>   
>> There is a use-case that glibc will mysteriously fail:
>> Thread  A at userland unlocked the mutex, then another thread B at
>> userland locked it, and thread B reuses the memory area for other purpose,
>> before thread A enters kernel, thread B used it for memory-mapped file 
>> buffer,
>> than write some data into the buffer which will be saved into disk file
>> by kernel, but before A runs, the memory happens to contains thread A's
>> thread id, then the thread A enters kernel, and thinks the userland
>> still hasn't unlocked the mutex, and it tries to write some data into mutex,
>> it thinks it unlocked it, but the memory is no longer for mutex now, it just
>> simply corrupted the data thread B saved. This implementation is racy and
>> dangerous.
>>     
>
> That seems rare but very dangerous if it triggers.
>
>   
>> I also know that they have link-entry embedded in mutex,
>> if the mutex is being shared by multiple processes,then I can write a
>> specific value into the link entry and corrupt the owner's link list,
>> even worse, I can write a specific address into the link entry, when
>> the owner unlocks it and unlinks it from its list, he will be happy to
>> write to any address I specified,  this may be a security problem
>> or causes very difficult debugging problem if the mutex memory
>> is corrupted.
>>     
>
> Indeed. Admittedly, shared memory is inherently not very safe, but one
> would expect the unsafety to be restricted to the shared memory segment.
>
>   
>> I think if you want robust mutex, //for whatever you do, there is a price,
>> the robust mutex is expensive but reliable. Until you can prove that
>> the glibc's write-mutex-memory-after-unlock is  not a problem,
>> I would not follow their idea. Based on the research, I think
>> Solaris must have a reason to not do it in this way, I would not laugh
>> at them that their robust mutex is slow.
>>     
>
> As I said, they changed it so it does not require a syscall for
> uncontended lock/unlock. Their implementation seems safer than what
> glibc+linux does:
>
> * The kernel knows all robust mutexes that are mapped and potentially
>   locked in a process (even if there is no thread blocked on them).
>   mutexes are added to this list upon pthread_mutex_init() and
>   pthread_mutex_lock() and removed when the memory region containing
>   them is unmapped. When the process execs or exits, the list is walked
>   and all mutexes owned by a thread in this process do the EOWNERDEAD
>   thing. A copy of the list is maintained in userland to avoid excessive
>   system calls. When a mutex is unmapped, the kernel removes the entry
>   from the userland list as well.
>
> * The list of owned robust mutexes is a variable length array instead of
>   a linked list, so there are no pointers to thread-private memory in
>   the mutex. Furthermore, this list is only used when a thread exits, in
>   userland.
>
> This approach appears to solve your objections, except if a program
> destroys a robust mutex and starts using the memory for something else
> without unmapping it. Perhaps doing some sort of syscall in
> pthread_mutex_destroy() to remove the mutex entry for all processes that
> have it mapped can solve this.
>   

I know the Solaris implemented in this way, but I also know because they
needs to look up a hash table in userland every time a pthread_mutex_lock()
is called, it is still not O(1), and has extra lock contention, even 
worse, does
the userland hash table lock protect priority inversion ? is it safe for 
real-time
thread ? I think they had implemented priority-inherited and 
priority-protect
mutex, even their condition variable supports the real-time thread 
scheduling.
Their implementation also causes impossibility to use robust mutex without
thread library, I saw there is a complain of such a problem.

> David Xu wrote:
>   
>> I also remembered Solaris 's  condition varaible is automatically
>> robust if your mutex is robust mutex type, glibc can not provide
>> the feature, when a process crashed, the condition variable's internal
>> lock may be held by the dead thread, and the condition variable is
>> in not usable state. but Solar's condition variable will still be in
>> health state, if a thread found the the pthread_mutex_lock returned
>> EOWNERDEAD, it still can use condition variable without any
>> problem. Without robust condition variable, a robust mutex is less
>> useful.
>>     
>
> Indeed, but it seems more-or-less orthogonal to mostly-userland robust
> mutex. I think the current kernel condition variable is suitable here,
> but I have not thought this through completely.
>   
Yes, the pthread is really a very complex beast, all complexity are hidden
under a simple group of API,  but when you want to implement it,
you will find that it is so difficult,  sometime you want to give up.