Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 Jan 2009 13:41:26 -0500
From:      Brian Fundakowski Feldman <green@freebsd.org>
To:        Daniel Eischen <deischen@freebsd.org>
Cc:        David Schultz <das@freebsd.org>, hackers@freebsd.org, Jason Evans <jasone@freebsd.org>, Julian Elischer <julian@elischer.org>
Subject:   Re: threaded, forked, rethreaded processes will deadlock
Message-ID:  <20090126184126.GA3712@green.homeunix.org>
In-Reply-To: <Pine.GSO.4.64.0901220021320.4150@sea.ntplx.net>
References:  <4966F81C.3070406@elischer.org> <20090109163426.GC2825@green.homeunix.org> <49678BBC.8050306@elischer.org> <20090116211959.GA12007@green.homeunix.org> <49710BD6.7040705@FreeBSD.org> <20090120004135.GB12007@green.homeunix.org> <20090121230033.GC12007@green.homeunix.org> <Pine.GSO.4.64.0901211831210.4150@sea.ntplx.net> <20090122045637.GA61058@zim.MIT.EDU> <Pine.GSO.4.64.0901220021320.4150@sea.ntplx.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 22, 2009 at 12:42:56AM -0500, Daniel Eischen wrote:
> On Wed, 21 Jan 2009, David Schultz wrote:
> 
>> I think there *is* a real bug here, but there's two distinct ways
>> to fix it. When a threaded process forks, malloc acquires all its
>> locks so that its state is consistent after a fork. However, the
>> post-fork hook that's supposed to release these locks fails to do
>> so in the child because the child process isn't threaded, and
>> malloc_mutex_unlock() is optimized to be a no-op in
>> single-threaded processes. If the child *stays* single-threaded,
>> malloc() works by accident even with all the locks held because
>> malloc_mutex_lock() is also a no-op in single-threaded processes.
>> But if the child goes multi-threaded, then things break.
>> 
>> Solution 1 is to actually unlock the locks in the child process,
>> which is what Brian is proposing.
>> 
>> Solution 2 is to take the position that all of this pre- and
>> post-fork bloat in the fork() path is gratuitous and should be
>> removed. The rationale here is that if you fork with multiple
>> running threads, there's scads of ways in which the child's heap
>> could be inconsistent; fork hooks would be needed not just in
>> malloc(), but in stdio, third party libraries, etc. Why should
>> malloc() be special? It's the programmer's job to quiesce all the
>> threads before calling fork(), and if the programmer doesn't do
>> this, then POSIX only guarantees that async-signal-safe functions
>> will work.
>> 
>> Note that Solution 2 also fixes Brian's problem if he quiesces all
>> of his worker threads before forking (as he should!) With the
>> pre-fork hook removed, all the locks will start out free in the
>> child.  So that's what I vote for...
> 
> The problem is that our own libraries (libthr included)
> need to malloc() for themselves, even after a fork() in
> the child.  After a fork(), the malloc locks should be
> reinitialized in the child if it was threaded, so that
> our implementation actually works for all the async
> signal calls, fork(), exec(), etc.  I forget the exact
> failure modes for very common cases, but if you remove
> the re-initialization of the malloc locks, I'm sure
> you will have problems.
> 
> Perhaps much of this malloc() stuff goes away when we
> move to pthread locks that are not pointers to allocated
> objects, but instead are actual objects/structures.
> This needs to be done in order for mutexes/CVs/etc
> to be PTHREAD_PROCESS_SHARED (placed in shared memory
> and used by multiple processes).  In other words,
> pthread_mutex_t goes from this:
> 
> 	typedef struct pthread_mutex *pthread_mutex_t;
> 
> to something like this:
> 
> 	struct __pthread_mutex {
> 		uint32_t	lock;
> 		...
> 	}
> 	typedef struct __pthread_mutex pthread_mutex_t;
> 
> Same thing for CVs, and we probably should convert any other
> locks used internally by libc/libpthread (spinlocks).
> 
> So after a fork(), there is no need to reallocate anything,
> it can just be reinitialized if necessary.

In this case it's not a matter of locks needing to be again initialized
-- the mutexes are fine -- they just need to be relinquished, which
never occurs for the child now, but used to work fine in 7.0.  What is
truly broken about the current behavior module the bug I'm trying to fix?

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green@FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090126184126.GA3712>