Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Jan 2008 21:36:43 -0500 (EST)
From:      Daniel Eischen <deischen@freebsd.org>
To:        Julian Elischer <julian@elischer.org>
Cc:        ivo@scito.com, Alfred Perlstein <alfred@freebsd.org>, nate@yogotech.com, Landon Fuller <landonf@threerings.net>, davidxu@freebsd.org, java@freebsd.org, julian@freebsd.org
Subject:   Re: cvs commit: src/lib/libkse/thread thr_kern.c
Message-ID:  <Pine.GSO.4.64.0801172132320.12041@sea.ntplx.net>
In-Reply-To: <478FFC91.4050508@elischer.org>
References:  <200711301716.lAUHGEV1064334@repoman.freebsd.org> <wpprxrto0s.fsf@heho.snv.jussieu.fr> <Pine.GSO.4.64.0711301659060.5465@sea.ntplx.net> <wpwsrz9uyr.fsf@heho.snv.jussieu.fr> <Pine.GSO.4.64.0711301849310.6581@sea.ntplx.net> <wphcj2plsx.fsf@heho.snv.jussieu.fr> <Pine.GSO.4.64.0712011824130.11446@sea.ntplx.net> <wphcj1dmvz.fsf@heho.snv.jussieu.fr> <90584F61-91FE-446E-978E-FD234553E8FC@threerings.net> <478FFC91.4050508@elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 17 Jan 2008, Julian Elischer wrote:

> Landon Fuller wrote:
>> 
>> On Dec 2, 2007, at 09:31, Arno J. Klaassen wrote:
>> 
>>> For info, the attached patch, which partially reverts mfc of rev 1.286
>>> 
>>> of kern_fork.c, seems to work as well (without the above patch to be 
>>> clear),
>>> 
>> 
>> I just upgraded our 8-core build server from pre-november 6-STABLE to 
>> 6.3-RELEASE, and ran into this issue, causing our fork-heavy builder 
>> processes to lock up regularly.
>> 
>> Your suggested patch (reverting the 1.286 MFC to sys/kern/kern_fork.c) 
>> allows our builds to run to completion; I'll try digging into this further. 
>> Given how easy this is to reproduce, I'm hoping this is possible to fix 
>> before 6.3 is officially released?
>
>
> This is a problem.. the reason it was changed was that the
> previous code results in heavily loaded threaded processes that
> fork, hanging in indefinite lockups IN THE KERNEL. Eventually
> the whole machine would become unuseable.  In particular when
> there is NFS being used but in other situations too. SO I'm
> damned if I do and damned if I don't on this.
>
> We were able to prove to ourselves that if a program got into this
> state it was a definite programming error. As was stated in the
> discussion to this change:
> "The change is trying to protect the user from doing something that they 
> shouldn't be doing anyhow."
> The previous kernel tried to stop all other threads from running
> and  thus, stopping them from changing anything, while the
> kernel copies the memory into the child process. The fact is that
> the kernel can't really protect the process from doing this and
> the other threads in the parent can still leave things in a state
> that will screw up the child.
>
> I gather it is the PARENT that hangs here?

It must be the child that hangs.

> It's possible that the answer is that the library needs to
> be changed as well.  Dan, what is the library doing here?

I suppose it is malloc() that is getting into an inconsistent
state in the child.  Creating a thread causes malloc() usage,
so threads in the parent can cause the malloc lock to look
like it's been locked just as the process is forked from a
different thread.

You might want to check out any differences between libkse
in -current and libpthread in 6.x.  I don't think there is
an issue with -current.

-- 
DE



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.64.0801172132320.12041>