Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Jan 2008 17:10:41 -0800
From:      Julian Elischer <julian@elischer.org>
To:        Landon Fuller <landonf@threerings.net>
Cc:        nate@yogotech.com, ivo@scito.com, Alfred Perlstein <alfred@freebsd.org>, Daniel Eischen <deischen@freebsd.org>, davidxu@freebsd.org, java@freebsd.org, julian@freebsd.org
Subject:   Re: cvs commit: src/lib/libkse/thread thr_kern.c
Message-ID:  <478FFC91.4050508@elischer.org>
In-Reply-To: <90584F61-91FE-446E-978E-FD234553E8FC@threerings.net>
References:  <200711301716.lAUHGEV1064334@repoman.freebsd.org> <wpprxrto0s.fsf@heho.snv.jussieu.fr> <Pine.GSO.4.64.0711301659060.5465@sea.ntplx.net> <wpwsrz9uyr.fsf@heho.snv.jussieu.fr> <Pine.GSO.4.64.0711301849310.6581@sea.ntplx.net> <wphcj2plsx.fsf@heho.snv.jussieu.fr> <Pine.GSO.4.64.0712011824130.11446@sea.ntplx.net> <wphcj1dmvz.fsf@heho.snv.jussieu.fr> <90584F61-91FE-446E-978E-FD234553E8FC@threerings.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Landon Fuller wrote:
> 
> On Dec 2, 2007, at 09:31, Arno J. Klaassen wrote:
> 
>> For info, the attached patch, which partially reverts mfc of rev 1.286
>>
>> of kern_fork.c, seems to work as well (without the above patch to be 
>> clear),
>>
> 
> I just upgraded our 8-core build server from pre-november 6-STABLE to 
> 6.3-RELEASE, and ran into this issue, causing our fork-heavy builder 
> processes to lock up regularly.
> 
> Your suggested patch (reverting the 1.286 MFC to sys/kern/kern_fork.c) 
> allows our builds to run to completion; I'll try digging into this 
> further. Given how easy this is to reproduce, I'm hoping this is 
> possible to fix before 6.3 is officially released?


This is a problem.. the reason it was changed was that the
previous code results in heavily loaded threaded processes that
fork, hanging in indefinite lockups IN THE KERNEL. Eventually
the whole machine would become unuseable.  In particular when
there is NFS being used but in other situations too. SO I'm
damned if I do and damned if I don't on this.

We were able to prove to ourselves that if a program got into this
state it was a definite programming error. As was stated in the
discussion to this change:
"The change is trying to protect the user from doing something that 
they shouldn't be doing anyhow."
The previous kernel tried to stop all other threads from running
and  thus, stopping them from changing anything, while the
kernel copies the memory into the child process. The fact is that
the kernel can't really protect the process from doing this and
the other threads in the parent can still leave things in a state
that will screw up the child.

I gather it is the PARENT that hangs here?

It's possible that the answer is that the library needs to
be changed as well.  Dan, what is the library doing here?














> 
> Here's a simple reproduction case that results in instant spinning 
> sub-processes:
> 
> #0  0x0000000800648b13 in mutex_lock_common (curthread=0x0, 
> m=0x8007616e8, abstime=0x0) at 
> /usr/src/lib/libpthread/thread/thr_mutex.c:503
> #1  0x000000080064ac25 in _pthread_mutex_lock (m=0x8007616e8) at 
> /usr/src/lib/libpthread/thread/thr_mutex.c:868
> #2  0x000000080063e9ce in _spinlock (lck=0x8009ac200) at 
> /usr/src/lib/libpthread/thread/thr_spinlock.c:97
> #3  0x00000008007eafc3 in pubrealloc (ptr=0x0, size=24, func=0x8008802b7 
> " in malloc():") at /usr/src/lib/libc/stdlib/malloc.c:1090
> #4  0x00000008007eb1e1 in malloc (size=24) at 
> /usr/src/lib/libc/stdlib/malloc.c:1150
> #5  0x000000080065ab8c in _lockuser_init (lu=0x52e068, priv=0x52e000) at 
> /usr/src/lib/libpthread/sys/lock.c:99
> #6  0x000000080065ac69 in _lockuser_reinit (lu=0x52e068, priv=0x52e000) 
> at /usr/src/lib/libpthread/sys/lock.c:128
> #7  0x000000080064d6d0 in _kse_single_thread (curthread=0x50cc00) at 
> /usr/src/lib/libpthread/thread/thr_kern.c:343
> #8  0x000000080063b627 in _fork () at 
> /usr/src/lib/libpthread/thread/thr_fork.c:101
> #9  0x00000000004008f1 in forker ()
> #10 0x000000080064516e in thread_start (curthread=0x50cc00, 
> start_routine=0x4008e0 <forker>, arg=0x0) at 
> /usr/src/lib/libpthread/thread/thr_create.c:341
> #11 0x00000008007b3cd9 in makectx_wrapper (ucp=0x800530860, 
> func=0x800645150 <thread_start>, args=0x7fffff7fcfd0) at 
> /usr/src/lib/libc/amd64/gen/makecontext.c:100
> #12 0x0000000000000000 in ?? ()
> #13 0x000000000050cc00 in ?? ()
> #14 0x00000000004008e0 in frame_dummy ()
> 
> #include <sys/types.h>
> #include <unistd.h>
> #include <pthread.h>
> 
> void *forker (void *arg) {
>         while (1) {
>                 pid_t pid = fork();
>                 if (pid == 0) {
>                         exit(0);
>                 } else if (pid > 0) {
>                         int status;
>                         waitpid(pid, &status, 0);
>                 } else {
>                         printf("Fork failed\n");
>                         abort();
>                 }
>         }
> }
> 
> int main(void) {
>         int i = 0;
>         for (i = 0; i < 4; i++) {
>                 pthread_t thr;
>                 pthread_create(&thr, NULL, forker, NULL);
>                 pthread_detach(thr);
>         }
> 
>         while(1)
>                 sleep(1000);
> }




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?478FFC91.4050508>