Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Oct 2007 12:14:59 -0700
From:      Julian Elischer <julian@elischer.org>
To:        =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= <des@des.no>
Cc:        cvs-src@FreeBSD.org, src-committers@FreeBSD.org, Julian Elischer <julian@FreeBSD.org>, cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/kern kern_fork.c
Message-ID:  <471F99B3.6060802@elischer.org>
In-Reply-To: <86y7dsuby9.fsf@ds4.des.no>
References:  <200710231754.l9NHsGLH090312@repoman.freebsd.org> <86y7dsuby9.fsf@ds4.des.no>

next in thread | previous in thread | raw e-mail | index | archive | help
Dag-Erling Smørgrav wrote:
> Julian Elischer <julian@FreeBSD.org> writes:
>>   This removes a reproducible lockup in NFS.
> 
> Could you elaborate on that?
> 
> DES

facts:
There is an error in the single-threading mode selected in fork 
(some "optimization" code that was added at some time (maybe by me)) 
that suspends threads that are already sleeping with PCATCH by simply 
adding the suspended bit. Turns out this is a bad idea.

NFS sometimes sleeps with a vnode lock held, with PCATCH set.
(and is this a candidate for the above)


now, the mechanism:

thread A does an NFS operation, locks an NFS vnode, and sleeps 
with PCATCH for some reply from the server.

thread B enters NFS but hits the locked vnode and waits (NO PCATCH)

thread C does fork()

thread A is suspended and can not proceed. 
(bug but let's get past that)
it is counted as quiesced for the thread_single

so,

Thread B can not proceed and so can not be suspended and 
counted as quiesced (also bug I think)

so

thread C never reached 'single threading state' (B is not yet quiesced)
 and can not proceed.

so

thread A can not be reawakened.

etc.

There are so many bugs here that one loses count,
however it turns out that the whole idea of single-threading
in the fork is unneeded due to all the locking introduced for
all the components altered in fork().

so:

to fix the problem: use another mode of thread_single()
that counts threads quiesced differently and doesn't do the suspend stupidity.

but

having fixed that,
the whole thing can be removed anyhow.


(analysis by davidxu, alc, me, alfred in concert)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?471F99B3.6060802>