Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 04 Apr 2010 20:56:28 -0700
From:      Doug Barton <dougb@FreeBSD.org>
To:        Jilles Tjoelker <jilles@stack.nl>
Cc:        svn-src-stable@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, svn-src-stable-8@freebsd.org
Subject:   Re: svn commit: r205806 - stable/8/etc
Message-ID:  <4BB95F6C.7070202@FreeBSD.org>
In-Reply-To: <20100329173214.GA17249@stack.nl>
References:  <201003282019.o2SKJfPg033857@svn.freebsd.org> <4BAFBBFA.7020701@FreeBSD.org> <20100328210630.GA2086@stack.nl> <4BAFE1EE.9040908@FreeBSD.org> <20100329173214.GA17249@stack.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------070707050009090200000503
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On 03/29/10 10:32, Jilles Tjoelker wrote:
> On Sun, Mar 28, 2010 at 04:10:38PM -0700, Doug Barton wrote:
>> On 03/28/10 14:06, Jilles Tjoelker wrote:
>>> On Sun, Mar 28, 2010 at 01:28:42PM -0700, Doug Barton wrote:
>>>> Probably my fault for not saying something sooner, but there is a
>>>> problem with the code in head that sometimes causes it to loop
>>>> repeatedly even though pwait exits successfully. I am trying to track it
>>>> down, but since it only happens about once every 10 shutdowns it's been
>>>> difficult.
> 
>>> There is a difference between the two methods in what is waited for
>>> exactly. pwait(1) will wait for the process to terminate; if it is
>>> applied to a zombie it will return immediately (printing the exit status
>>> if -v was given). On the other hand, kill(1) will continue to return
>>> success until the process has been waited for by its parent.
> 
>> The process that I see this with most often is devd, does that fit the
>> model you're describing?
> 
> Possibly. This would mainly happen because init has been busy, I think
> (or if the parent isn't init).
> 
>> What are the implications of moving on after a
>> successful pwait even though there is still a zombie process?
> 
> For shutdown/stop, nothing.
> 
> For restart, there may be problems if a restarted daemon checks the
> validity of the pid in the pidfile using kill().

Ok, in that case I'm not really comfortable with the idea of ignoring
the results of kill -0, however I've come up with what I think is a good
solution in the attached patch. Based on your description and my ongoing
analysis the problem I was seeing with the long string of the same pid
repeated over and over seems to be a side effect of pwait returning
successfully (thus the || sleep 2 never kicks in) but 'kill -0' still
being able to see the pid. So, I've moved the sleep up so that if we're
not in the first pass but kill -0 is still seeing the pid that we sleep
for 1 second, then proceed. I think that'll handle both the problem I
saw, and the odd case where pwait doesn't return successfully.

Sound good?


Doug

-- 

	... and that's just a little bit of history repeating.
			-- Propellerheads

	Improve the effectiveness of your Internet presence with
	a domain name makeover!    http://SupersetSolutions.com/


--------------070707050009090200000503
Content-Type: text/plain;
 name="wait_for_pids.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="wait_for_pids.diff"

Index: rc.subr
===================================================================
--- rc.subr	(revision 206117)
+++ rc.subr	(working copy)
@@ -359,12 +359,14 @@
 	if [ -z "$_list" ]; then
 		return
 	fi
-	_prefix=
+
+	local _prefix=
 	while true; do
 		_nlist="";
 		for _j in $_list; do
 			if kill -0 $_j 2>/dev/null; then
 				_nlist="${_nlist}${_nlist:+ }$_j"
+				[ -n "$_prefix" ] && sleep 1
 			fi
 		done
 		if [ -z "$_nlist" ]; then
@@ -373,7 +375,7 @@
 		_list=$_nlist
 		echo -n ${_prefix:-"Waiting for PIDS: "}$_list
 		_prefix=", "
-		pwait $_list 2>/dev/null || sleep 2
+		pwait $_list 2>/dev/null
 	done
 	if [ -n "$_prefix" ]; then
 		echo "."

--------------070707050009090200000503--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BB95F6C.7070202>