Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Aug 2009 11:15:33 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Jilles Tjoelker <jilles@stack.nl>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Konstantin Belousov <kib@FreeBSD.org>
Subject:   Re: svn commit: r196460 - head/sys/kern
Message-ID:  <20090824093730.V39303@delplex.bde.org>
In-Reply-To: <20090823213759.GA55039@stack.nl>
References:  <200908231244.n7NCiFgc061095@svn.freebsd.org> <20090823213759.GA55039@stack.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 23 Aug 2009, Jilles Tjoelker wrote:

> On Sun, Aug 23, 2009 at 12:44:15PM +0000, Konstantin Belousov wrote:
> ...
>> Modified: head/sys/kern/uipc_socket.c
>> ==============================================================================
>> --- head/sys/kern/uipc_socket.c	Sun Aug 23 12:23:24 2009	(r196459)
>> +++ head/sys/kern/uipc_socket.c	Sun Aug 23 12:44:15 2009	(r196460)
>> @@ -2898,13 +2898,11 @@ sopoll_generic(struct socket *so, int ev
>>  		if (so->so_oobmark || (so->so_rcv.sb_state & SBS_RCVATMARK))
>>  			revents |= events & (POLLPRI | POLLRDBAND);
>>
>> -	if ((events & POLLINIGNEOF) == 0) {
>> -		if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
>> -			revents |= events & (POLLIN | POLLRDNORM);
>> -			if (so->so_snd.sb_state & SBS_CANTSENDMORE)
>> -				revents |= POLLHUP;
>> -		}
>> -	}
>> +	if ((events & POLLINIGNEOF) == 0)
>> +		if (so->so_rcv.sb_state & SBS_CANTRCVMORE)
>> +			revents |= POLLHUP;
>> +	if (so->so_snd.sb_state & SBS_CANTSENDMORE)
>> +		revents |= POLLHUP;
>>
>>  	if (revents == 0) {
>>  		if (events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND)) {
>
> This sets POLLHUP when either of the directions has shut down, instead
> of the entire connection. This breaks use of select and poll with
> shutdown(2):

We already knew about this in general, but not its details.

> * sending some data, shutdown(SHUT_WR) then polling for input
> * (the other side for the above) receiving some data, getting EOF on
>  that direction then polling for output

This could possibly be fixed by not breaking things perfectly to POSIX
spec, and keep returning POLLOUT with POLLHUP.  We already return POLLIN
with POLLHUP.  This state is meaningful and supported by POSIX.  It means
that hangup has occurred but there is queued input.  The current problem
shows that the state { POLLOUT with POLLHUP } is also meaningful, although
it is forbidden by POSIX.  It should mean that hangup has occurred (for
the read direction only) but output is still possible.

> * a paranoid HTTP-like server that wants to wait for the client to close
>  the read end after shutting down the write end
>
> Either some data is lost or the program busy-waits for the data.

And it would wait forever if the client fails to tell it after failing
in polling for output?

I can't see any way to fix this short of uncombining the EOFs again.
Setting POLLHUP would at best reduce to busy-waiting since poll() would
always return immediately due to the POLLHUP.

> I think the POLLHUP setting before this commit was correct for sockets:
> POLLHUP is set if both directions are closed/error. An EOF that only
> affects one direction sets the corresponding POLLIN/POLLOUT so that the
> EOF becomes known but the other direction can still be used normally.
>
> (The POSIX spec explicitly describes something like this for POLLIN
> (zero length message) although it erroneously restricts it to STREAMS
> files only; the POLLOUT case has to be derived from the fact that the
> read end should still work normally but the EOF should be notified.)

I thought that the zero-length message part of the spec just requires
poll() to not block if read() would return EOF.  If done in all cases
of EOF, this makes POLLHUP almost useless for at least fifos.  If done
only in special cases, then no one, especially POSIX, knows when it
is done.  Note that POSIX has much better wording for requiring select()
on a read descriptor to not block if read() would return EOF.  This makes
select() useless for detecting certain state changes involving EOF on at
least fifos, and previous rounds of "fixes" in this area (about 6 years
ago) broke it away from POSIX.  The POLLINIGNEOF stuff is a failed attempt
to avoid breaking select() in this way while making poll() more useful.

> I think poll on fifos should instead be fixed by closing the
> half-connection corresponding to writing from fi_readsock to
> fi_writesock. I have tried this out, see attached patch.

Interesting.  I wonder what the corresponding magic for socketpair()
and the 4.4BSD implementation of pipes is (I couldn't find any shutdown()
there, and with fully bidirectional pipes or sockets it is unclear how
EOF can ever be detected, since a r/w descriptor can always talk to
itself).

> With the patch,
> pipepoll only gives "expected POLLHUP; got POLLIN | POLLHUP" errors and
> an error in fifo case 6b caused by our distinction between new and old
> readers.

6b tests more breakage to POSIX spec.  Another thing to decide is
whether to do this.  Could someone test this on newer versions of
Linux?  I use a hackish kernel change (essentially, throw away the
distinction by corrupting the generation count) to pass this test.

> tools/regression/poll does not test shutdown(2) interaction, so it does
> not find this problem.

More tests would be welcome.  I have only a few hackish ones for POLLOUT.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090824093730.V39303>