From owner-freebsd-standards@FreeBSD.ORG  Fri Sep  2 20:08:12 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BEA95106564A;
	Fri,  2 Sep 2011 20:08:12 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx08.syd.optusnet.com.au
	(fallbackmx08.syd.optusnet.com.au [211.29.132.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 4243C8FC15;
	Fri,  2 Sep 2011 20:08:11 +0000 (UTC)
Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au
	[211.29.132.183])
	by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p82HaMib008579; Sat, 3 Sep 2011 03:36:22 +1000
Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au
	(c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191])
	by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p82HaHtX015971
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 3 Sep 2011 03:36:18 +1000
Date: Sat, 3 Sep 2011 03:36:17 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Jilles Tjoelker <jilles@stack.nl>
In-Reply-To: <20110902104018.GA12845@stack.nl>
Message-ID: <20110903015445.J957@besplex.bde.org>
References: <4E60A1B8.7080607@FreeBSD.org> <20110902104018.GA12845@stack.nl>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-net@freebsd.org, freebsd-standards@freebsd.org,
	Andriy Gapon <avg@freebsd.org>
Subject: Re: POLLHUP on never connected socket
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 20:08:12 -0000

On Fri, 2 Sep 2011, Jilles Tjoelker wrote:

> On Fri, Sep 02, 2011 at 12:28:24PM +0300, Andriy Gapon wrote:
>> I see a problem where FreeBSD kernel (recent head) returns POLLHUP
>> _alone_ (0x10) for a socket that has never been connected - a client
>> socket for which connect(2) failed.  There is also a piece of software
>> that doesn't expect that flag and exhibits illogical behavior because
>> of it.
>
> That would be a bug in that software. Returning POLLHUP alone is clearly
> valid.

I might be the author of this (kib committed my fixes).  POLLHUP alone
is certainly valid, but it is only supposed to happen after a
disconnection.  My tests (in /usr/src/tools/ regression/poll/pipepoll.c)
check for this, but only for fifos.  My fixes were mostly to make
POLLHUP work at all (actually set it on hangup, so that old readers
see it), and to make all the flags properly sticky for incomplete new
connections (this includes POLLHUP).  From pipepoll.c, about the old
reader and a bug in the opposite direction (excessive dependency on
POLLHUP in gdb).

% 	/*
% 	 * Now we have no writer, but should still have data from the old
% 	 * writer.  Check that we have both a data-readable condition and a
% 	 * hangup condition, and that the data can be read in the usual way.
% 	 * Since Linux does this, programs must not quit reading when they
% 	 * see POLLHUP; they must see POLLHUP without POLLIN (or another
% 	 * input condition) before they decide that there is EOF.  gdb-6.1.1
% 	 * is an example of a broken program that quits on POLLHUP only --
% 	 * see its event-loop.c.
% 	 */
% ...

Then for a new reader, with and without an old reader (this is only tested
for fifos since the other supported cases don't have open() and always
create a pair of fd's):

% 	if (filetype == FT_FIFO) {
% 		/*
% 		 * Check that POLLHUP is sticky for a new reader and for
% 		 * the old reader.
% 		 */
% 		fd2 = open(FIFONAME, O_RDONLY | O_NONBLOCK);
% 		if (fd2 < 0)
% 			err(1, "open for read");
% 		pfd.fd = fd2;
% 		if (poll(&pfd, 1, 0) < 0)
% 			err(1, "poll");
% 		report(num++, "6b", POLLHUP, pfd.revents);

This checks that a new reader on a new fd (while the old reader is
still active on the old fd) only sees only POLLHUP.  It shouldn't see
POLLIN since the test has arranged for there to be no data (otherwise
it should see data from the old connection).  It shouldn't see POLLOUT
sice it is read-only (but see below).  And it should see POLLHUP since
there _was_ a connection and there isn't a new one yet, especially
since the old reader still has it partially open.

% 		pfd.fd = fd;
% 		if (poll(&pfd, 1, 0) < 0)
% 			err(1, "poll");
% 		report(num++, "6c", POLLHUP, pfd.revents);

Check the same for the old reader.

% 		close(fd2);
% 		if (poll(&pfd, 1, 0) < 0)
% 			err(1, "poll");
% 		report(num++, "6d", POLLHUP, pfd.revents);
% 	}


Check that POLLHUP persists even after the old reader goes away.  IIRC,
this might be different if the old reader goes away before the new
reader arrives.  It would be reasonable to clear POLLHUP when all
readers, all writers and all data go away.  But when there is data left,
POLLIN must be kept to tell new readers that there is data, and POLLHUP
must be kept to tell them that this data is old.

The tests are very incomplete for sockets, and cover mainly fifos and
normal pipes, with just socketpair() for sockets.  The only change
that I made to pipepoll.c after it was committed was to check POLLOUT.
POLLOUT is normally set (it is set even for a r/o open()!?).  But
when POLLHUP is set, POLLOUT must be clear (IIRC this is about the one
thing that is specified non-fuzzily by POSIX).  Thus POLLOUT is expected
to be clear when POLLHUP is expected to be set in the above checks, and
applications don't need to check for POLLHUP if they only want to do
output, and POLLHUP is almost unnecessary (since it almost tracks
POLLHUP); POLLHUP is most useful in connection with POLLIN, especially
if the fd is r/o and POLLOUT isn't bogusly set for r/o descriptors.
There are the following states for POLLHUP | POLLIN:
- clear: have a connection but no data
- POLLIN: have a connection and data
- POLLHUP: had a connection; have no data
- POLLHUP | POLLIN: had a connection; have data left by it (or a previous
   one)
Most differences with Linux and bugs are in this area.  IIRC, Linux
gets this right for some file types but not others.  See the test
output.

>> This is how POSIX describes POLLHUP:
>> POLLHUP
>>     The device has been disconnected. This event and POLLOUT are
>> mutually-exclusive; a stream can never be writable if a hangup has
>> occurred.  However, this event and POLLIN, POLLRDNORM, POLLRDBAND, or
>> POLLPRI are not mutually-exclusive. This flag is only valid in the
>> revents bitmask; it shall be ignored in the events member.

FreeBSD attempts to implement exactly this, with "has been" interpreted
as strictly in the past, and the largest possible orthogonality in the
setting oF POLLHUP vs the input flags.

> Together with the description of POLLIN which says that data may be read
> without blocking, this seems to mean that if POLLHUP is set alone, there
> is no point in calling read() since it will return 0 or -1 immediately.
> If POLLHUP and POLLIN are both set, there is still data in the buffer

Indeed.  I just remembered that the importance of these flags is more for
how they affect the !O_NONBLOCK case that for polling in the O_NONBLOCK
case (there is lots of redundancy for the latter).  Only POLLIN is
critical.  If there is data, then a new !O_NONBLOCK open won't block,
and read() won't block, so POLLIN must be set and poll() must not block
to be consistent.  When there is no data but there has been a disconnection,
new readers normally block, but if they insist on using O_NONBLOCK for
open(), then they will get a hint about the state from POLLHUP.  IIRC,
POLLHUP causes read() to not block, and the only hint that a reader gets
that it should not spin trying the read is that POLLHUP but not POLLIN
is set.  This spin must be avoided by old readers, and keeping the same
state for new readers is very delicate since some choices stop non-blocking
opens from being usable to wait for a new writer.  The choices made are
supposed to be the least worse and the most compatible possible with
Linux (see old PRs).

Also, select() is more problematic than poll() since it can't return
POLLHUP.  It esentially has to OR POLLHUP into its input-ready condition,
since it must not block() on input if there is a hangup but no data.
ISTR that these semantics had to be modified a little so that it doesn't
block for an old hangup, though the old hangup can still be seen using
poll().  See the select() tests.

> (in FreeBSD, POLLHUP | POLLIN is often returned even if there is no
> data, to keep buggy applications from breaking).

Is it?  I think I fixed this in my version.  kib didn't want to commit
something; maybe it was this.  This breaks non-buggy applications that
depend on POLLIN being cleared to detect EOF.  POLLHUP cannot be used
to detect EOF (see the above comment about gdb (gdb loses the final
input even on pipes for operations like "echo $command | gdb" when the
writer happens to close the pipe first)).  Neither poll() nor read()
will block if either POLLHUP or POLLIN is set, so POLLIN must be clear
to avoid an endless spin trying to read EOF if poll() is trusted.
select() can't tell the difference between hangup and no data, so
appliations using it must use another method to avoid the spin.  Old
versions of gdb used select() and then read() returning 0 as giving
EOF.  This works in most cases, but has races if there are multple
readers.

> A complicating factor is that there are situations where read() returns
> 0, but yet it is not a hangup. For example, if a user types the EOF
> character (such as ^D) in a terminal, this causes read() to return 0 but
> the terminal remains writable (and writing to it may even block). This
> is the "zero length message" which is under the obsolescent STREAMS
> marking, even though it applies to terminals in all implementations.

Old gdb works on terminals.  This is because select() knows the difference
between no data on a terminal and the EOF caused by a disconnection, so
it blocks if there is no data and no disconnection.  When it succeeds, it
means that there is data or a disconnection and then read() returning 0
means that it is a disconnection, provided there is only 1 reader and
no kernel conditions that clear the data.

The "zero length message" stuff is part of the fuzziness in POSIX.  The
specification is less incomplete for STREAMS, but for other file types
it is unclear whether zero length data is data.  I think it has to
count as data for select() but not for poll(), since select() doesn't
have POLLHUP so it has no cause other than null data available to
make it return on disconnection, while for poll() counting null data
as data makes it impossible for poll() to block waiting for data after
a disconnection and thus "fix" the problem with select() not being able
to block.  EIther choice causes problems and I forget exactly what is
done now.

> Similarly, if the other side has done the equivalent of
> shutdown(SHUT_WR), read() will return 0 but POLLHUP must not be returned
> as it would prevent the application from waiting for space in the
> outgoing socket buffer; therefore POLLIN must be returned. Similar
> considerations apply to other shutdown() calls (but note that
> shutdown(SHUT_RD) is not sent to the other side).

I'm not very familiar with user-mode socket programming.  In the kernel,
POLLHUP mostly follows SBS_CANTRCVMORE and SBS_CANTSENDMORE, and these
flags work very orthogonally, more so than is possible with POLLHUP.
Hmm, don't user-mode sockets (except bound ones?) go away when all
the readers and writers go away, unlike for fifos?  So the case of
new readers with no old readers on an old connection can't happen.

> The other way around, read() may fail after poll() did not return
> POLLHUP if the connection had closed in the meantime.
> 
>> For me "disconnected" _implies_ that the device should have been
>> connected first.  But this is not explicitly said anywhere.  Also, I
>> think it's possible that a socket gets connected and immediately
>> disconnected (before poll(2) is called), then the POLLHUP would be
>> appropriate in any interpretation.  So, I am inclined to think that
>> the software should check for POLLHUP.  But I would like to ask your
>> opinion since the problem appears to be FreeBSD-specific.
>
> There was a connection attempt in progress that failed at some point,
> which seems close enough to "disconnected". Alternatively, POLLERR could
> be set but could activate more bugs in applications.

It seems likely that there was a transient connection to cause this
problem.  Do we know for sure?

I don't like using POLLERR since it provides almost no information.  A
transient connection isn't an error.

> Comparisons with things like "connected and immediately disconnected"
> and "connection closed after poll() returned something" are useful in
> reasoning about this.

The former isn't really an error.  Only failure establishing a connection
might be an error.  But can that even give a connection?

Bruce