From owner-freebsd-standards@FreeBSD.ORG Fri Sep 2 20:08:12 2011 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BEA95106564A; Fri, 2 Sep 2011 20:08:12 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx08.syd.optusnet.com.au (fallbackmx08.syd.optusnet.com.au [211.29.132.10]) by mx1.freebsd.org (Postfix) with ESMTP id 4243C8FC15; Fri, 2 Sep 2011 20:08:11 +0000 (UTC) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p82HaMib008579; Sat, 3 Sep 2011 03:36:22 +1000 Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p82HaHtX015971 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Sep 2011 03:36:18 +1000 Date: Sat, 3 Sep 2011 03:36:17 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jilles Tjoelker In-Reply-To: <20110902104018.GA12845@stack.nl> Message-ID: <20110903015445.J957@besplex.bde.org> References: <4E60A1B8.7080607@FreeBSD.org> <20110902104018.GA12845@stack.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-standards@freebsd.org, Andriy Gapon Subject: Re: POLLHUP on never connected socket X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 20:08:12 -0000 On Fri, 2 Sep 2011, Jilles Tjoelker wrote: > On Fri, Sep 02, 2011 at 12:28:24PM +0300, Andriy Gapon wrote: >> I see a problem where FreeBSD kernel (recent head) returns POLLHUP >> _alone_ (0x10) for a socket that has never been connected - a client >> socket for which connect(2) failed. There is also a piece of software >> that doesn't expect that flag and exhibits illogical behavior because >> of it. > > That would be a bug in that software. Returning POLLHUP alone is clearly > valid. I might be the author of this (kib committed my fixes). POLLHUP alone is certainly valid, but it is only supposed to happen after a disconnection. My tests (in /usr/src/tools/ regression/poll/pipepoll.c) check for this, but only for fifos. My fixes were mostly to make POLLHUP work at all (actually set it on hangup, so that old readers see it), and to make all the flags properly sticky for incomplete new connections (this includes POLLHUP). From pipepoll.c, about the old reader and a bug in the opposite direction (excessive dependency on POLLHUP in gdb). % /* % * Now we have no writer, but should still have data from the old % * writer. Check that we have both a data-readable condition and a % * hangup condition, and that the data can be read in the usual way. % * Since Linux does this, programs must not quit reading when they % * see POLLHUP; they must see POLLHUP without POLLIN (or another % * input condition) before they decide that there is EOF. gdb-6.1.1 % * is an example of a broken program that quits on POLLHUP only -- % * see its event-loop.c. % */ % ... Then for a new reader, with and without an old reader (this is only tested for fifos since the other supported cases don't have open() and always create a pair of fd's): % if (filetype == FT_FIFO) { % /* % * Check that POLLHUP is sticky for a new reader and for % * the old reader. % */ % fd2 = open(FIFONAME, O_RDONLY | O_NONBLOCK); % if (fd2 < 0) % err(1, "open for read"); % pfd.fd = fd2; % if (poll(&pfd, 1, 0) < 0) % err(1, "poll"); % report(num++, "6b", POLLHUP, pfd.revents); This checks that a new reader on a new fd (while the old reader is still active on the old fd) only sees only POLLHUP. It shouldn't see POLLIN since the test has arranged for there to be no data (otherwise it should see data from the old connection). It shouldn't see POLLOUT sice it is read-only (but see below). And it should see POLLHUP since there _was_ a connection and there isn't a new one yet, especially since the old reader still has it partially open. % pfd.fd = fd; % if (poll(&pfd, 1, 0) < 0) % err(1, "poll"); % report(num++, "6c", POLLHUP, pfd.revents); Check the same for the old reader. % close(fd2); % if (poll(&pfd, 1, 0) < 0) % err(1, "poll"); % report(num++, "6d", POLLHUP, pfd.revents); % } Check that POLLHUP persists even after the old reader goes away. IIRC, this might be different if the old reader goes away before the new reader arrives. It would be reasonable to clear POLLHUP when all readers, all writers and all data go away. But when there is data left, POLLIN must be kept to tell new readers that there is data, and POLLHUP must be kept to tell them that this data is old. The tests are very incomplete for sockets, and cover mainly fifos and normal pipes, with just socketpair() for sockets. The only change that I made to pipepoll.c after it was committed was to check POLLOUT. POLLOUT is normally set (it is set even for a r/o open()!?). But when POLLHUP is set, POLLOUT must be clear (IIRC this is about the one thing that is specified non-fuzzily by POSIX). Thus POLLOUT is expected to be clear when POLLHUP is expected to be set in the above checks, and applications don't need to check for POLLHUP if they only want to do output, and POLLHUP is almost unnecessary (since it almost tracks POLLHUP); POLLHUP is most useful in connection with POLLIN, especially if the fd is r/o and POLLOUT isn't bogusly set for r/o descriptors. There are the following states for POLLHUP | POLLIN: - clear: have a connection but no data - POLLIN: have a connection and data - POLLHUP: had a connection; have no data - POLLHUP | POLLIN: had a connection; have data left by it (or a previous one) Most differences with Linux and bugs are in this area. IIRC, Linux gets this right for some file types but not others. See the test output. >> This is how POSIX describes POLLHUP: >> POLLHUP >> The device has been disconnected. This event and POLLOUT are >> mutually-exclusive; a stream can never be writable if a hangup has >> occurred. However, this event and POLLIN, POLLRDNORM, POLLRDBAND, or >> POLLPRI are not mutually-exclusive. This flag is only valid in the >> revents bitmask; it shall be ignored in the events member. FreeBSD attempts to implement exactly this, with "has been" interpreted as strictly in the past, and the largest possible orthogonality in the setting oF POLLHUP vs the input flags. > Together with the description of POLLIN which says that data may be read > without blocking, this seems to mean that if POLLHUP is set alone, there > is no point in calling read() since it will return 0 or -1 immediately. > If POLLHUP and POLLIN are both set, there is still data in the buffer Indeed. I just remembered that the importance of these flags is more for how they affect the !O_NONBLOCK case that for polling in the O_NONBLOCK case (there is lots of redundancy for the latter). Only POLLIN is critical. If there is data, then a new !O_NONBLOCK open won't block, and read() won't block, so POLLIN must be set and poll() must not block to be consistent. When there is no data but there has been a disconnection, new readers normally block, but if they insist on using O_NONBLOCK for open(), then they will get a hint about the state from POLLHUP. IIRC, POLLHUP causes read() to not block, and the only hint that a reader gets that it should not spin trying the read is that POLLHUP but not POLLIN is set. This spin must be avoided by old readers, and keeping the same state for new readers is very delicate since some choices stop non-blocking opens from being usable to wait for a new writer. The choices made are supposed to be the least worse and the most compatible possible with Linux (see old PRs). Also, select() is more problematic than poll() since it can't return POLLHUP. It esentially has to OR POLLHUP into its input-ready condition, since it must not block() on input if there is a hangup but no data. ISTR that these semantics had to be modified a little so that it doesn't block for an old hangup, though the old hangup can still be seen using poll(). See the select() tests. > (in FreeBSD, POLLHUP | POLLIN is often returned even if there is no > data, to keep buggy applications from breaking). Is it? I think I fixed this in my version. kib didn't want to commit something; maybe it was this. This breaks non-buggy applications that depend on POLLIN being cleared to detect EOF. POLLHUP cannot be used to detect EOF (see the above comment about gdb (gdb loses the final input even on pipes for operations like "echo $command | gdb" when the writer happens to close the pipe first)). Neither poll() nor read() will block if either POLLHUP or POLLIN is set, so POLLIN must be clear to avoid an endless spin trying to read EOF if poll() is trusted. select() can't tell the difference between hangup and no data, so appliations using it must use another method to avoid the spin. Old versions of gdb used select() and then read() returning 0 as giving EOF. This works in most cases, but has races if there are multple readers. > A complicating factor is that there are situations where read() returns > 0, but yet it is not a hangup. For example, if a user types the EOF > character (such as ^D) in a terminal, this causes read() to return 0 but > the terminal remains writable (and writing to it may even block). This > is the "zero length message" which is under the obsolescent STREAMS > marking, even though it applies to terminals in all implementations. Old gdb works on terminals. This is because select() knows the difference between no data on a terminal and the EOF caused by a disconnection, so it blocks if there is no data and no disconnection. When it succeeds, it means that there is data or a disconnection and then read() returning 0 means that it is a disconnection, provided there is only 1 reader and no kernel conditions that clear the data. The "zero length message" stuff is part of the fuzziness in POSIX. The specification is less incomplete for STREAMS, but for other file types it is unclear whether zero length data is data. I think it has to count as data for select() but not for poll(), since select() doesn't have POLLHUP so it has no cause other than null data available to make it return on disconnection, while for poll() counting null data as data makes it impossible for poll() to block waiting for data after a disconnection and thus "fix" the problem with select() not being able to block. EIther choice causes problems and I forget exactly what is done now. > Similarly, if the other side has done the equivalent of > shutdown(SHUT_WR), read() will return 0 but POLLHUP must not be returned > as it would prevent the application from waiting for space in the > outgoing socket buffer; therefore POLLIN must be returned. Similar > considerations apply to other shutdown() calls (but note that > shutdown(SHUT_RD) is not sent to the other side). I'm not very familiar with user-mode socket programming. In the kernel, POLLHUP mostly follows SBS_CANTRCVMORE and SBS_CANTSENDMORE, and these flags work very orthogonally, more so than is possible with POLLHUP. Hmm, don't user-mode sockets (except bound ones?) go away when all the readers and writers go away, unlike for fifos? So the case of new readers with no old readers on an old connection can't happen. > The other way around, read() may fail after poll() did not return > POLLHUP if the connection had closed in the meantime. > >> For me "disconnected" _implies_ that the device should have been >> connected first. But this is not explicitly said anywhere. Also, I >> think it's possible that a socket gets connected and immediately >> disconnected (before poll(2) is called), then the POLLHUP would be >> appropriate in any interpretation. So, I am inclined to think that >> the software should check for POLLHUP. But I would like to ask your >> opinion since the problem appears to be FreeBSD-specific. > > There was a connection attempt in progress that failed at some point, > which seems close enough to "disconnected". Alternatively, POLLERR could > be set but could activate more bugs in applications. It seems likely that there was a transient connection to cause this problem. Do we know for sure? I don't like using POLLERR since it provides almost no information. A transient connection isn't an error. > Comparisons with things like "connected and immediately disconnected" > and "connection closed after poll() returned something" are useful in > reasoning about this. The former isn't really an error. Only failure establishing a connection might be an error. But can that even give a connection? Bruce