From owner-freebsd-net@FreeBSD.ORG  Fri Sep  2 20:26:22 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9A3141065672;
	Fri,  2 Sep 2011 20:26:22 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 32DA28FC08;
	Fri,  2 Sep 2011 20:26:21 +0000 (UTC)
Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au
	(c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p82KQJi0032442
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 3 Sep 2011 06:26:20 +1000
Date: Sat, 3 Sep 2011 06:26:19 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Andriy Gapon <avg@FreeBSD.org>
In-Reply-To: <4E6123F4.4010209@FreeBSD.org>
Message-ID: <20110903053813.V2093@besplex.bde.org>
References: <4E60A1B8.7080607@FreeBSD.org> <20110902104018.GA12845@stack.nl>
	<20110903015445.J957@besplex.bde.org> <4E6123F4.4010209@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-net@FreeBSD.org, freebsd-standards@FreeBSD.org,
	Jilles Tjoelker <jilles@stack.nl>
Subject: Re: POLLHUP on never connected socket
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 20:26:22 -0000

On Fri, 2 Sep 2011, Andriy Gapon wrote:

> on 02/09/2011 20:36 Bruce Evans said the following:
>> It seems likely that there was a transient connection to cause this
>> problem.  Do we know for sure?
>
> Not sure if this what you are asking but when I reproduced the problem it was by
> connecting to a local port where no one listened for sure.

Yes, that's what I'm asking.  It should be simpler to fix if it is just
a bug and doesn't involve reader/writer races.  The code for setting
POLLHUP for sockets is simple: from uipc_socket.c:

% 	if ((events & POLLINIGNEOF) == 0) {
% 		if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
% 			revents |= events & (POLLIN | POLLRDNORM);
% 			if (so->so_snd.sb_state & SBS_CANTSENDMORE)
% 				revents |= POLLHUP;
% 		}
% 	}

So POLLHUP tracks SBS_CANTRCVMORE && SBS_CANTSENDMORE exactly, and POLLHUP
is never set without POLLIN.

But in my version:

% 	if ((events & POLLINIGNEOF) == 0) {
% 		if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
% 			if (so->so_snd.sb_state & SBS_CANTSENDMORE)
% 				revents |= POLLHUP;
% 			else
% 				revents |= events & (POLLIN | POLLRDNORM);
% 		}
% 	}

POLLIN is not set by hangup, bit actually means that there is (non-null)
input available.  jilles referred to this bug in -current.

POLLINIGNEOF is an old attempt to fix this.  It gives the caller more
control.  It works poorly, especially since most callers don't know that
it exists.

Perhaps SBS_CANTRCVMORE && SBS_CANTSENDMORE condition is too raw/low-level
even for sockets.  Fifos need more to get the behaviour that they want
(as similar as possible to Linux).  I quote my hackish version since
the version in -current doesn't have any comments and my version does
even more:

>From fifo_open(), for readers:

% 	mtx_lock(&fifo_mtx);
% 	if (ap->a_mode & FREAD) {
% 		fip->fi_readers++;
% 		if (fip->fi_readers == 1) {
% 			SOCKBUF_LOCK(&fip->fi_writesock->so_snd);
% 			fip->fi_writesock->so_snd.sb_state &= ~SBS_CANTSENDMORE;
% 			SOCKBUF_UNLOCK(&fip->fi_writesock->so_snd);
% 			if (fip->fi_writers > 0) {
% 				wakeup(&fip->fi_writers);
% 				sowwakeup(fip->fi_writesock);
% 			}
% 		}
% 		fdp = td->td_proc->p_fd;
% 		FILEDESC_LOCK(fdp);
% 		fp = fget_locked(fdp, ap->a_fdidx);
% 		/* Abuse f_msgcount as a generation count. */
% 		fp->f_msgcount = fip->fi_wgen - fip->fi_writers;

-current has the above line, and fixes the hack using a new field
f_seqcount.

% 		/*
% 		 * XXX quick fix: keep any POLLHUP condition sticky if we
% 		 * now have more than 1 reader, by clobbering f_msgcount
% 		 * here so that POLLINIGNEOF will not be set in
% 		 * fifo_poll_f().  (The above subtraction of fi_writers
% 		 * works similarly but is probably unnecessary since if
% 		 * there is a writer then there won't be a POLLHUP
% 		 * condition.)
% 		 *
% 		 * XXX: the generation counts are probably unnecessary now
% 		 * that this is broken to POSIX spec.  They are to help
% 		 * make any POLLHUP condition sticky only for old readers.
% 		 *
% 		 * XXX: does this hack work right for a new reader that will
% 		 * block?
% 		 */
% 		fp->f_msgcount -= fip->fi_readers - 1;

-current doesn't have the above line or the big comment.

I forget if this is for more compatibility with Linux, or more
correctness.  I think the committed regression tests assert the behaviour
give by this version, so they will certainly fail for -current.  I'm
surer of this for the spurious POLLIN with POLLHUP.

% 		FILEDESC_UNLOCK(fdp);
% 	}

% int
% fifo_poll_f(struct file *fp, int events, struct ucred *cred, struct thread *td)
% {
% 	struct file filetmp;
% 	struct fifoinfo *fip;
% 	int levents, revents = 0;
% 
% 	fip = fp->f_vnode->v_fifoinfo;
% 	levents = events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND);
% 	if (levents) {
% 		filetmp.f_data = fip->fi_readsock;
% 		filetmp.f_cred = fp->f_cred;
% 		if (fp->f_msgcount == fip->fi_wgen)
% 			levents |= POLLINIGNEOF;

-current has this (with s/msgcount/seqcount/).  This is the only place that
seqcount is used.  I thought I removed all uses of POLLINIGNEOF in the
kernel.

% 		revents |= soo_poll(&filetmp, levents, cred, td);
% 	}
% 	levents = events & (POLLOUT | POLLWRNORM | POLLWRBAND);
% 	if (levents) {
% 		filetmp.f_data = fip->fi_writesock;
% 		filetmp.f_cred = fp->f_cred;
% 		revents |= soo_poll(&filetmp, levents, cred, td);

Note that fifo polling calls soo_poll() twice, and it only uses
POLLINIGNEOF for the second call.  Thus the combination always sets
POLLHUP if there is a disconnection.  But the second call doesn't
ask for POLLIN, so in -current there is no spurious POLLIN to echo
POLLHUP in the case where POLLINIGNEOF is used, but there is if it
is not.

-current is subtly different.  Where the above masks out POLLINIGNEOF
in the user `events' in the first assignment to levents, -current keeps
it.  But both versions mask it for the second assignment.  Thus my
version always ignores the user POLLINIGNEOF for polling of fifos,
but -current keeps it partially.  -current thus sees POLLHUP and fails
to completely ignore EOF even if the user passes POLLINIGNEOF, iff the
user events also has one of the output flags set.  The user would have
to call poll() twice, first with only input flags set and then with
output flags set get unmixed behaviour, but this won't work since either
of the calls may block and the output check will never ignore EOF.

% 	}
% 	return (revents);
% }

Bruce