From owner-svn-src-all@freebsd.org  Sat Feb 27 21:16:38 2016
Return-Path: <owner-svn-src-all@freebsd.org>
Delivered-To: svn-src-all@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 67235AB683D;
 Sat, 27 Feb 2016 21:16:38 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 1405F19A9;
 Sat, 27 Feb 2016 21:16:37 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c110-21-41-193.carlnfd1.nsw.optusnet.com.au
 (c110-21-41-193.carlnfd1.nsw.optusnet.com.au [110.21.41.193])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id E0FDD3C436F;
 Sun, 28 Feb 2016 08:16:26 +1100 (AEDT)
Date: Sun, 28 Feb 2016 08:16:25 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Jilles Tjoelker <jilles@stack.nl>
cc: Pedro Giffuni <pfg@freebsd.org>, Ronald Klop <ronald-lists@klop.ws>, 
 src-committers@freebsd.org, svn-src-all@freebsd.org, 
 svn-src-head@freebsd.org
Subject: Re: svn commit: r296109 - head/libexec/rlogind
In-Reply-To: <20160227183841.GA62612@stack.nl>
Message-ID: <20160228071106.Y838@besplex.bde.org>
References: <201602262002.u1QK2298094838@repo.freebsd.org>
 <op.ydhtgxz8kndu52@53555a16.cm-6-6b.dynamic.ziggo.nl>
 <56D1B725.4000506@FreeBSD.org> <20160227183841.GA62612@stack.nl>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=EfU1O6SC c=1 sm=1 tr=0
 a=73JWPhLeruqQCjN69UNZtQ==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=D56GzbUmr8MTfhCQDr8A:9
 a=CjuIK1q_8ugA:10
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Feb 2016 21:16:38 -0000

On Sat, 27 Feb 2016, Jilles Tjoelker wrote:

> On Sat, Feb 27, 2016 at 09:48:05AM -0500, Pedro Giffuni wrote:
>> In the case of rlogind, note that the above limitation [FD_SETSIZE]
>> has disappeared by using poll(2).
>
>> I will add that FreeBSD has a native poll(2) implementation, it is
>> not a wrapper around select as some old postings would suggest.
>
>> I don't have any plans to do a massive shakeup against select(2), this
>> was some lower hanging fruit that was easy to change. For new code
>> kqueue(2) should be preferred.
>
> The FD_SETSIZE can be a more important issue in library code which may
> be called from applications that have many descriptors open already.
>
> I don't agree with always using kqueue(2) for new code. Provided poll(2)
> has the necessary functionality and the number of file descriptors is
> low, using poll(2) tends to result in simpler code and better
> performance. Also, poll(2) is more portable and avoids a failure mode
> from the kqueues rlimit.

All the conversions to poll() seem to be buggy.  poll() gives better
detection of EOF, but you have to actually check for it or you get
different behaviour from select().  gdb in FreeBSD has been broken
for 10-15 years by misconversion to poll().  E.g.:

     ttyv1:bde@besplex:~> echo "p 2+2" | gdb -q
     (gdb) Hangup detected on fd 0
     error detected on stdin

This is because it fails to check for POLLIN on stdin when it sees
POLLHUP on stdin.  It loses all the buffered input.  I tried more
input.  It loses all input for a buffer size of < 8K, then starts
seeing some.  The details depend on the kernel buffering.  PIPE_BUF
is only 512 but the kernel normally uses larger buffers than that.

kdump output for this:

    987 cat      CALL  write(0x1,0x8060000,0x1fff)
    987 cat      GIO   fd 1 wrote 4096 bytes
        "
 	p 2+2
 	p 2+2
 	...
 	p 2"
    987 cat      RET   write 8191/0x1fff
    987 cat      CALL  read(0x3,0x8060000,0x4000)
    987 cat      GIO   fd 3 read 0 bytes
        ""
    987 cat      RET   read 0
    987 cat      CALL  close(0x3)
    987 cat      RET   close 0
    987 cat      CALL  close(0x1)
    987 cat      RET   close 0
    987 cat      CALL  exit(0)
    986 sh       RET   wait4 987/0x3db
    986 sh       CALL  wait4(0xffffffff,0xbfbfe7c8,0x2,0)
    988 gdb      RET   poll 1
    988 gdb      CALL  write(0x1,0x8300000,0x18)
    988 gdb      GIO   fd 1 wrote 24 bytes
        "Hangup detected on fd 0
        "
    988 gdb      RET   write 24/0x18
    988 gdb      CALL  write(0x1,0x8300000,0x18)
    988 gdb      GIO   fd 1 wrote 24 bytes
        "error detected on stdin
        "
    988 gdb      RET   write 24/0x18
    988 gdb      CALL  sigprocmask(0x1,0,0xbfbfe33c)
    988 gdb      RET   sigprocmask 0
    988 gdb      CALL  exit(0)
    986 sh       RET   wait4 988/0x3dc
    986 sh       CALL  exit(0)

The changes have the opposite problem.  They check for POLLIN but not
POLLHUP.  This depends on the unportable and broken but possibly
permittied behaviour of poll() always setting POLLIN when it sets
POLLHUP.

select() has no way to report POLLHUP, so it converts POLLHUP to
input-ready.  poll() should do the following:
- for hangup with buffered input, as in the above pipe case, set
   POLLHUP to indicate the hangup and POLLIN to indicate the data
- for hangup with no buffered input, set POLLHUP to indicate the
   hangup and clear POLLIN to indicate no data
but the usual misimplementation does:
- always set POLLIN when setting POLLHUP.  This makes poll() behave
   more like select(), so that buggy conversions to poll() work, but
   it means that POLLIN cannot be trusted for determining if more
   than non-null data can be read.  This is always done for regular
   files IIRC.  For regular files, there is never any buffered input
   POLLHUP with POLLIN really means POLLHUP without POLLIN.
Another possible misimplementation that I have not observed is:
- delay setting POLLHUP until all buffered input has been read or
   discardered.  This would work OK for terminals since terminals
   are specified to discard input on hangup (but there are many races
   at low levels with the timing of the hangup and the input -- many
   drivers lose input much like gdb does -- they deliver the hangup
   to upper layers synchronously, but deliver buffered input later).
   However, for pipes the input is not discarded, so POLLHUP should
   not be delayed, so that applications can decide if they want to
   discard the input on hanhyp.

The poll regression tests check which version of the bugs are implemented
for pipes and sockets.  They only pass for some file types because the
bugs are implemented for others.  The buggy cases are system-dependent.

Due to the bugs, POLLIN together with POLLHUP can never be trusted.
Applications should use similar code to after select() to detect EOF:

 	switch (events & (POLLHUP | POLLIN)) {
 	case POLLHUP:
 		/* Non-buggy case. */
 		exit(0);
 	case POLLIN:
 		/* Normal input. */
 		nr = read(fd, buf, size);
 		if (nr == 0)
 			goto maybe_eof;	/* likely EOF, but check again */
 		handle_input(nr);
 		break;
 	case POLLHUP | POLLIN:
 		nr = read(fd, buf, size);
 		if (nr == 0)
 			/*
 			 * There may have been input that was eaten by a
 			 * race, but any new input would be associated with
 			 * a new connection and we don't want to see it
 			 * even if POLLHUP is not sticky for this file type.
 			 */
 			exit(0);
 	}

or in a sloppier version, just convert POLLHUP to POLLIN like select() does,
and assume that that other threads don't eat the input so that reading 0
means EOF:

 	if (events & POLLHUP)
 		events |= POLLIN;
 	if (events & POLLIN) {
 		nr = read(fd, buf, size);
 		if (nr == 0)
 			exit(0);	/* only likely EOF, but don't check */
 	}
 	handle_input(nr);

Bruce