From owner-svn-src-all@freebsd.org Sat Feb 27 21:16:38 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 67235AB683D; Sat, 27 Feb 2016 21:16:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1405F19A9; Sat, 27 Feb 2016 21:16:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c110-21-41-193.carlnfd1.nsw.optusnet.com.au (c110-21-41-193.carlnfd1.nsw.optusnet.com.au [110.21.41.193]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id E0FDD3C436F; Sun, 28 Feb 2016 08:16:26 +1100 (AEDT) Date: Sun, 28 Feb 2016 08:16:25 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jilles Tjoelker cc: Pedro Giffuni , Ronald Klop , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r296109 - head/libexec/rlogind In-Reply-To: <20160227183841.GA62612@stack.nl> Message-ID: <20160228071106.Y838@besplex.bde.org> References: <201602262002.u1QK2298094838@repo.freebsd.org> <56D1B725.4000506@FreeBSD.org> <20160227183841.GA62612@stack.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=EfU1O6SC c=1 sm=1 tr=0 a=73JWPhLeruqQCjN69UNZtQ==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=D56GzbUmr8MTfhCQDr8A:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Feb 2016 21:16:38 -0000 On Sat, 27 Feb 2016, Jilles Tjoelker wrote: > On Sat, Feb 27, 2016 at 09:48:05AM -0500, Pedro Giffuni wrote: >> In the case of rlogind, note that the above limitation [FD_SETSIZE] >> has disappeared by using poll(2). > >> I will add that FreeBSD has a native poll(2) implementation, it is >> not a wrapper around select as some old postings would suggest. > >> I don't have any plans to do a massive shakeup against select(2), this >> was some lower hanging fruit that was easy to change. For new code >> kqueue(2) should be preferred. > > The FD_SETSIZE can be a more important issue in library code which may > be called from applications that have many descriptors open already. > > I don't agree with always using kqueue(2) for new code. Provided poll(2) > has the necessary functionality and the number of file descriptors is > low, using poll(2) tends to result in simpler code and better > performance. Also, poll(2) is more portable and avoids a failure mode > from the kqueues rlimit. All the conversions to poll() seem to be buggy. poll() gives better detection of EOF, but you have to actually check for it or you get different behaviour from select(). gdb in FreeBSD has been broken for 10-15 years by misconversion to poll(). E.g.: ttyv1:bde@besplex:~> echo "p 2+2" | gdb -q (gdb) Hangup detected on fd 0 error detected on stdin This is because it fails to check for POLLIN on stdin when it sees POLLHUP on stdin. It loses all the buffered input. I tried more input. It loses all input for a buffer size of < 8K, then starts seeing some. The details depend on the kernel buffering. PIPE_BUF is only 512 but the kernel normally uses larger buffers than that. kdump output for this: 987 cat CALL write(0x1,0x8060000,0x1fff) 987 cat GIO fd 1 wrote 4096 bytes " p 2+2 p 2+2 ... p 2" 987 cat RET write 8191/0x1fff 987 cat CALL read(0x3,0x8060000,0x4000) 987 cat GIO fd 3 read 0 bytes "" 987 cat RET read 0 987 cat CALL close(0x3) 987 cat RET close 0 987 cat CALL close(0x1) 987 cat RET close 0 987 cat CALL exit(0) 986 sh RET wait4 987/0x3db 986 sh CALL wait4(0xffffffff,0xbfbfe7c8,0x2,0) 988 gdb RET poll 1 988 gdb CALL write(0x1,0x8300000,0x18) 988 gdb GIO fd 1 wrote 24 bytes "Hangup detected on fd 0 " 988 gdb RET write 24/0x18 988 gdb CALL write(0x1,0x8300000,0x18) 988 gdb GIO fd 1 wrote 24 bytes "error detected on stdin " 988 gdb RET write 24/0x18 988 gdb CALL sigprocmask(0x1,0,0xbfbfe33c) 988 gdb RET sigprocmask 0 988 gdb CALL exit(0) 986 sh RET wait4 988/0x3dc 986 sh CALL exit(0) The changes have the opposite problem. They check for POLLIN but not POLLHUP. This depends on the unportable and broken but possibly permittied behaviour of poll() always setting POLLIN when it sets POLLHUP. select() has no way to report POLLHUP, so it converts POLLHUP to input-ready. poll() should do the following: - for hangup with buffered input, as in the above pipe case, set POLLHUP to indicate the hangup and POLLIN to indicate the data - for hangup with no buffered input, set POLLHUP to indicate the hangup and clear POLLIN to indicate no data but the usual misimplementation does: - always set POLLIN when setting POLLHUP. This makes poll() behave more like select(), so that buggy conversions to poll() work, but it means that POLLIN cannot be trusted for determining if more than non-null data can be read. This is always done for regular files IIRC. For regular files, there is never any buffered input POLLHUP with POLLIN really means POLLHUP without POLLIN. Another possible misimplementation that I have not observed is: - delay setting POLLHUP until all buffered input has been read or discardered. This would work OK for terminals since terminals are specified to discard input on hangup (but there are many races at low levels with the timing of the hangup and the input -- many drivers lose input much like gdb does -- they deliver the hangup to upper layers synchronously, but deliver buffered input later). However, for pipes the input is not discarded, so POLLHUP should not be delayed, so that applications can decide if they want to discard the input on hanhyp. The poll regression tests check which version of the bugs are implemented for pipes and sockets. They only pass for some file types because the bugs are implemented for others. The buggy cases are system-dependent. Due to the bugs, POLLIN together with POLLHUP can never be trusted. Applications should use similar code to after select() to detect EOF: switch (events & (POLLHUP | POLLIN)) { case POLLHUP: /* Non-buggy case. */ exit(0); case POLLIN: /* Normal input. */ nr = read(fd, buf, size); if (nr == 0) goto maybe_eof; /* likely EOF, but check again */ handle_input(nr); break; case POLLHUP | POLLIN: nr = read(fd, buf, size); if (nr == 0) /* * There may have been input that was eaten by a * race, but any new input would be associated with * a new connection and we don't want to see it * even if POLLHUP is not sticky for this file type. */ exit(0); } or in a sloppier version, just convert POLLHUP to POLLIN like select() does, and assume that that other threads don't eat the input so that reading 0 means EOF: if (events & POLLHUP) events |= POLLIN; if (events & POLLIN) { nr = read(fd, buf, size); if (nr == 0) exit(0); /* only likely EOF, but don't check */ } handle_input(nr); Bruce