From owner-freebsd-bugs@FreeBSD.ORG Sun Mar 26 13:00:37 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8994416A444 for ; Sun, 26 Mar 2006 13:00:37 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id CFF8643D48 for ; Sun, 26 Mar 2006 13:00:36 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k2QD0aeD067503 for ; Sun, 26 Mar 2006 13:00:36 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k2QD0a2c067502; Sun, 26 Mar 2006 13:00:36 GMT (envelope-from gnats) Date: Sun, 26 Mar 2006 13:00:36 GMT Message-Id: <200603261300.k2QD0a2c067502@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Bruce Evans Cc: Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Bruce Evans List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Mar 2006 13:00:37 -0000 The following reply was made to PR kern/94772; it has been noted by GNATS. From: Bruce Evans To: Oliver Fromme Cc: bug-followup@freebsd.org Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken Date: Sun, 26 Mar 2006 23:57:15 +1100 (EST) On Thu, 23 Mar 2006, Oliver Fromme wrote: > OK, here are new patches. I wrote and tested them on > RELENG_6, but they should apply to HEAD as well, AFAICT. > > With these patches, all of the test programs pass with > success (no output), i.e. the select test and the poll > test. My own test program from the beginning of this > PR passes without problems, too. > > --- ./fs/fifofs/fifo_vnops.c.orig Tue Mar 21 09:42:32 2006 > +++ ./fs/fifofs/fifo_vnops.c Thu Mar 23 19:57:21 2006 > @@ -231,6 +231,12 @@ > wakeup(&fip->fi_writers); > sowwakeup(fip->fi_writesock); > } > + else if (ap->a_mode & O_NONBLOCK) { > + SOCKBUF_LOCK(&fip->fi_readsock->so_rcv); > + fip->fi_readsock->so_rcv.sb_state |= > + SBS_EOFNOHUP; > + SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv); > + } > } > } > if (ap->a_mode & FWRITE) { This flags seems to be necessary (I couldn't recover it from the SBS_CANT* clags like I hoped to). The flag needs to be set even in the !O_NONBLOCK case since otherwise it doesn't get set if the first reader is !O_NONBLOCK and a later reader (before any writers) is O_NONBLOCK. Clearing of this flag seems to be missing in cases where all readers go away before any writers appear. I cnaged the sense of the flag and renamed it to SBS_COULDRCV. > @@ -241,7 +247,8 @@ > fip->fi_writers++; > if (fip->fi_writers == 1) { > SOCKBUF_LOCK(&fip->fi_readsock->so_rcv); > - fip->fi_readsock->so_rcv.sb_state &= ~SBS_CANTRCVMORE; > + fip->fi_readsock->so_rcv.sb_state &= > + ~(SBS_CANTRCVMORE | SBS_EOFNOHUP); > SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv); > if (fip->fi_readers > 0) { > wakeup(&fip->fi_readers); OK. > @@ -661,37 +668,23 @@ > int levents, revents = 0; > > fip = fp->f_data; > - levents = events & > - (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND); > + levents = events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND); > if ((fp->f_flag & FREAD) && levents) { > - /* > - * If POLLIN or POLLRDNORM is requested and POLLINIGNEOF is > - * not, then convert the first two to the last one. This > - * tells the socket poll function to ignore EOF so that we > - * block if there is no writer (and no data). Callers can > - * set POLLINIGNEOF to get non-blocking behavior. > - */ > - if (levents & (POLLIN | POLLRDNORM) && > - !(levents & POLLINIGNEOF)) { > - levents &= ~(POLLIN | POLLRDNORM); > - levents |= POLLINIGNEOF; > - } > - > filetmp.f_data = fip->fi_readsock; > filetmp.f_cred = cred; > revents |= soo_poll(&filetmp, levents, cred, td); > - > - /* Reverse the above conversion. */ > - if ((revents & POLLINIGNEOF) && !(events & POLLINIGNEOF)) { > - revents |= (events & (POLLIN | POLLRDNORM)); > - revents &= ~POLLINIGNEOF; > - } > } > levents = events & (POLLOUT | POLLWRNORM | POLLWRBAND); > if ((fp->f_flag & FWRITE) && levents) { > filetmp.f_data = fip->fi_writesock; > filetmp.f_cred = cred; > revents |= soo_poll(&filetmp, levents, cred, td); > + } OK. > + if (revents & POLLHUP) { > + SOCKBUF_LOCK(&fip->fi_readsock->so_rcv); > + if (fip->fi_readsock->so_rcv.sb_state & SBS_EOFNOHUP) > + revents = (revents & ~POLLHUP) | POLLHUPIGNEOF; > + SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv); > } > return (revents); > } I think the locking here isn't useful (locking for reading a flag generally isn't since there is little difference if you lose a race before, during or after reading the flag. Locking the whole function might be needed. There is now locking for the whole of sopoll(). In the old version of -current that I use there isn't any locking at all there. The above became more complicated when I set POLLIN together with POLLHUP in sopoll(). Then the above needed to clear POLLIN together with POLLHUP but only if there is no data. Clearing flags in callers is ugly and causes the following bug: sopoll() has decided not to call selrecord() if revents != 0; the above mainains revents != 0 but pollscan() may clear the final flag POLLHUPIGNEOF out of revents and then we sleep without having called selrecord(). I didn't notice anything breaking from this. The zero timeout in the regression tests prevents problems there and probably nothing else in my system has used a fifo lately. I think I fixed these problems by moving all the decisions to sopoll(). pollscan() sets a flag POLLPOLL and fifofs maintains SBS_COULDRCV so that sopoll() can decide. Now I wonder about sleeping with or without selrecord() for silly combinations of poll flags. I think events == 0 causes a sleep that is not terminated by hangup since sopoll() is not called so it doesn't even get a chance to set POLLHUP. I think this case should cause a sleep that is terminated by hangup (but nothing else except a timeout or signal). > --- ./kern/uipc_socket.c.orig Wed Dec 28 19:05:13 2005 > +++ ./kern/uipc_socket.c Thu Mar 23 22:50:33 2006 > @@ -2033,16 +2033,15 @@ > SOCKBUF_LOCK(&so->so_snd); > SOCKBUF_LOCK(&so->so_rcv); The version that you use has the locking here too. > if (events & (POLLIN | POLLRDNORM)) > - if (soreadable(so)) > - revents |= events & (POLLIN | POLLRDNORM); > - > - if (events & POLLINIGNEOF) > if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat || > !TAILQ_EMPTY(&so->so_comp) || so->so_error) > - revents |= POLLINIGNEOF; > + revents |= events & (POLLIN | POLLRDNORM); > > - if (events & (POLLOUT | POLLWRNORM)) > - if (sowriteable(so)) > + if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) || > + (so->so_snd.sb_state & SBS_CANTSENDMORE)) > + revents |= POLLHUP; > + else > + if (events & (POLLOUT | POLLWRNORM) && sowriteable(so)) > revents |= events & (POLLOUT | POLLWRNORM); > > if (events & (POLLPRI | POLLRDBAND)) > @@ -2050,9 +2049,7 @@ > revents |= events & (POLLPRI | POLLRDBAND); > > if (revents == 0) { > - if (events & > - (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | > - POLLRDBAND)) { > + if (events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND)) { > selrecord(td, &so->so_rcv.sb_sel); > so->so_rcv.sb_flags |= SB_SEL; > } This all worked as intended. It broke lat_rpc because (as intended) the condition for setting POLLIN became a subset of soreadable(), so POLLIN no longer gets set at pure EOF, but lat_rpc depends on it being set. The rest of the changes are simple and worked. I forgot to mention another source of inconsistencies: kqueue. kqueue mainly uses the SB_CANT* flags for sockets and fifos to decide EOF, so it should work reasonably provided sopoll() does, but it will behave like select() in the special case unless it is changed to behave like poll(). Here is my work-in-progress version. The patch is relative to a very old version of FreeBSD. %%% Index: fifo_vnops.c =================================================================== RCS file: /home/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v retrieving revision 1.100 diff -u -2 -r1.100 fifo_vnops.c --- fifo_vnops.c 23 Jun 2004 00:35:50 -0000 1.100 +++ fifo_vnops.c 26 Mar 2006 09:42:47 -0000 @@ -232,4 +232,22 @@ fip->fi_readers++; if (fip->fi_readers == 1) { + SOCKBUF_LOCK(&fip->fi_readsock->so_rcv); + if (fip->fi_writers > 0) + fip->fi_readsock->so_rcv.sb_state |= + SBS_COULDRCV; + else + /* + * Sloppy? Might be necessary to clear it + * in all the places where fi_readers is + * decremented to 0. I think only writers + * polling for input could be confused by + * having it not set, and there is a problem + * with these anyway now that we have + * reversed the sense of the flag -- they + * now block (?), but shouldn't. + */ + fip->fi_readsock->so_rcv.sb_state &= + ~SBS_COULDRCV; + SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv); SOCKBUF_LOCK(&fip->fi_writesock->so_snd); fip->fi_writesock->so_snd.sb_state &= ~SBS_CANTSENDMORE; @@ -248,7 +266,8 @@ fip->fi_writers++; if (fip->fi_writers == 1) { - SOCKBUF_LOCK(&fip->fi_writesock->so_rcv); + SOCKBUF_LOCK(&fip->fi_readsock->so_rcv); fip->fi_readsock->so_rcv.sb_state &= ~SBS_CANTRCVMORE; - SOCKBUF_UNLOCK(&fip->fi_writesock->so_rcv); + fip->fi_readsock->so_rcv.sb_state |= SBS_COULDRCV; + SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv); if (fip->fi_readers > 0) { wakeup(&fip->fi_readers); @@ -521,32 +540,11 @@ int events, revents = 0; - events = ap->a_events & - (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND); + events = ap->a_events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND); if (events) { - /* - * If POLLIN or POLLRDNORM is requested and POLLINIGNEOF is - * not, then convert the first two to the last one. This - * tells the socket poll function to ignore EOF so that we - * block if there is no writer (and no data). Callers can - * set POLLINIGNEOF to get non-blocking behavior. - */ - if (events & (POLLIN | POLLRDNORM) && - !(events & POLLINIGNEOF)) { - events &= ~(POLLIN | POLLRDNORM); - events |= POLLINIGNEOF; - } - filetmp.f_data = ap->a_vp->v_fifoinfo->fi_readsock; filetmp.f_cred = ap->a_cred; - if (filetmp.f_data) - revents |= soo_poll(&filetmp, events, - ap->a_td->td_ucred, ap->a_td); - - /* Reverse the above conversion. */ - if ((revents & POLLINIGNEOF) && - !(ap->a_events & POLLINIGNEOF)) { - revents |= (ap->a_events & (POLLIN | POLLRDNORM)); - revents &= ~POLLINIGNEOF; - } + revents |= soo_poll(&filetmp, + events | (ap->a_events & POLLPOLL), ap->a_td->td_ucred, + ap->a_td); } events = ap->a_events & (POLLOUT | POLLWRNORM | POLLWRBAND); @@ -554,8 +552,6 @@ filetmp.f_data = ap->a_vp->v_fifoinfo->fi_writesock; filetmp.f_cred = ap->a_cred; - if (filetmp.f_data) { - revents |= soo_poll(&filetmp, events, - ap->a_td->td_ucred, ap->a_td); - } + revents |= soo_poll(&filetmp, events, ap->a_td->td_ucred, + ap->a_td); } return (revents); Index: sys_generic.c =================================================================== RCS file: /home/ncvs/src/sys/kern/sys_generic.c,v retrieving revision 1.131 diff -u -2 -r1.131 sys_generic.c --- sys_generic.c 5 Apr 2004 21:03:35 -0000 1.131 +++ sys_generic.c 26 Mar 2006 05:41:48 -0000 @@ -1093,6 +1080,6 @@ * POLLERR if appropriate. */ - fds->revents = fo_poll(fp, fds->events, - td->td_ucred, td); + fds->revents = fo_poll(fp, + fds->events | POLLPOLL, td->td_ucred, td); if (fds->revents != 0) n++; Index: uipc_socket.c =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.189 diff -u -2 -r1.189 uipc_socket.c --- uipc_socket.c 24 Jun 2004 04:28:30 -0000 1.189 +++ uipc_socket.c 26 Mar 2006 09:27:47 -0000 @@ -1872,13 +1870,43 @@ revents |= events & (POLLIN | POLLRDNORM); - if (events & POLLINIGNEOF) - if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat || - !TAILQ_EMPTY(&so->so_comp) || so->so_error) - revents |= POLLINIGNEOF; - if (events & (POLLOUT | POLLWRNORM)) if (sowriteable(so)) revents |= events & (POLLOUT | POLLWRNORM); + /* + * SBS_CANTRCVMORE (which is checked by soreadable()) normally + * implies EOF (and thus readable) and hung up, but for + * compatibility with other systems and to obtain behavior that + * is otherwise unavailable we make the case of poll() on a fifo + * that has never had any writers during the lifetime of any + * current reader special: then we pretend that the fifo is + * unreadable unless it contains non-null data, and that it is + * not hung up. The POLLPOLL flag is set by poll() to identify + * poll() here, and the SBS_COULDRCV flag is set by the fifo + * layer to indicate a fifo that is not in the special state. + */ + if (so->so_rcv.sb_state & SBS_CANTRCVMORE) { + if (!(events & POLLPOLL) || so->so_rcv.sb_state & SBS_COULDRCV) + revents |= POLLHUP; /* finish settings */ + else if (!(so->so_rcv.sb_cc >= so->so_rcv.sb_lowat || + !TAILQ_EMPTY(&so->so_comp) || so->so_error)) + revents &= ~(POLLIN | POLLRDNORM); /* undo settings */ + } + + /* + * Testing of hangup for writers could be optimized by combining + * it with testing for writeability, but we keep the test separate + * and with the same organization as the test for readers for + * clarity. Note that writeable implies not hung up, so if POLLHUP + * is set here then (POLLOUT | POLLWRNORM) is not set above, as + * standards require. Less obviously, if POLLHUP was set above for + * a reader, then the output flags cannot have been set above for + * a writer. Even less obviously, we cannot end up with both + * POLLHUP output flags set in revents after ORing the revents for + * the read and write socket in fifo_poll(). + */ + if (so->so_snd.sb_state & SBS_CANTSENDMORE) + revents |= POLLHUP; + if (events & (POLLPRI | POLLRDBAND)) if (so->so_oobmark || (so->so_rcv.sb_state & SBS_RCVATMARK)) @@ -1886,7 +1914,5 @@ if (revents == 0) { - if (events & - (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | - POLLRDBAND)) { + if (events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND)) { SOCKBUF_LOCK(&so->so_rcv); selrecord(td, &so->so_rcv.sb_sel); Index: poll.h =================================================================== RCS file: /home/ncvs/src/sys/sys/poll.h,v retrieving revision 1.13 diff -u -2 -r1.13 poll.h --- poll.h 10 Jul 2002 04:47:25 -0000 1.13 +++ poll.h 26 Mar 2006 07:56:52 -0000 @@ -67,7 +67,6 @@ #define POLLWRBAND 0x0100 /* OOB/Urgent data can be written */ -#if __BSD_VISIBLE -/* General FreeBSD extension (currently only supported for sockets): */ -#define POLLINIGNEOF 0x2000 /* like POLLIN, except ignore EOF */ +#ifdef _KERNEL +#define POLLPOLL 0x8000 /* system call is actually poll() */ #endif Index: socketvar.h =================================================================== RCS file: /home/ncvs/src/sys/sys/socketvar.h,v retrieving revision 1.130 diff -u -2 -r1.130 socketvar.h --- socketvar.h 24 Jun 2004 04:27:10 -0000 1.130 +++ socketvar.h 26 Mar 2006 08:35:56 -0000 @@ -212,4 +212,5 @@ #define SBS_CANTRCVMORE 0x0020 /* can't receive more data from peer */ #define SBS_RCVATMARK 0x0040 /* at mark on input */ +#define SBS_COULDRCV 0x0080 /* could receive previously (or now) */ /* Index: syscalls.c =================================================================== RCS file: /home/ncvs/src/usr.bin/truss/syscalls.c,v retrieving revision 1.39 diff -u -2 -r1.39 syscalls.c --- syscalls.c 11 Jun 2004 11:58:07 -0000 1.39 +++ syscalls.c 25 Mar 2006 13:25:13 -0000 @@ -402,5 +402,5 @@ #define POLLKNOWN_EVENTS \ (POLLIN | POLLPRI | POLLOUT | POLLERR | POLLHUP | POLLNVAL | \ - POLLRDNORM |POLLRDBAND | POLLWRBAND | POLLINIGNEOF) + POLLRDNORM |POLLRDBAND | POLLWRBAND) u += snprintf(tmp + used, per_fd, %%% Bruce