Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Jan 2003 22:21:31 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Alfred Perlstein <bright@mu.org>
Cc:        Poul-Henning Kamp <phk@FreeBSD.org>, <cvs-committers@FreeBSD.org>, <cvs-all@FreeBSD.org>
Subject:   Re: cvs commit: src/sys/fs/fifofs fifo_vnops.c
Message-ID:  <20030103213903.I2945-100000@gamplex.bde.org>
In-Reply-To: <20030102204913.GM26140@elvis.mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2 Jan 2003, Alfred Perlstein wrote:

>     216         if (ap->a_mode & FWRITE) {
>     217                 fip->fi_writers++;
>     218                 if (fip->fi_writers == 1) {
>     219                         fip->fi_readsock->so_state &= ~SS_CANTRCVMORE;
>     220                         if (fip->fi_readers > 0) {
>     221                                 wakeup((caddr_t)&fip->fi_readers);
>     222                                 sorwakeup(fip->fi_writesock);
>     223                         }
>     224                 }
>     225         }
>
> This this code is confusing because we have a producer consumer
> relationship between readers and writers.  I think that the problem
> is that in this section of code the wakeup should be on fi_writers.

Certainly not.  There is only 1 writer (ourself), and we must wake up
the readers.

The bug may in sendmail, or a race with readers coming and going.  It
can probably be localized by fixing the error handling in the timeout:

% 			while (fip->fi_readers == 0) {
% 				VOP_UNLOCK(vp, 0, td);
% 				/*
% 				 * XXX: Some race I havn't located is solved
% 				 * by timing out after a sec.  Race seen when
% 				 * sendmail hangs here during boot /phk
% 				 */
% 				error = tsleep((caddr_t)&fip->fi_writers,
% 				    PCATCH | PSOCK, "fifoow", hz);
% 				vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
% 				if (error)
% 					goto bad;
% 			}

This should handle the case of a timeout specially.  tlseep() returns
EWOULDBLOCK for timeouts.  If we have a timeout then we should continue
so as to check for the loop condition (fip->fi_readers == 0).  However,
this makes a race obvious: if a reader comes and goes while the writer
is in the above tsleep(), then the writer will never see the reader --
it will wake up normally and then go back to sleep, except with the
timeout it will wake up abnormally 1 second later and return the bogus
result EWOULDBLOCK (== EAGAIN, which can't happen for blocking writes).
If there is no reader within the first second, then the timeout just
breaks blocking writes to fifos.

I think the loop condition needs to involve a has-been-a-reader-since-
this-write-started flag, or the loop simply shouldn't exist (provided
the wakeup only occurs if there has been a reader or a signal; then
we know that we had a reader if tsleep() returns 0).  SS_CANTSENDMORE
is a (negative logic) has-been-a-reader flag, but IIRC it is perfectly
unsuitable here because it must be cleared (negatively) on close so
that new writers aren't affected by previous readers.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030103213903.I2945-100000>