Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Jun 2010 05:41:39 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Martin Simmons <martin@lispworks.com>
Cc:        freebsd-bugs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org
Subject:   Re: kern/147647: select wakes after 24 hours even if timeout is longer
Message-ID:  <20100608043255.K2970@besplex.bde.org>
In-Reply-To: <201006071200.o57C0tcF097821@www.freebsd.org>
References:  <201006071200.o57C0tcF097821@www.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 7 Jun 2010, Martin Simmons wrote:

>> Description:
> The select system call wakes indicating timeout after 24 hours even if the timeout specifies a longer delay.
>
>> Fix:
> I think the problem is that seltdwait is implemented using cv_timedwait_sig, which returns EWOULDBLOCK on timeout.  The calls to seltdwait in kern_select etc limit the timeout to a maximum of 24 * 60 * 60 * hz but interpret EWOULDBLOCK as a reason to exit rather than checking the actual timeout.

The bug may be mainly that the limiting code is garbage.  It defeats the
careful scaling of tvtohz(), and invokes undefined behaviour in the check
if hz is preposterously large.

I use the following fix in FreeBSD-~5.

% Index: sys_generic.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/kern/sys_generic.c,v
% retrieving revision 1.131
% diff -u -2 -r1.131 sys_generic.c
% --- sys_generic.c	5 Apr 2004 21:03:35 -0000	1.131
% +++ sys_generic.c	13 Aug 2009 11:21:29 -0000
% @@ -830,13 +819,10 @@
%  		ttv = atv;
%  		timevalsub(&ttv, &rtv);
% -		timo = ttv.tv_sec > 24 * 60 * 60 ?
% -		    24 * 60 * 60 * hz : tvtohz(&ttv);
% +		timo = tvtohz(&ttv);
%  	}
% 
%  	/*
% -	 * An event of interest may occur while we do not hold
% -	 * sellock, so check TDF_SELECT and the number of
% -	 * collisions and rescan the file descriptors if
% -	 * necessary.
% +	 * An event of interest may have occurred while we did not hold
% +	 * sellock.  Check for this and rescan if necessary.
%  	 */
%  	mtx_lock_spin(&sched_lock);
% @@ -1016,12 +1008,8 @@
%  		ttv = atv;
%  		timevalsub(&ttv, &rtv);
% -		timo = ttv.tv_sec > 24 * 60 * 60 ?
% -		    24 * 60 * 60 * hz : tvtohz(&ttv);
% +		timo = tvtohz(&ttv);
%  	}
% -	/*
% -	 * An event of interest may occur while we do not hold
% -	 * sellock, so check TDF_SELECT and the number of collisions
% -	 * and rescan the file descriptors if necessary.
% -	 */
% +
% +	/* Rescan if necessary, as above. */
%  	mtx_lock_spin(&sched_lock);
%  	if ((td->td_flags & TDF_SELECT) == 0 || nselcoll != ncoll) {

This also fixes style bugs in an unrelated comment, and has the same fixes
for poll().

When hz is preposterously large, the 24 * 60 * 60 * hz invokes undefined
behaviour by overflowing.  This bug is missing in tvtohz().

When hz is merely large, the timeout in seconds is somewhat limited and
the bug probably still occurs.  tvtohz() returns an int, so the timeout
is limited to INT_MAX / hz seconds (actually 1 or 2 ticks less).  tvtohz()
clamps to (INT_MAX - 1) in this case (1 less than the max so that 1 can
be added without checking).  The too-large default hz of 1000 allows
the timeout to be up to nearly 25 days.

seltdwait() (like any function that can time out) is supposed to fail
with EWOULDBLOCK if the timeout expires, since this is the API for
timeout expiry and seltdwait() has no knowledge of any longer timeout
at a higher level.  select() and poll() should check if the timeout
expired and restart with a reduced timeout, but apparently don't.

These bugs are both missing in kern_nanosleep().  Smaller bugs involving
sleeping unnecessarily long are present there too.  Clock drift is
normally only a few seconds per day, but innacuracies in the timeout
calibration can be much larger than that (I think up to about 40 seconds
per day is possible and unavoidable at the hardware level for an i8254
clk0 with the too-large default hz of 1000).  If these innacuracies
result in a timeout being late then there is no way to recover.

I just noticed another bug in nanosleep().  POSIX.1-2001 requires it
to use CLOCK_REALTIME, but in FreeBSD it uses CLOCK_MONOTONIC.
POSIX.1-2001 provides clock_nanosleep() to support specifying the
clock, but FreeBSD doesn't support this yet.  The POSIX rationale
says that nanosleep() was intentionally not modified to use
CLOCK_MONOTONIC.  I couldn't find anything in POSIX about the clock
used for select() and poll() timeouts.  Only CLOCK_MONOTONIC makes
sense, but that is true for nanosleep() too.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100608043255.K2970>