Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Sep 2007 14:55:35 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Julian Elischer <julian@elischer.org>
Cc:        Andre Oppermann <andre@freebsd.org>, arch@freebsd.org
Subject:   Re: 64bit ticks, was Re: Changing p_swtime and td_slptime to ticks
Message-ID:  <20070919133525.E78370@besplex.bde.org>
In-Reply-To: <46F0656A.7030802@elischer.org>
References:  <20070917165657.B558@10.0.0.1> <46EF644E.9050207@elischer.org> <20070918012555.G558@10.0.0.1> <46EFE4BD.4030505@freebsd.org> <20070918142115.C558@10.0.0.1> <20070918153536.D558@10.0.0.1> <46F05F88.5060809@errno.com> <46F0656A.7030802@elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 18 Sep 2007, Julian Elischer wrote:

> Sam Leffler wrote:
>> Jeff Roberson wrote:
>>> On Tue, 18 Sep 2007, Jeff Roberson wrote:
>>> 
>>>> On Tue, 18 Sep 2007, Andre Oppermann wrote:
>>>>> ...
>>>>> ticks is 2^31 on x86 and at HZ=1000 is wraps within a reasonable
>>>>> short uptime.  You have to make sure that your code handles that
>>>>> correctly or you run into lots of strange effects which are almost
>>>>> impossible to reproduce.  In TCP we've got bitten by that.
>>>> 
>>>> Thanks Andre, this is a good point.  For the td_slptime I don't think 
>>>> it's of practical concern.  However, for swtime I think I will convert it 
>>>> then to seconds from boot.

IIRC, the patch only uses deltas of `ticks', so its wraparound doesn't
matter, provided you actually use it often enough.  Often enough is
at least once every 497 days with HZ = 100, or just every 49.7 days
with HZ = 1000.

>>> Is there a good reason for not making ticks 64bit?

I think this would be only a small pessimization, unless accesses to
`ticks' are fully locked or made atomic.  The clock interrupt handler
can interrupt almost anything, but almost nothing locks out clock
interrupts.  The lock for the write access to `ticks' in hardclock() is
(rather bogusly) callout_lock, and sofclock() uses this lock for read
accesses without really trying, but generic code can't be expected to
know about this and in fact doesn't know about it -- `ticks' seems to
be used mainly in netinet where there is no callout_lock'ing in sight.
Without locking: with a 32-bit `ticks', read accesses not using atomic_read()
are probably atomic, so the race would just give an out of date value,
and this shouldn't matter -- the reader cannot tell the difference
between a value that is out of date when it is read or one that becomes
out of date 1 instruction after it is read; with a 64-bit `ticks' on
32-bit machines, read accesses would not be atomic and the value would
sometimes be garbage.

>>> math involving this 
>>> value is relatively infrequent.  Bruce?  Any comments?  It'd sure let us 
>>> forget all of these counter wrapping problems.
>> 
>> ticks is used a lot.  I'd rather set hz back to 100 by default.  This 
>> approach is a perfect example of ignoring low-end platforms.

I agree.  However, one variable isn't going to make much difference.  fs
code is full of unecessary 64-bit calculations but no one notices the
real-time pessimization from this, at least on non-low-end platforms (I
only notice the code bloat).

> but it still needs to work on systems with high hz values.

Just changing its type won't fix all such problems, but will cause new
ones.  E.g., in tcp_timer():

% 	int tick = ticks;

This will truncate `ticks'.  Truncation is the right thing to do in most
places that only need a small delta, but...

% ...
% 	CTR6(KTR_NET, "%p %s inp %p active %x tick %i nextc %i",
% 	    tp, __func__, inp, tt->tt_active, tick, tt->tt_nextc);

Easy to detect breakage of the printf format.  Old printf format errors:
%p for types other than void *.

% ...
% 	if (tick < tt->tt_nextc)
% 		goto rescan;

Comparison of truncated value, OK (since both tick and tt_nextc have
type int and int overflow is benign on all supported machines).  Types
should be u_int to avoid undefined behaviour on int overflow.

% ...
% 		if (tp->t_state != TCPS_TIME_WAIT &&
% 		   (ticks - tp->t_rcvtime) <= tcp_maxidle)
% 			tcp_timer_activate(tp, TT_2MSL, tcp_keepintvl);

This is already broken on all supported machines, since t_rcvtime has
type u_long which is incompatible with the type of both `ticks' and
tcp_maxidle (both int).  All the types should be truncated to a common
unsigned type to get defined behaviour that is not too hard to understand.
I think the current behaviour is benign overflow -- t_rcvtime is set
to an earlier value of `ticks', and the difference never grows very
large.  However, changing `ticks' to 64 bits breaks this on non-64-bit
machines, -- `ticks' may grow large, but the copy of it in t_rcvtime
is limited to u_long = 32 bits on non-64-bit machines, so after about
2^32 ticks of uptime (ticks - tp->t_rcvtime) will always exceed
tcp_maxidle.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070919133525.E78370>