Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Mar 2019 21:42:17 +0200
From:      Konstantin Belousov <kib@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Mark Millard <marklmi@yahoo.com>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID:  <20190301194217.GB68879@kib.kiev.ua>
In-Reply-To: <20190302043936.A4444@besplex.bde.org>
References:  <D3D7E9F4-9A5E-4320-B3C8-EC5CEF4A2764@yahoo.com> <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua> <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 02, 2019 at 05:40:58AM +1100, Bruce Evans wrote:
> On Fri, 1 Mar 2019, Konstantin Belousov wrote:
> 
> > On Thu, Feb 28, 2019 at 11:39:06PM -0800, Mark Millard wrote:
> >> [The new, trial code also has truncation occurring.]
> > In fact no, I do not think it is.
> >
> >> An example observation of diff_scaled having an overflowed
> >> value was:
> >>
> >> scale_factor            == 0x80da2067ac
> >> scale_factor*freq overflows unsigned, 64 bit representation.
> >> tim_offset              ==   0x3da0eaeb
> >> tim_cnt                 ==   0x42dea3c4
> >> tim_diff                ==    0x53db8d9
> >> For reference:                0x1fc9d43 == 0xffffffffffffffffull/scale_factor
> >> scaled_diff       == 0xA353A5BF3FF780CC (truncated to 64 bits)
> >>
> >> But for the new, trail code:
> >>
> >> 0x80da2067ac is 40 bits
> >>    0x53db8d9 is 27 bits
> >> So 67 bits, more than 63. Then:
> >>
> >>    x
> >> == (0x80da2067ac>>32) * 0x53db8d9
> >> == 0x80 * 0x53db8d9
> >> == 0x29EDC6C80
> >>
> >>    x>>32
> >> == 0x2
> >>
> >>    x<<32
> >> == 0x9EDC6C8000000000 (limited to 64 bits)
> >> Note the truncation of: 0x29EDC6C8000000000.
> > Right, this is how the patch is supposed to work.  Note that the overflow
> > bits 'lost' due to overflow of the left shift are the same bits that as
> > used to increment bt->sec:
> > 	bt->sec += x >> 32;
> > So the 2 seconds are accounted for.
> >
> >>
> >> Thus the "bintime_addx(bt, x << 32)" is still
> >> based on a truncated value.
> >
> > I must admit that 2 seconds of interval where the timehands where
> > not updated is too much.  This might be the real cause of all ppc
> > troubles.  I tried to see if the overflow case is possible on amd64,
> > and did not get a single case of the '> 63' branch executed during the
> > /usr/tests/lib/libc run.
> 
> The algorithm requires the update interval to be less than 1 second.
> th_scale is 2**64 / tc_frequency, so whatever tc_frequency is, after
> 1 second the value of the multiplication is approximately 2**64 so
> it overflows about then (depending on rounding).
> 
> The most useful timecounters are TSC's, and these give another overflow
> in tc_delta() after 1 second when their frequency is 4 GHz (except the
> bogus TSC-low timecounter reduces the frequency to below 2 binary GHz,
> so the usual case is overflow after 2 seconds).
As I said, I was unable to trigger the overflow on amd64.

> 
> > Actually, the same overflow-prone code exists in libc, so below is the
> > updated patch:
> > - I added __predict_false()
> > - libc multiplication is also done separately for high-order bits.
> > (fftclock counterpart is still pending).
> >
> > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c
> > index 3749e0473af..a14576988ff 100644
> > --- a/lib/libc/sys/__vdso_gettimeofday.c
> > +++ b/lib/libc/sys/__vdso_gettimeofday.c
> > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$");
> > #include <sys/time.h>
> > #include <sys/vdso.h>
> > #include <errno.h>
> > +#include <limits.h>
> > +#include <strings.h>
> > #include <time.h>
> > #include <machine/atomic.h>
> > #include "libc_private.h"
> > @@ -62,6 +64,7 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs)
> > {
> > 	struct vdso_timehands *th;
> > 	uint32_t curr, gen;
> > +	uint64_t scale, x;
> > 	u_int delta;
> > 	int error;
> >
> > @@ -78,7 +81,14 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs)
> > 			continue;
> > 		if (error != 0)
> > 			return (error);
> > -		bintime_addx(bt, th->th_scale * delta);
> > +		scale = th->th_scale;
> > +		if (__predict_false(fls(scale) + fls(delta) > 63)) {
> 
> This is unnecessarily pessimal.  Updates must be frequent enough to prevent
> tc_delta() overflowing, and it is even easier to arrange that this
> multiplication doesn't overflow (since the necessary update interval for
> the latter is constant).
> 
> `scale' is 64 bits, so fls(scale) is broken on 32-bit arches, and
> flls(scale) is an especially large pessimization.  I saw this on my
> calcru1() fixes -- the flls()s take almost as long as long long
> divisions when the use the pessimal C versions.
Ok, fixed.

> 
> The algorithm requires tc_delta() to be only 32 bits, since otherwise the
> multiplication would be 64 x 64 bits so would be much slower and harder
> to write.
> 
> If tc_freqency is far above 4GHz, then th_scale is far below 4G, so the
> scaling is not so accurate.  But 0.25 parts per billion is much more than
> enough.  Even 1 part per million is enough for a TSC, since TSC instability
> is more than 1ppm.  The overflows could be pushed off to 1024 seconds by
> dividing by 1024 at suitable places.  A 32-bit scale times a 64-bit delta
> would be simple compared with both 64 bits (much like the code here).
> 
> The code here can be optimized using values calculated at initialization
> time instead of fls*().  The overflow threshold for delta is approximately
> 2**64 / tc_frequency.
> 
> > +			x = (scale >> 32) * delta;
> > +			scale &= UINT_MAX;
> > +			bt->sec += x >> 32;
> > +			bintime_addx(bt, x << 32);
> > +		}
> > +		bintime_addx(bt, scale * delta);
> > 		if (abs)
> > 			bintime_add(bt, &th->th_boottime);
> 
> When the timecounter is the i8254, as it often was when timecounters
> were new, tc_windup() had to be called more often than every i8254 rollover
> (in practice once every hardclock tick), partly to keep tc_delta() small
> (since rollover gives a form of overflow).  This was not so easy to arrange.
> It requires not losing any hardclock ticks and also not having any with
> high latency and also complications to detect the rollover when there is
> small latency.  Most hardware is easier to handle now.  With tickless
> kernels, hardclock() is often not called for about 1 second, but it must
> be called at least that often to prevent the overflow here.

Updated patch.

diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c
index 3749e0473af..fdefda08e39 100644
--- a/lib/libc/sys/__vdso_gettimeofday.c
+++ b/lib/libc/sys/__vdso_gettimeofday.c
@@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$");
 #include <sys/time.h>
 #include <sys/vdso.h>
 #include <errno.h>
+#include <limits.h>
+#include <strings.h>
 #include <time.h>
 #include <machine/atomic.h>
 #include "libc_private.h"
@@ -62,7 +64,8 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs)
 {
 	struct vdso_timehands *th;
 	uint32_t curr, gen;
-	u_int delta;
+	uint64_t scale, x;
+	u_int delta, scale_bits;
 	int error;
 
 	do {
@@ -78,7 +81,19 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs)
 			continue;
 		if (error != 0)
 			return (error);
-		bintime_addx(bt, th->th_scale * delta);
+		scale = th->th_scale;
+#ifdef _LP64
+	scale_bits = ffsl(scale);
+#else
+	scale_bits = ffsll(scale);
+#endif
+		if (__predict_false(scale_bits + fls(delta) > 63)) {
+			x = (scale >> 32) * delta;
+			scale &= UINT_MAX;
+			bt->sec += x >> 32;
+			bintime_addx(bt, x << 32);
+		}
+		bintime_addx(bt, scale * delta);
 		if (abs)
 			bintime_add(bt, &th->th_boottime);
 
diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 2656fb4d22f..eedea5183c0 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
@@ -72,6 +72,7 @@ struct timehands {
 	struct timecounter	*th_counter;
 	int64_t			th_adjustment;
 	uint64_t		th_scale;
+	u_int			th_scale_bits;
 	u_int	 		th_offset_count;
 	struct bintime		th_offset;
 	struct bintime		th_bintime;
@@ -355,13 +356,22 @@ void
 binuptime(struct bintime *bt)
 {
 	struct timehands *th;
-	u_int gen;
+	uint64_t scale, x;
+	u_int delta, gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
 		*bt = th->th_offset;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		scale = th->th_scale;
+		delta = tc_delta(th);
+		if (__predict_false(th->th_scale_bits + fls(delta) > 63)) {
+			x = (scale >> 32) * delta;
+			scale &= UINT_MAX;
+			bt->sec += x >> 32;
+			bintime_addx(bt, x << 32);
+		}
+		bintime_addx(bt, scale * delta);
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 }
@@ -388,13 +398,22 @@ void
 bintime(struct bintime *bt)
 {
 	struct timehands *th;
-	u_int gen;
+	uint64_t scale, x;
+	u_int delta, gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
 		*bt = th->th_bintime;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		scale = th->th_scale;
+		delta = tc_delta(th);
+		if (__predict_false(th->th_scale_bits + fls(delta) > 63)) {
+			x = (scale >> 32) * delta;
+			scale &= UINT_MAX;
+			bt->sec += x >> 32;
+			bintime_addx(bt, x << 32);
+		}
+		bintime_addx(bt, scale * delta);
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 }
@@ -1464,6 +1483,11 @@ tc_windup(struct bintime *new_boottimebin)
 	scale += (th->th_adjustment / 1024) * 2199;
 	scale /= th->th_counter->tc_frequency;
 	th->th_scale = scale * 2;
+#ifdef _LP64
+	th->th_scale_bits = ffsl(th->th_scale);
+#else
+	th->th_scale_bits = ffsll(th->th_scale);
+#endif
 
 	/*
 	 * Now that the struct timehands is again consistent, set the new



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190301194217.GB68879>