Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Feb 2019 16:55:42 +0200
From:      Konstantin Belousov <kib@freebsd.org>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes
Message-ID:  <20190228145542.GT2420@kib.kiev.ua>
In-Reply-To: <D3D7E9F4-9A5E-4320-B3C8-EC5CEF4A2764@yahoo.com>
References:  <D3D7E9F4-9A5E-4320-B3C8-EC5CEF4A2764@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Feb 28, 2019 at 05:06:23AM -0800, Mark Millard via freebsd-ppc wrote:
> Basic context:
> 
> The code for sleeps of various forms depends
> on calls to:
> 
> static __inline sbintime_t
> sbinuptime(void)
> {
>         struct bintime _bt;
> 
>         binuptime(&_bt);
>         return (bttosbt(_bt));
> }
> 
> and comparisons with the return values, such
> as checking for timeouts. The upper 32 bits
> of the unsigned 64 bit result has the seconds
> and the lower 32 bits has the fraction as
> a multiplier of 1sec/(2**64).
> 
> An observed problem is that later sbinuptime calls
> sometimes end up with smaller values than earlier
> ones. (Past lisy freebsd-ppc messages give details.)
> This makes for problems with checking for timeouts
> when using later sbinuptime() calls after a timeout
> was initially detected against an earlier value:
> 
> A.0) timercb getting the earlier sbinuptime() value
> A.1) callout_process using that to detect a timeout,
> B)   sleepq_timeout checking the timeout again,
>      using a separate sbinuptime() call.
> 
> Some details about example values, overflows, and such follow.
> 
> I used the following sort of hacked code to report values when
> overflows happen:
> 
> #if defined(__powerpc64__) && defined(AIM)
> void
> binuptime(struct bintime *bt)
> {
>         struct timehands *th;
>         u_int gen;
> 
>         struct timecounter *tc; // HACK!!!
>         u_int tim_cnt, tim_offset, tim_diff; // HACK!!!
>         uint64_t freq, scale_factor, diff_scaled; // HACK!!!
> 
>         do {
>                 th = timehands;
>                 tc = th->th_counter; // HACK!!!
>                 gen = atomic_load_acq_int(&th->th_generation);
>                 *bt = th->th_offset;
>                 tim_cnt= tc->tc_get_timecount(tc); // HACK!!! (steps of tc_diff with values recorded)
>                 tim_offset= th->th_offset_count; // HACK!!!
>                 tim_diff= (tim_cnt - tim_offset) & tc->tc_counter_mask; // HACK!!!
>                 scale_factor= th->th_scale; // HACK!!!
>                 diff_scaled= scale_factor * tim_diff; // HACK!!!
>                 bintime_addx(bt, diff_scaled); // HACK!!!
>                 freq= tc->tc_frequency; // HACK!!!
>                 atomic_thread_fence_acq();
>         } while (gen == 0 || gen != th->th_generation);
> 
>         if (*(volatile uint64_t*)0xc0000000000000f0==0u && (0xffffffffffffffffull/scale_factor)<tim_diff) { // HACK!!!
>                 *(volatile uint64_t*)0xc0000000000000d0= freq;
>                 *(volatile uint64_t*)0xc0000000000000d8= scale_factor;
>                 *(volatile    u_int*)0xc0000000000000e0= tim_offset;
>                 *(volatile    u_int*)0xc0000000000000e4= tim_cnt;
>                 *(volatile    u_int*)0xc0000000000000e8= tim_diff;
>                 *(volatile uint64_t*)0xc0000000000000f0= diff_scaled;
>                 *(volatile uint64_t*)0xc0000000000000f8= scale_factor*freq;
>                 __asm__ ("sync");
>         }
> }
> #else
> . . .
> #endif
> 
> (mtfb() is used to provide the tc->tc_get_timecount(tc)
> value --but only the lower 32 bits are extracted and
> returned.)
> 
> Basically whenever tim_diff is such that:
> 
> (0xffffffffffffffff/scale_factor)<tim_dif
> 
> then diff_scaled overflows an unsigned, 64 bit representation,
> ending up with just the least 64 bits. This truncated value
> ends up being used in:
> 
>                 bintime_addx(bt, diff_scaled);
> 
> Observed consistently for tc->tc_frequency:
> 
> tc->tc_frequency == 0x1fca055 (i.e., 33333333)
> 
> ( tc->tc_counter_mask is 0xfffffffful as well. )
> 
> An example observation of diff_scaled having an overflowed
> value is:
> 
> scale_factor            == 0x80da2067ac
> scale_factor*freq overflows unsigned, 64 bit representation.
> tim_offset              ==   0x3da0eaeb
> tim_cnt                 ==   0x42dea3c4
> tim_diff                ==    0x53db8d9
> For reference:                0x1fc9d43 == 0xffffffffffffffffull/scale_factor
> scaled_diff       == 0xA353A5BF3FF780CC (truncated to 64 bits)
> 
> So scale_factor * tim_diff leaves diff_scaled truncated to
> the least significant 64 bits, which does not preserve
> ordering properties.
> 
> Another example:
> 
> scale_factor      ==       0x80d95962c0
> scale_factor*freq == 0xfffffffffd65c9c0
> tim_offset        ==         0x4d1fb8e2
> tim_cnt           ==         0x4d1fb8e1
> tim_diff          ==         0xffffffff
> For reference:                0x1fca055 == 0xffffffffffffffffull/scale_factor
> scaled_diff       == 0xD959623F26A69D40 (truncated to 64 bits)
> 
> Again the diff_scaled holds a truncated value from
> scale_factor * tim_diff .
> 
> Another example:
> 
> scale_factor      ==       0x80da20c940
> scale_factor*freq overflows unsigned, 64 bit representation.
> tim_offset        ==         0x9a7f5cdb
> tim_cnt           ==          0xb26bbd5
> tim_diff          ==         0x70a75efa
> For reference:                0x1fc9d41 == 0xffffffffffffffffull/scale_factor
> scaled_diff       == 0xB3AC715C56AA0880 (truncated to 64 bits)
> 
> Again the diff_scaled holds a truncated value from
> scale_factor * tim_diff .
> 
> Note that the scale_factor does vary.
> 

Try the following (I did not even booted it).  If worked out, ffclock
counterpart also needs the patching.

diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 2656fb4d22f..19e81bbf023 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
@@ -355,13 +355,20 @@ void
 binuptime(struct bintime *bt)
 {
 	struct timehands *th;
-	u_int gen;
+	uint64_t scale;
+	u_int delta, gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
 		*bt = th->th_offset;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		scale = th->th_scale;
+		delta = tc_delta(th);
+		if (fls(scale) + fls(delta) > 63) {
+			bt->sec += (scale >> 32) * delta;
+			scale &= UINT_MAX;
+		}
+		bintime_addx(bt, scale * delta);
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 }
@@ -388,13 +395,20 @@ void
 bintime(struct bintime *bt)
 {
 	struct timehands *th;
-	u_int gen;
+	uint64_t scale;
+	u_int delta, gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
 		*bt = th->th_bintime;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		scale = th->th_scale;
+		delta = tc_delta(th);
+		if (fls(scale) + fls(delta) > 63) {
+			bt->sec += (scale >> 32) * delta;
+			scale &= UINT_MAX;
+		}
+		bintime_addx(bt, scale * delta);
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 }



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190228145542.GT2420>