Date: Thu, 18 Apr 2019 22:17:46 -0700 From: Mark Millard <marklmi@yahoo.com> To: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org> Cc: Bruce Evans <brde@optusnet.com.au>, Konstantin Belousov <kib@freebsd.org> Subject: Re: powerpc64 or 32-bit power context: FreeBSD lwsync use vs. th->th_generation handling (and related th-> fields) [Correction] Message-ID: <0FD9ED28-EF4B-4A1C-9FCE-81C4D5BAEBF1@yahoo.com> In-Reply-To: <50CFD7F1-6892-4375-967B-4713517C2520@yahoo.com> References: <50CFD7F1-6892-4375-967B-4713517C2520@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[I caught my mental mistake.] On 2019-Apr-18, at 21:36, Mark Millard <marklmi@yahoo.com> wrote: > First I review below lwsync behavior. It is based on a = comparison/contrast > paper for the powerpc vs. arm memory models. It sets context for later > material specific to powerpc64 or 32-bit powerpc FreeBSD. >=20 > "For a write before a read, separated by a lwsync, the barrier will = ensure that the write is > committed before the read is satisfied but lets the read be satisfied = before the write has > been propagated to any other thread." >=20 > (By contrast, sync, guarantees that the write has propagated to all = threads before the > read in question is satisfied, the read having been separated from the = write by the > sync.) >=20 > Another wording in case it helps (from the same paper): >=20 > "The POWER lwsync does *not* ensure that writes before the barrier = have propagated to > any other thread before sequent actions, though it does keep writes = before and after > an lwsync in order as far as [each thread is] concerned". (Original = used plural form: > "all threads are". I tired to avoid any potential implication of cross = (hardware) > "thread" ordering constraints for seeing the updates when lwsync is = used.) >=20 >=20 > Next I note FreeBSD powerpc64 and 32-bit powerpc details > that happen to involve lwsync, though lwsync is not the > only issue: >=20 > atomic_store_rel_int(&th->th_generation, ogen); >=20 > and: >=20 > gen =3D atomic_load_acq_int(&th->th_generation); >=20 > with: >=20 > static __inline void \ > atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v) \ > { \ > \ > powerpc_lwsync(); \ > *p =3D v; \ > } >=20 > and: >=20 > static __inline u_##TYPE \ > atomic_load_acq_##TYPE(volatile u_##TYPE *p) \ > { \ > u_##TYPE v; \ > \ > v =3D *p; \ > powerpc_lwsync(); \ > return (v); \ > } \ >=20 > also: >=20 > static __inline void > atomic_thread_fence_acq(void) > { >=20 > powerpc_lwsync(); > } >=20 >=20 >=20 > First I list a simpler-than-full-context example to > try to make things clearer . . . >=20 > Here is a sequence, listing in an overall time > order, omitting other activity, despite the distinct > cpus, (N!=3DM): >=20 >=20 > (Presume th->th_generation=3D=3Dogen-1 initially, then:) >=20 > cpu N: atomic_store_rel_int(&th->th_generation, ogen); > (same th value as for cpu M below) >=20 > cpu M: gen =3D atomic_load_acq_int(&th->th_generation); >=20 >=20 > For the above sequence: >=20 > There is no barrier between the store and the later > load at all. This is important below. >=20 >=20 > So, if I have that much right . . . >=20 > Now for more actual "load side" context: > (Presume, for simplicity, that there is only one=20 > timehands instance instead of 2 or more timehands. So > th does not vary below and is the same on both cpu's > in the later example sequence of activity.) >=20 > do { > th =3D timehands; > gen =3D atomic_load_acq_int(&th->th_generation); > *bt =3D th->th_offset; > bintime_addx(bt, th->th_scale * tc_delta(th)); > atomic_thread_fence_acq(); > } while (gen =3D=3D 0 || gen !=3D th->th_generation); >=20 > For simplicity of referring to things: I again show > a specific sequence in time. I only show the > &th->th_generation activity from cpu N, again for > simplicity. >=20 > (Presume timehands->th_generation=3D=3Dogen-1 initially > and that M!=3DN:) >=20 > cpu M: th =3D timehands; > (Could be after the "cpu N" lines.) >=20 > cpu N: atomic_store_rel_int(&th->th_generation, ogen); > (same th value as for cpu M) >=20 > cpu M: gen =3D atomic_load_acq_int(&th->th_generation); > cpu M: *bt =3D th->th_offset; > cpu M: bintime_addx(bt, th->th_scale * tc_delta(th)); > cpu M: atomic_thread_fence_acq(); > cpu M: gen !=3D th->th_generation > (evaluated to false or to true) >=20 > So here: >=20 > A) gen ends up with: gen=3D=3Dogen-1 || gen=3D=3Dogen > (either is allowed because of the lack of > any barrier between the store and the > involved load). >=20 > B) When gen=3D=3Dogen: there was no barrier > before the assignment to gen to guarantee > other th-> field-value staging relationships. (B) is just wrong: seeing the new value (ogen) does guarantee some about the other th->=20 field-value staging relationships seen, given the lwsync before the store and after the load. > C) When gen=3D=3Dogen: gen!=3Dth->th_generation false > does not guarantee the *bt=3D. . . and > bintime_addx(. . .) activities were based > on a coherent set of th-> field-values. Without (B), (C) does not follow. > If I'm correct about (C) then the likes of the > binuptime and sbinuptime implementations appear > to be broken on powerpc64 and 32-bit powerpc > unless there are extra guarantees always present. >=20 > So have I found at least a powerpc64/32-bit-powerpc > FreeBSD implementation problem? No: I did not find a problem. > Note: While I'm still testing, I've seen problems > on the two 970MP based 2-socket/2-cores-each G5 > PowerMac11,2's that I've so far not seen on three > 2-socket/1-core-each PowerMacs, two such 7455 G4 > PowerMac3,6's and one such 970 G5 PowerMac7,2. > The two PowerMac11,2's are far more tested at > this point. But proving that any test-failure is > specifically because of (C) is problematical. >=20 >=20 > Note: arm apparently has no equivalent of lwsync, > just of sync (aka. hwsync and sync 0). If I > understand correctly, PowerPC/Power has the weakest > memory model of the modern tier-1/tier-2 > architectures and, so, they might be broken for > memory model handling when everything else is > working. >=20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0FD9ED28-EF4B-4A1C-9FCE-81C4D5BAEBF1>