Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Feb 2017 01:05:59 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        Justin Hibbits <chmeeedalf@gmail.com>, mjg@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, svn-src-head@freebsd.org, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
Message-ID:  <477BA631-AB85-4E77-8BA3-CD2AFAD5E405@dsl-only.net>
In-Reply-To: <EB9DBDFA-BAE9-4BFF-8E8B-BF7698362A11@dsl-only.net>
References:  <2FD12B8F-2255-470A-98D4-2DCE9C7495F5@dsl-only.net> <20170220191044.GA8526@dft-labs.eu> <5D5235E1-6F84-4329-8ED5-35FCDB0A6A71@dsl-only.net> <20170225002300.GC19697@dft-labs.eu> <12339EDD-5663-40E0-8553-821EF9B6CDEB@dsl-only.net> <EB9DBDFA-BAE9-4BFF-8E8B-BF7698362A11@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-Feb-24, at 11:46 PM, Mark Millard <markmi at dsl-only.net> =
wrote:

> On 2017-Feb-24, at 8:25 PM, Mark Millard <markmi at dsl-only.net> =
wrote:
>=20
>> On 2017-Feb-24, at 4:23 PM, Mateusz Guzik <mjguzik at gmail.com> =
wrote:
>>>=20
>>> On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote:
>>>> [Back to the powerpc64 context.]
>>>>=20
>>>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik <mjguzik at gmail.com> =
wrote:
>>>>=20
>>>>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
>>>>>> [Note: I experiment with clang based powerpc64 builds,
>>>>>> reporting problems that I find. Justin is familiar
>>>>>> with this, as is Nathan.]
>>>>>>=20
>>>>>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>>>>>> that I have access to from head -r312761 to -r313864 and
>>>>>> ended up with random panics and hang ups in fairly short
>>>>>> order after booting.
>>>>>>=20
>>>>>> Some approximate bisecting for the kernel lead to:
>>>>>> (sometimes getting part way into a buildkernel attempt
>>>>>> for a different version before a failure happens)
>>>>>>=20
>>>>>> -r313266: works (just before use of atomic_fcmpset)
>>>>>> vs.
>>>>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>>>>>>=20
>>>>>> (I did not try -r313268 through -r313270 as the use was
>>>>>> gradually added.)
>>>>>>=20
>>>>>> So I'm currently running a -r313864 world with a -r313266
>>>>>> kernel.
>>>>>>=20
>>>>>> No kernel that I tried that was from before -r313266 had the
>>>>>> problems.
>>>>>>=20
>>>>>> Any kernel that I tried that was from after -r313271 had the
>>>>>> problems.
>>>>>>=20
>>>>>> Of course I did not try them all in other direction. :)
>>>>>>=20
>>>>>=20
>>>>> I found that spin mutexes were not properly handling this, fixed =
in
>>>>> r313996.
>>>>>=20
>>>>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
>>>>> fcmpset to simulate failures. Everything works, while it would =
easily
>>>>> fail without the patch.
>>>>>=20
>>>>> That said, I hope this concludes the 'missing check for not-reread =
value
>>>>> of failed fcmpset' saga.
>>>>>=20
>>>>> --=20
>>>>> Mateusz Guzik <mjguzik gmail.com>
>>>>=20
>>>> -r313999 is an improvement for powerpc64: it boots and I can
>>>> log in on the old PowerMac G5 so-called "Quad Core".
>>>>=20
>>>> But, e.g., buildworld buildkernel eventually hangs and later
>>>> the powerpc64 panics for "spin lock held too long".
>>>>=20
>>>=20
>>> Allright, play time is over.
>>>=20
>>> Can you please:
>>> 1. verify r313254 is stable for you
>>> 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff =
and
>>> https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and =
retry
>>> the test?
>>>=20
>>> This is a workaround which effectively disables the powerpc-specific
>>> primitive and makes it use a cmpset wrapper instead. I don't have =
the
>>> hardware to test right now and my attempts to boot in qemu also =
failed.
>>>=20
>>> That said, does not look like there are general fcmpset bugs left =
and
>>> the remaining issue seems powerpc-specific.
>>>=20
>>> If this works, I'll commit the workaround for the time being as in =
few
>>> weeks I'd like to start merging the work back to stable/11.
>>>=20
>>> --=20
>>> Mateusz Guzik <mjguzik gmail.com>
>>=20
>> I've started a self-hosted powerpc64 -r313254 build
>> based on running the -r313266 kernel. (The context=20
>> sometimes do cross builds in is tied up with other
>> things. -r313266 is what my prior bisection came up
>> with as the last appearently-working kernel at the
>> time.)
>>=20
>> So it will be a while before I have a -r313254 in
>> place to try: the self-hosted build takes longer
>> and so will not be installed for a while.
>>=20
>> To judge stability I'll probably have -e313254 build
>> the patched update that you want me to test, initially
>> doing a cleanworld. So that too will take a while.
>>=20
>> (The above wording presumes all goes well.)
>>=20
>> I'll let you know as I go along if I run into anything
>> interesting.
>>=20
>>=20
>> My builds are rebuilding both world and kernel since
>> what turns into /usr/include/sys/* has changes in your
>> patch.
>>=20
>> The builds are without MALLOC_PRODUCTION but are
>> otherwise not debug builds.
>>=20
>>=20
>> I've not seen anything indicating that anyone has
>> been trying TARGET_ARCH=3Dpowerpc. I've been trying
>> TARGET_ARCH=3Dpowerpc64 .
>>=20
>> While I do not have access to a true
>> TARGET_ARCH=3Dpowerpc machine currently, such a build
>> can be used on a PowerMac G5 so-called "Quad Core".
>> So I could eventually build and try such on the one
>> powerpc family machine that I currently have access
>> to.
>>=20
>> clang 3.9.1 has a significant code generation problem
>> for TARGET_ARCH=3Dpowerpc and so I'd have to use
>> a gcc 4.2.1 based build for that sort of experiment.
>> (There is no xtoolchain for 32-bit powerpc.)
>>=20
>> I use clang 3.9.1 or xtoolchain for
>> TARGET_ARCH=3Dpowerpc64 and have been using clang 3.9.1
>> in recent times. My primary powerpc family use has
>> been to experiment with building based on the
>> modern libc++ and reporting issues discovered in the
>> attempts. This explains the clang/xtoolchain context.
>>=20
>> clang 3.9.1 has major problems for C++ exception
>> handling for both powerpc64 and powerpc but a
>> lot of FreeBSD is independent of throwing C++
>> exceptions. By contrast xtoolchain-based works
>> for C++ exception handling but lib32 fails
>> to operate when built by a xtoolchain build.
>=20
> -r313254 had no trouble booting or building
> the patched version or anything else involved
> in getting there or installing.
>=20
> But the patched version failed quickly just
> attempting cleanworld's recursive remove. (So
> it did boot and let me log in.) The panic
> description was:
>=20
> panic: vn_finished_secondary_write: neg cnt
>=20
>=20
> The sources that are different from svn's -r313254
> are (some tied to arm64 experiments, most everything
> else tied to powerpc64 and/or powerpc, those not
> from your patches are long standing from my
> investigations or from Justin H.):
>=20
> # svnlite status /usr/src | sort
> . . . (ignoring the ? lines) . . .
> M       /usr/src/bin/sh/jobs.c
> M       /usr/src/bin/sh/miscbltin.c
> M       /usr/src/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td
> M       /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp
> M       /usr/src/lib/csu/powerpc64/Makefile
> M       /usr/src/libexec/rtld-elf/Makefile
> M       /usr/src/sys/arm/arm/gic.c
> M       /usr/src/sys/boot/ofw/Makefile.inc
> M       /usr/src/sys/boot/powerpc/Makefile.inc
> M       /usr/src/sys/boot/powerpc/kboot/Makefile
> M       /usr/src/sys/boot/uboot/Makefile.inc
> M       /usr/src/sys/conf/kmod.mk
> M       /usr/src/sys/ddb/db_main.c
> M       /usr/src/sys/ddb/db_script.c
> M       /usr/src/sys/kern/init_main.c
> M       /usr/src/sys/kern/kern_condvar.c
> M       /usr/src/sys/kern/kern_lock.c
> M       /usr/src/sys/kern/kern_lockstat.c
> M       /usr/src/sys/kern/kern_mutex.c
> M       /usr/src/sys/kern/kern_rwlock.c
> M       /usr/src/sys/kern/kern_sx.c
> M       /usr/src/sys/kern/kern_synch.c
> M       /usr/src/sys/kern/kern_thread.c
> M       /usr/src/sys/kern/subr_lock.c
> M       /usr/src/sys/kern/vfs_default.c
> M       /usr/src/sys/kern/vfs_subr.c
> M       /usr/src/sys/powerpc/include/atomic.h
> M       /usr/src/sys/powerpc/ofw/ofw_machdep.c
> M       /usr/src/sys/sys/lock.h
> M       /usr/src/sys/sys/lockmgr.h
> M       /usr/src/sys/sys/lockstat.h
> M       /usr/src/sys/sys/mutex.h
> M       /usr/src/sys/sys/rwlock.h
> M       /usr/src/sys/sys/sdt.h
> M       /usr/src/sys/sys/sx.h
> M       /usr/src/sys/sys/systm.h

To recover from the problem and again have a buildworld
buildkernel present I've booted based on:

A) The -r313254 kernel without your patches (kernel.old).
B) The -r313254 world (which had your patches in its
   build).

I've reverted the /usr/src/ to not have your patches
(but does have my prior ones from prior activity).

I repeated the cleanworld to let it finish after its
prior failure (that failed during a SSD trim activity).

I've started buildworld buildkernel (with -j 4 as is
normal for my context).

So far this combination seems to be working fine. This
suggests that the sys/sys/*.h files that ended up in
/usr/include/sys/ and the sys/powerpc/include/atomic.h
that ended up in /usr/include/machine/ were not problems
as used in the world code --since those uses are still in
place in the binaries being used. Only the kernel
binaries seem to be a problem (not necessarily all of
them).

=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?477BA631-AB85-4E77-8BA3-CD2AFAD5E405>