Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Nov 2017 18:06:11 +0000
From:      Andrew Turner <andrew@fubar.geek.nz>
To:        Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc:        src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r326218 - head/sys/kern
Message-ID:  <F4676F95-5AA5-4838-92D2-0A164F7EB79C@fubar.geek.nz>
In-Reply-To: <62864c22-e458-2f13-2b33-9dc075ed2ef2@freebsd.org>
References:  <201711252341.vAPNf5Qx001464@repo.freebsd.org> <59383A7D-FC99-4748-9561-53D968C4B150@fubar.geek.nz> <62864c22-e458-2f13-2b33-9dc075ed2ef2@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

> On 26 Nov 2017, at 17:07, Nathan Whitehorn <nwhitehorn@freebsd.org> =
wrote:
>=20
>=20
>=20
> On 11/26/17 01:46, Andrew Turner wrote:
>> On 25 Nov 2017, at 23:41, Nathan Whitehorn <nwhitehorn@freebsd.org> =
wrote:
>>> Author: nwhitehorn
>>> Date: Sat Nov 25 23:41:05 2017
>>> New Revision: 326218
>>> URL: https://svnweb.freebsd.org/changeset/base/326218
>>>=20
>>> Log:
>>>  Remove some, but not all, assumptions that the BSP is CPU 0 and =
that CPUs
>>>  are numbered densely from there to n_cpus.
>>>=20
>>>  MFC after:	1 month
>>>=20
>>> Modified:
>>>  head/sys/kern/kern_clock.c
>>>  head/sys/kern/kern_clocksource.c
>>>  head/sys/kern/kern_shutdown.c
>>>  head/sys/kern/kern_timeout.c
>>>  head/sys/kern/sched_ule.c
>>>  head/sys/kern/subr_pcpu.c
>>>=20
>> ...
>>> Modified: head/sys/kern/subr_pcpu.c
>>> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
>>> --- head/sys/kern/subr_pcpu.c	Sat Nov 25 23:23:24 2017	=
(r326217)
>>> +++ head/sys/kern/subr_pcpu.c	Sat Nov 25 23:41:05 2017	=
(r326218)
>>> @@ -279,6 +279,8 @@ pcpu_destroy(struct pcpu *pcpu)
>>> struct pcpu *
>>> pcpu_find(u_int cpuid)
>>> {
>>> +	KASSERT(cpuid_to_pcpu[cpuid] !=3D NULL,
>>> +	    ("Getting uninitialized PCPU %d", cpuid));
>>>=20
>>> 	return (cpuid_to_pcpu[cpuid]);
>>> }
>> This breaks on one arm64 simulator I have where the device tree lists =
8 cpus, but only 2 are enabled in the simulation. The ofw_cpu device =
nodes for these are cpu0 and cpu4 so the call to cpu_find in =
ofw_cpu_attach will hit this KASSERT when the CPU has not been enabled.
>>=20
>> Andrew
>>=20
>> cpu0: <Open Firmware CPU> on cpulist0
>> cpu1: <Open Firmware CPU> on cpulist0
>> panic: Getting uninitialized PCPU 1
>> cpuid =3D 0
>> time =3D 1
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self_wrapper+0x28
>> 	 pc =3D 0xffff0000005f03e8  lr =3D 0xffff000000087048
>> 	 sp =3D 0xffff0000000105a0  fp =3D 0xffff0000000107b0
>=20
> That's unfortunate. Removing the new KASSERT is probably not the right =
option, since all it is doing is indicating a pre-existing bug. =
Specifically, it is preventing ofw_cpu from using a NULL pointer for CPU =
4, which I suppose it was previously happily doing (it would only get =
dereferenced if you had cpufreq support, hence no previous panic).

I would expect cpu4 to have a non-NULL value as its cpu pointer will =
have been set. It=E2=80=99s cpu1 that is the issue as it=E2=80=99s in =
the device tree, but doesn=E2=80=99t exist in =E2=80=9Chardware=E2=80=9D. =
Because of this it fails when we try to enable it so free it=E2=80=99s =
pcpu data clear the pointer.

>=20
> A super quick-and-dirty fix would be to be fail attach on =
CPU_ABSENT(device_get_unit(dev)), which restores the previous buggy =
behavior but without the panic. If you do not actually use ofw_cpu for =
anything, we could also just disable it temporarily on your platform =
(it's off for some PPC systems, you will note, for similar reasons).

We used to use it to find CPUs to start, however this is no longer the =
case.

>=20
> A real fix is somewhat complex, since the driver relies on being able =
to get a mapping from platform-specific numbers in the device tree to an =
entry in pcpu, which intrinsically relies on reverse-engineering the =
platform's mapping between some kind of hardware CPU ID and the kernel =
CPU numbering. I can't think of a way to do that internally. We could =
make some kind of platform macro, but the whole concept is even somewhat =
dubious at the moment since a number of IBM systems with SMT don't even =
have a 1:1 relationship between CPUs as FreeBSD defines them and device =
tree nodes, so it's possible we need a serious rearchitecture of the =
driver.

On arm64 we use the ID from the device tree to assign the cpuid so there =
is a 1:1 mapping between the two. In most cases both are identical, the =
only case that=E2=80=99s not correct is when we boot from a non-zero =
CPU, and I know of only one SoC where this is true.

Andrew




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F4676F95-5AA5-4838-92D2-0A164F7EB79C>