Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Jul 2021 21:32:49 +0200
From:      Michael Tuexen <tuexen@freebsd.org>
To:        Andrew Turner <andrew@fubar.geek.nz>
Cc:        Mark Millard <marklmi@yahoo.com>, freebsd-arm@freebsd.org
Subject:   Re: register x18
Message-ID:  <06B96A5D-AF14-4EEC-8D11-B91F9683A0E8@freebsd.org>
In-Reply-To: <D18F32F8-9BFD-4192-BC9E-59ABAC98EB88@fubar.geek.nz>
References:  <86EC9C12-F90C-4D0C-BFA3-41986C9F07B5@freebsd.org> <BFF3BCE7-3387-4A7C-A71C-890223CDDF18@yahoo.com> <32C24DDC-C8A1-43CD-9220-8009B229E452@freebsd.org> <ACD1D84A-5923-4106-AAE4-35FB7A182B0F@fubar.geek.nz> <4361A215-BB47-4166-BC3F-386E7834B788@freebsd.org> <D18F32F8-9BFD-4192-BC9E-59ABAC98EB88@fubar.geek.nz>

next in thread | previous in thread | raw e-mail | index | archive | help
> On 16. Jul 2021, at 17:53, Andrew Turner <andrew@fubar.geek.nz> wrote:
>=20
>=20
>> On 16 Jul 2021, at 17:07, Michael Tuexen <tuexen@freebsd.org> wrote:
>>=20
>>> On 16. Jul 2021, at 14:51, Andrew Turner <andrew@fubar.geek.nz> =
wrote:
>>>=20
>>>=20
>>>> On 16 Jul 2021, at 13:08, tuexen@freebsd.org wrote:
>>>>=20
>>>>> On 16. Jul 2021, at 04:06, Mark Millard <marklmi@yahoo.com> wrote:
>>>>>=20
>>>>>=20
>>>>>=20
>>>>> On 2021-Jul-15, at 17:40, Michael Tuexen <tuexen at freebsd.org> =
wrote:
>>>>>=20
>>>>>> Dear all,
>>>>>>=20
>>>>>> register x18 seems to be special. What is it used for in FreeBSD?
>>>>>>=20
>>>>>> Best regards
>>>>>> Michael
>>>>>=20
>>>>> =
https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64-bit-A=
rchitecture/Register-use-in-the-AArch64-Procedure-Call-Standard/Parameters=
-in-general-purpose-registers
>>>>>=20
>>>>> reports:
>>>>>=20
>>>>> QUOTE
>>>>> 	=E2=80=A2 X18 is the platform register and is reserved for the =
use of platform ABIs. This is an adional temporary register on platforms =
that don't assign a special meaning to it.
>>>>> END QUOTE
>>>>>=20
>>>>> So, special, yes. But I do not know what the "platform ABI" usage
>>>>> for it might be on FreeBSD. So, for the most part, this does not
>>>>> well-answer your question. Sorry.
>>>> Yepp, I found the above text. However, x18 seems to be used when =
accessing
>>>> global variables. I am looking at a panic, where the system panics =
on accessing
>>>> global variable, which can be controlled by sysctl.
>>>> It seems that x18 does not have the expected value, but it is also =
not set in
>>>> the function...
>>>=20
>>> X18 is used to store the pointer to the pcpu data It should only =
ever be set when we enter the kernel from userland by the exception =
handler.
>> Hi Andrew,
>>=20
>> thanks for the response. Hmm. I was hoping that the answers helps me =
to understand
>> a panic that I'm observing when stress testing the TCP RACK stack. =
I'm transferring
>> 10GB via scp and at some point of time (not right at the beginning), =
the machine panics.
>> The machine is an eMAG system.
>>=20
>> Here is what I know:
>>=20
>> Initially it panics multiple times (always at the same place) in
>> =
https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16540
>> when it is trying to read V_tcp_map_entries_limit.
>>=20
>> I discussed this with rrs@ and since we had no clue, I tried to just =
compile
>> out the if condition.
>>=20
>> Then is paniced in
>> =
https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16928
>> at
>> =
https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n15664
>> which is basically the next place where a V_ variable is accessed.
>>=20
>> Please note that for debugging I'm using a kernel without VIMAGE =
support,
>> since we initially thought that it might be related a VNET bug.
>>=20
>> So I decided to look at the disassembly of rack_sndbuf_autoscale (I =
added some comments):
>>=20
>>   0xffff000001388a6c <+0>:	stp	x29, x30, [sp, #-32]!
>>   0xffff000001388a70 <+4>:	str	x19, [sp, #16]
>>   0xffff000001388a74 <+8>:	mov	x29, sp
>>   0xffff000001388a78 <+12>:	ldr	x9, [x0, #24]				=
// x9 =3D rack->tp;
>>   0xffff000001388a7c <+16>:	ldr	w8, [x0, #188]				=
// w8 =3D rack->r_ctl.cwnd_to_use
>>   0xffff000001388a80 <+20>:	adrp	x12, 0xffff0000013ac000
>>   0xffff000001388a84 <+24>:	ldr	w10, [x9, #52]				=
// w10 =3D tp->snd_wnd;
>>   0xffff000001388a88 <+28>:	ldr	x11, [x18]
>>   0xffff000001388a8c <+32>:	ldr	x11, [x11, #1256]
>>   0xffff000001388a90 <+36>:	cmp	w8, w10
>>   0xffff000001388a94 <+40>:	csel	w10, w8, w10, cc  // cc =3D lo, =
ul, last	// min(rack->r_ctl.cwnd_to_use, tp->snd_wnd);
>> =3D> 0xffff000001388a98 <+44>:	ldr	x11, [x11, #40]
>>   0xffff000001388a9c <+48>:	ldr	x12, [x12, #2752]
>>   0xffff000001388aa0 <+52>:	ldr	w11, [x11, x12]				=
// w11 =3D V_tcp_do_autosndbuf ???
>>   0xffff000001388aa4 <+56>:	cbz	w11, 0xffff000001388be0 =
<rack_sndbuf_autoscale+372>
>>   0xffff000001388aa8 <+60>:	ldr	x8, [x0, #32]				=
// x8 =3D rack->rc_inp
>>   0xffff000001388aac <+64>:	ldr	x19, [x8, #120]				=
// x19 =3D so =3D x8->inp_socket
>>   0xffff000001388ab0 <+68>:	ldrb	w8, [x19, #817]				=
// w8 =3D (x19->so_snd.sb_flags << 8) & 0ff
>>   0xffff000001388ab4 <+72>:	tbz	w8, #3, 0xffff000001388be0 =
<rack_sndbuf_autoscale+372> so->so_snd.sb_flags & SB_AUTOSIZE =3D=3D 0
>>   0xffff000001388ab8 <+76>:	ldr	w11, [x9, #52]				=
// w11 =3D tp->snd_wnd
>>   0xffff000001388abc <+80>:	ldr	w8, [x19, #740]				=
// w8 =3D so->so_snd.sb_hiwat
>>   0xffff000001388ac0 <+84>:	lsr	w11, w11, #2
>>   0xffff000001388ac4 <+88>:	add	w11, w11, w11, lsl #2
>>   0xffff000001388ac8 <+92>:	cmp	w11, w8
>>   0xffff000001388acc <+96>:	b.cc	0xffff000001388be0 =
<rack_sndbuf_autoscale+372>  // b.lo, b.ul, b.last
>>   0xffff000001388ad0 <+100>:	ldr	w11, [x19, #736]
>>   0xffff000001388ad4 <+104>:	lsr	w8, w8, #3
>>   0xffff000001388ad8 <+108>:	lsl	w12, w8, #3
>>   0xffff000001388adc <+112>:	sub	w8, w12, w8
>>   0xffff000001388ae0 <+116>:	cmp	w11, w8
>>   0xffff000001388ae4 <+120>:	b.cc	0xffff000001388be0 =
<rack_sndbuf_autoscale+372>  // b.lo, b.ul, b.last
>>   0xffff000001388ae8 <+124>:	ldr	x8, [x18]
>>   0xffff000001388aec <+128>:	ldr	x8, [x8, #1256]
>>   0xffff000001388af0 <+132>:	ldr	x12, [x8, #40]
>>   0xffff000001388af4 <+136>:	adrp	x8, 0xffff0000013ac000
>>   0xffff000001388af8 <+140>:	ldr	x8, [x8, #2760]
>>   0xffff000001388afc <+144>:	ldr	w12, [x12, x8]
>>   0xffff000001388b00 <+148>:	cmp	w11, w12
>>=20
>> So it seems that the code accessing V_tcp_do_autosndbuf is:
>>=20
>>   0xffff000001388a80 <+20>:	adrp	x12, 0xffff0000013ac000
>> ...
>>   0xffff000001388a88 <+28>:	ldr	x11, [x18]
>>   0xffff000001388a8c <+32>:	ldr	x11, [x11, #1256]
>> ...
>> =3D> 0xffff000001388a98 <+44>:	ldr	x11, [x11, #40]
>>   0xffff000001388a9c <+48>:	ldr	x12, [x12, #2752]
>>   0xffff000001388aa0 <+52>:	ldr	w11, [x11, x12]				=
// w11 =3D V_tcp_do_autosndbuf ???
>>=20
>> and for V_tcp_autosndbuf_max it is:
>>   0xffff000001388ae8 <+124>:	ldr	x8, [x18]
>>   0xffff000001388aec <+128>:	ldr	x8, [x8, #1256]
>>   0xffff000001388af0 <+132>:	ldr	x12, [x8, #40]
>>   0xffff000001388af4 <+136>:	adrp	x8, 0xffff0000013ac000
>>   0xffff000001388af8 <+140>:	ldr	x8, [x8, #2760]
>>   0xffff000001388afc <+144>:	ldr	w12, [x12, x8]
>>=20
>> The #2752 versus #2760 could be the offset of the variable.
>>=20
>> Does the above code makes sense to you? The code relevant for the =
crash seems to be:
>>=20
>> 0xffff000001388a88 <+28>:	ldr	x11, [x18]
>> 0xffff000001388a8c <+32>:	ldr	x11, [x11, #1256]
>> 0xffff000001388a98 <+44>:	ldr	x11, [x11, #40]
>>=20
>> Since it is crashing at 0xffff000001388a98 <+44>, my assumption was =
that x18 is wrong...
>> But does this use fit to your description?
>=20
> This code is loading curthread from the pcpu data, then loading =
whatever is 1256 bytes within struct thread. I checked the offset of =
td_vnet and found it was at the correct location so it would appear to =
be using VIMAGE and has a bad vnet pointer.
>=20
> The other assembly above also looks like it=E2=80=99s using VIMAGE as =
they have similar code with the same offsets.
>=20
>>=20
>> I'm trying to debug this on arm64, since I can reproduce it on arm64. =
But there is
>> also a bug report that this happens on amd64: =
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D257195
>>=20
>> Any idea what can be wrong? Any hint how to progress?
>=20
> If you can reproduce of amd64 it might pay to test with KASAN.
>=20
> How stable is the bad pointer value? It might pay to add KASSERTS to =
the code to check curvnet (the macro to get td_vnet) is not the bad =
value, or at least greater than VM_MIN_KERNEL_ADDRESS.
Thank you very much!

I double checked my kernel config, and after disabling VIMAGE, it was =
enabled again.
So, yes this is a VIMAGE kernel and I guess the problem is related to =
it.

Your explanations were very helpful.

Best regards
Michael
>=20
> Andrew




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?06B96A5D-AF14-4EEC-8D11-B91F9683A0E8>