Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Nov 2020 12:02:16 +0100
From:      Peter Blok <pblok@bsd4all.org>
To:        Kristof Provost <kp@FreeBSD.org>
Cc:        FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: Commit 367705+367706 causes a pabic
Message-ID:  <665757BF-DA06-4503-9ACD-8A4630E23FF4@bsd4all.org>
In-Reply-To: <1753B4A3-2FFC-47A5-9D0C-DC0B71BA22E8@FreeBSD.org>
References:  <CD3B0F62-3790-4C63-A92C-9694256823CD@bsd4all.org> <1753B4A3-2FFC-47A5-9D0C-DC0B71BA22E8@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_D4E6FEDC-7B80-4B5B-BF86-174BD15DAD20
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Hi Kristof,

This is 12-stable. With the previous bridge epochification that was =
backed out my config had a panic too.

I don=E2=80=99t have any local modifications. I did a clean rebuild =
after removing /usr/obj/usr

My kernel is custom - I only have zfs.ko, opensolaris.ko, vmm.ko and =
nmdm.ko as modules. Everything else is statically linked. I have removed =
all drivers not needed for the hardware at hand.

My bridge is between two vlans from the same trunk and the jail epair =
devices as well as the bhyve tap devices.

The panic happens when the jails are starting.

I can try to narrow it down over the weekend and make the crash dump =
available for analysis.

Previously I had the following crash with 363492

kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid =3D 2; apic id =3D 02
fault virtual address	=3D 0xffffffff00000410
fault code		=3D supervisor read data, page not present
instruction pointer	=3D 0x20:0xffffffff80692326
stack pointer	        =3D 0x28:0xfffffe00c06097b0
frame pointer	        =3D 0x28:0xfffffe00c06097f0
code segment		=3D base 0x0, limit 0xfffff, type 0x1b
			=3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	=3D resume, IOPL =3D 0
current process		=3D 2030 (ifconfig)
trap number		=3D 12
panic: page fault
cpuid =3D 2
time =3D 1595683412
KDB: stack backtrace:
#0 0xffffffff80698165 at kdb_backtrace+0x65
#1 0xffffffff8064d67b at vpanic+0x17b
#2 0xffffffff8064d4f3 at panic+0x43
#3 0xffffffff809cc311 at trap_fatal+0x391
#4 0xffffffff809cc36f at trap_pfault+0x4f
#5 0xffffffff809cb9b6 at trap+0x286
#6 0xffffffff809a5b28 at calltrap+0x8
#7 0xffffffff803677fd at ck_epoch_synchronize_wait+0x8d
#8 0xffffffff8069213a at epoch_wait_preempt+0xaa
#9 0xffffffff807615b7 at ipsec_ioctl+0x3a7
#10 0xffffffff8075274f at ifioctl+0x47f
#11 0xffffffff806b5ea7 at kern_ioctl+0x2b7
#12 0xffffffff806b5b4a at sys_ioctl+0xfa
#13 0xffffffff809ccec7 at amd64_syscall+0x387
#14 0xffffffff809a6450 at fast_syscall_common+0x101




> On 20 Nov 2020, at 11:30, Kristof Provost <kp@FreeBSD.org> wrote:
>=20
> On 20 Nov 2020, at 11:18, peter.blok@bsd4all.org =
<mailto:peter.blok@bsd4all.org> wrote:
>> I=E2=80=99m afraid the last Epoch fix for bridge is not solving the =
problem ( or perhaps creates a new ).
>>=20
> We=E2=80=99re talking about the stable/12 branch, right?
>=20
>> This seems to happen when the jail epair is added to the bridge.
>>=20
> There must be something more to it than that. I=E2=80=99ve run the =
bridge tests on stable/12 without issue, and this is a problem we =
didn=E2=80=99t see when the bridge epochification initially went into =
stable/12.
>=20
> Do you have a custom kernel config? Other patches? What exact commands =
do you run to trigger the panic?
>=20
>> kernel trap 12 with interrupts disabled
>>=20
>>=20
>> Fatal trap 12: page fault while in kernel mode
>> cpuid =3D 6; apic id =3D 06
>> fault virtual address	=3D 0xc10
>> fault code		=3D supervisor read data, page not present
>> instruction pointer	=3D 0x20:0xffffffff80695e76
>> stack pointer	        =3D 0x28:0xfffffe00bf14e6e0
>> frame pointer	        =3D 0x28:0xfffffe00bf14e720
>> code segment		=3D base 0x0, limit 0xfffff, type 0x1b
>> 			=3D DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags	=3D resume, IOPL =3D 0
>> current process		=3D 1686 (jail)
>> trap number		=3D 12
>> panic: page fault
>> cpuid =3D 6
>> time =3D 1605811310
>> KDB: stack backtrace:
>> #0 0xffffffff8069bb85 at kdb_backtrace+0x65
>> #1 0xffffffff80650a4b at vpanic+0x17b
>> #2 0xffffffff806508c3 at panic+0x43
>> #3 0xffffffff809d0351 at trap_fatal+0x391
>> #4 0xffffffff809d03af at trap_pfault+0x4f
>> #5 0xffffffff809cf9f6 at trap+0x286
>> #6 0xffffffff809a98c8 at calltrap+0x8
>> #7 0xffffffff80368a8d at ck_epoch_synchronize_wait+0x8d
>> #8 0xffffffff80695c8a at epoch_wait_preempt+0xaa
>> #9 0xffffffff80757d40 at vnet_if_init+0x120
>> #10 0xffffffff8078c994 at vnet_alloc+0x114
>> #11 0xffffffff8061e3f7 at kern_jail_set+0x1bb7
>> #12 0xffffffff80620190 at sys_jail_set+0x40
>> #13 0xffffffff809d0f07 at amd64_syscall+0x387
>> #14 0xffffffff809aa1ee at fast_syscall_common+0xf8
>=20
> This panic is rather odd. This isn=E2=80=99t even the bridge code. =
This is during initial creation of the vnet. I don=E2=80=99t really see =
how this could even trigger panics.
> That panic looks as if something corrupted the net_epoch_preempt, by =
overwriting the epoch->e_epoch. The bridge patches only access this =
variable through the well-established functions and macros. I see no =
obvious way that they could corrupt it.
>=20
> Best regards,
> Kristof


--Apple-Mail=_D4E6FEDC-7B80-4B5B-BF86-174BD15DAD20
Content-Disposition: attachment;
	filename=smime.p7s
Content-Type: application/pkcs7-signature;
	name=smime.p7s
Content-Transfer-Encoding: base64

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCCBSAw
ggUcMIIEBKADAgECAhEAq2wFIs+rCK6H6/2jbblXhDANBgkqhkiG9w0BAQsFADCBlzELMAkGA1UE
BhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9yZDEaMBgG
A1UEChMRQ09NT0RPIENBIExpbWl0ZWQxPTA7BgNVBAMTNENPTU9ETyBSU0EgQ2xpZW50IEF1dGhl
bnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0EwHhcNMTgwNDE0MDAwMDAwWhcNMjEwNDEzMjM1
OTU5WjBEMQswCQYDVQQGEwJOTDETMBEGA1UEAxMKUGV0ZXIgQmxvazEgMB4GCSqGSIb3DQEJARYR
cGJsb2tAYnNkNGFsbC5vcmcwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDPT/3evs2a
zLSIVepGa9qFVcSISd5HzoJt9xAyQ4od7NM6Qzwm446OyhzWsIN/a6+nDNB4AxzSg00QXKx4afEa
FrdLzmREEfv24f88j2UZYqHAls0j26jyED5FZ068xs4gWZBG2U7EVTUNNJuUrrmqBNZkGxTIrFrD
Cgr1EpRULpN+HrEelHHh7uR0twAjvwcyXkG9DbDJXnw8HzKGR80ik4+13HDxx4mDxOY4NOvWSSiM
kEFS2Z2AKtxXSMBQZHazAUvbka27c1m93/QsjnDF+P6Aef9NEvUDL9mU9Jbf/+5V+anT2KdPGP4p
rQ9gA/Nup61qxDkwc+RupiXD5NSbAgMBAAGjggGzMIIBrzAfBgNVHSMEGDAWgBSCr2yM+MX+lmF8
6B89K3FIXsSLwDAdBgNVHQ4EFgQUjwe7n1zvxFkTeCUYWrsaJpOGP14wDgYDVR0PAQH/BAQDAgWg
MAwGA1UdEwEB/wQCMAAwHQYDVR0lBBYwFAYIKwYBBQUHAwQGCCsGAQUFBwMCMEYGA1UdIAQ/MD0w
OwYMKwYBBAGyMQECAQMFMCswKQYIKwYBBQUHAgEWHWh0dHBzOi8vc2VjdXJlLmNvbW9kby5uZXQv
Q1BTMFoGA1UdHwRTMFEwT6BNoEuGSWh0dHA6Ly9jcmwuY29tb2RvY2EuY29tL0NPTU9ET1JTQUNs
aWVudEF1dGhlbnRpY2F0aW9uYW5kU2VjdXJlRW1haWxDQS5jcmwwgYsGCCsGAQUFBwEBBH8wfTBV
BggrBgEFBQcwAoZJaHR0cDovL2NydC5jb21vZG9jYS5jb20vQ09NT0RPUlNBQ2xpZW50QXV0aGVu
dGljYXRpb25hbmRTZWN1cmVFbWFpbENBLmNydDAkBggrBgEFBQcwAYYYaHR0cDovL29jc3AuY29t
b2RvY2EuY29tMA0GCSqGSIb3DQEBCwUAA4IBAQC85hVlqTVwt218IJR/WjMiMnDtZ7hY860XKjzO
uB3sUUQwHxHj+ZYuMbAfVLZGGqh1EekbwDMVgkK9cezIHM+ZzxrNGX2SJyl1YW+3FLn52P0uIlmA
VPFjUowf5qBhOHl2NJo+WXYZhQY7rT/xSygE81o3oLE/A4zO6WtO3PeZpFpZNrBvizAsjTDfPeXW
iQzXz6NLrgwert0Wml95ov2rG5oCzHYPijabubSNm2NdUjPRtcVylcqAThXOvp6X4UvW8/L0uhkp
9WsKP2JEJ3Zukv7Ib+vMBsdE4tf4rmv89pQC+lLpD08ze/QDCIeFBCRIihcC2PycDQrnNIp1RAIh
MYIDyjCCA8YCAQEwga0wgZcxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0
ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMT0wOwYDVQQD
EzRDT01PRE8gUlNBIENsaWVudCBBdXRoZW50aWNhdGlvbiBhbmQgU2VjdXJlIEVtYWlsIENBAhEA
q2wFIs+rCK6H6/2jbblXhDANBglghkgBZQMEAgEFAKCCAe0wGAYJKoZIhvcNAQkDMQsGCSqGSIb3
DQEHATAcBgkqhkiG9w0BCQUxDxcNMjAxMTIwMTEwMjE2WjAvBgkqhkiG9w0BCQQxIgQgkoj7KDrS
tF64PFkyAH79LZTJPVnVFDuWu8BfBmv8JHowgb4GCSsGAQQBgjcQBDGBsDCBrTCBlzELMAkGA1UE
BhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9yZDEaMBgG
A1UEChMRQ09NT0RPIENBIExpbWl0ZWQxPTA7BgNVBAMTNENPTU9ETyBSU0EgQ2xpZW50IEF1dGhl
bnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0ECEQCrbAUiz6sIrofr/aNtuVeEMIHABgsqhkiG
9w0BCRACCzGBsKCBrTCBlzELMAkGA1UEBhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3Rl
cjEQMA4GA1UEBxMHU2FsZm9yZDEaMBgGA1UEChMRQ09NT0RPIENBIExpbWl0ZWQxPTA7BgNVBAMT
NENPTU9ETyBSU0EgQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0ECEQCr
bAUiz6sIrofr/aNtuVeEMA0GCSqGSIb3DQEBAQUABIIBAGud4BgzeYAF7/TbwvKXS2emx+F4/Von
+ghazxpbvcPBhTvAdrNXSSUVkiD8jrWS5EmBDQXHHad6NsfYOB7r+crXCneGFaJ60J4qTYf6Ev5D
YoZ2fGbsEieC8mPHwuQ52RrnGKMECbRD8iRPp2dgdmuw80ykkDsh/wxZFwtS37Kg+HUspxlmwb0y
g24cpU16LJ3kKjxqcynvSeEs6CqZ30dEehq6V8GbdMP45lt4awP8PfugSZj75WTHmKAzkmzMP0lP
didcqVuYJzAcrKIZpow7Lx8DlqvgCfrsy373sEEGav6o5HJGPbouUDw2CUHbX530bC9J+McaA5C6
Gq5HLyYAAAAAAAA=
--Apple-Mail=_D4E6FEDC-7B80-4B5B-BF86-174BD15DAD20--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?665757BF-DA06-4503-9ACD-8A4630E23FF4>