Date: Sun, 25 Aug 2019 12:15:13 -0400 From: Mark Johnston <markj@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Rebecca Cran <rebecca@bsdio.com>, FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: Panic on boot with r351461 (AMD ThreadRipper 2990WX) Message-ID: <20190825161513.GA33382@raichu> In-Reply-To: <20190825143034.GO71821@kib.kiev.ua> References: <6e5687b2-ab3f-a570-37ab-72c8a9776167@bsdio.com> <20190824203305.GF71821@kib.kiev.ua> <d7200dbc-62b3-fd86-ca61-32d559987338@bsdio.com> <20190824230801.GK71821@kib.kiev.ua> <f15ba651-28ef-d9db-3646-ab8cb49b3d18@bsdio.com> <20190825062407.GL71821@kib.kiev.ua> <9e94aea8-7d63-0f9e-2f1e-c1492e9dc455@bsdio.com> <20190825143034.GO71821@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Aug 25, 2019 at 05:30:34PM +0300, Konstantin Belousov wrote: > On Sun, Aug 25, 2019 at 07:17:20AM -0600, Rebecca Cran wrote: > > On 2019-08-25 00:24, Konstantin Belousov wrote: > > > What are the panic messages ? > > > > Fatal trap 18: integer divide fault while in kernel mode > > > > instruction pointer = 0x20:0xffffffff80f1027c > > > > stack pointer = 0x28:0xffffffff845809f0 > > > > frame pointer = 0x28:0xffffffff84580a00 > > > > code segment = base 0x0, limit 0xffffff, type 0x1b > > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > > processor eflags = resume, IOPL = 0 > > > > current process = 0 () > > > > trap number = 18 > > > > panic: integer divide fault > > > > cpuid = 0 > > > > time = 1 > > > > > > > What is the source line ? > > > > (gdb) info line *0xffffffff80f1027c > > Line 102 of "/usr/src/sys/vm/vm_domainset.c" starts at address > > 0xffffffff80f10267 <vm_domainset_iter_first+151> > > and ends at 0xffffffff80f1027f <vm_domainset_iter_first+175>. > > There was one more source line I asked about. > > So what happens, IMO, is that for memory-less domains ds_cnt is zero > because ds_mask is zero, which causes the exception on divide. You > can try the following combined patch, but I really dislike the fact > that I cannot safely use DOMAINSET_FIXED (if my diagnosis is correct). I think this is simply a bug. Something like the following hack should work: we want to leave the _FIXED domainsets unmodified, but they should be removed from the global list (to ensure that userspace cannot specify impossible policies). diff --git a/sys/kern/kern_cpuset.c b/sys/kern/kern_cpuset.c index 87f9333bf43b..931fe7e157e5 100644 --- a/sys/kern/kern_cpuset.c +++ b/sys/kern/kern_cpuset.c @@ -503,9 +503,17 @@ domainset_empty_vm(struct domainset *domain) int i, j, max; max = DOMAINSET_FLS(&domain->ds_mask) + 1; - for (i = 0; i < max; i++) - if (DOMAINSET_ISSET(i, &domain->ds_mask) && VM_DOMAIN_EMPTY(i)) + for (i = 0; i < max; i++) { + if (DOMAINSET_ISSET(i, &domain->ds_mask) && + VM_DOMAIN_EMPTY(i)) { + /* + * Leave the domainset unmodified, in case it is a + * static policy defined for use by the kernel. + */ + if (domain->ds_cnt == 1) + return (true); DOMAINSET_CLR(i, &domain->ds_mask); + } domain->ds_cnt = DOMAINSET_COUNT(&domain->ds_mask); max = DOMAINSET_FLS(&domain->ds_mask) + 1; for (i = j = 0; i < max; i++) { > I would prefer for kmem_malloc_domainset(DOMAINSET_FIXED(unpopulated domain)) > to fail with NULL result, and then I would manually fall-back to > DOMAINSET_PREF(). > > OTOH, I think the chunk for mp_realloc_cpu() is the final fix. Looks ok to me.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190825161513.GA33382>