Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Oct 2010 09:40:31 +0300
From:      Andriy Gapon <avg@freebsd.org>
To:        Jeff Roberson <jeff@freebsd.org>, freebsd-current@freebsd.org
Subject:   Re: panic in uma_startup for many-core amd64 system
Message-ID:  <4CBBEBDF.3060905@freebsd.org>
In-Reply-To: <4C9B9B9C.6000807@freebsd.org>
References:  <4C9B9B9C.6000807@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
on 23/09/2010 21:25 Andriy Gapon said the following:
> 
> Jeff,
> 
> just for the kicks I tried to emulate a machine with 64 logical CPUs using
> qemu-devel port:
> qemu-system-x86_64 -smp sockets=4,cores=8,threads=2 ...
> 
> It seems that FreeBSD agreed to recognize only first 32 CPUs, but it paniced anyway.
> 
> Here's a backtrace:
> #34 0xffffffff804fe7f5 in zone_alloc_item (zone=0xffffffff80be1554,
> udata=0xffffffff80be1550, flags=1924) at /usr/src/sys/vm/uma_core.c:2506
> #35 0xffffffff804ff35d in hash_alloc (hash=0xffffff001ffdb030) at
> /usr/src/sys/vm/uma_core.c:483
> #36 0xffffffff804ff642 in keg_ctor (mem=Variable "mem" is not available.
> ) at /usr/src/sys/vm/uma_core.c:1396
> #37 0xffffffff804fe91b in zone_alloc_item (zone=0xffffffff80a1f300,
> udata=0xffffffff80be1b60, flags=2) at /usr/src/sys/vm/uma_core.c:2544
> #38 0xffffffff804ff92e in zone_ctor (mem=Variable "mem" is not available.
> ) at /usr/src/sys/vm/uma_core.c:1832
> #39 0xffffffff804ffca4 in uma_startup (bootmem=0xffffff001ffac000, boot_pages=48)
> at /usr/src/sys/vm/uma_core.c:1741
> #40 0xffffffff80514822 in vm_page_startup (vaddr=18446744071576817664) at
> /usr/src/sys/vm/vm_page.c:360
> #41 0xffffffff805060c5 in vm_mem_init (dummy=Variable "dummy" is not available.
> ) at /usr/src/sys/vm/vm_init.c:118
> #42 0xffffffff803258b9 in mi_startup () at /usr/src/sys/kern/init_main.c:253
> #43 0xffffffff8017177c in btext () at /usr/src/sys/amd64/amd64/locore.S:81
> [[[
> Note:
> 1. Frame numbers are high because the backtrace is obtained via gdb remotely
> connected to qemu and also there is bunch of extra frames from DDB, etc.
> 2. Line numbers in uma_core. won't match those in FreeBSD tree, because I've doing
> some unrelated hacking in the file.
> ]]]
> 
> The problem seems to be with creation of "UMA Zones" zone and keg.
> Because of the large number of processors, size argument in the following snippet
> is set to a value of 4480:
> 
> args.name = "UMA Zones";
> args.size = sizeof(struct uma_zone) +
>     (sizeof(struct uma_cache) * (mp_maxid + 1));
> 
> Because of this, keg_ctor() calls keg_large_init():
> 
> else if ((keg->uk_size+UMA_FRITM_SZ) >
>     (UMA_SLAB_SIZE - sizeof(struct uma_slab)))
>         keg_large_init(keg);
> else
>         keg_small_init(keg);
> 
> keg_large_init sets UMA_ZONE_OFFPAGE and UMA_ZONE_HASH flags for this keg.
> This leads to hash_alloc() being invoked from keg_ctor():
> 
> if (keg->uk_flags & UMA_ZONE_HASH)
>         hash_alloc(&keg->uk_hash);
> 
> But the problem is that "UMA Hash" zone is not created yet and thus the call leads
> to the panic.  "UMA Hash" zone is the last of system zones created.
> 
> Not sure what the proper fix here could/should be.
> Would it work to simply not set UMA_ZONE_HASH flag when UMA_ZFLAG_INTERNAL is set?
> 
> 
> And some final calculations.
> On the test system sizeof(struct uma_cache) is 128 bytes and (mp_maxid + 1) is 32,
> so it's already UMA_SLAB_SIZE = PAGE_SIZE = 4096.
> 

Here is a simple solution that seems to work:
http://people.freebsd.org/~avg/uma-many-cpus.diff
Not sure if it's the best we can do.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CBBEBDF.3060905>