Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Dec 2016 01:45:00 +0300
From:      Slawa Olhovchenkov <slw@zxy.spb.ru>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Enabling NUMA in BIOS stop booting FreeBSD
Message-ID:  <20161215224500.GM98176@zxy.spb.ru>
In-Reply-To: <20161215135656.GS94325@kib.kiev.ua>
References:  <20161214102711.GF94325@kib.kiev.ua> <20161214105211.GC98176@zxy.spb.ru> <20161214113927.GG94325@kib.kiev.ua> <20161214121336.GD98176@zxy.spb.ru> <20161214152627.GF98176@zxy.spb.ru> <20161214190349.GJ94325@kib.kiev.ua> <20161215105118.GK98176@zxy.spb.ru> <20161215123330.GQ94325@kib.kiev.ua> <20161215131624.GL98176@zxy.spb.ru> <20161215135656.GS94325@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 15, 2016 at 03:56:56PM +0200, Konstantin Belousov wrote:

> > > Possibly, the dmesg of the boot (with late_console=0) with this and only
> > > this patch applied against stock HEAD.  This might be long.
> > 
> > Do you need all (262144?) lines?
> > 
> > Testing system
> > memory........................................................................................................................pb 0x2040000000
> > pb 0x2040001000
> > pb 0x2040002000
> > pb 0x2040003000
> > pb 0x2040004000
> > pb 0x2040005000
> > pb 0x2040006000
> > [...]
> > pb 0x207ffff000
> > 
> > > diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
> > > index 682307f5fe4..072c8d76acf 100644
> > > --- a/sys/amd64/amd64/machdep.c
> > > +++ b/sys/amd64/amd64/machdep.c
> > > @@ -1400,6 +1400,7 @@ getmemsize(caddr_t kmdp, u_int64_t first)
> > >  			 */
> > >  			*(int *)ptr = tmp;
> > >  
> > > +if (page_bad) printf("pb 0x%lx\n", pa);
> > >  skip_memtest:
> > >  			/*
> > >  			 * Adjust array of valid/good pages.
> > 
> > PS: memtest86 hung at test 128-130G (server have 128G installed).
> Well, the physical memory is 128G, but it is not mapped contiguously into
> the address space accessible to the processors.  E.g. in the SMAPs you
> posted above, there are several holes (type 2) used for PCIe config
> window, PCI BARs, APICs, and other i/o register pages.  Intel chipsets
> allow to remap the RAM hidden by the io pages, which is probably not
> done correctly by BIOS.
> 
> The SMAP clearly reports segment 0x100000000-0x2080000000 as populated
> by RAM, this is 4G-130G.  Very primitive memory test in kernel does
> not like all pages starting at 129G.  Possibly important detail is that
> kernel memory test only touches first 4 bytes on each page.  So if BIOS
> erronously mapped any io registers into that range, memory test might
> luckily avoid touching anything critical, but still noting that the
> page does not behave as RAM.
> 
> Update BIOS, and if the issue persists, contact supermicro. This
> interesting detail adds even more evidence that BIOS is problematic.

Updated BIOS don't solve this.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161215224500.GM98176>