Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Feb 2009 13:34:25 +1100
From:      Lawrence Stewart <lstewart@freebsd.org>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        kib@freebsd.org, current@freebsd.org
Subject:   Re: Recent versions of pxeboot hang/panic on AMD platform.
Message-ID:  <49A35CB1.4050304@freebsd.org>
In-Reply-To: <20081121231400.GA94863@onelab2.iet.unipi.it>
References:  <20081121231400.GA94863@onelab2.iet.unipi.it>

next in thread | previous in thread | raw e-mail | index | archive | help
Luigi Rizzo wrote:
> [copying some people involved with recent related commits]
> 
> As reported in  kern/118222 recent versions of pxeboot hang/panic
> on AMD platform.
> 
> Initial reports mentioned that the RELENG_6 versions worked well,
> however i found out that even the recent RELENG_6 code is problematic.
> 
> Specifically, the problem i see on two machines with AMD CPU (one
> is an Asus M2N-VM) motherboard netbooting with PXEboot, is that the
> loading of config files or binary modules (kernel, etc.) randomly
> hangs with recent version of pxeboot (RELENG_6, RELENG_7 and HEAD
> all give the same behaviour).
> 
> The same system works fine with an old version of pxeboot from RELENG_6.
> 
> Things seem to work fine on i386 (tried a Pentium4, N270 and on qemu) 
> with all the versions below.
> 
> To make some investigation i started with a reliable version
> (RELENG_6, early 2008) and moved forward to figure out where the
> problem was introduced. I found the following:
> 
>         RELENG_6 as of 2008.03.01 (svn 176674)  works
>         RELENG_6 as of 2008.03.15 (svn 177190)  works
>                 (same as previous)
>         RELENG_6 as of 2008.03.31 (svn 177768)  does NOT work
>             changed files:
>                 Index: RELENG_6/sys/boot/i386/boot2/boot2.c
>                 Index: RELENG_6/sys/boot/i386/btx/btx/Makefile
>                 Index: RELENG_6/sys/boot/i386/btx/btx/btx.S
>                 Index: RELENG_6/sys/boot/i386/gptboot/gptboot.c
>                 Index: RELENG_6/sys/boot/i386/libi386/biossmap.c
>                 Index: RELENG_6/sys/boot/i386/libi386/biosmem.c
> 
> There is a recent, related change (august 2008) which however
> does not seem to fix the bug.
> 
> (all the above is basically an MFC of something applied slightly earlier to
> head and RELENG_7 . I have experienced the same exact bug with a fresh
> head and RELENG_7, even though I have not found the exact point there
> where the problem arised).
> 
> The fact that the failure occurs at random times, even quite early 
> (e.g. while reading the Forth config files) suggests that the problem
> may be related to interrupts coming at the wrong time. 
> 
> Unfortunately the changes to btx.S (which i believe may be related to
> the problem, as the changes to the other files seem innocuous or unrelated)
> are beyond my knowledge. 
> 
> So, anyone has ideas on what could be happening here, and especially
> how likely it is that we might see the same problem with a disk or usb-based
> booting ?

Just adding a "me too" with pxeboot built from head r188509. Running 
with pxeboot from AMD64 6.3-RELEASE as Luigi's research hinted seems to 
resolve the issue for me also. I haven't tried pxeboot built from 
r177768 yet though to see if it too fails.

To quickly touch on symptoms... I've never seen a panic. I experience 
permanent hangs that occur maybe 50% (or possibly even more) of the time 
when I reboot or cold start the machine. Only option is to reboot when 
it hangs. Rebooting a few times will eventually allow the boot process 
to finish and then once the kernel kicks off probing, all is good.

Hardware is an Intel 865GM chipset based Gigabyte mainboard with a 3GHz 
HTT P4 CPU (HTT enabled).

Happy to help debug further if anyone has ideas to try.

Cheers,
Lawrence



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49A35CB1.4050304>