Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Jun 2007 20:32:17 +0200
From:      Peter Holm <peter@holm.cc>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-acpi@freebsd.org
Subject:   Re: Possible ACPI relared panic with Tyan S2720
Message-ID:  <20070605183216.GA23211@peter.osted.lan>
In-Reply-To: <200706051326.22581.jhb@freebsd.org>
References:  <20070604183419.GA73268@peter.osted.lan> <200706051027.29879.jhb@freebsd.org> <20070605164402.GA18091@peter.osted.lan> <200706051326.22581.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jun 05, 2007 at 01:26:22PM -0400, John Baldwin wrote:
> On Tuesday 05 June 2007 12:44:02 pm Peter Holm wrote:
> > On Tue, Jun 05, 2007 at 10:27:29AM -0400, John Baldwin wrote:
> > > On Tuesday 05 June 2007 04:44:54 am Nate Lawson wrote:
> > > > Peter Holm wrote:
> > > > > On Mon, Jun 04, 2007 at 12:45:23PM -0700, Nate Lawson wrote:
> > > > >> This is a really confusing issue.  All the trace you have shows is 
> that
> > > > >> it occurs while transitioning the system from legacy to ACPI mode.
> > > > >> Unfortunately, the details of what is going on are hidden in the BIOS
> > > > >> since that write to a port triggers an SMI and the BIOS does the 
> rest.
> > > > >>
> > > > >> However, it seems like the BIOS is reserving more memory, using 
> memory
> > > > >> it didn't reserve, or FreeBSD is using memory we shouldn't.  John, 
> any
> > > > >> insight on the SMAP output?
> > > > >>
> > > > >>> SMAP type=01 base=0000000000000000 len=000000000009fc00
> > > > >>> SMAP type=02 base=000000000009fc00 len=0000000000000400
> > > > >>> SMAP type=02 base=00000000000e0000 len=0000000000020000
> > > > >>> SMAP type=01 base=0000000000100000 len=000000003fef0000
> > > > >>> SMAP type=03 base=000000003fff0000 len=000000000000f000
> > > > >>> SMAP type=04 base=000000003ffff000 len=0000000000001000
> > > > >>> SMAP type=02 base=00000000fec00000 len=0000000000100000
> > > > >>> SMAP type=02 base=00000000fee00000 len=0000000000001000
> > > > >>> SMAP type=02 base=00000000fff80000 len=0000000000080000
> > > > >> Peter, can you figure out what phys address is getting overwritten?
> > > > >> Seems like it's the loader that sets up the module list and the 
> loader's
> > > > >> allocator may be using RAM it shouldn't.
> > > > >>
> > > > > 
> > > > > If I did it right (I used a vtophys() on the address):
> > > > > 
> > > > > Address of mod->name(if_tun): 0xc3eed5ec, phys: 0x985ec
> > > > 
> > > > So it's somewhere near 620K and the first region goes to 640K - 1 K.
> > > > The last 1 K is type 2 (reserved).  Nothing seems to show why switching
> > > > to acpi mode results in an overwrite of data at 620K.  I'm not sure
> > > > where to look.
> > > > 
> > > > There should be some way to write a guard pattern to that area but I'll
> > > > have to think about it a bit first.  Can you see if a BIOS update is
> > > > available and try it out?  What about seeing if you can pre-alloc (by
> > > > hacking loader's SMAP code to reserve more of the first 640 K) and
> > > > writing a pattern there, then verifying it at various points during boot
> > > > to be sure we know exactly where the BIOS is writing?
> > > 
> > > Err, the loader should not be storing modules that low.  Did you kldload 
> the 
> > > module or load it via the loader?
> > > 
> > 
> > I did not load the module. It's loaded automatically by the loader.
> > 
> > This is my /boot/loader.conf
> > 
> > kernel_options="-D"
> > machdep.hyperthreading_allowed=1
> > hw.ata.atapi_dma=0
> 
> Are you sure it isn't loaded by ifconfig during boot and thus via an implicit 
> kldload?  The loader only loads modules into memory > KERNLOAD (2MB for PAE, 
> 4MB for non-PAE).
> 
No, I'm not sure at all!

I have tried to manually load acpi.ko at the loader prompt and also to
add acpi_load="YES" to /boot/loader.conf. This still overwrites the
if_tun entry in the modules list.

Typing unset acpi_load at the loader prompt works and I can then later
load acpi:

$ kldstat
Id Refs Address    Size     Name
 1    1 0xc0400000 889124   kernel
$ kldload acpi.ko
$ kldstat
Id Refs Address    Size     Name
 1    3 0xc0400000 889124   kernel
 2    1 0xc48af000 57000    acpi.ko

Just to summarize the problem:

The memory corruption comes and goes depending on the kernel config
file. I first identified the "cause" to be files committed by scottl
at 2007/05/14 21:48, which just introduces new malloc types.

Right now GENERIC works fine again, but if I remove the newly added:

nodevice         fwip            # IP over FireWire (RFC 2734,3146)
nodevice         dcons           # Dumb console driver
nodevice         dcons_crom      # Configuration ROM for dcons

the problem pops up again.
-- 
Peter



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070605183216.GA23211>