FreeBSD Mail Archives

Date:      Wed, 7 Nov 2007 22:21:34 +0100
From:      Marius Strobl <marius@alchemy.franken.de>
To:        Alan Cox <alc@cs.rice.edu>
Cc:        alc@FreeBSD.org, Kris Kennaway <kris@FreeBSD.org>, freebsd-sparc64@FreeBSD.org, John Baldwin <jhb@FreeBSD.org>
Subject:   Re: 7.0 broken on e4500
Message-ID:  <20071107212134.GL36824@alchemy.franken.de>
In-Reply-To: <473019E8.3070203@cs.rice.edu>
References:  <46FEADFD.8020105@FreeBSD.org> <20071003132944.GA17342@alchemy.franken.de> <200710060222.31023.jhb@freebsd.org> <20071006132620.GF24840@alchemy.franken.de> <472DFC18.3080000@FreeBSD.org> <472E4573.3090708@FreeBSD.org> <20071104224618.GD36824@alchemy.franken.de> <472E54D0.8070807@FreeBSD.org> <473019E8.3070203@cs.rice.edu>

On Tue, Nov 06, 2007 at 01:38:16AM -0600, Alan Cox wrote:
> Kris Kennaway wrote:
> 
> >Marius Strobl wrote:
> >
> >>On Sun, Nov 04, 2007 at 11:19:31PM +0100, Kris Kennaway wrote:
> >>
> >>>Kris Kennaway wrote:
> >>>
> >>>>Marius Strobl wrote:
> >>>>
> >>>>>On Sat, Oct 06, 2007 at 02:22:30AM -0400, John Baldwin wrote:
> >>>>>
> >>>>>>On Wednesday 03 October 2007 09:29:44 am Marius Strobl wrote:
> >>>>>>
> >>>>>>>On Sat, Sep 29, 2007 at 09:56:45PM +0200, Kris Kennaway wrote:
> >>>>>>>
> >>>>>>>>I get this early during boot with a CVS kernel (updated from last 
> >>>>>>>
> >>>>>>December):
> >>>>>>
> >>>>>>>>>FreeBSD/SMP: Multiprocessor System Detected: 10 CPUs
> >>>>>>>>>panic: tsb_tte_enter: replacing valid kernel mapping
> >>>>>>>>>cpuid = 0
> >>>>>>>>>KDB: enter: panic
> >>>>>>>>>[thread pid 0 tid 0 ]
> >>>>>>>>>Stopped at      kdb_enter+0x68: ta              %xcc, 1
> >>>>>>>>>db> wh
> >>>>>>>>>Tracing pid 0 tid 0 td 0xc0744f80
> >>>>>>>>>panic() at panic+0x204
> >>>>>>>>>tsb_tte_enter() at tsb_tte_enter+0xdc
> >>>>>>>>>pmap_enter_locked() at pmap_enter_locked+0x2d0
> >>>>>>>>>pmap_enter() at pmap_enter+0x64
> >>>>>>>>>kmem_malloc() at kmem_malloc+0x6e0
> >>>>>>>>>page_alloc() at page_alloc+0x28
> >>>>>>>>>uma_large_malloc() at uma_large_malloc+0x44
> >>>>>>>>>malloc() at malloc+0x1b0
> >>>>>>>>>sf_buf_init() at sf_buf_init+0xf8
> >>>>>>>>>mi_startup() at mi_startup+0x18c
> >>>>>>>>>btext() at btext+0x34
> >>>>>>>>
> >>>>>>>Do you by chance load the new kernel manually via the loader
> >>>>>>>prompt, with the old kernel being <= 8MB in size and the new
> >>>>>>>one > 8MB?
> >>>>>>
> >>>>>>I get this panic on an E220R at work, but my "new" kernel is 
> >>>>>>smaller.
> >>>>>>
> >>>>>If the actual panic string is "vm_phys_paddr_to_vm_page: paddr <foo>
> >>>>>is not in any segment" than that's the problem I had in mind when
> >>>>>replying to Kris but unfortunately failed to describe the right
> >>>>>way around.
> >>>>>
> >>>>>>>ll /boot/kernel/kernel* /boot/test/kernel*
> >>>>>>
> >>>>>>-r-xr-xr-x  1 root  wheel   7821094 Feb  6  2007 /boot/kernel/kernel
> >>>>>>-r-xr-xr-x  1 root  wheel  13902501 Feb  6  2007 
> >>>>>>/boot/kernel/kernel.symbols
> >>>>>>-r-xr-xr-x  1 root  wheel   4534968 Oct  6 00:20 /boot/test/kernel
> >>>>>>-r-xr-xr-x  1 root  wheel  10101980 Oct  6 00:20 
> >>>>>>/boot/test/kernel.symbols
> >>>>>>
> >>>>>>The working kernel (~7MB) is the GENERIC kernel, and the "test" 
> >>>>>>kernel
> >>>>>>is the stripped down kernel for this machine.  In my case I'm 
> >>>>>>panicing in pmap_remove_tte() called from pmap_enter_locked().  I 
> >>>>>>added some KTR traces to the pmap code to try and investigate, 
> >>>>>>but I'm guessing the root problem is that the loader doesn't 
> >>>>>>properly handle telling OFW about needing to change the mappings 
> >>>>>>when unloading and then loading a new kernel?
> >>>>>>
> >>>>>>Hmm, it looks like currently the loader doesn't do any sort of MD 
> >>>>>>callback
> >>>>>>when unloading a file, so the loader isn't going to free up the 
> >>>>>>RAM it asked for from OFW for the old kernel.
> >>>>>>
> >>>>>Correct, the immediate problem (which I had a patch for somewhere)
> >>>>>is that in case the "old" kernel required more TLB slots to be used
> >>>>>than the "new" one one can't use the kernel end in order to determine
> >>>>>how many slots are used for the kernel map. As you describe the real
> >>>>>problem lies within the loader though. The funny thing is that no
> >>>>>arch except sparc64 and sun4v seems to rely on the kernel end
> >>>>>provided by the loader.
> >>>>>If no idea what's the cause of the problem Kris is seeing though.
> >>>>>
> >>>>>Marius
> >>>>>
> >>>>>
> >>>>FYI one of the e4500's is now booting again but another is still 
> >>>>failing with the same panic:
> >>>>
> >>>>FreeBSD 8.0-CURRENT #44: Mon Nov  5 01:52:42 JST 2007
> >>>>   root@e4500-2.allbsd.org:/usr/src/sys/sparc64/compile/E4500_2
> >>>>real memory  = 9663676416 (9216 MB)
> >>>>avail memory = 9433554944 (8996 MB)
> >>>>cpu0: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
> >>>>cpu1: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
> >>>>cpu2: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
> >>>>cpu3: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
> >>>>cpu4: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
> >>>>cpu5: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
> >>>>cpu6: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
> >>>>cpu7: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
> >>>>cpu8: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
> >>>>cpu9: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
> >>>>FreeBSD/SMP: Multiprocessor System Detected: 10 CPUs
> >>>>panic: tsb_tte_enter: replacing valid kernel mapping
> >>>>db> wh
> >>>>Tracing pid 0 tid 0 td 0xc056ad30
> >>>>panic() at panic+0x248
> >>>>tsb_tte_enter() at tsb_tte_enter+0xdc
> >>>>pmap_enter_locked() at pmap_enter_locked+0x318
> >>>>pmap_enter() at pmap_enter+0x64
> >>>>kmem_malloc() at kmem_malloc+0x644
> >>>>page_alloc() at page_alloc+0x28
> >>>>uma_large_malloc() at uma_large_malloc+0x44
> >>>>malloc() at malloc+0x1a0
> >>>>sf_buf_init() at sf_buf_init+0xe8
> >>>>mi_startup() at mi_startup+0x1e8
> >>>>btext() at btext+0x34
> >>>>
> 
> Can anyone tell me more about the "vm_phys_paddr_to_vm_page: paddr <foo> 
> is not in any segment" panic?
> 

The relevant info should be also above; if one unloads a kernel
in the loader and loads another one which occupies fewer TLB
slots than the previous one, the excess slots aren't flushed.
The kernel in turn relies on the MODINFOMD_KERNEND provided
by the loader (i.e. the ekva supplied to pmap_bootstrap()) for
calculating the start of KVA however, which doesn't include
the excess slots with locked entries entered by the loader.
Typical panics look like:
cpu0: Sun Microsystems UltraSparc-IIi Processor (440.16 MHz CPU)
panic: vm_phys_paddr_to_vm_page: paddr 0x1e01a000 is not in any segment
cpuid = 0
KDB: enter: panic
[thread pid 0 tid 0 ]
Stopped at      kdb_enter+0x68: ta              %xcc, 1
db> bt
Tracing pid 0 tid 0 td 0xc06a2780
panic() at panic+0x204
vm_phys_paddr_to_vm_page() at vm_phys_paddr_to_vm_page+0x84
pmap_remove_tte() at pmap_remove_tte+0x44
pmap_enter_locked() at pmap_enter_locked+0x1b4
pmap_enter() at pmap_enter+0x94
kmem_malloc() at kmem_malloc+0x69c
page_alloc() at page_alloc+0x28
uma_large_malloc() at uma_large_malloc+0x44
malloc() at malloc+0xc4
sf_buf_init() at sf_buf_init+0xf8
mi_startup() at mi_startup+0x18c
btext() at btext+0x34
db>

Marius

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071107212134.GL36824>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation