Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Mar 2013 17:16:13 +0100
From:      Marius Strobl <marius@alchemy.franken.de>
To:        YongHyeon PYUN <pyunyh@gmail.com>
Cc:        Jeremy Chadwick <jdc@koitsu.org>, Lo?c Blot <loic.blot@unix-experience.fr>, freebsd-stable@freebsd.org, yongari@freebsd.org
Subject:   Re: Strange reboot since 9.1
Message-ID:  <20130308161613.GA82746@alchemy.franken.de>
In-Reply-To: <20130308023254.GC3246@michelle.cdnetworks.com>
References:  <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <CAJ-UWtTA2P26PUa=6%2B3xR4idC5RqeXnK2s-jw3815Y6Dif-Sng@mail.gmail.com> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> <51389ED5.6030207@bsdinfo.com.br> <1362670734.16808.48.camel@iMac-LBlot.domain.iogs> <20130307163827.GA96983@icarus.home.lan> <20130308023254.GC3246@michelle.cdnetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 08, 2013 at 11:32:54AM +0900, YongHyeon PYUN wrote:
> On Thu, Mar 07, 2013 at 08:38:27AM -0800, Jeremy Chadwick wrote:
> > On Thu, Mar 07, 2013 at 04:38:54PM +0100, Lo?c Blot wrote:
> > > Hi Marcelo, thanks. Here is a better trace:
> > > 
> > > ---------------------------------
> > > 
> > > kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11
> > > GNU gdb 6.1.1 [FreeBSD]
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you
> > > are
> > > welcome to change it and/or distribute copies of it under certain
> > > conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for
> > > details.
> > > This GDB was configured as "amd64-marcel-freebsd"...
> > > 
> > > Unread portion of the kernel message buffer:
> > > 
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 0; apic id = 00
> > > fault virtual address	= 0x0
> > > fault code		= supervisor read data, page not present
> > > instruction pointer	= 0x20:0xffffffff80a84414
> > > stack pointer	        = 0x28:0xffffff822fc267a0
> > > frame pointer	        = 0x28:0xffffff822fc26830
> > > code segment		= base 0x0, limit 0xfffff, type 0x1b
> > > 			= DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags	= interrupt enabled, resume, IOPL = 0
> > > current process		= 12 (irq265: bce0)
> > > trap number		= 12
> > > panic: page fault
> > > cpuid = 0
> > > KDB: stack backtrace:
> > > #0 0xffffffff809208a6 at kdb_backtrace+0x66
> > > #1 0xffffffff808ea8be at panic+0x1ce
> > > #2 0xffffffff80bd8240 at trap_fatal+0x290
> > > #3 0xffffffff80bd857d at trap_pfault+0x1ed
> > > #4 0xffffffff80bd8b9e at trap+0x3ce
> > > #5 0xffffffff80bc315f at calltrap+0x8
> > > #6 0xffffffff80a861d5 at udp_input+0x475
> > > #7 0xffffffff80a043dc at ip_input+0xac
> > > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b
> > > #9 0xffffffff809a35cd at ether_demux+0x14d
> > > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4
> > > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b
> > > #12 0xffffffff80438fd7 at bce_intr+0x487
> > > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104
> > > #14 0xffffffff808c0076 at ithread_loop+0xa6
> > > #15 0xffffffff808bb9ef at fork_exit+0x11f
> > > #16 0xffffffff80bc368e at fork_trampoline+0xe
> > > Uptime: 27m20s
> > > Dumping 1265 out of 8162
> > > MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92%
> > > 
> > > #0  doadump (textdump=Variable "textdump" is not available.
> > > ) at pcpu.h:224
> > > 224	pcpu.h: No such file or directory.
> > > 	in pcpu.h
> > > (kgdb) bt f
> > > #0  doadump (textdump=Variable "textdump" is not available.
> > > ) at pcpu.h:224
> > > No locals.
> > > #1  0xffffffff808ea3a1 in kern_reboot (howto=260)
> > > at /usr/src/sys/kern/kern_shutdown.c:448
> > > 	_ep = Variable "_ep" is not available.
> > > (kgdb) bt
> > > #0  doadump (textdump=Variable "textdump" is not available.
> > > ) at pcpu.h:224
> > > #1  0xffffffff808ea3a1 in kern_reboot (howto=260)
> > > at /usr/src/sys/kern/kern_shutdown.c:448
> > > #2  0xffffffff808ea897 in panic (fmt=0x1 <Address 0x1 out of bounds>)
> > > at /usr/src/sys/kern/kern_shutdown.c:636
> > > #3  0xffffffff80bd8240 in trap_fatal (frame=0xc, eva=Variable "eva" is
> > > not available.
> > > ) at /usr/src/sys/amd64/amd64/trap.c:857
> > > #4  0xffffffff80bd857d in trap_pfault (frame=0xffffff822fc266f0,
> > > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773
> > > #5  0xffffffff80bd8b9e in trap (frame=0xffffff822fc266f0)
> > > at /usr/src/sys/amd64/amd64/trap.c:456
> > > #6  0xffffffff80bc315f in calltrap ()
> > > at /usr/src/sys/amd64/amd64/exception.S:228
> > > #7  0xffffffff80a84414 in udp_append (inp=0xfffffe019e2a1000,
> > > ip=0xfffffe00444b6c80, n=0xfffffe00444b6c00, off=20,
> > > udp_in=0xffffff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:252
> > > #8  0xffffffff80a861d5 in udp_input (m=0xfffffe00444b6c00, off=Variable
> > > "off" is not available.
> > > ) at /usr/src/sys/netinet/udp_usrreq.c:618
> > > #9  0xffffffff80a043dc in ip_input (m=0xfffffe00444b6c00)
> > > at /usr/src/sys/netinet/ip_input.c:760
> > > #10 0xffffffff809adafb in netisr_dispatch_src (proto=1, source=Variable
> > > "source" is not available.
> > > ) at /usr/src/sys/net/netisr.c:1013
> > > #11 0xffffffff809a35cd in ether_demux (ifp=0xfffffe00053fa000,
> > > m=0xfffffe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940
> > > #12 0xffffffff809a38a4 in ether_nh_input (m=Variable "m" is not
> > > available.
> > > ) at /usr/src/sys/net/if_ethersubr.c:759
> > > #13 0xffffffff809adafb in netisr_dispatch_src (proto=9, source=Variable
> > > "source" is not available.
> > > ) at /usr/src/sys/net/netisr.c:1013
> > > #14 0xffffffff80438fd7 in bce_intr (xsc=Variable "xsc" is not available.
> > > ) at /usr/src/sys/dev/bce/if_bce.c:6903
> > > #15 0xffffffff808be8d4 in intr_event_execute_handlers (p=Variable "p" is
> > > not available.
> > > ) at /usr/src/sys/kern/kern_intr.c:1262
> > > #16 0xffffffff808c0076 in ithread_loop (arg=0xfffffe00057424e0)
> > > at /usr/src/sys/kern/kern_intr.c:1275
> > > #17 0xffffffff808bb9ef in fork_exit (callout=0xffffffff808bffd0
> > > <ithread_loop>, arg=0xfffffe00057424e0, frame=0xffffff822fc26c40)
> > > at /usr/src/sys/kern/kern_fork.c:992
> > > #18 0xffffffff80bc368e in fork_trampoline ()
> > > at /usr/src/sys/amd64/amd64/exception.S:602
> > > #19 0x0000000000000000 in ?? ()
> > > #20 0x0000000000000000 in ?? ()
> > > #21 0x0000000000000001 in ?? ()
> > > #22 0x0000000000000000 in ?? ()
> > > #23 0x0000000000000000 in ?? ()
> > > #24 0x0000000000000000 in ?? ()
> > > #25 0x0000000000000000 in ?? ()
> > > #26 0x0000000000000000 in ?? ()
> > > #27 0x0000000000000000 in ?? ()
> > > #28 0x0000000000000000 in ?? ()
> > > #29 0x0000000000000000 in ?? ()
> > > #30 0x0000000000000000 in ?? ()
> > > #31 0x0000000000000000 in ?? ()
> > > #32 0x0000000000000000 in ?? ()
> > > #33 0x0000000000000000 in ?? ()
> > > #34 0x0000000000000000 in ?? ()
> > > #35 0x0000000000000000 in ?? ()
> > > #36 0x0000000000000000 in ?? ()
> > > #37 0x0000000000000000 in ?? ()
> > > #38 0x0000000000000000 in ?? ()
> > > #39 0x0000000000000000 in ?? ()
> > > #40 0x0000000000000000 in ?? ()
> > > #41 0x0000000000000000 in ?? ()
> > > #42 0x0000000000000000 in ?? ()
> > > #43 0x0000000000000002 in ?? ()
> > > #44 0xffffffff81241c00 in tdq_cpu ()
> > > #45 0xfffffe0005501000 in ?? ()
> > > #46 0x0000000000000000 in ?? ()
> > > #47 0xffffff822fc266d0 in ?? ()
> > > #48 0xffffff822fc26678 in ?? ()
> > > #49 0xfffffe019ed11470 in ?? ()
> > > #50 0xffffffff8091352e in sched_switch (td=0x0,
> > > newtd=0xfffffe00057424e0, flags=Variable "flags" is not available.
> > > ) at /usr/src/sys/kern/sched_ule.c:1921
> > > Previous frame inner to this frame (corrupt stack?)
> > > 
> 
> [...]
> 
> > CC'ing Yong-Hyeon (yongari@) who helps maintain the bce(4) driver; it
> > looks to me the issue is there.  He may have some advice.
> 
> I recall there had been a couple of bce(4) related crash reports(
> e.g. kern/171739) but the root cause of the issue was not
> identified yet. Give that most of crash reports indicate bce(4)'s
> RX path, I suspect the driver modifies mbufs passed to upper stack.
> I still have to revive one of my box that can host quad-port bce(4)
> controllers but couldn't find time and new MB.

I see a possible path leading to exactly that but it's a bit of a
shot in the dark as I don't know how a) the hardware and b) the x86
bus_dmamap_load_buffer(9) behave in detail.
Loic, could you please give the following patch a try (it's against
the 9.1-RELEASE version of if_bce.c but probably also works with
stable/9)?
http://people.freebsd.org/~marius/bce_cleanup2.diff9.1

Marius




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130308161613.GA82746>