From owner-freebsd-stable@FreeBSD.ORG Sat Mar 9 08:50:00 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B5562B7A for ; Sat, 9 Mar 2013 08:50:00 +0000 (UTC) (envelope-from loic.blot@unix-experience.fr) Received: from smtp.smtpout.orange.fr (smtp10.smtpout.orange.fr [80.12.242.132]) by mx1.freebsd.org (Postfix) with ESMTP id BE8B8601 for ; Sat, 9 Mar 2013 08:49:58 +0000 (UTC) Received: from [10.42.69.152] ([82.120.74.222]) by mwinf5d33 with ME id 9Lpq1l0064nlqgJ03LpqtF; Sat, 09 Mar 2013 09:49:51 +0100 Message-ID: <1362819234.30912.2.camel@Nerz-PC.home> Subject: Re: Strange reboot since 9.1 From: =?ISO-8859-1?Q?Lo=EFc?= BLOT To: freebsd-stable@freebsd.org Date: Sat, 09 Mar 2013 09:53:54 +0100 In-Reply-To: <20130308161613.GA82746@alchemy.franken.de> References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> <51389ED5.6030207@bsdinfo.com.br> <1362670734.16808.48.camel@iMac-LBlot.domain.iogs> <20130307163827.GA96983@icarus.home.lan> <20130308023254.GC3246@michelle.cdnetworks.com> <20130308161613.GA82746@alchemy.franken.de> Organization: UNIX Experience Fr Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-bdBt2vLu8caQi9AXx/Zb" X-Mailer: Evolution 3.6.3 Mime-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: loic.blot@unix-experience.fr List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2013 08:50:00 -0000 --=-bdBt2vLu8caQi9AXx/Zb Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Marius Thanks for your patch, but it has no effect for stability. The server has rebooted this night after 8h uptime, same backtrace appears. --=20 Best regards, Lo=C3=AFc BLOT,=20 UNIX systems, security and network expert http://www.unix-experience.fr Le vendredi 08 mars 2013 =C3=A0 17:16 +0100, Marius Strobl a =C3=A9crit : > On Fri, Mar 08, 2013 at 11:32:54AM +0900, YongHyeon PYUN wrote: > > On Thu, Mar 07, 2013 at 08:38:27AM -0800, Jeremy Chadwick wrote: > > > On Thu, Mar 07, 2013 at 04:38:54PM +0100, Lo?c Blot wrote: > > > > Hi Marcelo, thanks. Here is a better trace: > > > >=20 > > > > --------------------------------- > > > >=20 > > > > kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11 > > > > GNU gdb 6.1.1 [FreeBSD] > > > > Copyright 2004 Free Software Foundation, Inc. > > > > GDB is free software, covered by the GNU General Public License, an= d you > > > > are > > > > welcome to change it and/or distribute copies of it under certain > > > > conditions. > > > > Type "show copying" to see the conditions. > > > > There is absolutely no warranty for GDB. Type "show warranty" for > > > > details. > > > > This GDB was configured as "amd64-marcel-freebsd"... > > > >=20 > > > > Unread portion of the kernel message buffer: > > > >=20 > > > >=20 > > > > Fatal trap 12: page fault while in kernel mode > > > > cpuid =3D 0; apic id =3D 00 > > > > fault virtual address =3D 0x0 > > > > fault code =3D supervisor read data, page not present > > > > instruction pointer =3D 0x20:0xffffffff80a84414 > > > > stack pointer =3D 0x28:0xffffff822fc267a0 > > > > frame pointer =3D 0x28:0xffffff822fc26830 > > > > code segment =3D base 0x0, limit 0xfffff, type 0x1b > > > > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > > > > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > > > > current process =3D 12 (irq265: bce0) > > > > trap number =3D 12 > > > > panic: page fault > > > > cpuid =3D 0 > > > > KDB: stack backtrace: > > > > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > > > > #1 0xffffffff808ea8be at panic+0x1ce > > > > #2 0xffffffff80bd8240 at trap_fatal+0x290 > > > > #3 0xffffffff80bd857d at trap_pfault+0x1ed > > > > #4 0xffffffff80bd8b9e at trap+0x3ce > > > > #5 0xffffffff80bc315f at calltrap+0x8 > > > > #6 0xffffffff80a861d5 at udp_input+0x475 > > > > #7 0xffffffff80a043dc at ip_input+0xac > > > > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b > > > > #9 0xffffffff809a35cd at ether_demux+0x14d > > > > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 > > > > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b > > > > #12 0xffffffff80438fd7 at bce_intr+0x487 > > > > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > > > > #14 0xffffffff808c0076 at ithread_loop+0xa6 > > > > #15 0xffffffff808bb9ef at fork_exit+0x11f > > > > #16 0xffffffff80bc368e at fork_trampoline+0xe > > > > Uptime: 27m20s > > > > Dumping 1265 out of 8162 > > > > MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92% > > > >=20 > > > > #0 doadump (textdump=3DVariable "textdump" is not available. > > > > ) at pcpu.h:224 > > > > 224 pcpu.h: No such file or directory. > > > > in pcpu.h > > > > (kgdb) bt f > > > > #0 doadump (textdump=3DVariable "textdump" is not available. > > > > ) at pcpu.h:224 > > > > No locals. > > > > #1 0xffffffff808ea3a1 in kern_reboot (howto=3D260) > > > > at /usr/src/sys/kern/kern_shutdown.c:448 > > > > _ep =3D Variable "_ep" is not available. > > > > (kgdb) bt > > > > #0 doadump (textdump=3DVariable "textdump" is not available. > > > > ) at pcpu.h:224 > > > > #1 0xffffffff808ea3a1 in kern_reboot (howto=3D260) > > > > at /usr/src/sys/kern/kern_shutdown.c:448 > > > > #2 0xffffffff808ea897 in panic (fmt=3D0x1
) > > > > at /usr/src/sys/kern/kern_shutdown.c:636 > > > > #3 0xffffffff80bd8240 in trap_fatal (frame=3D0xc, eva=3DVariable "= eva" is > > > > not available. > > > > ) at /usr/src/sys/amd64/amd64/trap.c:857 > > > > #4 0xffffffff80bd857d in trap_pfault (frame=3D0xffffff822fc266f0, > > > > usermode=3D0) at /usr/src/sys/amd64/amd64/trap.c:773 > > > > #5 0xffffffff80bd8b9e in trap (frame=3D0xffffff822fc266f0) > > > > at /usr/src/sys/amd64/amd64/trap.c:456 > > > > #6 0xffffffff80bc315f in calltrap () > > > > at /usr/src/sys/amd64/amd64/exception.S:228 > > > > #7 0xffffffff80a84414 in udp_append (inp=3D0xfffffe019e2a1000, > > > > ip=3D0xfffffe00444b6c80, n=3D0xfffffe00444b6c00, off=3D20, > > > > udp_in=3D0xffffff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:2= 52 > > > > #8 0xffffffff80a861d5 in udp_input (m=3D0xfffffe00444b6c00, off=3D= Variable > > > > "off" is not available. > > > > ) at /usr/src/sys/netinet/udp_usrreq.c:618 > > > > #9 0xffffffff80a043dc in ip_input (m=3D0xfffffe00444b6c00) > > > > at /usr/src/sys/netinet/ip_input.c:760 > > > > #10 0xffffffff809adafb in netisr_dispatch_src (proto=3D1, source=3D= Variable > > > > "source" is not available. > > > > ) at /usr/src/sys/net/netisr.c:1013 > > > > #11 0xffffffff809a35cd in ether_demux (ifp=3D0xfffffe00053fa000, > > > > m=3D0xfffffe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940 > > > > #12 0xffffffff809a38a4 in ether_nh_input (m=3DVariable "m" is not > > > > available. > > > > ) at /usr/src/sys/net/if_ethersubr.c:759 > > > > #13 0xffffffff809adafb in netisr_dispatch_src (proto=3D9, source=3D= Variable > > > > "source" is not available. > > > > ) at /usr/src/sys/net/netisr.c:1013 > > > > #14 0xffffffff80438fd7 in bce_intr (xsc=3DVariable "xsc" is not ava= ilable. > > > > ) at /usr/src/sys/dev/bce/if_bce.c:6903 > > > > #15 0xffffffff808be8d4 in intr_event_execute_handlers (p=3DVariable= "p" is > > > > not available. > > > > ) at /usr/src/sys/kern/kern_intr.c:1262 > > > > #16 0xffffffff808c0076 in ithread_loop (arg=3D0xfffffe00057424e0) > > > > at /usr/src/sys/kern/kern_intr.c:1275 > > > > #17 0xffffffff808bb9ef in fork_exit (callout=3D0xffffffff808bffd0 > > > > , arg=3D0xfffffe00057424e0, frame=3D0xffffff822fc26c4= 0) > > > > at /usr/src/sys/kern/kern_fork.c:992 > > > > #18 0xffffffff80bc368e in fork_trampoline () > > > > at /usr/src/sys/amd64/amd64/exception.S:602 > > > > #19 0x0000000000000000 in ?? () > > > > #20 0x0000000000000000 in ?? () > > > > #21 0x0000000000000001 in ?? () > > > > #22 0x0000000000000000 in ?? () > > > > #23 0x0000000000000000 in ?? () > > > > #24 0x0000000000000000 in ?? () > > > > #25 0x0000000000000000 in ?? () > > > > #26 0x0000000000000000 in ?? () > > > > #27 0x0000000000000000 in ?? () > > > > #28 0x0000000000000000 in ?? () > > > > #29 0x0000000000000000 in ?? () > > > > #30 0x0000000000000000 in ?? () > > > > #31 0x0000000000000000 in ?? () > > > > #32 0x0000000000000000 in ?? () > > > > #33 0x0000000000000000 in ?? () > > > > #34 0x0000000000000000 in ?? () > > > > #35 0x0000000000000000 in ?? () > > > > #36 0x0000000000000000 in ?? () > > > > #37 0x0000000000000000 in ?? () > > > > #38 0x0000000000000000 in ?? () > > > > #39 0x0000000000000000 in ?? () > > > > #40 0x0000000000000000 in ?? () > > > > #41 0x0000000000000000 in ?? () > > > > #42 0x0000000000000000 in ?? () > > > > #43 0x0000000000000002 in ?? () > > > > #44 0xffffffff81241c00 in tdq_cpu () > > > > #45 0xfffffe0005501000 in ?? () > > > > #46 0x0000000000000000 in ?? () > > > > #47 0xffffff822fc266d0 in ?? () > > > > #48 0xffffff822fc26678 in ?? () > > > > #49 0xfffffe019ed11470 in ?? () > > > > #50 0xffffffff8091352e in sched_switch (td=3D0x0, > > > > newtd=3D0xfffffe00057424e0, flags=3DVariable "flags" is not availab= le. > > > > ) at /usr/src/sys/kern/sched_ule.c:1921 > > > > Previous frame inner to this frame (corrupt stack?) > > > >=20 > >=20 > > [...] > >=20 > > > CC'ing Yong-Hyeon (yongari@) who helps maintain the bce(4) driver; it > > > looks to me the issue is there. He may have some advice. > >=20 > > I recall there had been a couple of bce(4) related crash reports( > > e.g. kern/171739) but the root cause of the issue was not > > identified yet. Give that most of crash reports indicate bce(4)'s > > RX path, I suspect the driver modifies mbufs passed to upper stack. > > I still have to revive one of my box that can host quad-port bce(4) > > controllers but couldn't find time and new MB. >=20 > I see a possible path leading to exactly that but it's a bit of a > shot in the dark as I don't know how a) the hardware and b) the x86 > bus_dmamap_load_buffer(9) behave in detail. > Loic, could you please give the following patch a try (it's against > the 9.1-RELEASE version of if_bce.c but probably also works with > stable/9)? > http://people.freebsd.org/~marius/bce_cleanup2.diff9.1 >=20 > Marius >=20 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" --=-bdBt2vLu8caQi9AXx/Zb Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iF4EABEIAAYFAlE6+LAACgkQh290DZyz8uYzVgD/SsSRpnT6oLI5MyuriKFcl0eh YnAl3Xsym2V8bxqv7NMA+gO1OgacND1UQtHaLuQzu3j3fMlzE2TjiiNBkch9n9mi =9+S7 -----END PGP SIGNATURE----- --=-bdBt2vLu8caQi9AXx/Zb--