From owner-freebsd-stable@FreeBSD.ORG Sat Oct 23 08:21:42 2010 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B62B4106564A for ; Sat, 23 Oct 2010 08:21:42 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost2.sentex.ca (smarthost2-6.sentex.ca [IPv6:2607:f3e0:80:80::2]) by mx1.freebsd.org (Postfix) with ESMTP id EF1878FC0A for ; Sat, 23 Oct 2010 08:21:41 +0000 (UTC) Received: from lava.sentex.ca (pyroxene.sentex.ca [199.212.134.18]) by smarthost2.sentex.ca (8.14.4/8.14.4) with ESMTP id o9N8LXJg052168 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 23 Oct 2010 04:21:33 -0400 (EDT) (envelope-from mike@sentex.net) Received: from mdt-xp.sentex.net (simeon.sentex.ca [192.168.43.27]) by lava.sentex.ca (8.14.4/8.14.4) with ESMTP id o9N8LVuR001382; Sat, 23 Oct 2010 04:21:31 -0400 (EDT) (envelope-from mike@sentex.net) Message-Id: <201010230821.o9N8LVuR001382@lava.sentex.ca> X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9 Date: Sat, 23 Oct 2010 04:21:31 -0400 To: Jack Vogel From: Mike Tancsa In-Reply-To: References: <201010221416.o9MEGSa0094817@lava.sentex.ca> <201010221425.o9MEPcWC094867@lava.sentex.ca> <201010221848.o9MIm7WF096197@lava.sentex.ca> <4CC1F3B8.3010302@bogus.com> <4CC225D3.1030502@ops-netman.net> <7.1.0.9.0.20101022210145.06fe25e8@sentex.net> <201010230159.o9N1xGGF098363@lava.sentex.ca> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Scanned-By: MIMEDefang 2.67 on 205.211.164.50 Cc: Chris Morrow , Joel Jaeggli , stable , warren@kumari.net, Randy Bush Subject: Re: repeating crashes with 8.1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Oct 2010 08:21:42 -0000 At 12:41 AM 10/23/2010, Jack Vogel wrote: >Odd, can you make any connection between this and the em complaints?? I dont think so. This is on an igb nic and a different panic/behaviour. I have the box sitting at the debugger prompt in the FreeBSD netperf cluster, so hopefully someone can take a look and see what is the issue. ---Mike >Jack > > >On Fri, Oct 22, 2010 at 6:59 PM, Mike Tancsa ><mike@sentex.net> wrote: >At 09:11 PM 10/22/2010, Mike Tancsa wrote: >At 08:01 PM 10/22/2010, Chris Morrow wrote: >Note, Warren and I attempted to test this this evening on a 10.04 Ubuntu >box, no crashy-crashy... > > > >I was able to trigger the issue on box (c). I was ping6ing box (a) >when I did a hard down of (d)'s connected interface. The box then >dropped to debugger > > >Fatal trap 9: general protection fault while in kernel mode >cpuid = 0; apic id = 00 >instruction pointer = 0x20:0xffffffff80740a50 >stack pointer = 0x28:0xffffff800005a890 >frame pointer = 0x28:0xffffff800005a930 > >code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 >processor eflags = interrupt enabled, resume, IOPL = 0 >current process = 12 (swi4: clock) >[thread pid 12 tid 100007 ] >Stopped at in6_cksum+0x410: movzwl (%rsi),%r10d >db> bt >Tracing pid 12 tid 100007 td 0xffffff00025083e0 >in6_cksum() at in6_cksum+0x410 >icmp6_reflect() at icmp6_reflect+0x312 >icmp6_error() at icmp6_error+0x1ec >nd6_llinfo_timer() at nd6_llinfo_timer+0x208 >softclock() at softclock+0x2a6 >intr_event_execute_handlers() at intr_event_execute_handlers+0x66 >ithread_loop() at ithread_loop+0xb2 >fork_exit() at fork_exit+0x12a >fork_trampoline() at fork_trampoline+0xe >--- trap 0, rip = 0, rsp = 0xffffff800005ad30, rbp = 0 --- >db> > > > > >I was able to do it, but not the box I expected > >4 boxes > >(a) Attacking host 2001:db8:1:1/64 >(b) victim, not on a connected interface with a). Outside interface >- em0 - 2001:db8::2:1/64, inside interface - em1 - 2001:db8::3:1/64 >(c) a host behind (b) 2001:db8::3:c/64 >(d) a host behind (b), 2001:db8::3:d/64 > > >hosts (c) and (d) have default gateways to b). (c) however, has a >next hop for (a) via (d). So rather than go out its normal default >gateway, it takes an extra hop via (d). > >Start a ping6 from (a) to (c). Then down (d)'s interface so that >the ping6 fails. Let the ping keep running for an hour or >two. Eventually (b) gets error messages like > >Oct 22 18:38:32 zoo kernel: em1: discard frame w/o packet header > >and crashes. > >Unfortunately, I thought it would be (c) that crapped out, not (b) >and I didnt have crash dumps enabled on the host. Just in the >process of setting up a better environment. > > ---Mike > >-chris > >On 10/22/10 16:27, Joel Jaeggli wrote: > > Ok I'll try testing that on some box I can reach with both hands. > > > > fyi nagasaki is: > > > > [root@nagasaki ~]# uname -a > > FreeBSD nagasaki.bogus.com > 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #13: > > Sun May 30 22:19:23 UTC 2010 > > root@nagasaki.bogus.com:/usr/obj/usr/src/sys/GENERIC i386 > > [root@nagasaki ~]# > > > > > > On 10/22/10 1:17 PM, Randy Bush wrote: > >>>>>>> Do you know how this panic is triggered ? Are you able to > >>>>>>> create it on demand ? > >>>>>> > >>>>>> no i do not. bring server up and it'll happen in half an hour. > >>>>>> and the server was happy for two months. so i am thinking hardware. > >>>>> > >>>>> Perhaps. The reason I ask is that I had a box go down last night with > >>>>> the same set of errors. The box has a number of ipv6 routes, but its > >>>>> next hop was down and the problems started soon after. So I wonder if > >>>>> it has something to do with that. Do you have ipv6 on this box and > >>>>> are all the next hop addresses correct / reachable ? > >>>>> > >>>>> Oct 22 02:06:02 i4 kernel: em1: discard frame w/o packet header > >>>>> Oct 22 02:06:10 i4 kernel: em2: discard frame w/o packet header > >>>>> Oct 22 02:06:21 i4 kernel: em1: discard frame w/o packet header > >>>> > >>>> it was co-incident with a border router being taken down for new router > >>>> install. that router was the v6 exit the servers was using. i have now > >>>> pointed default6 to a different exit. the server seems happy. > >>> > >>> > >>> Are you servers still up ? I guess the question now is how to > >>> trigger this problem on demand. Perhaps lots of inbound ipv6 traffic > >>> with a bad next hop out ? How recent are you sources ? The kernel > >>> said Oct 21st. Were the sources from then too ? > >> > >> yes, kernel and world from 21 oct > >> > >> chris had an idea on retrigger, install a static for a small dest that > >> points to a hole. send a packet to the small dest. > >> > >> randy > >> > > >-------------------------------------------------------------------- >Mike Tancsa, tel +1 519 651 3400 >Sentex >Communications, >mike@sentex.net >Providing Internet since >1994 www.sentex.net >Cambridge, Ontario >Canada www.sentex.net/mike > > >-------------------------------------------------------------------- >Mike Tancsa, tel +1 519 651 3400 >Sentex >Communications, >mike@sentex.net >Providing Internet since >1994 www.sentex.net >Cambridge, Ontario >Canada www.sentex.net/mike > >_______________________________________________ >freebsd-stable@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-stable >To unsubscribe, send any mail to >"freebsd-stable-unsubscribe@freebsd.org" > -------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike