Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Oct 2009 09:49:09 -0700
From:      Pyun YongHyeon <pyunyh@gmail.com>
To:        Mark Atkinson <atkin901@yahoo.com>
Cc:        freebsd-net@freebsd.org, bug-followup@FreeBSD.org
Subject:   Re: kern/124127: [msk] watchdog timeout (missed Tx interrupts) -- recovering
Message-ID:  <20091029164909.GA13275@michelle.cdnetworks.com>
In-Reply-To: <hcc6n3$761$1@ger.gmane.org>
References:  <200910290010.n9T0A3cV083541@freefall.freebsd.org> <hcc6n3$761$1@ger.gmane.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 29, 2009 at 06:52:34AM -0700, Mark Atkinson wrote:
> Wow, not sure what to blame for that charset nightmare.  Apologies.
> Here's the original message:
> 
> On the unpatched -current kernel, built
> 
> FreeBSD hellfire.filament.org 9.0-CURRENT FreeBSD 9.0-CURRENT #14: Mon
> Oct 19 09:12:03 PDT 2009
> 
> I recieved the following panic today related to this:
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address  = 0xdeadc10a
> fault code              = supervisor read, page not present
> instruction pointer    = 0x20:0xc0987410
> stack pointer          = 0x28:0xd533dac0
> frame pointer          = 0x28:0xd533dae8
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process        = 0 (mskc0 taskq)
> Physical memory: 495 MB
> Dumping 132 MB: 117 101 85 69 53 37 21 5
> 
> Reading symbols from /boot/kernel/linux.ko...Reading symbols from
> /boot/kernel/linux.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/linux.ko
> #0  0xc08907a9 in doadump () at /usr/src/sys/kern/kern_shutdown.c:254
> 254    }
> (kgdb) bt
> #0  0xc08907a9 in doadump () at /usr/src/sys/kern/kern_shutdown.c:254
> #1  0xc04f7e37 in db_fncall (dummy1=-1067299898, dummy2=0,
> dummy3=-718022488,
>     dummy4=0xd533d898 "\200%t?") at /usr/src/sys/ddb/db_command.c:548
> #2  0xc04f8214 in db_command (last_cmdp=0xc0da059c, cmd_table=0x0,
> dopager=1)
>     at /usr/src/sys/ddb/db_command.c:445
> #3  0xc04f8352 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
> #4  0xc04fa05e in db_trap (type=12, code=0) at
> /usr/src/sys/ddb/db_main.c:229
> #5  0xc08bf2d2 in kdb_reenter () at /usr/src/sys/kern/subr_kdb.c:398
> #6  0xc0ba9b62 in trap_fatal (frame=0x1, eva=3735929098)
>     at /usr/src/sys/i386/i386/trap.c:938
> #7  0xc0baa483 in trap (frame=0xd533da80) at
> /usr/src/sys/i386/i386/trap.c:339
> #8  0xc0b8e4ab in Xlcall_syscall () at
> /usr/src/sys/i386/i386/exception.s:241
> #9  0xc0987410 in in_lltable_lookup (llt=0xc39e1000, flags=Variable
> "flags" is not available.
> )
>     at /usr/src/sys/netinet/in.c:1380
> #10 0xc0982470 in arpintr (m=0xc3baeb00) at
> /usr/src/sys/netinet/if_ether.c:642
> #11 0xc094227a in netisr_dispatch_src (proto=7, source=0, m=0xc0de)
>     at /usr/src/sys/net/netisr.c:932
> #12 0xc09424dd in netisr_unregister (nhp=0xc0de)
>     at /usr/src/sys/net/netisr.c:583
> #13 0xc093ac69 in ether_demux (ifp=0x0, m=0xc3baeb00)
>     at /usr/src/sys/net/if_ethersubr.c:911
> #14 0xc093b1ce in ether_output (ifp=0xc36ad400, m=0xc3baeb00,
> dst=0xc0c55c27,
>     ro=0x301010a) at /usr/src/sys/net/if_ethersubr.c:181
> ---Type <return> to continue, or q <return> to quit---
> #15 0xc070b032 in msk_handle_events (sc=0xc3686c00)
>     at /usr/src/sys/dev/msk/if_msk.c:3048
> #16 0xc070b828 in msk_int_task (arg=0xc3686c00, pending=1)
>     at /usr/src/sys/dev/msk/if_msk.c:3625
> #17 0xc08cac8c in taskqueue_run (queue=0xc36bf380)
>     at /usr/src/sys/kern/subr_taskqueue.c:72
> #18 0xc08cadcc in taskqueue_thread_loop (arg=0xc3686c8c)
>     at /usr/src/sys/kern/subr_taskqueue.c:90
> #19 0xc0869271 in fork_exit (callout=0xc08cad67 <taskqueue_thread_loop+64>,
>     arg=0xc3686c8c, frame=0xd533dd38) at /usr/src/sys/kern/kern_fork.c:854
> #20 0xc0b8e520 in Xatpic_intr0 () at atpic_vector.s:62
> #21 0x00000000 in ?? ()
> 

I think it's not a bug of msk(4). Qin Li fixed the bug in arp code.
See r198301.

For watchdog timeout issues on 88E8053 controller, did you ever try
disabling MSI? msk(4) was changed a lot since 7.0-RELEASE to
support newer controllers and added several workarounds to address
silicon bugs. So don't blindly apply experimental patches to your
controller. 88E8053 also has a couple of hardware bugs but I guess
msk(4) already incorporated required workarounds. So if you can
reliably reproduce watchdog timeouts please let me know. 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20091029164909.GA13275>