Date: Fri, 19 Aug 2011 00:17:12 -0700 From: Garrett Cooper <yanegomi@gmail.com> To: mdf@freebsd.org Cc: FreeBSD Current <freebsd-current@freebsd.org>, Pyun YongHyeon <yongari@freebsd.org> Subject: Re: Deterministic panic due to non-sleepable lock with if_alc when reconfiguring interfaces Message-ID: <CAGH67wRPNygNw0h5L73U21jQnAvkr6NM7ASJM=bvXocxZgPo6Q@mail.gmail.com> In-Reply-To: <CAMBSHm-R0QBCy_FshgXq=neeAaHFTYStWkE=AcJ7NngNchvwxQ@mail.gmail.com> References: <CAGH67wRWVu0qtae7fZjAi9r1H=Tt2QYpgJgF=1stUuWe1dg%2BSw@mail.gmail.com> <CAMBSHm-R0QBCy_FshgXq=neeAaHFTYStWkE=AcJ7NngNchvwxQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 18, 2011 at 9:31 PM, <mdf@freebsd.org> wrote: > On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper <yanegomi@gmail.com> wrot= e: >> =A0 =A0When loading if_alc as a module on my netbook and running >> /etc/rc.d/netif restart, I can deterministically panic my netbook with >> the following message: These repro steps were overly simplified. The complete steps are: 1. Attach ethernet cable to alc(4) enabled NIC. 2. Boot up machine. 3. Login. 4. Physically remove ethernet cable from alc(4) enabled NIC. 5. Run `/etc/rc.d/netif restart' as root. >> ) at _bus_dmamap_sync+0x51 >> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e >> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc_ioctl= +0x22e >> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioctl+0xc= 98 >> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo_ioctl= +0x401 >> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d= 7 >> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118 >> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f >> syscall(e6ca3d28) at syscall+0x2e >> Xint0x80_syscall() at Xint0x80_syscall+0x21 >> --- syscall (54kernel trap 12 with interrupts disabled >> Kernel page fault with the following non-sleepable locks held: >> exclusive sleep mutex alc0 (network driver) r =3D 0 (0xc3dbc608) locked >> @ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362 >> KDB: stack backtrace: >> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at >> db_trace_self_wrapper+0x26 >> kdb_backtrace(93a,0,ffffffff,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2= a >> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e >> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1 >> trap(e6ca32dc) at trap+0x15a >> calltrap() at calltrap+0x6 >> >> =A0 =A0I tried to track down what the exact issue was, but I got lost >> (the locking sort of looks ok to me, but I'm still not an expert with >> mutex(9)). >> =A0 =A0I still have the vmcore and can provide more helpful details when= requested. > > The locking itself is almost certainly fine. =A0The error message is not > very helpful, but what went wrong was the page fault. =A0You just happen > to panic on a witness warning before vm_fault can panic due to a bad > address. > > The alc(4) maintainer would probably like info on the trap (line of > code and where the bad pointer came from). I talked to Xin a bit and as he noted the panic was just a symptom of the actual issue at hand. I think the problem is that the rx ring's rx_m value isn't set to NULL when an error occurred, but getting to the exact problem at hand, the following call is failing: if (bus_dmamap_load_mbuf_sg(sc->alc_cdata.alc_rx_tag, // <-- HERE sc->alc_cdata.alc_rx_sparemap, m, segs, &nsegs, 0) !=3D 0) { m_freem(m); return (ENOBUFS); } It's failing with ENOMEM. Still trying to determine what the exact reason for ENOMEM is from the x86 busdma code though.. Thanks, -Garrett
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGH67wRPNygNw0h5L73U21jQnAvkr6NM7ASJM=bvXocxZgPo6Q>