Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Aug 2011 19:39:26 -0700
From:      Garrett Cooper <yanegomi@gmail.com>
To:        pyunyh@gmail.com
Cc:        mdf@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, Pyun YongHyeon <yongari@freebsd.org>
Subject:   Re: Deterministic panic due to non-sleepable lock with if_alc when reconfiguring interfaces
Message-ID:  <CAGH67wTrgDDSFvzVbkz6a%2BAAEKQSoTSA7KUZnmAknRE2QdBr_w@mail.gmail.com>
In-Reply-To: <20110822015502.GE1755@michelle.cdnetworks.com>
References:  <CAGH67wRWVu0qtae7fZjAi9r1H=Tt2QYpgJgF=1stUuWe1dg%2BSw@mail.gmail.com> <CAMBSHm-R0QBCy_FshgXq=neeAaHFTYStWkE=AcJ7NngNchvwxQ@mail.gmail.com> <CAGH67wRPNygNw0h5L73U21jQnAvkr6NM7ASJM=bvXocxZgPo6Q@mail.gmail.com> <20110821234856.GB1755@michelle.cdnetworks.com> <CAGH67wTsSViuSsTgxcUT2gY2Jy=D3HNN2iPdhba9v=e8_4buuA@mail.gmail.com> <20110822015502.GE1755@michelle.cdnetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Aug 21, 2011 at 6:55 PM, YongHyeon PYUN <pyunyh@gmail.com> wrote:
> On Sun, Aug 21, 2011 at 06:26:45PM -0700, Garrett Cooper wrote:
>> On Sun, Aug 21, 2011 at 4:48 PM, YongHyeon PYUN <pyunyh@gmail.com> wrote=
:
>> > On Fri, Aug 19, 2011 at 12:17:12AM -0700, Garrett Cooper wrote:
>> >> On Thu, Aug 18, 2011 at 9:31 PM, ?<mdf@freebsd.org> wrote:
>> >> > On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper <yanegomi@gmail.com=
> wrote:
>> >> >> ? ?When loading if_alc as a module on my netbook and running
>> >> >> /etc/rc.d/netif restart, I can deterministically panic my netbook =
with
>> >> >> the following message:
>> >>
>> >> ? ? These repro steps were overly simplified. The complete steps are:
>> >>
>> >> 1. Attach ethernet cable to alc(4) enabled NIC.
>> >> 2. Boot up machine.
>> >> 3. Login.
>> >> 4. Physically remove ethernet cable from alc(4) enabled NIC.
>> >> 5. Run `/etc/rc.d/netif restart' as root.
>> >>
>> >
>> > I can't reproduce this with AR8151 sample board. Could you give me
>> > dmesg output to know exact controller revision?
>> > One issue I'm aware of is lack of re-establishing link when
>> > controller firmware put its PHY to deep sleep mode. ?The deep sleep
>> > mode seems to be automatically activated by firmware when it
>> > detects no energy signal(i.e. cable unplugged) so I had to down and
>> > up the interface again to take the PHY out of the sleep mode.
>> >
>> >> >> ) at _bus_dmamap_sync+0x51
>> >> >> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e
>> >> >> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc=
_ioctl+0x22e
>> >> >> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioc=
tl+0xc98
>> >> >> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo=
_ioctl+0x401
>> >> >> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioct=
l+0x1d7
>> >> >> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118
>> >> >> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0=
x23f
>> >> >> syscall(e6ca3d28) at syscall+0x2e
>> >> >> Xint0x80_syscall() at Xint0x80_syscall+0x21
>> >> >> --- syscall (54kernel trap 12 with interrupts disabled
>> >> >> Kernel page fault with the following non-sleepable locks held:
>> >> >> exclusive sleep mutex alc0 (network driver) r =3D 0 (0xc3dbc608) l=
ocked
>> >> >> @ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362
>> >> >> KDB: stack backtrace:
>> >> >> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) =
at
>> >> >> db_trace_self_wrapper+0x26
>> >> >> kdb_backtrace(93a,0,ffffffff,c0ad6114,e6ca323c,...) at kdb_backtra=
ce+0x2a
>> >> >> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugge=
r+0x1e
>> >> >> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0=
x1f1
>> >> >> trap(e6ca32dc) at trap+0x15a
>> >> >> calltrap() at calltrap+0x6
>> >> >>
>> >> >> ? ?I tried to track down what the exact issue was, but I got lost
>> >> >> (the locking sort of looks ok to me, but I'm still not an expert w=
ith
>> >> >> mutex(9)).
>> >> >> ? ?I still have the vmcore and can provide more helpful details wh=
en requested.
>> >> >
>> >> > The locking itself is almost certainly fine. ?The error message is =
not
>> >> > very helpful, but what went wrong was the page fault. ?You just hap=
pen
>> >> > to panic on a witness warning before vm_fault can panic due to a ba=
d
>> >> > address.
>> >> >
>> >> > The alc(4) maintainer would probably like info on the trap (line of
>> >> > code and where the bad pointer came from).
>> >>
>> >> ? ? I talked to Xin a bit and as he noted the panic was just a sympto=
m
>> >> of the actual issue at hand. I think the problem is that the rx ring'=
s
>> >> rx_m value isn't set to NULL when an error occurred, but getting to
>> >> the exact problem at hand, the following call is failing:
>> >>
>> >> ? ? ? ? if (bus_dmamap_load_mbuf_sg(sc->alc_cdata.alc_rx_tag, // <-- =
HERE
>> >> ? ? ? ? ? ? sc->alc_cdata.alc_rx_sparemap, m, segs, &nsegs, 0) !=3D 0=
) {
>> >> ? ? ? ? ? ? ? ? m_freem(m);
>> >> ? ? ? ? ? ? ? ? return (ENOBUFS);
>> >> ? ? ? ? }
>> >>
>> >> ? ? It's failing with ENOMEM. Still trying to determine what the exac=
t
>> >
>> > Even if bus_dmamap_load_mbuf_sg(9) fails driver should not panic.
>> > Could you show me full back-trace?
>>
>> =A0 =A0 I tried to hack the kernel to get it to dump properly, but that
>> inevitably failed (one of the buffers or the stack data associated
>> probably got stomped on when the system panicked).
>> =A0 =A0 Here are some pics.
>
> Thanks a lot. I see that alc(4) failed to allocate RX buffers and
> it seems the panic happened in alc_stop(). =A0But I can't understand
> how it could be triggered. =A0When RX buffer allocation failed, the
> mbuf pointer would have been NULL such that bus_dmamap_sync(9)
> wouldn't be invoked in alc_stop().
> I also see you have wireless network setup in the back trace. Could
> you also reproduce alc(4) panic without wireless network
> configuration?

Unfortunately disabling wireless and if_ath still yields the panic.
-Garrett



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGH67wTrgDDSFvzVbkz6a%2BAAEKQSoTSA7KUZnmAknRE2QdBr_w>