Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Oct 2013 09:07:57 -0700
From:      Adrian Chadd <adrian@freebsd.org>
To:        claudiu vasadi <claudiu.vasadi@gmail.com>,  "freebsd-wireless@freebsd.org" <freebsd-wireless@freebsd.org>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: 9.2-STABLE r255918 with GENERIC and iwn - core dump
Message-ID:  <CAJ-VmomY16hJSJgj5jAq9ywxCsY67AmmesPSSgsb1OekTGikww@mail.gmail.com>
In-Reply-To: <CAM-i3iiem_3-tv90k0NWeJjocx77RhT%2BCeZPHZRHZS3_AsgZkQ@mail.gmail.com>
References:  <CAM-i3iiem_3-tv90k0NWeJjocx77RhT%2BCeZPHZRHZS3_AsgZkQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I know what's causing this!

It's because when the management frame completes, there's a callback mbuf
tag (M_TXCB) that causes the driver to call the net80211 TX completion
callback.

Now, because some drivers call the net80211 tx completion callback from
within their driver locks, it causes locking issues. So, someone (I don't
know or really care who) made it so whenever a TX completion occurs, the
net80211 code will schedule a callout to occur. This means the callout
occurs outside of the driver locks, solving that issue.

This has a bunch of problems.

* Firstly, if you have multiple management frames coming in, only the most
recent will be acknowledged. Tsk. There's only one callout, and it's per
vap.
* Secondly, no node reference is taken before scheduling the callout, so if
the node is destroyed (eg because the BSS is freed during a channel scan or
reset) and the callout still occurs, it'll dereference a bad node. This is
the crash cause.
* Thirdly, the cancellation occurs in the VAP state change path. It doesn't
know about the node(s) that just received TX completions. Since the
callback is per vap, there's no way to figure out which node needs
dereferencing.. so things blow up.

The solution is just to undo this brain damaged solution and require that
drivers call the TX completion callback with no driver locks held. That's
on my TODO list but it'll take a little more time. Now that 10 has branched
I'll be happy to just flip that switch in -HEAD and deal with the locking
fallout.

Thanks,



-adrian



On 22 October 2013 07:28, claudiu vasadi <claudiu.vasadi@gmail.com> wrote:

> Hi everyone,
>
> I have a Lenovo Thinkpad T420s with Intel core i7 @ 2.70GHz, 8GB RAM, Intel
> SSD 160GB and iwn0: <Intel Centrino Ultimate-N 6300> mem
> 0xf4200000-0xf4201fff irq 17 at device 0.0 on pci3
>
> Today, while connecting to different AP's, I noticed at one point that I
> was not getting an IP although the wifi card was associated. Within
> "wifimgr", I did a "Save and Reconnect" and then got a core dump.
>
> Bellow, the bt:
>
>
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
>
> Unread portion of the kernel message buffer:
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address = 0xffffff801e5f7000
> fault code = supervisor read data, page not present
> instruction pointer = 0x20:0xffffffff80a10431
> stack pointer        = 0x28:0xffffff8000276980
> frame pointer        = 0x28:0xffffff8000276a20
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 12 (swi4: clock)
> trap number = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff80948a06 at kdb_backtrace+0x66
> #1 0xffffffff8090e50e at panic+0x1ce
> #2 0xffffffff80cf3440 at trap_fatal+0x290
> #3 0xffffffff80cf37a1 at trap_pfault+0x211
> #4 0xffffffff80cf3d54 at trap+0x344
> #5 0xffffffff80cdd093 at calltrap+0x8
> #6 0xffffffff808dfddd at intr_event_execute_handlers+0xfd
> #7 0xffffffff808e15cd at ithread_loop+0x9d
> #8 0xffffffff808dc82f at fork_exit+0x11f
> #9 0xffffffff80cdd5be at fork_trampoline+0xe
> Uptime: 8h20m28s
> Dumping 952 out of 8106 MB:..2% (CTRL-C to abort)  (CTRL-C to abort)
>  (CTRL-C to abort)  (CTRL-C to abort) ..11% (CTRL-C to abort)  (CTRL-C to
> abort) ..21%..31% (CTRL-C to abort)  (CTRL-C to abort)  (CTRL-C to abort)
>  (CTRL-C to abort)  (CTRL-C to abort) ..41% (CTRL-C to abort)  (CTRL-C to
> abort)  (CTRL-C to abort)  (CTRL-C to abort)  (CTRL-C to abort)  (CTRL-C to
> abort)  (CTRL-C to abort)  (CTRL-C to abort) ..51% (CTRL-C to abort)
>  (CTRL-C to abort) ..61% (CTRL-C to abort) ..71% (CTRL-C to abort)
> ..81%..91%
>
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
> /boot-mount/boot/kernel/zfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
> /boot-mount/boot/kernel/opensolaris.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> Reading symbols from /boot/kernel/geom_eli.ko...Reading symbols from
> /boot-mount/boot/kernel/geom_eli.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/geom_eli.ko
> Reading symbols from /boot/kernel/crypto.ko...Reading symbols from
> /boot-mount/boot/kernel/crypto.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/crypto.ko
> Reading symbols from /boot/kernel/linux.ko...Reading symbols from
> /boot-mount/boot/kernel/linux.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/linux.ko
> Reading symbols from /boot/kernel/drm.ko...Reading symbols from
> /boot-mount/boot/kernel/drm.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/drm.ko
> Reading symbols from /boot/modules/nvidia.ko...done.
> Loaded symbols for /boot/modules/nvidia.ko
> Reading symbols from /boot/kernel/mmc.ko...Reading symbols from
> /boot-mount/boot/kernel/mmc.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/mmc.ko
> Reading symbols from /boot/kernel/mmcsd.ko...Reading symbols from
> /boot-mount/boot/kernel/mmcsd.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/mmcsd.ko
> Reading symbols from /boot/kernel/acpi_call.ko...done.
> Loaded symbols for /boot/kernel/acpi_call.ko
> Reading symbols from /boot/kernel/umodem.ko...Reading symbols from
> /boot-mount/boot/kernel/umodem.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/umodem.ko
> Reading symbols from /boot/modules/vboxnetflt.ko...done.
> Loaded symbols for /boot/modules/vboxnetflt.ko
> Reading symbols from /boot/modules/vboxdrv.ko...done.
> Loaded symbols for /boot/modules/vboxdrv.ko
> Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from
> /boot-mount/boot/kernel/netgraph.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/netgraph.ko
> Reading symbols from /boot/kernel/ng_ether.ko...Reading symbols from
> /boot-mount/boot/kernel/ng_ether.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ng_ether.ko
> Reading symbols from /boot/modules/vboxnetadp.ko...done.
> Loaded symbols for /boot/modules/vboxnetadp.ko
> #0  doadump (textdump=<value optimized out>) at pcpu.h:234
> 234 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) bt
> #0  doadump (textdump=<value optimized out>) at pcpu.h:234
> #1  0xffffffff8090dfe6 in kern_reboot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:449
> #2  0xffffffff8090e4e7 in panic (fmt=0x1 <Address 0x1 out of bounds>) at
> /usr/src/sys/kern/kern_shutdown.c:637
> #3  0xffffffff80cf3440 in trap_fatal (frame=0xc, eva=<value optimized out>)
> at /usr/src/sys/amd64/amd64/trap.c:879
> #4  0xffffffff80cf37a1 in trap_pfault (frame=0xffffff80002768d0,
> usermode=0) at /usr/src/sys/amd64/amd64/trap.c:795
> #5  0xffffffff80cf3d54 in trap (frame=0xffffff80002768d0) at
> /usr/src/sys/amd64/amd64/trap.c:463
> #6  0xffffffff80cdd093 in calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:232
> #7  0xffffffff80a10431 in ieee80211_tx_mgt_timeout (arg=0xffffff801e5f7000)
>     at /usr/src/sys/net80211/ieee80211_output.c:2487
> #8  0xffffffff809246e8 in softclock (arg=<value optimized out>) at
> /usr/src/sys/kern/kern_timeout.c:518
> #9  0xffffffff808dfddd in intr_event_execute_handlers (p=<value optimized
> out>, ie=0xfffffe0007221b00)
>     at /usr/src/sys/kern/kern_intr.c:1272
> #10 0xffffffff808e15cd in ithread_loop (arg=0xfffffe0007209460) at
> /usr/src/sys/kern/kern_intr.c:1285
> #11 0xffffffff808dc82f in fork_exit (callout=0xffffffff808e1530
> <ithread_loop>, arg=0xfffffe0007209460,
>     frame=0xffffff8000276b00) at /usr/src/sys/kern/kern_fork.c:990
> #12 0xffffffff80cdd5be in fork_trampoline () at
> /usr/src/sys/amd64/amd64/exception.S:606
> #13 0x0000000000000000 in ?? ()
>
>
> One thing to keep in mind is that since I started using geli+ZFS (installed
> with PC-BSD 9.1 cd), I always got "Cannot reset interface wlan0 - exit
> status 1" with "wifimgr" whichever action i did (ex: reconect, rescan,
> up/down,etc).
>
>
> I would appreciate some help in debugging this.
>
>
> --
> Best regards,
> Claudiu Vasadi
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmomY16hJSJgj5jAq9ywxCsY67AmmesPSSgsb1OekTGikww>