Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 Apr 2014 10:58:33 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        "Andrey V. Elsukov" <ae@freebsd.org>
Cc:        svn-src-head@freebsd.org, Adrian Chadd <adrian@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, Alan Somers <asomers@freebsd.org>
Subject:   Re: svn commit: r253687 - head/sys/net
Message-ID:  <CAOtMX2i5jwgPfTfXbb9Su5Yd-t=3-p%2BVnAeAPfrFwh7NDzabNA@mail.gmail.com>
In-Reply-To: <535A9093.6010201@FreeBSD.org>
References:  <201307261941.r6QJfEMO087844@svn.freebsd.org> <CAOtMX2iXzXY1zAebqxJYGXw-_HRmRGmkpw7fgyLkvavJGcZy=g@mail.gmail.com> <535A9093.6010201@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 25, 2014 at 10:42 AM, Andrey V. Elsukov <ae@freebsd.org> wrote:
> On 25.04.2014 19:58, Alan Somers wrote:
>> On Fri, Jul 26, 2013 at 1:41 PM, Adrian Chadd <adrian@freebsd.org> wrote:
>>> Author: adrian
>>> Date: Fri Jul 26 19:41:13 2013
>>> New Revision: 253687
>>> URL: http://svnweb.freebsd.org/changeset/base/253687
>>>
>>> Log:
>>>   Break out the static, global LACP debug options into a per-lagg unit
>>>   sysctl tree.
>>>
>>>   * Create a net.link.lagg.X.lacp node
>>
>> I think this introduced a lock order reversal.
>>
>>>   * Add a debug node under that for tx_test and rx_test
>>>   * Add lacp_strict_mode, defaulting to 1
>>>
>>>   tx_test and rx_test are still a bitmap of unit numbers for now.
>>>   At some point it would be nice to create child nodes of the lagg bundle
>>>   for each sub-interface, and then populate those with various knobs
>>>   and statistics.
>>>
>>>   Sponsored by: Netflix
>>>
>>> Modified:
>>>   head/sys/net/ieee8023ad_lacp.c
>>>   head/sys/net/ieee8023ad_lacp.h
>>>   head/sys/net/if_lagg.c
>>>   head/sys/net/if_lagg.h
>>>
>>> Modified: head/sys/net/ieee8023ad_lacp.c
>>> ==============================================================================
>>> --- head/sys/net/ieee8023ad_lacp.c      Fri Jul 26 19:11:08 2013        (r253686)
>>> +++ head/sys/net/ieee8023ad_lacp.c      Fri Jul 26 19:41:13 2013        (r253687)
>>
>> <Extra chunks elided>
>> ;
>>> @@ -765,10 +791,19 @@ lacp_attach(struct lagg_softc *sc)
>>>
>>>         lsc->lsc_hashkey = arc4random();
>>>         lsc->lsc_active_aggregator = NULL;
>>> +       lsc->lsc_strict_mode = 1;
>>>         LACP_LOCK_INIT(lsc);
>>>         TAILQ_INIT(&lsc->lsc_aggregators);
>>>         LIST_INIT(&lsc->lsc_ports);
>>>
>>> +       /* Create a child of the parent lagg interface */
>>> +       oid = SYSCTL_ADD_NODE(&sc->ctx, SYSCTL_CHILDREN(sc->sc_oid),
>>> +           OID_AUTO, "lacp", CTLFLAG_RD, NULL, "LACP");
>>
>> This line grabs a sleepable lock, but we already had a nonsleepable
>> lock further up the stack, acquired in lagg_ioctl().
>>
>>> +
>>> +       /* Attach sysctl nodes */
>>> +       lacp_attach_sysctl(lsc, oid);
>>> +       lacp_attach_sysctl_debug(lsc, oid);
>>> +
>>>         callout_init_mtx(&lsc->lsc_transit_callout, &lsc->lsc_mtx, 0);
>>>         callout_init_mtx(&lsc->lsc_callout, &lsc->lsc_mtx, 0);
>>>
>>
>> Here's the warning from Witness.as well as a warning from UMA.  Many
>> more UMA warnings followed.
>>
>> lock order reversal: (sleepable after non-sleepable)
>> 1st 0xfffff8000252ca08 if_lagg rmlock (if_lagg rmlock) @
>> /usr/home/alans/freebsd/head/sys/modules/if_lagg/../../net/if_lagg.c:1040
>> 2nd 0xffffffff814ef4e0 sysctl lock (sysctl lock) @
>> /usr/home/alans/freebsd/head/sys/kern/kern_sysctl.c:474
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00977485b0
>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0097748660
>> witness_checkorder() at witness_checkorder+0xdc2/frame 0xfffffe00977486f0
>> _sx_xlock() at _sx_xlock+0x75/frame 0xfffffe0097748730
>> sysctl_add_oid() at sysctl_add_oid+0x4a/frame 0xfffffe0097748780
>> lacp_attach() at lacp_attach+0xf7/frame 0xfffffe00977487f0
>> lagg_lacp_attach() at lagg_lacp_attach+0x88/frame 0xfffffe0097748810
>> lagg_ioctl() at lagg_ioctl+0x98a/frame 0xfffffe00977488f0
>> in_control() at in_control+0x38e/frame 0xfffffe0097748970
>> ifioctl() at ifioctl+0xba2/frame 0xfffffe0097748a30
>> kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe0097748a90
>> sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe0097748ae0
>> amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe0097748bf0
>> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0097748bf0
>> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fa045a, rsp =
>> 0x7fffffffe118, rbp = 0x7fffffffe1a0 ---
>> uma_zalloc_arg: zone "128" with the following non-sleepable locks held:
>> exclusive rm if_lagg rmlock (if_lagg rmlock) r = 0
>> (0xfffff8000252ca08) locked @
>> /usr/home/alans/freebsd/head/sys/modules/if_lagg/../../net/if_lagg.c:1040
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0097748500
>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00977485b0
>> witness_warn() at witness_warn+0x4b5/frame 0xfffffe0097748670
>> uma_zalloc_arg() at uma_zalloc_arg+0x3b/frame 0xfffffe00977486e0
>> malloc() at malloc+0x194/frame 0xfffffe0097748730
>> sysctl_add_oid() at sysctl_add_oid+0x11f/frame 0xfffffe0097748780
>> lacp_attach() at lacp_attach+0xf7/frame 0xfffffe00977487f0
>> lagg_lacp_attach() at lagg_lacp_attach+0x88/frame 0xfffffe0097748810
>> lagg_ioctl() at lagg_ioctl+0x98a/frame 0xfffffe00977488f0
>> in_control() at in_control+0x38e/frame 0xfffffe0097748970
>> ifioctl() at ifioctl+0xba2/frame 0xfffffe0097748a30
>> kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe0097748a90
>> sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe0097748ae0
>> amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe0097748bf0
>> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0097748bf0
>> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fa045a, rsp =
>> 0x7fffffffe118, rbp = 0x7fffffffe1a0 ---
>>
>>
>> # uname -a
>> FreeBSD alans-fbsd-head 11.0-CURRENT FreeBSD 11.0-CURRENT #49
>> r264887M: Thu Apr 24 17:21:48 MDT 2014
>> alans@ns1.eng.sldomain.com:/vmpool/obj/usr/home/alans/freebsd/head/sys/GENERIC
>>  amd64
>>
>> To reproduce:
>> ifconfig tap0 create
>> ifconfig tap1 create
>> ifconfig tap2 create
>> ifconfig lagg0 create
>> ifconfig lagg0 up laggproto lacp laggport tap0 laggport tap1 laggport
>> tap2 192.0.0.2/24
>>
>> If I create and destroy the lagg in a tight loop, while running
>> "ifconfig -am" in a tight loop in another terminal, I eventually hit a
>> general protection fault in __mtx_lock_sleep.  I think it might be
>> related.
>
> Do you have a backtrace from this panic?

I can't get a backtrace because every time I panic, my terminal gets
screwed up and unresponsive.  And I have neither serial port nor
firewire setup on this machine.  I'll see if I can get automatic core
dumps to work.

>
>> Can you reproduce this?  Do you have any good ideas for a solution?
>
> I can reproduce a lot of LOR messages, but no panic.

When I commented out the sysctl statements in lacp_attach, the LOR
messages went away.  So did the GPF, but instead I got a page fault.
It seems that there are many bugs in the lagg creation/destruction
code.  I shall continue to investigate...

>
> --
> WBR, Andrey V. Elsukov



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2i5jwgPfTfXbb9Su5Yd-t=3-p%2BVnAeAPfrFwh7NDzabNA>