From owner-svn-src-all@FreeBSD.ORG Fri Apr 25 16:58:37 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 10D07456; Fri, 25 Apr 2014 16:58:37 +0000 (UTC) Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com [IPv6:2a00:1450:400c:c05::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D1E211B65; Fri, 25 Apr 2014 16:58:35 +0000 (UTC) Received: by mail-wi0-f169.google.com with SMTP id hm4so2938218wib.0 for ; Fri, 25 Apr 2014 09:58:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=Lj3nMWHr6FY8DKpkWOZ02k712zs82nUvhJ+Tt0CWnYE=; b=zFLIbceBe6W9aIvum0kYruq76zUcROBd2txH/+eJ+2ow6RRVWIlW9vNewifZB8BzHm zzt1LzBn6G76ti7abzZuY0YliYBdhOzA1RkglVWVdBLeBjw6+F5rbl8j3kYXElBRx9Xz XO7jYlmOrYfzczA7pK66TsoyEtv0tyPjHkNPIFik/ENLSXImXQ+4f7bLE12kkqxCTZyf SOVOJBXtr6h8iBp7B9iTPImLVgbuN8eUOdrWVDsLq3eoU6pb9Hhz0Edf+9us7ugNOuq6 SswKp38Pi3ywl6t+m2K1JaGzQro13VkRtp1lAni8bMVcwwSjufcLmWa2BXC33V5a4qei rW6w== MIME-Version: 1.0 X-Received: by 10.180.92.131 with SMTP id cm3mr4641669wib.40.1398445113974; Fri, 25 Apr 2014 09:58:33 -0700 (PDT) Sender: asomers@gmail.com Received: by 10.194.168.130 with HTTP; Fri, 25 Apr 2014 09:58:33 -0700 (PDT) In-Reply-To: <535A9093.6010201@FreeBSD.org> References: <201307261941.r6QJfEMO087844@svn.freebsd.org> <535A9093.6010201@FreeBSD.org> Date: Fri, 25 Apr 2014 10:58:33 -0600 X-Google-Sender-Auth: NfAcuk3cDRx3XSWnvSU2_AEafBQ Message-ID: Subject: Re: svn commit: r253687 - head/sys/net From: Alan Somers To: "Andrey V. Elsukov" Content-Type: text/plain; charset=UTF-8 Cc: svn-src-head@freebsd.org, Adrian Chadd , src-committers@freebsd.org, svn-src-all@freebsd.org, Alan Somers X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Apr 2014 16:58:37 -0000 On Fri, Apr 25, 2014 at 10:42 AM, Andrey V. Elsukov wrote: > On 25.04.2014 19:58, Alan Somers wrote: >> On Fri, Jul 26, 2013 at 1:41 PM, Adrian Chadd wrote: >>> Author: adrian >>> Date: Fri Jul 26 19:41:13 2013 >>> New Revision: 253687 >>> URL: http://svnweb.freebsd.org/changeset/base/253687 >>> >>> Log: >>> Break out the static, global LACP debug options into a per-lagg unit >>> sysctl tree. >>> >>> * Create a net.link.lagg.X.lacp node >> >> I think this introduced a lock order reversal. >> >>> * Add a debug node under that for tx_test and rx_test >>> * Add lacp_strict_mode, defaulting to 1 >>> >>> tx_test and rx_test are still a bitmap of unit numbers for now. >>> At some point it would be nice to create child nodes of the lagg bundle >>> for each sub-interface, and then populate those with various knobs >>> and statistics. >>> >>> Sponsored by: Netflix >>> >>> Modified: >>> head/sys/net/ieee8023ad_lacp.c >>> head/sys/net/ieee8023ad_lacp.h >>> head/sys/net/if_lagg.c >>> head/sys/net/if_lagg.h >>> >>> Modified: head/sys/net/ieee8023ad_lacp.c >>> ============================================================================== >>> --- head/sys/net/ieee8023ad_lacp.c Fri Jul 26 19:11:08 2013 (r253686) >>> +++ head/sys/net/ieee8023ad_lacp.c Fri Jul 26 19:41:13 2013 (r253687) >> >> >> ; >>> @@ -765,10 +791,19 @@ lacp_attach(struct lagg_softc *sc) >>> >>> lsc->lsc_hashkey = arc4random(); >>> lsc->lsc_active_aggregator = NULL; >>> + lsc->lsc_strict_mode = 1; >>> LACP_LOCK_INIT(lsc); >>> TAILQ_INIT(&lsc->lsc_aggregators); >>> LIST_INIT(&lsc->lsc_ports); >>> >>> + /* Create a child of the parent lagg interface */ >>> + oid = SYSCTL_ADD_NODE(&sc->ctx, SYSCTL_CHILDREN(sc->sc_oid), >>> + OID_AUTO, "lacp", CTLFLAG_RD, NULL, "LACP"); >> >> This line grabs a sleepable lock, but we already had a nonsleepable >> lock further up the stack, acquired in lagg_ioctl(). >> >>> + >>> + /* Attach sysctl nodes */ >>> + lacp_attach_sysctl(lsc, oid); >>> + lacp_attach_sysctl_debug(lsc, oid); >>> + >>> callout_init_mtx(&lsc->lsc_transit_callout, &lsc->lsc_mtx, 0); >>> callout_init_mtx(&lsc->lsc_callout, &lsc->lsc_mtx, 0); >>> >> >> Here's the warning from Witness.as well as a warning from UMA. Many >> more UMA warnings followed. >> >> lock order reversal: (sleepable after non-sleepable) >> 1st 0xfffff8000252ca08 if_lagg rmlock (if_lagg rmlock) @ >> /usr/home/alans/freebsd/head/sys/modules/if_lagg/../../net/if_lagg.c:1040 >> 2nd 0xffffffff814ef4e0 sysctl lock (sysctl lock) @ >> /usr/home/alans/freebsd/head/sys/kern/kern_sysctl.c:474 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00977485b0 >> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0097748660 >> witness_checkorder() at witness_checkorder+0xdc2/frame 0xfffffe00977486f0 >> _sx_xlock() at _sx_xlock+0x75/frame 0xfffffe0097748730 >> sysctl_add_oid() at sysctl_add_oid+0x4a/frame 0xfffffe0097748780 >> lacp_attach() at lacp_attach+0xf7/frame 0xfffffe00977487f0 >> lagg_lacp_attach() at lagg_lacp_attach+0x88/frame 0xfffffe0097748810 >> lagg_ioctl() at lagg_ioctl+0x98a/frame 0xfffffe00977488f0 >> in_control() at in_control+0x38e/frame 0xfffffe0097748970 >> ifioctl() at ifioctl+0xba2/frame 0xfffffe0097748a30 >> kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe0097748a90 >> sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe0097748ae0 >> amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe0097748bf0 >> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0097748bf0 >> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fa045a, rsp = >> 0x7fffffffe118, rbp = 0x7fffffffe1a0 --- >> uma_zalloc_arg: zone "128" with the following non-sleepable locks held: >> exclusive rm if_lagg rmlock (if_lagg rmlock) r = 0 >> (0xfffff8000252ca08) locked @ >> /usr/home/alans/freebsd/head/sys/modules/if_lagg/../../net/if_lagg.c:1040 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0097748500 >> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00977485b0 >> witness_warn() at witness_warn+0x4b5/frame 0xfffffe0097748670 >> uma_zalloc_arg() at uma_zalloc_arg+0x3b/frame 0xfffffe00977486e0 >> malloc() at malloc+0x194/frame 0xfffffe0097748730 >> sysctl_add_oid() at sysctl_add_oid+0x11f/frame 0xfffffe0097748780 >> lacp_attach() at lacp_attach+0xf7/frame 0xfffffe00977487f0 >> lagg_lacp_attach() at lagg_lacp_attach+0x88/frame 0xfffffe0097748810 >> lagg_ioctl() at lagg_ioctl+0x98a/frame 0xfffffe00977488f0 >> in_control() at in_control+0x38e/frame 0xfffffe0097748970 >> ifioctl() at ifioctl+0xba2/frame 0xfffffe0097748a30 >> kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe0097748a90 >> sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe0097748ae0 >> amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe0097748bf0 >> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0097748bf0 >> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fa045a, rsp = >> 0x7fffffffe118, rbp = 0x7fffffffe1a0 --- >> >> >> # uname -a >> FreeBSD alans-fbsd-head 11.0-CURRENT FreeBSD 11.0-CURRENT #49 >> r264887M: Thu Apr 24 17:21:48 MDT 2014 >> alans@ns1.eng.sldomain.com:/vmpool/obj/usr/home/alans/freebsd/head/sys/GENERIC >> amd64 >> >> To reproduce: >> ifconfig tap0 create >> ifconfig tap1 create >> ifconfig tap2 create >> ifconfig lagg0 create >> ifconfig lagg0 up laggproto lacp laggport tap0 laggport tap1 laggport >> tap2 192.0.0.2/24 >> >> If I create and destroy the lagg in a tight loop, while running >> "ifconfig -am" in a tight loop in another terminal, I eventually hit a >> general protection fault in __mtx_lock_sleep. I think it might be >> related. > > Do you have a backtrace from this panic? I can't get a backtrace because every time I panic, my terminal gets screwed up and unresponsive. And I have neither serial port nor firewire setup on this machine. I'll see if I can get automatic core dumps to work. > >> Can you reproduce this? Do you have any good ideas for a solution? > > I can reproduce a lot of LOR messages, but no panic. When I commented out the sysctl statements in lacp_attach, the LOR messages went away. So did the GPF, but instead I got a page fault. It seems that there are many bugs in the lagg creation/destruction code. I shall continue to investigate... > > -- > WBR, Andrey V. Elsukov