Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Jan 2017 10:20:17 -0500
From:      "J.R. Oldroyd" <fbsd@opal.com>
To:        Adrian Chadd <adrian.chadd@gmail.com>
Cc:        "freebsd-wireless@freebsd.org" <freebsd-wireless@freebsd.org>
Subject:   Re: Boot freeze 11.0p3 during network initialization
Message-ID:  <20170126102017.26e9a3eb@shibato>
In-Reply-To: <CAJ-Vmo=DLnco7yJgM4_EsC7p2ya%2BQ3qffV1Tf_u%2BUPMvTaROTA@mail.gmail.com>
References:  <20161208095719.30f3c60e@shibato> <op.yr523ie8iew4ia@localhost> <20161208171926.7e182754@shibato> <20161220111808.5c277e21@shibato> <CAJ-Vmom_JWDa5C24YWoXbzhE866ndi6d5=pbWgxB7g_NXzqe1g@mail.gmail.com> <20161223143741.0cad961e@shibato> <CAJ-VmomDm-xMoPvTjsdXoYmH-2C6RyL-Zj%2BCxU78XpW3xt4ENA@mail.gmail.com> <20161227173012.1feb0c2f@shibato> <CAJ-Vmo=DLnco7yJgM4_EsC7p2ya%2BQ3qffV1Tf_u%2BUPMvTaROTA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/JHMjNF2HLUdgcpHdULXA/3d
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Sorry for the time gap, I had to deal with family matters.

OK, I patched if_lagg.c to drop and re-acquire the lock around
the call to init the underlying driver.  I've been running this
for some weeks now and haven't seen the boot-hang since.  Hopefully
I have tested long enough.

Someone more familiar with this driver and use of this lock there
should review this patch and comment.

	-jr


Index: sys/net/if_lagg.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/net/if_lagg.c	(revision 307319)
+++ sys/net/if_lagg.c	(working copy)
@@ -995,6 +995,21 @@
 		LAGG_RUNLOCK(sc, &tracker);
 		break;
=20
+	case SIOCADDMULTI:
+	case SIOCDELMULTI:
+		/*
+		 * Drivers like if_re.c cause a LOR on WLOCK, so we must
+		 * drop and re-aquire the lock around the call.
+		 */
+		if (lp->lp_ioctl =3D=3D NULL) {
+			error =3D EINVAL;
+			break;
+		}
+		LAGG_WUNLOCK(sc);
+		error =3D (*lp->lp_ioctl)(ifp, cmd, data);
+		LAGG_WLOCK(sc);
+		break;
+
 	case SIOCSIFCAP:
 		if (lp->lp_ioctl =3D=3D NULL) {
 			error =3D EINVAL;


On Wed, 28 Dec 2016 00:24:09 -0800 Adrian Chadd <adrian.chadd@gmail.com> wr=
ote:
>
> hi,
>=20
> yes, the LOR is why the boot hang occurs :(
>=20
>=20
>=20
> -a
>=20
>=20
> On 27 December 2016 at 14:30, J.R. Oldroyd <fbsd@opal.com> wrote:
> > Sorry, Adrian, I'm missing the back-story here and I'm not that
> > familiar with the lagg code.
> >
> > Are you saying that this LOR is likely relevant to this boot hang,
> > or are you saying that this is a known problem that's not relevant?
> >
> > Jan Kokem=C3=BCller posted some lagg patches.  I don't know if they are
> > likely applicable to this problem, but I could try those.
> >
> > https://reviews.freebsd.org/D6845
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211689#c4
> >
> > The first removes an RLOCK, but not the one referenced in the LOR
> > report.  The second is a patch for the ath/iwm panic.  If you're
> > unfamiliar with them, I will study up on this code and patches
> > to get up to speed on it.
> >
> >         -jr
> >
> >
> > On Fri, 23 Dec 2016 11:41:33 -0800 Adrian Chadd <adrian.chadd@gmail.com=
> wrote: =20
> >>
> >> Right, that's the known lock order issue with lagg. :(
> >>
> >>
> >> -adrian
> >>
> >>
> >> On 23 December 2016 at 11:37, J.R. Oldroyd <fbsd@opal.com> wrote: =20
> >> > On Fri, 23 Dec 2016 10:17:34 -0800 Adrian Chadd <adrian.chadd@gmail.=
com> wrote: =20
> >> >>
> >> >> On 20 December 2016 at 08:18, J.R. Oldroyd <fbsd@opal.com> wrote: =
=20
> >> >> > On Thu, 8 Dec 2016 17:19:26 -0500 "J.R. Oldroyd" <fbsd@opal.com> =
wrote: =20
> >> >> >>
> >> >> >> On Thu, 08 Dec 2016 21:29:32 +0200 "Andriy Voskoboinyk" <s3erios=
@gmail.com> wrote: =20
> >> >> >> >
> >> >> >> > Thu, 08 Dec 2016 16:57:19 +0200 =D0=B1=D1=83=D0=BB=D0=BE =D0=
=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BD=D0=BE J.R. Oldroyd <fbsd@opal.com>:
> >> >> >> >
> >> >> >> > Is there any additional output with
> >> >> >> > wlandebug_wlan0=3D"scan+state+auth+assoc"
> >> >> >> > in /etc/rc.conf ?
> >> >> >> > =20
> >> >> >>
> >> >> >> I have put that in and rebooted several times, all times OK.
> >> >> >> I will report back again in due course when it next hangs.
> >> >> >>
> >> >> >>       -jr
> >> >> >> =20
> >> >> >
> >> >> > The boot hang occurred again today.  I noted the point of the han=
g and
> >> >> > rebooted; the log from the good boot with annotation of the previ=
ous hang
> >> >> > point is here [1].
> >> >> >
> >> >> >         -jr
> >> >> >
> >> >> > [1] http://opal.com/jr/freebsd/20161220-fbsd11.3-boot_hang_wlan_d=
ebug.txt
> >> >> > _______________________________________________
> >> >> > freebsd-wireless@freebsd.org mailing list
> >> >> > https://lists.freebsd.org/mailman/listinfo/freebsd-wireless
> >> >> > To unsubscribe, send any mail to "freebsd-wireless-unsubscribe@fr=
eebsd.org" =20
> >> >>
> >> >>
> >> >> can you compile with witness and invariants? I'd like to see if its
> >> >> locking related.
> >> >>
> >> >> thanks
> >> >>
> >> >>
> >> >> -adrian
> >> >>
> >> >> =20
> >> >
> >> > Hmm, maybe:
> >> >
> >> > Dec 23 14:30:34 shibato kernel: wlan0: ieee80211_swscan_add_scan: ch=
an  11g min dwell met (2146895553 > 2146895553)
> >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_mindwell: called
> >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: loop start=
; scandone=3D0
> >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: chan  11g =
->   7g [active, dwell min 20ms max 200ms]
> >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan: calling; maxdwe=
ll=3D200
> >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: waiting
> >> > Dec 23 14:30:34 shibato kernel: re0: link state changed to UP
> >> > Dec 23 14:30:34 shibato kernel: lagg0: link state changed to UP
> >> > Dec 23 14:30:34 shibato kernel: lock order reversal:
> >> > Dec 23 14:30:34 shibato kernel: 1st 0xfffff800095d2208 if_lagg rmloc=
k (if_lagg rmlock) @ /usr/src/sys/modules/if_lagg/../../net/if_lagg.c:1530
> >> > Dec 23 14:30:34 shibato kernel: 2nd 0xfffffe0000e10218 re0 (network =
driver) @ dev/re/if_re.c:3433
> >> > Dec 23 14:30:34 shibato kernel: stack backtrace:
> >> > Dec 23 14:30:34 shibato kernel: #0 0xffffffff80a98b60 at witness_deb=
ugger+0x70
> >> > Dec 23 14:30:34 shibato kernel: #1 0xffffffff80a98a54 at witness_che=
ckorder+0xe54
> >> > Dec 23 14:30:34 shibato kernel: #2 0xffffffff80a1c794 at __mtx_lock_=
flags+0xa4
> >> > Dec 23 14:30:34 shibato kernel: #3 0xffffffff8078c279 at re_ioctl+0x=
3a9
> >> > Dec 23 14:30:34 shibato kernel: #4 0xffffffff8222428e at lagg_port_i=
octl+0xde
> >> > Dec 23 14:30:34 shibato kernel: #5 0xffffffff80b20bbf at if_addmulti=
+0x39f
> >> > Dec 23 14:30:34 shibato kernel: #6 0xffffffff82224708 at lagg_ether_=
cmdmulti+0x158
> >> > Dec 23 14:30:34 shibato kernel: #7 0xffffffff822219dd at lagg_ioctl+=
0xdd
> >> > Dec 23 14:30:34 shibato kernel: #8 0xffffffff80b20bbf at if_addmulti=
+0x39f
> >> > Dec 23 14:30:34 shibato kernel: #9 0xffffffff80c35a97 at in6_mc_join=
_locked+0x1d7
> >> > Dec 23 14:30:34 shibato kernel: #10 0xffffffff80c35715 at in6_joingr=
oup+0x75
> >> > Dec 23 14:30:34 shibato kernel: #11 0xffffffff80c2f9e9 at in6_update=
_ifa+0x1339
> >> > Dec 23 14:30:34 shibato kernel: #12 0xffffffff80c33eb3 at in6_ifatta=
ch+0x413
> >> > Dec 23 14:30:34 shibato kernel: #13 0xffffffff80b1fd84 at ifioctl+0x=
fe4
> >> > Dec 23 14:30:34 shibato kernel: #14 0xffffffff80a9d946 at kern_ioctl=
+0x246
> >> > Dec 23 14:30:34 shibato kernel: #15 0xffffffff80a9d691 at sys_ioctl+=
0x171
> >> > Dec 23 14:30:34 shibato kernel: #16 0xffffffff80e9d40b at amd64_sysc=
all+0x2db
> >> > Dec 23 14:30:34 shibato kernel: #17 0xffffffff80e7d8ab at Xfast_sysc=
all+0xfb
> >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: loop start=
; scandone=3D0
> >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: chan   7g =
->  36a [active, dwell min 20ms max 200ms]
> >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan: calling; maxdwe=
ll=3D200
> >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: waiting
> >> >
> >> > This boot then continued normally, no hang.
> >> >
> >> >         -jr =20
> > =20


--Sig_/JHMjNF2HLUdgcpHdULXA/3d
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAliKE7EACgkQls33urr0k4mW5gCdEMiwnbcF+cszL3i4Y8E/Lcrq
kXAAn3rG2U4frXQLn8hrFIsdfW+BDVV4
=04vD
-----END PGP SIGNATURE-----

--Sig_/JHMjNF2HLUdgcpHdULXA/3d--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170126102017.26e9a3eb>