Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Jun 2016 11:55:55 +0200
From:      Julien Charbon <jch@freebsd.org>
To:        Gleb Smirnoff <glebius@FreeBSD.org>, rrs@FreeBSD.org
Cc:        hselasky@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org
Subject:   Re: panic with tcp timers
Message-ID:  <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org>
In-Reply-To: <20160620073917.GI1076@FreeBSD.org>
References:  <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--WRcuaKjPkOnTUAdRFNif3MWmTfBRkeXRe
Content-Type: multipart/mixed; boundary="oQsT0aQmt8QcIiFDCxI7qFMBav9RqDwaE"
From: Julien Charbon <jch@freebsd.org>
To: Gleb Smirnoff <glebius@FreeBSD.org>, rrs@FreeBSD.org
Cc: hselasky@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org
Message-ID: <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org>
Subject: Re: panic with tcp timers
References: <20160617045319.GE1076@FreeBSD.org>
 <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org>
 <20160620073917.GI1076@FreeBSD.org>
In-Reply-To: <20160620073917.GI1076@FreeBSD.org>

--oQsT0aQmt8QcIiFDCxI7qFMBav9RqDwaE
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


 Hi,

On 6/20/16 9:39 AM, Gleb Smirnoff wrote:
> On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote:
> J> > Comparing stable/10 and head, I see two changes that could
> J> > affect that:
> J> >=20
> J> > - callout_async_drain
> J> > - switch to READ lock for inp info in tcp timers
> J> >=20
> J> > That's why you are in To, Julien and Hans :)
> J> >=20
> J> > We continue investigating, and I will keep you updated.
> J> > However, any help is welcome. I can share cores.
>=20
> Now, spending some time with cores and adding a bunch of
> extra CTRs, I have a sequence of events that lead to the
> panic. In short, the bug is in the callout system. It seems
> to be not relevant to the callout_async_drain, at least for
> now. The transition to READ lock unmasked the problem, that's
> why NetflixBSD 10 doesn't panic.
>=20
> The panic requires heavy contention on the TCP info lock.
>=20
> [CPU 1] the callout fires, tcp_timer_keep entered
> [CPU 1] blocks on INP_INFO_RLOCK(&V_tcbinfo);
> [CPU 2] schedules the callout
> [CPU 2] tcp_discardcb called
> [CPU 2] callout successfully canceled
> [CPU 2] tcpcb freed
> [CPU 1] unblocks... panic
>=20
> When the lock was WLOCK, all contenders were resumed in a
> sequence they came to the lock. Now, that they are readers,
> once the lock is released, readers are resumed in a "random"
> order, and this allows tcp_discardcb to go before the old
> running callout, and this unmasks the panic.

 Highly interesting.  I should be able to reproduce that (will be useful
for testing the corresponding fix).

 Fix proposal:  If callout_async_drain() returns 0 (fail) (instead of 1
(success) here) when the callout cancellation is a success _but_ the
callout is current running, that should fix it.

 For the history:  It comes back to my old callout question:

 Does _callout_stop_safe() is allowed to return 1 (success) even if the
callout is still currently running;  a.k.a. it is not because you
successfully cancelled a callout that the callout is not currently runnin=
g.

 We did propose a patch to make _callout_stop_safe() returns 0 (fail)
when the callout is currently running:

callout_stop() should return 0 when the callout is currently being
serviced and indeed unstoppable
https://reviews.freebsd.org/differential/changeset/?ref=3D62513&whitespac=
e=3Dignore-most

 But this change impacted too many old code paths and was interesting
only for TCP timers and thus was abandoned.

 My 2 cents.

--
Julien


--oQsT0aQmt8QcIiFDCxI7qFMBav9RqDwaE--

--WRcuaKjPkOnTUAdRFNif3MWmTfBRkeXRe
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJXZ72vAAoJEKVlQ5Je6dhxZ3sH/2eFfPP334XgUPWLnMPJ1CeQ
gGAz8hshDh9Rmrt7tR+XoG0q8fRanTLP75cOODIYiU51bFYys+0NymLTrsDtjUbF
fqRp4cjRznhMEoTiUoCLCIfeIJaer3X5FQDyf1md2Mn+CbtiWswXGr0kH1mnCBwq
FBLPwCLF2MEZrXdZImhWCCF+i9KJYXL7gOsu/gCg/5x+JnOK5/Rq4SY6SXvqkBYB
p9NKU4E4brZYXatLG4EGaHM4nG16gtw6ZrXmJKfiYMm2en9otRwhbfHfC7xpJG2n
ONIMU32WJ095xcOFs+ywUkJ8DFWa0+01AoTy/+OHmIqacrJYMb2hy7mh7O7ylSs=
=W+6o
-----END PGP SIGNATURE-----

--WRcuaKjPkOnTUAdRFNif3MWmTfBRkeXRe--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1d18d0e2-3e42-cb26-928c-2989d0751884>