Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Jun 2016 18:48:27 +0200
From:      Julien Charbon <jch@freebsd.org>
To:        Gleb Smirnoff <glebius@FreeBSD.org>, Hans Petter Selasky <hps@selasky.org>
Cc:        rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org
Subject:   Re: panic with tcp timers
Message-ID:  <7294457d-07c0-2d6e-617b-15fa48bf2bb9@freebsd.org>
In-Reply-To: <20160620103015.GK1076@FreeBSD.org>
References:  <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org> <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> <20160620095822.GJ1076@FreeBSD.org> <b81aa248-4b97-3162-7636-da655163cdcc@selasky.org> <20160620103015.GK1076@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--1Ta12WsRFnONHaUgmQbLo6tFj62n4m1rc
Content-Type: multipart/mixed; boundary="Mvq9GL89TXjDlopGgmEC0W8JMOWLNG1QO"
From: Julien Charbon <jch@freebsd.org>
To: Gleb Smirnoff <glebius@FreeBSD.org>, Hans Petter Selasky <hps@selasky.org>
Cc: rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org
Message-ID: <7294457d-07c0-2d6e-617b-15fa48bf2bb9@freebsd.org>
Subject: Re: panic with tcp timers
References: <20160617045319.GE1076@FreeBSD.org>
 <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org>
 <20160620073917.GI1076@FreeBSD.org>
 <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org>
 <20160620095822.GJ1076@FreeBSD.org>
 <b81aa248-4b97-3162-7636-da655163cdcc@selasky.org>
 <20160620103015.GK1076@FreeBSD.org>
In-Reply-To: <20160620103015.GK1076@FreeBSD.org>

--Mvq9GL89TXjDlopGgmEC0W8JMOWLNG1QO
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


 Hi,

On 6/20/16 12:30 PM, Gleb Smirnoff wrote:
> On Mon, Jun 20, 2016 at 12:14:18PM +0200, Hans Petter Selasky wrote:
> H> On 06/20/16 11:58, Gleb Smirnoff wrote:
> H> > The fix I am working on now is doing exactly that. callout_reset m=
ust
> H> > return 0 if the callout is currently running.
> H> >
> H> > What are the old paths impacted?
> H>=20
> H> I'll dig into the matter aswell and give some comments. Thanks for t=
he=20
> H> analysis, Gleb.
> H>=20
> H> FYI: This class of problems wouldn't exist if the callout could be=20
> H> associated with a mutex!
>=20
> Exactly! I am convinced that all callouts should be locked, and non-loc=
ked
> one should simply go away, as well as async drain.
>=20
> What does prevent us from converting TCP timeouts to locked? To my
> understanding it is the lock order of taking pcbinfo after pcb lock.
>=20
> I'm now trying to understand Julien's conversion from pcbinfo lock
> to pcbinfo + pcblist locks. It seems to me that we actually can convert=

> TCP timers to locked callouts.
>=20
> What for do we need pcbinfo read lock in a tcp timer? The timer works
> only on the pcb and doesn't modify global lists, does it?

 tcp_timer_keep() for example can modify global pcb list, see the call
stack below:

tcp_timer_keep()
tcp_drop()
tcp_close()
sofree()
tcp_usr_detach() (via pr->pr_usrreqs->pru_detach() in sofree())
tcp_detach()
in_pcbfree()
in_pcbremlists()


 Anyway, a bit of history here:

 o In stable/10 the TCP locking order is:

ipi_lock (before) inpcb lock

 and in stable/10 ipi_lock is protecting the global pcb list.  Then,
only in the cases where you were absolutely sure you are _not_ going to
modify the global pcb list you are allowed to take the inpcb lock only.
For all the other cases, you have to take the write lock on ipi_lock
first due to lock order.

 And in context of short-lived TCP connection of the 5 received packets
for a TCP connection, 4 require the write ipi_lock lock, and that's does
not scale well.

 Lesson learned:  For scaling perspective, in lock ordering always put
the most global lock last.


 It was improved in a lean way in 11:

 o In 11 the TCP lock order became:

ipi_lock (before) inpcb lock (before) ipi_list

 And in 11 ipi_list protects global pcb list, and only the code actually
modifying the global list is protected by a global write lock, e.g.:

https://github.com/freebsd/freebsd/blob/master/sys/netinet/in_pcb.c#L1285=


 Then why keeping the ipi_lock lock at all now?

 It is solely for one case:  When you need the complete stability of the
full pcb list while traversing it.  These full pcb list traversals are
done in functions like:  in_pcbnotifyall(), in_pcbpurgeif0(),
inp_apply_all(), tcp_ccalgounload(), tcp_drain(), etc.

 Thus in 11 ipi_lock write lock is required only when you do this full
traversal with list stability requirement, and the ipi_lock read lock in
all other cases like tcp_input() that then scales better.

 Sadly in 11, you cannot use the inpcb lock as is for the TCP timers
callout, because like tcp_timer_keep() most TCP timers callout can
modify the global pcb list and then you need the read lock ipi_lock in
top of inpcb lock...


 o Future/12:

 The next natural step is to remove the ipi_lock from the picture to get:=


inpcb lock (before) ipi_list

 It /just/ requires a global pcb list that allows addition/deletion
while doing a full traversal concurrently.  A classical way to achieve
that, is to use a RCU-based list.  As soon as RCU (or anything
equivalent like lwref) are supported in kernel, we will implement this
change.

 Just history here, as I already did a presentation on this exact topic
at BSDCan 2015:
https://wiki.freebsd.org/201506DevSummit#line-83

 It was recorded but I never saw the footage/presentation actually
published. :)

--
Julien


--Mvq9GL89TXjDlopGgmEC0W8JMOWLNG1QO--

--1Ta12WsRFnONHaUgmQbLo6tFj62n4m1rc
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJXaB5jAAoJEKVlQ5Je6dhxuN8IAKeytbxsVb7bEK4iGl3HDY8X
RRD4rBqamc4cVTzzLn75LpOkIXQZhj6SzaZKO5AfTjZ5eUSuabYubM3wcpai9swx
UEoBCnjocNWjxJcElRwdN0CNgj0VtGPlNycWe+d8p3hRGqy14IdGEQ74+ru7ryC8
uGg/eJ0e73YmjVOZCthHHqyH1K+JLild16ZSH+1uFJt+KaPJjLBg5iQfXvM7t1HM
yP7HvACAaQXFrUdw/L51pffdxOAwG1FHieQMw2Lu8GuoCn7dJ4hlxumETJpcPJY9
Torn0bCwg0fdSLQiHJbGLE5VywXzcIdb0KAxLBO+k6leCKbnU9lEkxkp7uihB6U=
=xbUt
-----END PGP SIGNATURE-----

--1Ta12WsRFnONHaUgmQbLo6tFj62n4m1rc--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7294457d-07c0-2d6e-617b-15fa48bf2bb9>