Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Oct 2011 16:49:43 +0800
From:      dave jones <s.dave.jones@gmail.com>
To:        Mikolaj Golub <trociny@freebsd.org>
Cc:        Adrian Chadd <adrian@freebsd.org>, "K. Macy" <kmacy@freebsd.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, bz@freebsd.org, Robert Watson <rwatson@freebsd.org>, Arnaud Lacombe <lacombar@gmail.com>
Subject:   Re: Kernel panic on FreeBSD 9.0-beta2
Message-ID:  <CANf5e8at2fxU8V=jie-g6ifeGpH1gsaJRqEoQjGs=iniW-Sqgg@mail.gmail.com>
In-Reply-To: <86fwiy91ty.fsf@in138.ua3>
References:  <CANf5e8aG4go4M_vsRExUsJB_sjaN5x-QK-TCDAhSH64JSo0mdQ@mail.gmail.com> <CACqU3MXStMMEoppvDtZS6hV4WGttbdJiF8E-ORwJ%2BQSmnTy-Yg@mail.gmail.com> <CACqU3MV-t4Va6VWUoXy1Y9FYnNJTUw1X%2BE7ik-2%2BtMVuVOV3RA@mail.gmail.com> <CAJ-Vmom-177OkdUXjz%2BZLqbaqn=p%2BuTGypiVuMqdeXgdOgb4hQ@mail.gmail.com> <CAHM0Q_Mmn3z1V6AtZHQMpgbdY7oQqOChiNt=8NJrZQDnravb7A@mail.gmail.com> <CACqU3MU9ZZtOsdBOa%2BF3SqUaYgO%2BEo0v1ACjY0S4rY4fRQyv5Q@mail.gmail.com> <CAHM0Q_PZD9_0ZkELZ5XL8Ebh8eD-uFuSjXWKKVpGDeM_JDaqMA@mail.gmail.com> <8662kcigif.fsf@kopusha.home.net> <alpine.BSF.2.00.1109301432570.65269@fledge.watson.org> <CANf5e8ab=mUw-AJuRXZy1T6%2BZcryxjKfuCOsakPPfqatuA3HdA@mail.gmail.com> <86y5x0ooik.fsf@in138.ua3> <CANf5e8YtQ5P2euF7E-D6Wt7U38UuLc8KVU-NCehq74XV_WTvBg@mail.gmail.com> <CANf5e8bLcxYDe%2BmHssUndOqh2B0j-V28Ox2dZCfy6%2Bo7aURw=w@mail.gmail.com> <86fwiy91ty.fsf@in138.ua3>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/10/12 Mikolaj Golub wrote:
>
> On Wed, 12 Oct 2011 09:53:34 +0800 dave jones wrote:
>
> =A0dj> On Fri, Oct 7, 2011 at 9:12 AM, dave jones =A0wrote:
> =A0>> 2011/10/4 Mikolaj Golub :
> =A0>>>
> =A0>>> On Sat, 1 Oct 2011 14:15:45 +0800 dave jones wrote:
> =A0>>>
> =A0>>> =A0dj> On Fri, Sep 30, 2011 at 9:41 PM, Robert Watson wrote:
> =A0>>> =A0>>
> =A0>>> =A0>> On Wed, 28 Sep 2011, Mikolaj Golub wrote:
> =A0>>> =A0>>
> =A0>>> =A0>>> On Mon, 26 Sep 2011 16:12:55 +0200 K. Macy wrote:
> =A0>>> =A0>>>
> =A0>>> =A0>>> KM> Sorry, didn't look at the images (limited bw), I've see=
n something KM>
> =A0>>> =A0>>> like this before in timewait. This "can't happen" with UDP =
so will be KM>
> =A0>>> =A0>>> interested in learning more about the bug.
> =A0>>> =A0>>>
> =A0>>> =A0>>> The panic can be easily triggered by this:
> =A0>>> =A0>>
> =A0>>> =A0>> Hi:
> =A0>>> =A0>>
> =A0>>> =A0>> Just catching up on this thread. =A0I think the analysis her=
e is generally
> =A0>>> =A0>> right: in 9.0, you're much more likely to see an inpcb with =
its in_socket
> =A0>>> =A0>> pointer cleared in the hash list than in prior releases, and
> =A0>>> =A0>> in_pcbbind_setup() trips over this.
> =A0>>> =A0>>
> =A0>>> =A0>> However, at least on first glance (and from the perspective =
of invariants
> =A0>>> =A0>> here), I think the bug is actualy that in_pcbbind_setup() is=
 asking
> =A0>>> =A0>> in_pcblookup_local() for an inpcb and then access the return=
ed inpcb's
> =A0>>> =A0>> in_socket pointer without acquiring a lock on the inpcb. =A0=
Structurally, it
> =A0>>> =A0>> can't acquire this lock for lock order reasons -- it already=
 holds the lock
> =A0>>> =A0>> on its own inpcb. =A0Therefore, we should only access fields=
 that are safe to
> =A0>>> =A0>> follow in an inpcb when you hold a reference via the hash lo=
ck and not a
> =A0>>> =A0>> lock on the inpcb itself, which appears generally OK (+/-) f=
or all the
> =A0>>> =A0>> fields in that clause but the t->inp_socket->so_options dere=
ference.
> =A0>>> =A0>>
> =A0>>> =A0>> A preferred fix would cache the SO_REUSEPORT flag in an inpc=
b-layer field,
> =A0>>> =A0>> such as inp_flags2, giving us access to its value without ha=
ving to walk
> =A0>>> =A0>> into the attached (or not) socket.
> =A0>>> =A0>>
> =A0>>> =A0>> This raises another structural question, which is whether we=
 need a new
> =A0>>> =A0>> inp_foo flags field that is protected explicitly by the hash=
 lock, and not
> =A0>>> =A0>> by the inpcb lock, which could hold fields relevant to addre=
ss binding. =A0I
> =A0>>> =A0>> don't think we need to solve that problem in this context, a=
s a slightly
> =A0>>> =A0>> race on SO_REUSEPORT is likely acceptable.
> =A0>>> =A0>>
> =A0>>> =A0>> The suggested fix does perform the desired function of expli=
citly detaching
> =A0>>> =A0>> the inpcb from the hash list before the socket is disconnect=
ed from the
> =A0>>> =A0>> inpcb. However, it's incomplete in that the invariant that's=
 being broken is
> =A0>>> =A0>> also relied on for other protocols (such as raw sockets). =
=A0The correct
> =A0>>> =A0>> invariant is that inp_socket is safe to follow unconditional=
ly if an inpcb
> =A0>>> =A0>> is locked and INP_DROPPED isn't set -- the bug is in "locked=
" not in
> =A0>>> =A0>> "INP_DROPPED", which is why I think this is the wrong fix, e=
ven though it
> =A0>>> =A0>> prevents a panic :-).
> =A0>>>
> =A0>>> =A0dj> Hello Robert,
> =A0>>>
> =A0>>> =A0dj> Thank you for taking your valuable time to find out the pro=
blem.
> =A0>>> =A0dj> Since I don't have idea about network internals, would you =
have a patch
> =A0>>> =A0dj> about this? I'd be glad to test it, thanks again.
> =A0>>>
> =A0>>> Here is the patch that implements what Robert suggests.
> =A0>>>
> =A0>>> Dave, could you test it?
> =A0>>
> =A0>> Sure. Thanks for cooking the patch.
> =A0>> Machines have been running two days now without panic.
>
> Thank you for testing it.
>
> =A0dj> Is there any plan to commit your fix? Thank you.
> =A0dj> I'd upgrade to 9.0-release from beta-2 once it's released.
>
> I have an upgraded version of the patch, which is under review now. I hav=
e
> been waiting for the response before asking you to test it, but it would =
be
> great if you try it not waiting :-).
>
> As it was pointed by Robert the previous version introduced a regression:
> SO_REUSEPORT was ignored if setsockopt was called after bind (the old cac=
hed
> value was still used). So the updated version fixes this and also contain=
s
> several other fixes, the most important among them is that it fixes the p=
anic
> for IPv6 bind case too.

Thanks for cooking the patch. Machines have been running up days
without any panic.

> --
> Mikolaj Golub

Regards,
Dave.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANf5e8at2fxU8V=jie-g6ifeGpH1gsaJRqEoQjGs=iniW-Sqgg>