Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Jun 2016 20:45:24 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Matthew Macy <mmacy@nextbsd.org>
Cc:        Peter Holm <peter@holm.cc>, Eric Badger <eric@badgerio.us>, freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: Kqueue races causing crashes
Message-ID:  <20160615174524.GF38613@kib.kiev.ua>
In-Reply-To: <1555525b518.c9c704c026886.2375886287356557279@nextbsd.org>
References:  <34035bf6-8b3c-d15c-765b-94bcc919ea2e@badgerio.us> <20160615081143.GS38613@kib.kiev.ua> <20160615115000.GA23198@x2.osted.lan> <1555525b518.c9c704c026886.2375886287356557279@nextbsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 15, 2016 at 10:39:42AM -0700, Matthew Macy wrote:
>=20
>        =20
>=20
>        =20
>             You can use dwarf4 if you use GDB from ports
How would it help ?

Problem for kgdb is that %rip is zero, due to function pointer being set
to NULL in a destroyed knlist.  Either version of kgdb would not find
neither code nor unwind annotations for zero address.

But the issue is understood and we are working on the version of fix.


=9A---- On Wed, 15 Jun 2016 04:50:00 -0700  Peter Holm<peter@holm.cc> wrote=
 ----On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote: >=
 On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote: > > I believe=
 they all have more or less the same cause. The crashes occur  > > because =
we acquire a knlist lock via the KN_LIST_LOCK macro, but when we  > > call =
KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has  > > bee=
n cleared by another thread. Thus we are unable to unlock the  > > previous=
ly acquired lock and hold it until something causes us to crash  > > (such =
as the witness code noticing that we???re returning to userland with  > > t=
he lock still held). > ... > > I believe there???s also a small window wher=
e the KN_LIST_LOCK macro  > > checks kn->kn_knlist and finds it to be non-N=
ULL, but by the time it  > > actually dereferences it, it has become NULL. =
This would produce the  > > ???page fault while in kernel mode??? crash. > =
>  > > If someone familiar with this code sees an obvious fix, I???ll be ha=
ppy to  > > test it. Otherwise, I???d appreciate any advice on fixing this.=
 My first  > > thought is that a ???struct knote??? ought to have its own m=
utex for  > > controlling access to the flag fields and ideally the ???kn_k=
nlist??? field.  > > I.e., you would first acquire a knote???s lock and the=
n the knlist lock,  > > thus ensuring that no one could clear the kn_knlist=
 variable while you  > > hold the knlist lock. The knlist lock, however, us=
ually comes from  > > whichever event producing entity the knote tracks, so=
 getting lock  > > ordering right between the per-knote mutex and this othe=
r lock seems  > > potentially hard. (Sometimes we call into functions in ke=
rn_event.c with  > > the knlist lock already held, having been acquired in =
code outside of  > > kern_event.c. Consider, for example, calling KNOTE_LOC=
KED from  > > kern_exit.c; the PROC_LOCK macro has already been used to acq=
uire the  > > process lock, also serving=20
>        =20
>        =20
>=20
>    =20
>    =20
>=20



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160615174524.GF38613>