Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Oct 2007 13:02:59 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Jeff Roberson <jroberson@chesapeake.net>
Cc:        arch@freebsd.org
Subject:   Re: Abolishing sleeps in issignal()
Message-ID:  <20071009100259.GW2180@deviant.kiev.zoral.com.ua>
In-Reply-To: <20071008142928.Y912@10.0.0.1>
References:  <20071008142928.Y912@10.0.0.1>

next in thread | previous in thread | raw e-mail | index | archive | help

--ZljC5FVPx7rxDQQ8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Oct 08, 2007 at 02:40:00PM -0700, Jeff Roberson wrote:
> During the work on thread lock I observed that there is a significant=20
> amount of locking involved in our signal paths right now.  And these lock=
s=20
> also show up contended in many workloads.  Furthermore, requiring a DEF=
=20
> mutex complicates sleep queues by forcing them to drop the spinlock to=20
> check for signals and then check for races.
>=20
> The current issignal() code will actually msleep in the case of a=20
> stopevent() requested by the debugger.  This is fine for signals that=20
> would normally abort the sleep anyway, but SIGSTOP actually leaves the=20
> thread on the sleep queue and tries to resume the sleep after the stop ha=
s=20
> cleared.  So SIGSTOP combined with a stopevent() actually breaks because=
=20
> the stopevent() removes the thread from the sleep queue.  I'm not certain=
=20
> what the failure mode is currently, but I'm certain that it's wrong.
>=20
> What I'd like to do is stop sleeping in issignal() all together.  For=20
> regular restartable syscalls this would mean failing back out to ast()=20
> where we'd then handle the signals including SIGSTOP.  After SIGCONT we'd=
=20
> then restart the syscall.  For non-restartable syscalls we could have a=
=20
> special issignal variant that is called when msleep/cv_timedwait_sig=20
> return interrupted that would check for SIGSTOP/debugger events and sleep=
=20
> within a loop retrying the operation.  This would preserve the behavior o=
f=20
> debugging events and SIGSTOP not aborting non-restartable syscalls as the=
y=20
> do now.
>=20
> Once we have moved the location of the sleeps it will be possible to chec=
k=20
> for signals using a spinlock without dropping the sleep queue lock in=20
> sleepq_catch_signals().
>=20
Another failure mode we have in our code is the NFS intr mounts. NFS client
often sleeps interruptible while holding vnode lock(s). Allowing the
SIGSTOP to stop such process inside corresponding msleep() call causes
vnode lock cascade.

> What I'd like from readers on arch@ is for you to consider if there are=
=20
> other cases than non-restartable syscalls that will break if=20
> msleep/sleepqs return EINTR from SIGSTOP and debug events.  Also, is ther=
e=20
> an authoritative list of non-restartable syscalls anywhere?  It's just=20
> those involving timevals right?  nanosleep/poll/select/kqueue.. others?
>=20
> I intend to do this work for 8.0 and hopefully very early on so we have=
=20
> plenty of time to shake out bugs as this signal code tends to be very=20
> delicate.
>=20
> Thanks,
> Jeff

--ZljC5FVPx7rxDQQ8
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHC1HTC3+MBN1Mb4gRAnWzAJ9aYq8pUFaAiJQ2gxEYJPA4U7S0RQCeMl2r
nkfEvZbOUZYUdEMOZsPRTOQ=
=068I
-----END PGP SIGNATURE-----

--ZljC5FVPx7rxDQQ8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071009100259.GW2180>