Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Nov 2014 20:43:09 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, Andriy Gapon <avg@freebsd.org>, Jung-uk Kim <jkim@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: suspending threads before devices
Message-ID:  <87FFDA99-ADDC-4F56-A3E8-56CCAA544542@bsdimp.com>
In-Reply-To: <201411181721.56505.jhb@freebsd.org>
References:  <201203202037.q2KKbNfK037014@svn.freebsd.org> <54676BA6.7000202@FreeBSD.org> <20141115180014.GK17068@kib.kiev.ua> <201411181721.56505.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_D0699E9D-0B0A-4746-B966-7F61F1A5F7E0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252


On Nov 18, 2014, at 3:21 PM, John Baldwin <jhb@FreeBSD.org> wrote:

> On Saturday, November 15, 2014 1:00:15 pm Konstantin Belousov wrote:
>> On Sat, Nov 15, 2014 at 05:05:10PM +0200, Andriy Gapon wrote:
>>> On 15/11/2014 12:58, Konstantin Belousov wrote:
>>>> On Fri, Nov 14, 2014 at 11:10:45PM +0200, Andriy Gapon wrote:
>>>>> On 22/03/2012 16:14, Konstantin Belousov wrote:
>>>>>> I already noted this to Jung-uk, I think that current suspend =
handling
>>>>>> is (somewhat) wrong. We shall not stop other CPUs for suspension =
when
>>>>>> they are executing some random kernel code. Rather, CPUs should =
be safely
>>>>>> stopped at the kernel->user boundary, or at sleep point, or at =
designated
>>>>>> suspend point like idle loop.
>>>>>>=20
>>>>>> We already are engaged into somewhat doubtful actions like =
restoring of %cr2,
>>>>>> since we might, for instance, preemt page fault handler with =
suspend IPI.
>>>>>=20
>>>>> I recently revisited this issue in the context of some =
suspend+resume problems
>>>>> that I am having with radeonkms driver.  What surprised me is that =
the driver's
>>>>> suspend code has no synchronization whatsoever with its other code =
paths.  So, I
>>>>> looked first at the Linux code and then at the illumos code to see =
how suspend
>>>>> is implemented there.
>>>>> As far as I can see, those kernels do exactly what you suggest =
that we do.
>>>>> Before suspending devices they first suspend all threads except =
for one that
>>>>> initiates the suspend.  For userland threads a signal-like =
mechanism is used to
>>>>> put them in a state similar to SIGSTOP-ed one.  With the kernel =
threads
>>>>> mechanisms are different between the kernels.  Also, illumos =
freezes kernel
>>>>> threads after suspending the devices, not before.
>>>>>=20
>>>>> I think that we could start with only the userland threads =
initially.  Do you
>>>>> think the SIGSTOP-like approach would be hard to implement for us?
>>>> We have most, if not all, parts of the stopping code
>>>> already implemented. I mean the single-threading code, see
>>>> thread_single(SINGLE_BOUNDARY). The code ensures that other threads =
in
>>>> the current process are stopped either at the kernel->user =
boundary, or
>>>> at the safe kernel sleep point.
>>>>=20
>>>> This is not immediately applicable, since the caller is supposed to =
be
>>>> a thread in the suspended process, but modifications to allow =
external
>>>> process to do the same are really small comparing with the =
complexity
>>>> of the code.  I suspect that all what is needed is change of
>>>> 	while/if (remaining !=3D 1)
>>>> to
>>>> 	while/if ((p =3D=3D curproc && remaining !=3D 1) ||
>>>> 	    (p !=3D curproc && remaining !=3D 0))
>>>> together with explicit passing of struct proc *p to thread_single.
>>>=20
>>> Thank you for the pointer!
>>> I think that maybe even more changes are required for that code to =
be usable for
>>> suspending.  E.g. maybe a different p_flag bit should be used, =
because I think
>>> that we would like to avoid interaction between the process level =
suspend and
>>> the global suspend.  I.e. the global suspend might encounter a =
multi-threaded
>>> process in a single thread mode and would need to suspend its =
remaining thread.
>>=20
>> Thread which is a p_singlethread, is not at the safe point; in other
>> words, a process which is under the singlethreading, should prevent
>> the system from entering sleep state. The singlethreading is the
>> temporal state anyway, it is established during exec() or exit(), so
>> it is fine to wait for in-process singlethreading to end before outer
>> singlethreading is done.
>>=20
>> Anyway, this requires real coding to experiment.  I started looking =
at
>> it since I did somewhat related changes now.
>=20
> I would certainly like a way to quiesce threads before entering the =
real suspend
> path.  I would also like to cleanly unmount filesystems during suspend =
as well and
> the thread issue is a prerequisite for that.  However, reusing "stop =
at boundary"
> may not be quite correct because you probably don't want to block =
suspend because
> you have an NFS request that is retrying due to a down NFS server.  =
For NFS I
> think you want any threads asleep to just not get a chance to run =
again until
> after resume completes.

I=92m almost certain you don=92t want to =93unmount=94 the filesystems. =
This would invalidate
all open file handles and would be mondo-bado, and would only succeed if =
you forced
this issue due to all the open references. Perhaps you=92re being =
imprecise.

I think you want to pause all the user land threads, sync the =
filesystems, which should
mark them as clean and allow for the battery to drain w/o too much =
trouble. It would
also mean that all the threads that start up again won=92t have to =
reopen files, etc. Once
you=92ve done this, you can proceed to kernel threads and suspending the =
hardware.

Filesystems implemented in userland, however, would be a problem. As =
would logging
to stable media once the suspend begins (since you=92d have already =
suspended syslogd
and even if you hand=92t, you=92d then be dirtying the very disks you =
want to keep clean). There=92s
currently hooks into devd that would need attention that are (were?) =
used to put the video
mode into a state that one can resume from.

And then there=92s CardBus / PC Card which detach the devices since they =
power them down.
They could easily be changed to not detach the devices (but they have to =
power them down
to avoid interrupt storms), but then all the attachments would need to =
ensure that their
=91resume=92 routines did everything that attached did to initialize the =
hardware. And the only way to
know if you=92re suspended and resumed with hardware that=92s the same =
type, but different that
would need to be treated differently than hardware that=92s the same =
type but actually the same.
The device interface would likely need to grow a UUID function that =
would be the hash of the
network cards MAC or the disks=92 serial number (Swapping CF cards while =
suspended today is
completely safe since we force a detach, if that were to be fixed, it =
could be come dangerous
as the new disk is unlikely to be a bitwise copy of the old). This =
functionality would need to live
in the driver level, not the bus level, because PCI Config space doesn=92t=
 have the MAC, nor
does it have enough knowledge to know about serial numbers. PC Card CIS =
space doesn=92t
typically have a serial number (a vanishingly small number of cards have =
it, 99%+ or more
don=92t). Oh, and it doesn=92t help that the disk isn=92t a direct child =
of the PC Card bus, but a child of
a child (possibly without any device_t in the case of CAM devices). Oh, =
did I mention that PC Card
and Cardbus hot plug require kernel threads to be active during suspend =
/ resume to work properly
in some rare cases that time has deserted from my memory.

And then there=92s USB, which implemented things differently and =
possibly unsafely. I haven=92t looked
into that in as much detail as I have PC Card and Cardbus.

Warner

--Apple-Mail=_D0699E9D-0B0A-4746-B966-7F61F1A5F7E0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJUbBHNAAoJEGwc0Sh9sBEAFDMQAO0LoVHTr+4yzegb6yHZKZ1g
L9hMLYJizHd+zGaOOBilTx7Dms3MTg2wPuO9csYtQoor3Y1G97slP/DRDajwvrtg
Go4LYLilTPtZ4WIVt3UGxj+ntxAJx9syRqzcLu5sptIH55s8t03HE1HRRrmbhtmk
bn9XPNVbfxll2uVVDDyho2rq8Go+G5j2N3x88g6uH5E1/mInYzH56L7nOSTdT6o4
TpAyFVXYjxm2PpljImKLu4HyP2cKFiJS7JlIOZu9jfLCojlIf+tp2+6EbgvHB6HF
zW6NhCYzC0YKTZdytYAakuSdMFV5DmEdKV47krE24DCY1f027MOtgtyvwWQUsH0Y
ScHLCRhnaz40JDcevDgT4o9KoLV1/ad9iwK/sZQXM68ekwLhIQKMZBULoocF1PLA
huh3PUfjnrZ8SPLcx56M61quYpcxGRytK23dK2dWUIw2taVOJh/cK2X5gJzRwDID
jjdNZQPdVqNUPix6RM9SSXmgIQQwXPwGgnexrfhro8dC2H4QS40s2UbBf2RUkuck
+1r0BHDUI4BbaCCTiUa5eCZg1BX9Tdd/oDLj8pzGFj2gVBATP12Fkoi/RX3aWY6t
mKxcUQQa7FzWdK0ZYnBBQHGiCuHF10x7t3vafRdmnpcjfrawsWPXcCaEC6KACNux
hkL4oyW5MUapGkAolVzq
=2vDO
-----END PGP SIGNATURE-----

--Apple-Mail=_D0699E9D-0B0A-4746-B966-7F61F1A5F7E0--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87FFDA99-ADDC-4F56-A3E8-56CCAA544542>