From owner-freebsd-arch@FreeBSD.ORG Wed Nov 19 03:43:20 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A694F50F for ; Wed, 19 Nov 2014 03:43:20 +0000 (UTC) Received: from mail-pd0-f169.google.com (mail-pd0-f169.google.com [209.85.192.169]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 71583133 for ; Wed, 19 Nov 2014 03:43:20 +0000 (UTC) Received: by mail-pd0-f169.google.com with SMTP id fp1so7343366pdb.14 for ; Tue, 18 Nov 2014 19:43:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=s34g/PTgG0UWcVtZTtinal9HrPwQcOcbdjMrcUPFeT8=; b=S6WKsSyDlrvqG+PnSjb7W+/whjA2/fpDLsYYs+M50PJaOg9wF+vV/PW4tANwvvGsKq Alw22tlfn6Kxnwpm6o7RxZkKWZ3nbo3jRepVM7W3SzGiU0jwa/B8XqKLuyjoB8ufzKh7 7YZ5ecqkOl6B1P/I6TQKi+jC8IqP7Jlt9vOoldQDuD+yWhDSKI8nMN8b+su+94bai6Ml 7M4bOvpeX1WSnHMVwTIO4YbiDVNhxlr56oUW5Cf+ioQ6o6AvqqX+o/CSkPscihJIun4A dhI9OIm9d7GcQXNWghm1JS22quraSMuhr8urb9a3lnBEdKHb3HkJQwsHga6Rgtt1FVk6 fBwg== X-Gm-Message-State: ALoCoQmAu3bvbQNuQ6KkJEh1OFZj7Hfm6QcgdEP2RmrqdGYSiFs7aC1M9ItwsGZeB8tr2tJaXMVq X-Received: by 10.70.126.134 with SMTP id my6mr42532476pdb.14.1416368594711; Tue, 18 Nov 2014 19:43:14 -0800 (PST) Received: from lgmac-zdu.corp.netflix.com ([69.53.236.236]) by mx.google.com with ESMTPSA id oe10sm280846pdb.66.2014.11.18.19.43.12 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 18 Nov 2014 19:43:13 -0800 (PST) Sender: Warner Losh Content-Type: multipart/signed; boundary="Apple-Mail=_D0699E9D-0B0A-4746-B966-7F61F1A5F7E0"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: suspending threads before devices From: Warner Losh In-Reply-To: <201411181721.56505.jhb@freebsd.org> Date: Tue, 18 Nov 2014 20:43:09 -0700 Message-Id: <87FFDA99-ADDC-4F56-A3E8-56CCAA544542@bsdimp.com> References: <201203202037.q2KKbNfK037014@svn.freebsd.org> <54676BA6.7000202@FreeBSD.org> <20141115180014.GK17068@kib.kiev.ua> <201411181721.56505.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1878.6) Cc: Konstantin Belousov , Andriy Gapon , Jung-uk Kim , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2014 03:43:20 -0000 --Apple-Mail=_D0699E9D-0B0A-4746-B966-7F61F1A5F7E0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Nov 18, 2014, at 3:21 PM, John Baldwin wrote: > On Saturday, November 15, 2014 1:00:15 pm Konstantin Belousov wrote: >> On Sat, Nov 15, 2014 at 05:05:10PM +0200, Andriy Gapon wrote: >>> On 15/11/2014 12:58, Konstantin Belousov wrote: >>>> On Fri, Nov 14, 2014 at 11:10:45PM +0200, Andriy Gapon wrote: >>>>> On 22/03/2012 16:14, Konstantin Belousov wrote: >>>>>> I already noted this to Jung-uk, I think that current suspend = handling >>>>>> is (somewhat) wrong. We shall not stop other CPUs for suspension = when >>>>>> they are executing some random kernel code. Rather, CPUs should = be safely >>>>>> stopped at the kernel->user boundary, or at sleep point, or at = designated >>>>>> suspend point like idle loop. >>>>>>=20 >>>>>> We already are engaged into somewhat doubtful actions like = restoring of %cr2, >>>>>> since we might, for instance, preemt page fault handler with = suspend IPI. >>>>>=20 >>>>> I recently revisited this issue in the context of some = suspend+resume problems >>>>> that I am having with radeonkms driver. What surprised me is that = the driver's >>>>> suspend code has no synchronization whatsoever with its other code = paths. So, I >>>>> looked first at the Linux code and then at the illumos code to see = how suspend >>>>> is implemented there. >>>>> As far as I can see, those kernels do exactly what you suggest = that we do. >>>>> Before suspending devices they first suspend all threads except = for one that >>>>> initiates the suspend. For userland threads a signal-like = mechanism is used to >>>>> put them in a state similar to SIGSTOP-ed one. With the kernel = threads >>>>> mechanisms are different between the kernels. Also, illumos = freezes kernel >>>>> threads after suspending the devices, not before. >>>>>=20 >>>>> I think that we could start with only the userland threads = initially. Do you >>>>> think the SIGSTOP-like approach would be hard to implement for us? >>>> We have most, if not all, parts of the stopping code >>>> already implemented. I mean the single-threading code, see >>>> thread_single(SINGLE_BOUNDARY). The code ensures that other threads = in >>>> the current process are stopped either at the kernel->user = boundary, or >>>> at the safe kernel sleep point. >>>>=20 >>>> This is not immediately applicable, since the caller is supposed to = be >>>> a thread in the suspended process, but modifications to allow = external >>>> process to do the same are really small comparing with the = complexity >>>> of the code. I suspect that all what is needed is change of >>>> while/if (remaining !=3D 1) >>>> to >>>> while/if ((p =3D=3D curproc && remaining !=3D 1) || >>>> (p !=3D curproc && remaining !=3D 0)) >>>> together with explicit passing of struct proc *p to thread_single. >>>=20 >>> Thank you for the pointer! >>> I think that maybe even more changes are required for that code to = be usable for >>> suspending. E.g. maybe a different p_flag bit should be used, = because I think >>> that we would like to avoid interaction between the process level = suspend and >>> the global suspend. I.e. the global suspend might encounter a = multi-threaded >>> process in a single thread mode and would need to suspend its = remaining thread. >>=20 >> Thread which is a p_singlethread, is not at the safe point; in other >> words, a process which is under the singlethreading, should prevent >> the system from entering sleep state. The singlethreading is the >> temporal state anyway, it is established during exec() or exit(), so >> it is fine to wait for in-process singlethreading to end before outer >> singlethreading is done. >>=20 >> Anyway, this requires real coding to experiment. I started looking = at >> it since I did somewhat related changes now. >=20 > I would certainly like a way to quiesce threads before entering the = real suspend > path. I would also like to cleanly unmount filesystems during suspend = as well and > the thread issue is a prerequisite for that. However, reusing "stop = at boundary" > may not be quite correct because you probably don't want to block = suspend because > you have an NFS request that is retrying due to a down NFS server. = For NFS I > think you want any threads asleep to just not get a chance to run = again until > after resume completes. I=92m almost certain you don=92t want to =93unmount=94 the filesystems. = This would invalidate all open file handles and would be mondo-bado, and would only succeed if = you forced this issue due to all the open references. Perhaps you=92re being = imprecise. I think you want to pause all the user land threads, sync the = filesystems, which should mark them as clean and allow for the battery to drain w/o too much = trouble. It would also mean that all the threads that start up again won=92t have to = reopen files, etc. Once you=92ve done this, you can proceed to kernel threads and suspending the = hardware. Filesystems implemented in userland, however, would be a problem. As = would logging to stable media once the suspend begins (since you=92d have already = suspended syslogd and even if you hand=92t, you=92d then be dirtying the very disks you = want to keep clean). There=92s currently hooks into devd that would need attention that are (were?) = used to put the video mode into a state that one can resume from. And then there=92s CardBus / PC Card which detach the devices since they = power them down. They could easily be changed to not detach the devices (but they have to = power them down to avoid interrupt storms), but then all the attachments would need to = ensure that their =91resume=92 routines did everything that attached did to initialize the = hardware. And the only way to know if you=92re suspended and resumed with hardware that=92s the same = type, but different that would need to be treated differently than hardware that=92s the same = type but actually the same. The device interface would likely need to grow a UUID function that = would be the hash of the network cards MAC or the disks=92 serial number (Swapping CF cards while = suspended today is completely safe since we force a detach, if that were to be fixed, it = could be come dangerous as the new disk is unlikely to be a bitwise copy of the old). This = functionality would need to live in the driver level, not the bus level, because PCI Config space doesn=92t= have the MAC, nor does it have enough knowledge to know about serial numbers. PC Card CIS = space doesn=92t typically have a serial number (a vanishingly small number of cards have = it, 99%+ or more don=92t). Oh, and it doesn=92t help that the disk isn=92t a direct child = of the PC Card bus, but a child of a child (possibly without any device_t in the case of CAM devices). Oh, = did I mention that PC Card and Cardbus hot plug require kernel threads to be active during suspend = / resume to work properly in some rare cases that time has deserted from my memory. And then there=92s USB, which implemented things differently and = possibly unsafely. I haven=92t looked into that in as much detail as I have PC Card and Cardbus. Warner --Apple-Mail=_D0699E9D-0B0A-4746-B966-7F61F1A5F7E0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUbBHNAAoJEGwc0Sh9sBEAFDMQAO0LoVHTr+4yzegb6yHZKZ1g L9hMLYJizHd+zGaOOBilTx7Dms3MTg2wPuO9csYtQoor3Y1G97slP/DRDajwvrtg Go4LYLilTPtZ4WIVt3UGxj+ntxAJx9syRqzcLu5sptIH55s8t03HE1HRRrmbhtmk bn9XPNVbfxll2uVVDDyho2rq8Go+G5j2N3x88g6uH5E1/mInYzH56L7nOSTdT6o4 TpAyFVXYjxm2PpljImKLu4HyP2cKFiJS7JlIOZu9jfLCojlIf+tp2+6EbgvHB6HF zW6NhCYzC0YKTZdytYAakuSdMFV5DmEdKV47krE24DCY1f027MOtgtyvwWQUsH0Y ScHLCRhnaz40JDcevDgT4o9KoLV1/ad9iwK/sZQXM68ekwLhIQKMZBULoocF1PLA huh3PUfjnrZ8SPLcx56M61quYpcxGRytK23dK2dWUIw2taVOJh/cK2X5gJzRwDID jjdNZQPdVqNUPix6RM9SSXmgIQQwXPwGgnexrfhro8dC2H4QS40s2UbBf2RUkuck +1r0BHDUI4BbaCCTiUa5eCZg1BX9Tdd/oDLj8pzGFj2gVBATP12Fkoi/RX3aWY6t mKxcUQQa7FzWdK0ZYnBBQHGiCuHF10x7t3vafRdmnpcjfrawsWPXcCaEC6KACNux hkL4oyW5MUapGkAolVzq =2vDO -----END PGP SIGNATURE----- --Apple-Mail=_D0699E9D-0B0A-4746-B966-7F61F1A5F7E0--