From owner-freebsd-arch@freebsd.org Thu Dec 15 11:41:15 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6117DC81CBC for ; Thu, 15 Dec 2016 11:41:15 +0000 (UTC) (envelope-from prvs=150a29c11=roger.pau@citrix.com) Received: from SMTP.EU.CITRIX.COM (smtp.ctxuk.citrix.com [185.25.65.24]) (using TLSv1.2 with cipher RC4-SHA (128/128 bits)) (Client CN "mail.citrix.com", Issuer "DigiCert SHA2 Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B8A01A90; Thu, 15 Dec 2016 11:41:14 +0000 (UTC) (envelope-from prvs=150a29c11=roger.pau@citrix.com) X-IronPort-AV: E=Sophos;i="5.33,351,1477958400"; d="scan'208";a="36869369" Date: Thu, 15 Dec 2016 11:40:33 +0000 From: Roger Pau =?iso-8859-1?Q?Monn=E9?= To: Subject: Order of device suspend/resume Message-ID: <20161215114033.r33nt3fqhnfi7hqw@dhcp-3-221.uk.xensource.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: NeoMutt/20161126 (1.7.1) X-ClientProxiedBy: AMSPEX02CAS01.citrite.net (10.69.22.112) To AMSPEX02CL02.citrite.net (10.69.22.126) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Dec 2016 11:41:15 -0000 Hello, I'm currently dealing with a bug in the Xen suspend/resume sequence, and I've found that lacking a way to order device priority during suspend/resume is proving quite harmful for Xen (and maybe other systems too). The current suspend/resume code simply scans the root bus, and suspends/resumes every device based on the order they are attached to their parents. The problem here is that there's no way to tell that some devices should be resumed before others, for example the event timers/time counters/uarts should definitely be resume before other devices, but that's seems to happens mostly out of chance. Currently most time related devices are attached directly to the nexus, which means they will get resumed first, but for example the uart is currently attached to the pci bus IIRC, which means it gets resumed quite late. On Xen systems, this is even worse. The Xen PV bus (that contains all Xen-related devices) is attached the last one (because it tends to pick up unused memory regions for it's own usage) and this bus also contains the PV timecounter which should be resumed _before_ other devices, or else timecounting will be completely screwed and things can get stuck in indefinitely long loops (due to the fact that the timecounter is implemented based on the uptime of the host, and that changes from host-to-host). In order to solve this I could add a hack to the Xen resume process (which is already different from the ACPI one), but this looks gross. I could also attach the Xen PV timer to the nexus directly (as it was done before), but I also prefer to keep all Xen-related devices in the same bus for coherency. Last option would be to add some kind of suspend/resume priorities to the devices, and do more than one suspend/resume pass. This is more complex and requires more changes, so I would like to know if it would be helpful for other systems, or if someone has already attempted to do it. Thanks, Roger. From owner-freebsd-arch@freebsd.org Thu Dec 15 21:50:58 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 19B27C8230D for ; Thu, 15 Dec 2016 21:50:58 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mail.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EDAA435F for ; Thu, 15 Dec 2016 21:50:57 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by mail.baldwin.cx (Postfix) with ESMTPSA id C83A610AA27; Thu, 15 Dec 2016 16:50:55 -0500 (EST) From: John Baldwin To: Roger Pau =?ISO-8859-1?Q?Monn=E9?= Cc: freebsd-arch@freebsd.org Subject: Re: Order of device suspend/resume Date: Thu, 15 Dec 2016 13:38:11 -0800 Message-ID: <7469755.xT5lfhErkd@ralph.baldwin.cx> User-Agent: KMail/4.14.10 (FreeBSD/11.0-PRERELEASE; KDE/4.14.10; amd64; ; ) In-Reply-To: <20161215114033.r33nt3fqhnfi7hqw@dhcp-3-221.uk.xensource.com> References: <20161215114033.r33nt3fqhnfi7hqw@dhcp-3-221.uk.xensource.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.baldwin.cx); Thu, 15 Dec 2016 16:50:55 -0500 (EST) X-Virus-Scanned: clamav-milter 0.99.2 at mail.baldwin.cx X-Virus-Status: Clean X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Dec 2016 21:50:58 -0000 On Thursday, December 15, 2016 11:40:33 AM Roger Pau Monn=E9 wrote: > Hello, >=20 > I'm currently dealing with a bug in the Xen suspend/resume sequence, = and I've=20 > found that lacking a way to order device priority during suspend/resu= me is=20 > proving quite harmful for Xen (and maybe other systems too). The curr= ent=20 > suspend/resume code simply scans the root bus, and suspends/resumes e= very device=20 > based on the order they are attached to their parents. The problem he= re is that=20 > there's no way to tell that some devices should be resumed before oth= ers, for=20 > example the event timers/time counters/uarts should definitely be res= ume before=20 > other devices, but that's seems to happens mostly out of chance. >=20 > Currently most time related devices are attached directly to the nexu= s, which=20 > means they will get resumed first, but for example the uart is curren= tly=20 > attached to the pci bus IIRC, which means it gets resumed quite late.= On Xen=20 > systems, this is even worse. The Xen PV bus (that contains all Xen-re= lated=20 > devices) is attached the last one (because it tends to pick up unused= memory=20 > regions for it's own usage) and this bus also contains the PV timecou= nter which=20 > should be resumed _before_ other devices, or else timecounting will b= e=20 > completely screwed and things can get stuck in indefinitely long loop= s (due to=20 > the fact that the timecounter is implemented based on the uptime of t= he host,=20 > and that changes from host-to-host). >=20 > In order to solve this I could add a hack to the Xen resume process (= which is=20 > already different from the ACPI one), but this looks gross. I could a= lso attach=20 > the Xen PV timer to the nexus directly (as it was done before), but I= also=20 > prefer to keep all Xen-related devices in the same bus for coherency.= Last=20 > option would be to add some kind of suspend/resume priorities to the = devices,=20 > and do more than one suspend/resume pass. This is more complex and re= quires more=20 > changes, so I would like to know if it would be helpful for other sys= tems, or if=20 > someone has already attempted to do it. I think Justin Hibbits had some patches to make use of the boot-time ne= w-bus passes for suspend and resume which I think would help with this. You = suspend things in the reverse order of boot and resume operates in the same ord= er as boot. --=20 John Baldwin From owner-freebsd-arch@freebsd.org Fri Dec 16 04:34:54 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8F3B1C767C8 for ; Fri, 16 Dec 2016 04:34:54 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: from mail-it0-x242.google.com (mail-it0-x242.google.com [IPv6:2607:f8b0:4001:c0b::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 65C8A1011 for ; Fri, 16 Dec 2016 04:34:54 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: by mail-it0-x242.google.com with SMTP id c20so1301993itb.0 for ; Thu, 15 Dec 2016 20:34:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=cc:message-id:from:to:in-reply-to:content-transfer-encoding :mime-version:subject:date:references; bh=IHcz/Q/aic0P6czdPho1I4oh+RcVapQlQdWLcnoq9zk=; b=vMykM9O/FFrxXs2HWrHfjpT6UEMlSQtLp4ceIjmfBKr2n70dKsQj3VzttJCoojMW1G bHZoxAFn0jUydb0vAxPBczXIr34gr5ZW/59sXP+jh6Hq+4J7wlRBNcbMD6PQX4Fp/8Mr X7CwQ+mw48DPfIFeWgHYnXMFUEGMa+XTt2bWCpglD4JyC9R9+3KdsimWyHgfkxf2tBZU 98Q8rxnnWYJmCEnLLqBbvOA9I+CIMj0N1W1eK5pNva/SWnTeIl6Yvn9QF684JMgwveeW R2eUwTE/l3Rab94FyghWvskLsQsz6rgQCuKCji2RXYqwMqYmC8Be21kpx4fKIFPdGJPv cotg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:cc:message-id:from:to:in-reply-to :content-transfer-encoding:mime-version:subject:date:references; bh=IHcz/Q/aic0P6czdPho1I4oh+RcVapQlQdWLcnoq9zk=; b=mxzWVZbFreeG/iSJ1nuETUHo0oJk3SBoKnxuZZAcxnaI5jgbW+tEHK2jLeH/wFfZ0r VpShQMHFRi2T99T+ZlTW6a2ZBg5e3ruVHGVAtXpFfgVg7EuB1lh5bnYTDpITs1gpYchd 5+HnLTgL1oxbeR/vZrLz7Vtq7BMNx5cQAfyjxRZcOHzDs8VgGX/wETpfhM26uku4cqtN SqakIHqGCuIJrwQSGAPCCCMGDG6umLzXXxzTJxnF6UjcKDGrAjl8Fd9R4ceQV9sSFQXY /AmViYRpaEPwyjw2i8iJfcBs/lgTBp45CeTnR2ivzxNSOE0iBvL1h4xz3vFvh+/MJZn2 8fcQ== X-Gm-Message-State: AKaTC01QFuaRui8rAg9QIB/oT8i1klC9oSFUU9fwp+b2oLc69McQfI/TRZ3XCQSK2kOfzg== X-Received: by 10.36.68.146 with SMTP id o140mr1092800ita.33.1481862893574; Thu, 15 Dec 2016 20:34:53 -0800 (PST) Received: from blackstar.knownspace (50-80-150-234.client.mchsi.com. [50.80.150.234]) by smtp.gmail.com with ESMTPSA id u12sm763022ita.1.2016.12.15.20.34.52 (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 15 Dec 2016 20:34:52 -0800 (PST) Cc: FreeBSD Arch Message-Id: From: Justin Hibbits To: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= In-Reply-To: <7469755.xT5lfhErkd@ralph.baldwin.cx> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v936) Subject: Re: Order of device suspend/resume Date: Thu, 15 Dec 2016 22:34:51 -0600 References: <20161215114033.r33nt3fqhnfi7hqw@dhcp-3-221.uk.xensource.com> <7469755.xT5lfhErkd@ralph.baldwin.cx> X-Mailer: Apple Mail (2.936) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Dec 2016 04:34:54 -0000 On Dec 15, 2016, at 3:38 PM, John Baldwin wrote: > On Thursday, December 15, 2016 11:40:33 AM Roger Pau Monn=E9 wrote: >> Hello, >> >> I'm currently dealing with a bug in the Xen suspend/resume =20 >> sequence, and I've >> found that lacking a way to order device priority during suspend/=20 >> resume is >> proving quite harmful for Xen (and maybe other systems too). The =20 >> current >> suspend/resume code simply scans the root bus, and suspends/resumes =20= >> every device >> based on the order they are attached to their parents. The problem =20= >> here is that >> there's no way to tell that some devices should be resumed before =20 >> others, for >> example the event timers/time counters/uarts should definitely be =20 >> resume before >> other devices, but that's seems to happens mostly out of chance. >> >> Currently most time related devices are attached directly to the =20 >> nexus, which >> means they will get resumed first, but for example the uart is =20 >> currently >> attached to the pci bus IIRC, which means it gets resumed quite =20 >> late. On Xen >> systems, this is even worse. The Xen PV bus (that contains all Xen-=20= >> related >> devices) is attached the last one (because it tends to pick up =20 >> unused memory >> regions for it's own usage) and this bus also contains the PV =20 >> timecounter which >> should be resumed _before_ other devices, or else timecounting will =20= >> be >> completely screwed and things can get stuck in indefinitely long =20 >> loops (due to >> the fact that the timecounter is implemented based on the uptime of =20= >> the host, >> and that changes from host-to-host). >> >> In order to solve this I could add a hack to the Xen resume process =20= >> (which is >> already different from the ACPI one), but this looks gross. I could =20= >> also attach >> the Xen PV timer to the nexus directly (as it was done before), but =20= >> I also >> prefer to keep all Xen-related devices in the same bus for =20 >> coherency. Last >> option would be to add some kind of suspend/resume priorities to =20 >> the devices, >> and do more than one suspend/resume pass. This is more complex and =20= >> requires more >> changes, so I would like to know if it would be helpful for other =20 >> systems, or if >> someone has already attempted to do it. > > I think Justin Hibbits had some patches to make use of the boot-time =20= > new-bus > passes for suspend and resume which I think would help with this. =20 > You suspend > things in the reverse order of boot and resume operates in the same =20= > order as > boot. > > --=20 > John Baldwin John is right. I have a (somewhat abandoned due to time and focus) =20 branch, https://svnweb.freebsd.org/base/projects/pmac_pmu/ which has =20 the necessary code working mostly on PowerPC. The diff can be found =20 at https://reviews.freebsd.org/D203 too. - Justin From owner-freebsd-arch@freebsd.org Fri Dec 16 05:25:07 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E1E21C80965 for ; Fri, 16 Dec 2016 05:25:07 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-io0-x234.google.com (mail-io0-x234.google.com [IPv6:2607:f8b0:4001:c06::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B33CAE58 for ; Fri, 16 Dec 2016 05:25:07 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-io0-x234.google.com with SMTP id p42so89158048ioo.1 for ; Thu, 15 Dec 2016 21:25:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=AxIreycXh+suxG3cP8Cy+ivz1G83Gj9U63TnPZ2wbSY=; b=vLkMar2TEum5Yh4QHK3W71Q6CpMi7kHLUZCLVe9oWGp9DQqe+uK3Gz+oQel3JguAEa D7GE3kRp8PBJZuyIVhEr2euemJLvFCm5zJO2OZg9Z9ybVzRJyc+ARzaWuOFDuhMR31N8 ELfaE7ZvgE1IsYHJl4vPTS6N0gvo0ViUtLZA8/64OdwhhL1JBXYivz4fttyHSIJ2pMl/ o4Ef6L7uGwvXV8pAGWiYSdguw22VHMdOegSc0STW802frSjWGkLTijs9kKwV5vECbrMr XYat6MRg4ifcy/wRuuE8AEHj1jbEIkzQJfHK/n12wjeLxj8hmJ3q1hV5bFbCthorBRxU lukA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-transfer-encoding; bh=AxIreycXh+suxG3cP8Cy+ivz1G83Gj9U63TnPZ2wbSY=; b=f8sxsgky/83sRGuTAIVYi8jCsnAlRo8QhI78jLXUUFN08fOCIAqW88XlKma6sYqSNb yVtfRZko4FPWIx52M7HF30EVBu99OCPEgRqM6ezcWh4Eu5Sr5IujsbFI+g5iHw9LoaTy EpVhDw1PQqJMLSayPow9gLlUNTJrkanJhHkLmoqirOV06/mCbMdw6+Rcn9/cTYMuquuc h2NYXImdnPFSDMG2AvBx3u2itBG8PkyFrt+ojHokOrcJTJfkA63gWWx5pDcRDhn6zDiY NmgcTugrTfhKdDvh+xu/7sNR0WzmKStDOhNo9qyWKoWJvKvIuPZFupkaZA3nOqnFAELK Urdg== X-Gm-Message-State: AIkVDXLCDmhINoLIjcNeRAXTUqVehallJrpq81zkrE7GxwTODp18/5HL3K7DOH5fSK++fUGFaTsPZ4K7bw32tw== X-Received: by 10.107.132.74 with SMTP id g71mr1183275iod.19.1481865905903; Thu, 15 Dec 2016 21:25:05 -0800 (PST) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.22.135 with HTTP; Thu, 15 Dec 2016 21:25:05 -0800 (PST) X-Originating-IP: [69.53.245.200] In-Reply-To: References: <20161215114033.r33nt3fqhnfi7hqw@dhcp-3-221.uk.xensource.com> <7469755.xT5lfhErkd@ralph.baldwin.cx> From: Warner Losh Date: Thu, 15 Dec 2016 21:25:05 -0800 X-Google-Sender-Auth: Ad4JEYgOFgBNQCNzJ56OTOqkOSs Message-ID: Subject: Re: Order of device suspend/resume To: Justin Hibbits Cc: =?UTF-8?Q?Roger_Pau_Monn=C3=A9?= , FreeBSD Arch Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Dec 2016 05:25:08 -0000 On Thu, Dec 15, 2016 at 8:34 PM, Justin Hibbits wrot= e: > > On Dec 15, 2016, at 3:38 PM, John Baldwin wrote: > >> On Thursday, December 15, 2016 11:40:33 AM Roger Pau Monn=C3=A9 wrote: >>> >>> Hello, >>> >>> I'm currently dealing with a bug in the Xen suspend/resume sequence, an= d >>> I've >>> found that lacking a way to order device priority during suspend/resume >>> is >>> proving quite harmful for Xen (and maybe other systems too). The curren= t >>> suspend/resume code simply scans the root bus, and suspends/resumes eve= ry >>> device >>> based on the order they are attached to their parents. The problem here >>> is that >>> there's no way to tell that some devices should be resumed before other= s, >>> for >>> example the event timers/time counters/uarts should definitely be resum= e >>> before >>> other devices, but that's seems to happens mostly out of chance. >>> >>> Currently most time related devices are attached directly to the nexus, >>> which >>> means they will get resumed first, but for example the uart is currentl= y >>> attached to the pci bus IIRC, which means it gets resumed quite late. O= n >>> Xen >>> systems, this is even worse. The Xen PV bus (that contains all >>> Xen-related >>> devices) is attached the last one (because it tends to pick up unused >>> memory >>> regions for it's own usage) and this bus also contains the PV timecount= er >>> which >>> should be resumed _before_ other devices, or else timecounting will be >>> completely screwed and things can get stuck in indefinitely long loops >>> (due to >>> the fact that the timecounter is implemented based on the uptime of the >>> host, >>> and that changes from host-to-host). >>> >>> In order to solve this I could add a hack to the Xen resume process >>> (which is >>> already different from the ACPI one), but this looks gross. I could als= o >>> attach >>> the Xen PV timer to the nexus directly (as it was done before), but I >>> also >>> prefer to keep all Xen-related devices in the same bus for coherency. >>> Last >>> option would be to add some kind of suspend/resume priorities to the >>> devices, >>> and do more than one suspend/resume pass. This is more complex and >>> requires more >>> changes, so I would like to know if it would be helpful for other >>> systems, or if >>> someone has already attempted to do it. >> >> >> I think Justin Hibbits had some patches to make use of the boot-time >> new-bus >> passes for suspend and resume which I think would help with this. You >> suspend >> things in the reverse order of boot and resume operates in the same orde= r >> as >> boot. >> >> -- >> John Baldwin > > > John is right. I have a (somewhat abandoned due to time and focus) branc= h, > https://svnweb.freebsd.org/base/projects/pmac_pmu/ which has the necessar= y > code working mostly on PowerPC. The diff can be found at > https://reviews.freebsd.org/D203 too. Cool. Does it have a mechanism similar to the attach code that lets you run again at each pass? Warner