From owner-freebsd-arch@FreeBSD.ORG Mon Oct 8 21:37:25 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02BF516A41A for ; Mon, 8 Oct 2007 21:37:25 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id C5A7B13C448 for ; Mon, 8 Oct 2007 21:37:24 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.103] (c-67-160-44-208.hsd1.wa.comcast.net [67.160.44.208]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l98LbMJq006480 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO) for ; Mon, 8 Oct 2007 17:37:23 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Mon, 8 Oct 2007 14:40:00 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: arch@freebsd.org Message-ID: <20071008142928.Y912@10.0.0.1> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Oct 2007 21:37:25 -0000 During the work on thread lock I observed that there is a significant amount of locking involved in our signal paths right now. And these locks also show up contended in many workloads. Furthermore, requiring a DEF mutex complicates sleep queues by forcing them to drop the spinlock to check for signals and then check for races. The current issignal() code will actually msleep in the case of a stopevent() requested by the debugger. This is fine for signals that would normally abort the sleep anyway, but SIGSTOP actually leaves the thread on the sleep queue and tries to resume the sleep after the stop has cleared. So SIGSTOP combined with a stopevent() actually breaks because the stopevent() removes the thread from the sleep queue. I'm not certain what the failure mode is currently, but I'm certain that it's wrong. What I'd like to do is stop sleeping in issignal() all together. For regular restartable syscalls this would mean failing back out to ast() where we'd then handle the signals including SIGSTOP. After SIGCONT we'd then restart the syscall. For non-restartable syscalls we could have a special issignal variant that is called when msleep/cv_timedwait_sig return interrupted that would check for SIGSTOP/debugger events and sleep within a loop retrying the operation. This would preserve the behavior of debugging events and SIGSTOP not aborting non-restartable syscalls as they do now. Once we have moved the location of the sleeps it will be possible to check for signals using a spinlock without dropping the sleep queue lock in sleepq_catch_signals(). What I'd like from readers on arch@ is for you to consider if there are other cases than non-restartable syscalls that will break if msleep/sleepqs return EINTR from SIGSTOP and debug events. Also, is there an authoritative list of non-restartable syscalls anywhere? It's just those involving timevals right? nanosleep/poll/select/kqueue.. others? I intend to do this work for 8.0 and hopefully very early on so we have plenty of time to shake out bugs as this signal code tends to be very delicate. Thanks, Jeff From owner-freebsd-arch@FreeBSD.ORG Mon Oct 8 21:50:12 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13F2B16A417 for ; Mon, 8 Oct 2007 21:50:12 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 0974613C474 for ; Mon, 8 Oct 2007 21:50:11 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id DE44B1A4D83; Mon, 8 Oct 2007 14:50:11 -0700 (PDT) Date: Mon, 8 Oct 2007 14:50:11 -0700 From: Alfred Perlstein To: Jeff Roberson Message-ID: <20071008215011.GI31826@elvis.mu.org> References: <20071008142928.Y912@10.0.0.1> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071008142928.Y912@10.0.0.1> User-Agent: Mutt/1.4.2.3i Cc: arch@freebsd.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Oct 2007 21:50:12 -0000 * Jeff Roberson [071008 14:39] wrote: > > What I'd like from readers on arch@ is for you to consider if there are > other cases than non-restartable syscalls that will break if > msleep/sleepqs return EINTR from SIGSTOP and debug events. Also, is there > an authoritative list of non-restartable syscalls anywhere? It's just > those involving timevals right? nanosleep/poll/select/kqueue.. others? > > I intend to do this work for 8.0 and hopefully very early on so we have > plenty of time to shake out bugs as this signal code tends to be very > delicate. > Is there precident for this work from other OSes, Linux, Solaris that shows moving to this model works? -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Mon Oct 8 23:04:05 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7DB2516A418; Mon, 8 Oct 2007 23:04:05 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 4A1B513C481; Mon, 8 Oct 2007 23:04:05 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.103] (c-67-160-44-208.hsd1.wa.comcast.net [67.160.44.208]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l98N42ww026437 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Mon, 8 Oct 2007 19:04:03 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Mon, 8 Oct 2007 16:06:40 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: Alfred Perlstein In-Reply-To: <20071008215011.GI31826@elvis.mu.org> Message-ID: <20071008160422.K912@10.0.0.1> References: <20071008142928.Y912@10.0.0.1> <20071008215011.GI31826@elvis.mu.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Oct 2007 23:04:05 -0000 On Mon, 8 Oct 2007, Alfred Perlstein wrote: > * Jeff Roberson [071008 14:39] wrote: >> >> What I'd like from readers on arch@ is for you to consider if there are >> other cases than non-restartable syscalls that will break if >> msleep/sleepqs return EINTR from SIGSTOP and debug events. Also, is there >> an authoritative list of non-restartable syscalls anywhere? It's just >> those involving timevals right? nanosleep/poll/select/kqueue.. others? >> >> I intend to do this work for 8.0 and hopefully very early on so we have >> plenty of time to shake out bugs as this signal code tends to be very >> delicate. >> > > Is there precident for this work from other OSes, Linux, Solaris > that shows moving to this model works? I forgot to mention that. These two both use this model. Linux sets up a complicated syscall restart state so that the normal syscal restart mechanism can be used. I didn't fully understand what solaris does but they don't sleep in issignal. They do it later as well. Jeff. > > -- > - Alfred Perlstein > From owner-freebsd-arch@FreeBSD.ORG Tue Oct 9 01:00:09 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F76316A417 for ; Tue, 9 Oct 2007 01:00:09 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.geekcn.org (tarsier.geekcn.org [210.51.165.229]) by mx1.freebsd.org (Postfix) with ESMTP id E06F713C448 for ; Tue, 9 Oct 2007 01:00:08 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from localhost (tarsier.geekcn.org [210.51.165.229]) by tarsier.geekcn.org (Postfix) with ESMTP id DB993EBB713; Tue, 9 Oct 2007 08:43:25 +0800 (CST) X-Virus-Scanned: amavisd-new at geekcn.org Received: from tarsier.geekcn.org ([210.51.165.229]) by localhost (mail.geekcn.org [210.51.165.229]) (amavisd-new, port 10024) with ESMTP id yOOsB73pIGlt; Tue, 9 Oct 2007 08:43:15 +0800 (CST) Received: from LI-Xins-MacBook.local (71.5.7.139.ptr.us.xo.net [71.5.7.139]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tarsier.geekcn.org (Postfix) with ESMTP id B9D3AEBB709; Tue, 9 Oct 2007 08:43:14 +0800 (CST) DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns; h=message-id:date:from:reply-to:organization:user-agent: mime-version:to:subject:x-enigmail-version:openpgp:content-type; b=O8fZ9KPE0TcZ6WE5qYPAg+wfewihEJkiN3Rz6BUR4VZ1lgpiOWAV3a+GgVXmbWjsR 7u6EruplVc583xQxbf+Xg== Message-ID: <470ACEA1.3030309@delphij.net> Date: Mon, 08 Oct 2007 17:43:13 -0700 From: LI Xin Organization: The FreeBSD Project User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: freebsd-arch@freebsd.org X-Enigmail-Version: 0.95.3 OpenPGP: url=http://www.delphij.net/delphij.asc Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig8151AA50F19832207723B335" Subject: Why we optimize by time by default for < -O2 case? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: d@delphij.net List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 01:00:09 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig8151AA50F19832207723B335 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi, I wonder why we want to optimize by time by default and not by space by default for the system compiler, does the reasoning still hold true? The commit log said: Modified files: contrib/gcc toplev.c Log: Clarify revision 1.14: Gcc 3.1's -O0 and -O1 actually optimized alignment for space, but we fe= el it should optimize alignment for time like Gcc 2.95 used to. Optimizat= ion for space should give 1-byte alignment on i386's, but doesn't quite. Revision Changes Path 1.15 +0 -0 src/contrib/gcc/toplev.c Cheers, --=20 Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! --------------enig8151AA50F19832207723B335 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHCs6hOfuToMruuMARClTXAJ94HdQgZ3q7KUKRlbNhzDoeW99cBgCfQ1rn ItCdcunWUXLhPdSq9abGeBo= =8lGL -----END PGP SIGNATURE----- --------------enig8151AA50F19832207723B335-- From owner-freebsd-arch@FreeBSD.ORG Tue Oct 9 01:28:05 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 143ED16A41B for ; Tue, 9 Oct 2007 01:28:05 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id B459F13C48D for ; Tue, 9 Oct 2007 01:28:04 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.1/8.14.1/NETPLEX) with ESMTP id l9919Bh2004239; Mon, 8 Oct 2007 21:09:11 -0400 (EDT) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-3.0 (mail.netplex.net [204.213.176.10]); Mon, 08 Oct 2007 21:09:12 -0400 (EDT) Date: Mon, 8 Oct 2007 21:09:11 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Jeff Roberson In-Reply-To: <20071008142928.Y912@10.0.0.1> Message-ID: References: <20071008142928.Y912@10.0.0.1> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 01:28:05 -0000 On Mon, 8 Oct 2007, Jeff Roberson wrote: > During the work on thread lock I observed that there is a significant amount > of locking involved in our signal paths right now. And these locks also show > up contended in many workloads. Furthermore, requiring a DEF mutex > complicates sleep queues by forcing them to drop the spinlock to check for > signals and then check for races. > > The current issignal() code will actually msleep in the case of a stopevent() > requested by the debugger. This is fine for signals that would normally > abort the sleep anyway, but SIGSTOP actually leaves the thread on the sleep > queue and tries to resume the sleep after the stop has cleared. So SIGSTOP > combined with a stopevent() actually breaks because the stopevent() removes > the thread from the sleep queue. I'm not certain what the failure mode is > currently, but I'm certain that it's wrong. > > What I'd like to do is stop sleeping in issignal() all together. For regular > restartable syscalls this would mean failing back out to ast() where we'd > then handle the signals including SIGSTOP. After SIGCONT we'd then restart > the syscall. For non-restartable syscalls we could have a special issignal > variant that is called when msleep/cv_timedwait_sig return interrupted that > would check for SIGSTOP/debugger events and sleep within a loop retrying the > operation. This would preserve the behavior of debugging events and SIGSTOP > not aborting non-restartable syscalls as they do now. > > Once we have moved the location of the sleeps it will be possible to check > for signals using a spinlock without dropping the sleep queue lock in > sleepq_catch_signals(). > > What I'd like from readers on arch@ is for you to consider if there are other > cases than non-restartable syscalls that will break if msleep/sleepqs return > EINTR from SIGSTOP and debug events. Also, is there an authoritative list of > non-restartable syscalls anywhere? It's just those involving timevals right? > nanosleep/poll/select/kqueue.. others? I would consult the POSIX spec; it may be of some value to you. Generally, any syscall that can block should be able to restart the syscall without the application handling EINTR. On the flip side, any syscall that can be guaranteed to complete in a short time should not return EINTR, but should delay signal delivery until after syscall completion. Some functions are guaranteed by POSIX to not return EINTR (e.g., getpid & getuid). See: http://www.opengroup.org/onlinepubs/000095399/xrat/xsh_chap02.html#tag_03_02_04 I don't know if that link will work... -- DE From owner-freebsd-arch@FreeBSD.ORG Tue Oct 9 02:28:36 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D153C16A420 for ; Tue, 9 Oct 2007 02:28:36 +0000 (UTC) (envelope-from karels@redrock.karels.net) Received: from redrock.karels.net (redrock.karels.net [206.196.45.2]) by mx1.freebsd.org (Postfix) with ESMTP id 8357C13C447 for ; Tue, 9 Oct 2007 02:28:36 +0000 (UTC) (envelope-from karels@redrock.karels.net) Received: from redrock.karels.net (localhost.karels.net [127.0.0.1]) by redrock.karels.net (8.13.8/8.13.6) with ESMTP id l9923HTT011918; Mon, 8 Oct 2007 21:03:17 -0500 (CDT) (envelope-from karels@redrock.karels.net) Message-Id: <200710090203.l9923HTT011918@redrock.karels.net> To: Jeff Roberson From: Mike Karels In-reply-to: Your message of Mon, 08 Oct 2007 14:40:00 -0700. <20071008142928.Y912@10.0.0.1> Date: Mon, 08 Oct 2007 21:03:17 -0500 Sender: karels@karels.net Cc: arch@freebsd.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: karels@karels.net List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 02:28:36 -0000 > What I'd like from readers on arch@ is for you to consider if there are > other cases than non-restartable syscalls that will break if > msleep/sleepqs return EINTR from SIGSTOP and debug events. Also, is there > an authoritative list of non-restartable syscalls anywhere? It's just > those involving timevals right? nanosleep/poll/select/kqueue.. others? Don't forget about siginterrupt, which can make specified syscalls interrupt read/read etc. Mike From owner-freebsd-arch@FreeBSD.ORG Tue Oct 9 02:34:14 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2A0BD16A41A for ; Tue, 9 Oct 2007 02:34:14 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au [211.29.133.51]) by mx1.freebsd.org (Postfix) with ESMTP id A5E6F13C47E for ; Tue, 9 Oct 2007 02:34:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l992XurJ012893 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 9 Oct 2007 12:33:59 +1000 Date: Tue, 9 Oct 2007 12:33:56 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: d@delphij.net In-Reply-To: <470ACEA1.3030309@delphij.net> Message-ID: <20071009122751.X54949@besplex.bde.org> References: <470ACEA1.3030309@delphij.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-arch@freebsd.org Subject: Re: Why we optimize by time by default for < -O2 case? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 02:34:14 -0000 On Mon, 8 Oct 2007, LI Xin wrote: > I wonder why we want to optimize by time by default and not by space by > default for the system compiler, does the reasoning still hold true? -O means optimize for time if possible. > The commit log said: > > Modified files: > contrib/gcc toplev.c > Log: > Clarify revision 1.14: > Gcc 3.1's -O0 and -O1 actually optimized alignment for space, but we feel > it should optimize alignment for time like Gcc 2.95 used to. Optimization > for space should give 1-byte alignment on i386's, but doesn't quite. > > Revision Changes Path > 1.15 +0 -0 src/contrib/gcc/toplev.c Without this change, -O pessimizes for time. Bruce From owner-freebsd-arch@FreeBSD.ORG Tue Oct 9 02:53:58 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3D21916A419 for ; Tue, 9 Oct 2007 02:53:58 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.geekcn.org (tarsier.geekcn.org [210.51.165.229]) by mx1.freebsd.org (Postfix) with ESMTP id 21A7213C467 for ; Tue, 9 Oct 2007 02:53:56 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from localhost (tarsier.geekcn.org [210.51.165.229]) by tarsier.geekcn.org (Postfix) with ESMTP id EFCA7EB90FA; Tue, 9 Oct 2007 10:53:55 +0800 (CST) X-Virus-Scanned: amavisd-new at geekcn.org Received: from tarsier.geekcn.org ([210.51.165.229]) by localhost (mail.geekcn.org [210.51.165.229]) (amavisd-new, port 10024) with ESMTP id r+zxMessgE+D; Tue, 9 Oct 2007 10:53:44 +0800 (CST) Received: from LI-Xins-MacBook.local (c-67-161-39-180.hsd1.ca.comcast.net [67.161.39.180]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tarsier.geekcn.org (Postfix) with ESMTP id 8D681EB569B; Tue, 9 Oct 2007 10:53:43 +0800 (CST) DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns; h=message-id:date:from:reply-to:organization:user-agent: mime-version:to:cc:subject:references:in-reply-to: x-enigmail-version:openpgp:content-type; b=Z3/AL8IrkP3nrOaLg44LA5XLztQrldEbd5MKSZOYuhzxdFFRuw5xdUCBjnUle3pmR cMESdYVzwg5zBybS7wRoQ== Message-ID: <470AED31.60201@delphij.net> Date: Mon, 08 Oct 2007 19:53:37 -0700 From: LI Xin Organization: The FreeBSD Project User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Bruce Evans References: <470ACEA1.3030309@delphij.net> <20071009122751.X54949@besplex.bde.org> In-Reply-To: <20071009122751.X54949@besplex.bde.org> X-Enigmail-Version: 0.95.3 OpenPGP: url=http://www.delphij.net/delphij.asc Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig21800BF69FE11B6A0A24B616" Cc: d@delphij.net, freebsd-arch@freebsd.org Subject: Re: Why we optimize by time by default for < -O2 case? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: d@delphij.net List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 02:53:58 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig21800BF69FE11B6A0A24B616 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Bruce Evans wrote: > On Mon, 8 Oct 2007, LI Xin wrote: >=20 >> I wonder why we want to optimize by time by default and not by space b= y >> default for the system compiler, does the reasoning still hold true? >=20 > -O means optimize for time if possible. >=20 >> The commit log said: >> >> Modified files: >> contrib/gcc toplev.c >> Log: >> Clarify revision 1.14: >> Gcc 3.1's -O0 and -O1 actually optimized alignment for space, but we >> feel >> it should optimize alignment for time like Gcc 2.95 used to.=20 >> Optimization >> for space should give 1-byte alignment on i386's, but doesn't quite. >> >> Revision Changes Path >> 1.15 +0 -0 src/contrib/gcc/toplev.c >=20 > Without this change, -O pessimizes for time. Does this affect all platforms or is it i386 only, on the latest GCC version? Cheers, --=20 Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! --------------enig21800BF69FE11B6A0A24B616 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHCu0yOfuToMruuMARCqq6AJ9kgt+/seMxXRUvLOos4Q/N2aRIXQCeI0A7 DvXzq5qtLgwhZQx1iG5Bnl0= =IXVY -----END PGP SIGNATURE----- --------------enig21800BF69FE11B6A0A24B616-- From owner-freebsd-arch@FreeBSD.ORG Tue Oct 9 10:03:11 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4AF5616A419 for ; Tue, 9 Oct 2007 10:03:11 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from gnome.kiev.sovam.com (gnome.kiev.sovam.com [212.109.32.24]) by mx1.freebsd.org (Postfix) with ESMTP id EAC0313C46A for ; Tue, 9 Oct 2007 10:03:10 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com ([62.64.120.197]) by gnome.kiev.sovam.com with esmtp (Exim 4.67 (FreeBSD)) (envelope-from ) id 1IfBvh-0002gh-RC for arch@freebsd.org; Tue, 09 Oct 2007 13:03:09 +0300 Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1IfBvg-0008bw-E9 for arch@freebsd.org; Tue, 09 Oct 2007 13:03:09 +0300 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.1/8.14.1) with ESMTP id l99A2xRU048940; Tue, 9 Oct 2007 13:02:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.1/8.14.1/Submit) id l99A2x0J048825; Tue, 9 Oct 2007 13:02:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 9 Oct 2007 13:02:59 +0300 From: Kostik Belousov To: Jeff Roberson Message-ID: <20071009100259.GW2180@deviant.kiev.zoral.com.ua> References: <20071008142928.Y912@10.0.0.1> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ZljC5FVPx7rxDQQ8" Content-Disposition: inline In-Reply-To: <20071008142928.Y912@10.0.0.1> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: ca614d1e091d53a46adb2ef7a730c6a6 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1576 [Oct 09 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: arch@freebsd.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 10:03:11 -0000 --ZljC5FVPx7rxDQQ8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Oct 08, 2007 at 02:40:00PM -0700, Jeff Roberson wrote: > During the work on thread lock I observed that there is a significant=20 > amount of locking involved in our signal paths right now. And these lock= s=20 > also show up contended in many workloads. Furthermore, requiring a DEF= =20 > mutex complicates sleep queues by forcing them to drop the spinlock to=20 > check for signals and then check for races. >=20 > The current issignal() code will actually msleep in the case of a=20 > stopevent() requested by the debugger. This is fine for signals that=20 > would normally abort the sleep anyway, but SIGSTOP actually leaves the=20 > thread on the sleep queue and tries to resume the sleep after the stop ha= s=20 > cleared. So SIGSTOP combined with a stopevent() actually breaks because= =20 > the stopevent() removes the thread from the sleep queue. I'm not certain= =20 > what the failure mode is currently, but I'm certain that it's wrong. >=20 > What I'd like to do is stop sleeping in issignal() all together. For=20 > regular restartable syscalls this would mean failing back out to ast()=20 > where we'd then handle the signals including SIGSTOP. After SIGCONT we'd= =20 > then restart the syscall. For non-restartable syscalls we could have a= =20 > special issignal variant that is called when msleep/cv_timedwait_sig=20 > return interrupted that would check for SIGSTOP/debugger events and sleep= =20 > within a loop retrying the operation. This would preserve the behavior o= f=20 > debugging events and SIGSTOP not aborting non-restartable syscalls as the= y=20 > do now. >=20 > Once we have moved the location of the sleeps it will be possible to chec= k=20 > for signals using a spinlock without dropping the sleep queue lock in=20 > sleepq_catch_signals(). >=20 Another failure mode we have in our code is the NFS intr mounts. NFS client often sleeps interruptible while holding vnode lock(s). Allowing the SIGSTOP to stop such process inside corresponding msleep() call causes vnode lock cascade. > What I'd like from readers on arch@ is for you to consider if there are= =20 > other cases than non-restartable syscalls that will break if=20 > msleep/sleepqs return EINTR from SIGSTOP and debug events. Also, is ther= e=20 > an authoritative list of non-restartable syscalls anywhere? It's just=20 > those involving timevals right? nanosleep/poll/select/kqueue.. others? >=20 > I intend to do this work for 8.0 and hopefully very early on so we have= =20 > plenty of time to shake out bugs as this signal code tends to be very=20 > delicate. >=20 > Thanks, > Jeff --ZljC5FVPx7rxDQQ8 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHC1HTC3+MBN1Mb4gRAnWzAJ9aYq8pUFaAiJQ2gxEYJPA4U7S0RQCeMl2r nkfEvZbOUZYUdEMOZsPRTOQ= =068I -----END PGP SIGNATURE----- --ZljC5FVPx7rxDQQ8-- From owner-freebsd-arch@FreeBSD.ORG Tue Oct 9 23:43:21 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1E80216A418 for ; Tue, 9 Oct 2007 23:43:21 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id C50C213C447 for ; Tue, 9 Oct 2007 23:43:20 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.1/8.13.7) with ESMTP id l99NX1Xe073288; Tue, 9 Oct 2007 16:33:02 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.1/8.13.4/Submit) id l99NX0qZ073285; Tue, 9 Oct 2007 16:33:00 -0700 (PDT) Date: Tue, 9 Oct 2007 16:33:00 -0700 (PDT) From: Matthew Dillon Message-Id: <200710092333.l99NX0qZ073285@apollo.backplane.com> To: Kostik Belousov References: <20071008142928.Y912@10.0.0.1> <20071009100259.GW2180@deviant.kiev.zoral.com.ua> Cc: arch@freebsd.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 23:43:21 -0000 I noticed the sigstop cascade with NFS a year or two ago and changed the way DragonFly handles SIGSTOP to no longer actually stop the process in the sleep code. What it does instead is set the process (or thread) state to SSTOP, but does not actually stop the process until the process tries to return to userland. All the stop handling was moved to userret(). It works just dandy, The only issue that cropped up from doing things this way is that when you ^Z a program that is blocked on I/O, the program will complete the I/O before actually going to sleep. This seems to only have a visible effect for programs outputting a lot of junk to stdout. One additional line will be written to stdout after the ^Z is delivered before the process actually stops. 'ps' output will also show the process not go into an immediate stop state but since the condition has to be flagged it is really easy to adjust ps to report that the process is stopped even though it isn't quite stopped yet. This change saved us a lot of headaches and simplified a number of code paths. Frankly, userret is the ONLY safe place where you can actually stop a process these days. -Matt From owner-freebsd-arch@FreeBSD.ORG Tue Oct 9 23:46:33 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1349816A418 for ; Tue, 9 Oct 2007 23:46:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au [211.29.132.192]) by mx1.freebsd.org (Postfix) with ESMTP id 8D31B13C457 for ; Tue, 9 Oct 2007 23:46:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l99NkGGT000772 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 10 Oct 2007 09:46:21 +1000 Date: Wed, 10 Oct 2007 09:46:16 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: d@delphij.net In-Reply-To: <470AED31.60201@delphij.net> Message-ID: <20071010092255.O36751@delplex.bde.org> References: <470ACEA1.3030309@delphij.net> <20071009122751.X54949@besplex.bde.org> <470AED31.60201@delphij.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-arch@freebsd.org Subject: Re: Why we optimize by time by default for < -O2 case? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 23:46:33 -0000 On Mon, 8 Oct 2007, LI Xin wrote: > Bruce Evans wrote: >>> Revision Changes Path >>> 1.15 +0 -0 src/contrib/gcc/toplev.c >> >> Without this change, -O pessimizes for time. > > Does this affect all platforms or is it i386 only, on the latest GCC > version? I don't know, but guess it affects some. In general, the pessimization works by breaking lookup of a table that gives the best alignment for the current target, so it affects all platforms that have such a table. On platforms with stricter alignment requirements than i386, gcc would have to adjust any -falign-foo settings that are too small to work. Then the pessimization might give minimal aligment != 1 and thus have no effect if the minimal alignment happens to equal the best alignment. Bruce From owner-freebsd-arch@FreeBSD.ORG Tue Oct 9 23:49:08 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 040E216A419 for ; Tue, 9 Oct 2007 23:49:08 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id B1D9D13C44B for ; Tue, 9 Oct 2007 23:49:07 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.1/8.13.7) with ESMTP id l99Nn0la073432; Tue, 9 Oct 2007 16:49:00 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.1/8.13.4/Submit) id l99Nn01S073431; Tue, 9 Oct 2007 16:49:00 -0700 (PDT) Date: Tue, 9 Oct 2007 16:49:00 -0700 (PDT) From: Matthew Dillon Message-Id: <200710092349.l99Nn01S073431@apollo.backplane.com> To: Kostik Belousov References: <20071008142928.Y912@10.0.0.1> <20071009100259.GW2180@deviant.kiev.zoral.com.ua> Cc: arch@freebsd.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 23:49:08 -0000 I think it's a bad idea to have SIGSTOP generate an EINTR-like event. The two have totally different mechanics and the system call behavior will be different because there are some system calls which simply cannot be restarted even if you wanted to. System calls which loop on UIO's, for example, cannot be restarted. EINTR and STOP have two totally different behaviors for such calls and there is nothing you can do about it. Basically any system call which maintains state through multiple blocking events cannot be restarted after having returned unless no cumulative operations have been performed. For example, if read()ing from a socket the read() is restartable ONLY if no data has yet been read. But if some data HAS been read and EINTR occurs, the system call will simply terminate early and return a short-read, and NOT restart. That same system call when presented with a STOP, however, will not terminate early. Instead it (in FreeBSD now) stops in tsleep and when it is CONTed again the system call resumes. It's simply not possible (without a LOT of work) to have such a system call return all the way to userland or even return to the kernel syscall trap layer and be able to restart it. The restart code only works if no cumulative events have occured... for example, if a UIO has not been filled at all (0 bytes read or written). ERESTART literally moves the program counter back to the start of the system call and causes userland to re-execute it. The best compromise that I found, which I implemented for Dragonfly a while back, was to ignore SIGSTOP in the kernel entirely and process the event in userret() instead. Except for certain process control cases like the debugger, SIGSTOP is handled asynchronously anyway. e.g. when you signal a SIGSTOP the kill() system call will return before the target process(es) have actually stopped. It's just that the window of opportunity is fairly small when SIGSTOP is handled in tsleep, and somewhat bigger when it is handled in userret. That's the only hangup. -Matt From owner-freebsd-arch@FreeBSD.ORG Wed Oct 10 01:23:26 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 596B216A417 for ; Wed, 10 Oct 2007 01:23:26 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id F395B13C46A for ; Wed, 10 Oct 2007 01:23:25 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.103] (c-67-160-44-208.hsd1.wa.comcast.net [67.160.44.208]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l9A1NKjW083995 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Tue, 9 Oct 2007 21:23:21 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Tue, 9 Oct 2007 18:25:54 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: Matthew Dillon In-Reply-To: <200710092349.l99Nn01S073431@apollo.backplane.com> Message-ID: <20071009182046.J912@10.0.0.1> References: <20071008142928.Y912@10.0.0.1> <20071009100259.GW2180@deviant.kiev.zoral.com.ua> <200710092349.l99Nn01S073431@apollo.backplane.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kostik Belousov , arch@freebsd.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2007 01:23:26 -0000 On Tue, 9 Oct 2007, Matthew Dillon wrote: > I think it's a bad idea to have SIGSTOP generate an EINTR-like event. > The two have totally different mechanics and the system call behavior > will be different because there are some system calls which simply cannot > be restarted even if you wanted to. System calls which loop on UIO's, > for example, cannot be restarted. EINTR and STOP have two totally > different behaviors for such calls and there is nothing you can do about > it. Basically any system call which maintains state through multiple > blocking events cannot be restarted after having returned unless no > cumulative operations have been performed. > > For example, if read()ing from a socket the read() is restartable ONLY > if no data has yet been read. But if some data HAS been read and EINTR > occurs, the system call will simply terminate early and return a > short-read, and NOT restart. That same system call when presented with > a STOP, however, will not terminate early. Instead it (in FreeBSD now) > stops in tsleep and when it is CONTed again the system call resumes. > It's simply not possible (without a LOT of work) to have such a system > call return all the way to userland or even return to the kernel syscall > trap layer and be able to restart it. Sure, however, we already deal with interrupting these system calls now either with short reads or syscall restart. The question is whether changing the behavior to the same for SIGSTOP is a big enough change to break things. I will see what posix has to say about it soon. > > The restart code only works if no cumulative events have occured... for > example, if a UIO has not been filled at all (0 bytes read or written). > ERESTART literally moves the program counter back to the start of the > system call and causes userland to re-execute it. > > The best compromise that I found, which I implemented for Dragonfly a > while back, was to ignore SIGSTOP in the kernel entirely and process > the event in userret() instead. Except for certain process control > cases like the debugger, SIGSTOP is handled asynchronously anyway. e.g. > when you signal a SIGSTOP the kill() system call will return before > the target process(es) have actually stopped. It's just that the window > of opportunity is fairly small when SIGSTOP is handled in tsleep, and > somewhat bigger when it is handled in userret. That's the only hangup. Yes this is a very good idea. However, it's also a change in behavior. The question is, which is more disruptive? Causing restart behavior or allowing the syscalls to continue further than they original would've. I will consult posix and see what Linux and Solaris do in more detail. Thanks, Jeff > > -Matt > From owner-freebsd-arch@FreeBSD.ORG Wed Oct 10 01:40:37 2007 Return-Path: Delivered-To: arch@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1114F16A419 for ; Wed, 10 Oct 2007 01:40:37 +0000 (UTC) (envelope-from davidxu@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id EA78213C455; Wed, 10 Oct 2007 01:40:36 +0000 (UTC) (envelope-from davidxu@FreeBSD.org) Received: from [127.0.0.1] (root@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l9A1eYYp051139; Wed, 10 Oct 2007 01:40:35 GMT (envelope-from davidxu@freebsd.org) Message-ID: <470C2DC5.3040601@freebsd.org> Date: Wed, 10 Oct 2007 09:41:25 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.13) Gecko/20070516 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jeff Roberson References: <20071008142928.Y912@10.0.0.1> In-Reply-To: <20071008142928.Y912@10.0.0.1> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2007 01:40:37 -0000 I think the performance problem here is that while sleep queue lock is already hash scaled, but interruptable msleep still has to pass through a single lock, the process lock, this is a serious serialization problem on SMP machine, especially when machine has 4 or more cores. Also in most most runtime, process has few signals and job controls, process lock and unlock should be avoided by checking a thread self's flag instead, which uses thread lock. I had ever worked out a patch to avoid the lock contention: http://people.freebsd.org/~davidxu/patch/PCATCH_optimize.patch mysql benchmark shows that on dual PIII machine it can improve performance about 1 or 2 percentage, I had not tested it on 4 core machine. Regards, David Xu From owner-freebsd-arch@FreeBSD.ORG Wed Oct 10 06:13:06 2007 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3996416A41A; Wed, 10 Oct 2007 06:13:06 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 040B513C467; Wed, 10 Oct 2007 06:13:05 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.103] (c-67-160-44-208.hsd1.wa.comcast.net [67.160.44.208]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l9A6D32c028080 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Wed, 10 Oct 2007 02:13:05 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Tue, 9 Oct 2007 23:15:38 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: David Xu In-Reply-To: <470C2DC5.3040601@freebsd.org> Message-ID: <20071009231203.I912@10.0.0.1> References: <20071008142928.Y912@10.0.0.1> <470C2DC5.3040601@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2007 06:13:06 -0000 On Wed, 10 Oct 2007, David Xu wrote: > I think the performance problem here is that while sleep queue lock is > already hash scaled, but interruptable msleep still has to pass through > a single lock, the process lock, this is a serious serialization problem on > SMP machine, especially when machine has 4 or more cores. > Also in most most runtime, process has few signals and job controls, > process lock and unlock should be avoided by checking a thread self's > flag instead, which uses thread lock. > This is an interesting approach as well and may have some value, however, I hope to fix a number of bugs in the current mechanism at the same time. This will still require moving the sleeps. I also intend to experiment with making the sigaction lock a spinlock and using that entirely to protect signal delivery for the process. This will make a single spinlock which can be checked while the sleepq chain lock is held. Solaris actually does this without a lock entirely. If the race is lost the sender of the signal will be blocking on the sleepq chain lock while the thread is going to sleep anyway. I think this is possible to implement without much difficulty. > I had ever worked out a patch to avoid the lock contention: > http://people.freebsd.org/~davidxu/patch/PCATCH_optimize.patch > > mysql benchmark shows that on dual PIII machine it can improve > performance about 1 or 2 percentage, I had not tested it on 4 core > machine. We have seen that faster machines suffer much more from the contention problems so it's likely that the effect would be significantly greater on a 4-8 core faster machine. If you update the patch I can try it on an 8way opteron or xeon for you. > > Regards, > David Xu > Thanks, Jeff From owner-freebsd-arch@FreeBSD.ORG Wed Oct 10 14:28:06 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4851A16A41A for ; Wed, 10 Oct 2007 14:28:06 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 1DEF113C448 for ; Wed, 10 Oct 2007 14:28:06 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 850DB1A4D82; Wed, 10 Oct 2007 07:28:05 -0700 (PDT) Date: Wed, 10 Oct 2007 07:28:05 -0700 From: Alfred Perlstein To: Jeff Roberson Message-ID: <20071010142805.GU31826@elvis.mu.org> References: <20071008142928.Y912@10.0.0.1> <20071009100259.GW2180@deviant.kiev.zoral.com.ua> <200710092349.l99Nn01S073431@apollo.backplane.com> <20071009182046.J912@10.0.0.1> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071009182046.J912@10.0.0.1> User-Agent: Mutt/1.4.2.3i Cc: Kostik Belousov , arch@freebsd.org Subject: Re: Abolishing sleeps in issignal() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2007 14:28:06 -0000 * Jeff Roberson [071009 18:24] wrote: > On Tue, 9 Oct 2007, Matthew Dillon wrote: > > > The restart code only works if no cumulative events have occured... for > > example, if a UIO has not been filled at all (0 bytes read or written). > > ERESTART literally moves the program counter back to the start of the > > system call and causes userland to re-execute it. > > > > The best compromise that I found, which I implemented for Dragonfly a > > while back, was to ignore SIGSTOP in the kernel entirely and process > > the event in userret() instead. Except for certain process control > > cases like the debugger, SIGSTOP is handled asynchronously anyway. e.g. > > when you signal a SIGSTOP the kill() system call will return before > > the target process(es) have actually stopped. It's just that the window > > of opportunity is fairly small when SIGSTOP is handled in tsleep, and > > somewhat bigger when it is handled in userret. That's the only hangup. > > Yes this is a very good idea. However, it's also a change in behavior. > The question is, which is more disruptive? Causing restart behavior or > allowing the syscalls to continue further than they original would've. I > will consult posix and see what Linux and Solaris do in more detail. You may be able to fix those situations by manually calling into "check_sstop()" in those code paths. -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Thu Oct 11 17:22:58 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1BFA816A417 for ; Thu, 11 Oct 2007 17:22:58 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outU.internet-mail-service.net (outU.internet-mail-service.net [216.240.47.244]) by mx1.freebsd.org (Postfix) with ESMTP id 07D5513C48E for ; Thu, 11 Oct 2007 17:22:57 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Thu, 11 Oct 2007 10:22:56 -0700 X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 4CC57126674; Thu, 11 Oct 2007 10:22:56 -0700 (PDT) Message-ID: <470E5BFB.4050903@elischer.org> Date: Thu, 11 Oct 2007 10:23:07 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: arch@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marko Zec Subject: kernel level virtualisation requirements. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Oct 2007 17:22:58 -0000 After PHK added jails, FreeBSD found a multitude of new applications, and they have served us well for quite some time. However since then Solaris and Linux have provided newer and more extensive virtualised abstractions, and it's probably time to think abut where we go from here. Marco Zec has been working on his network virtualisation, and Andre has spoken of what would be a subset of that, with policy based routing capacities (multiple routing tables etc.) I have been doing some private work on machines with multiple routing universes but that is not generally applicable. Some people have talked about cpu partition, resource sub partitioning and other aspects that could be considered to be part of presenting the appearance of many machines in one way or another. My reason for writing this is to see if as a group, we can come to a definition of what is needed, and how it can be organised. I'll start the ball rolling by stating that I'd like to see the vimage code merged with a general framework (it already has some aspects of this.. Marco has done a great job) and put in the new head branch. What I'd like to see is a bit of a 'a-la-carte' virtualisation ability. I'd like to be able to say.. I want to share the filesystem, and unix domain sockets but have a separate routing domain for my processes, or maybe just for some sockets. But someone else may want to have complete separation with everything up to and including separate userID spaces. My question to you, the reader, is: what aspects of virtualisation (the appearance of multiple instances of some resource) would you like to see in the system? Even a discussion as to how to frame this question is up for discussion. We don't even have a taxonomy to discus the issue. Julian From owner-freebsd-arch@FreeBSD.ORG Fri Oct 12 20:06:38 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F4E016A468; Fri, 12 Oct 2007 20:06:38 +0000 (UTC) (envelope-from jamie@gritton.org) Received: from gritton.org (gritton.org [161.58.222.4]) by mx1.freebsd.org (Postfix) with ESMTP id C864713C47E; Fri, 12 Oct 2007 20:06:37 +0000 (UTC) (envelope-from jamie@gritton.org) Received: from [10.20.12.66] (fw.oremut02.us.wh.verio.net [198.65.168.24]) (authenticated bits=0) by gritton.org (8.13.6.20060614/8.13.6) with ESMTP id l9CJsAol091865; Fri, 12 Oct 2007 13:54:10 -0600 (MDT) Message-ID: <470FD0DC.5080503@gritton.org> Date: Fri, 12 Oct 2007 13:54:04 -0600 From: James Gritton User-Agent: Thunderbird 1.5.0.2 (X11/20060512) MIME-Version: 1.0 To: arch@freebsd.org References: <470E5BFB.4050903@elischer.org> In-Reply-To: <470E5BFB.4050903@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marko Zec , Julian Elischer Subject: Re: kernel level virtualisation requirements. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Oct 2007 20:06:38 -0000 Julian Elischer wrote: > What I'd like to see is a bit of a 'a-la-carte' virtualisation > ability. ... > My question to you, the reader, is: > what aspects of virtualisation (the appearance of multiple instances > of some resource) would you like to see in the system? Of course everything jail has now, and all the network bits that vimage offers. CPU scheduling, in particular schedule the CPU first by jail, and then by processes within jail. Filesystem quotas, without the need for each jail to have its own mount point. A lot of things that fall under the IPC category: UNIX domain sockets (part of jail chroot I suppose), PTYs, tunnel devices, SYSV IPC, file locks. Swap space and resident memory limits. The sysctl mechanism seems a good way to declare jails as having one capability or the other. This would alleviate the need to keep updating the jail structure when someone has a new idea, especially handy since the single structure makes it very hard to work on more than one new idea at a time. - Jamie From owner-freebsd-arch@FreeBSD.ORG Sat Oct 13 07:44:25 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4EA9016A419; Sat, 13 Oct 2007 07:44:25 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 2029013C457; Sat, 13 Oct 2007 07:44:25 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.104] (cpe-66-91-190-165.hawaii.res.rr.com [66.91.190.165]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l9D7iBOh033255 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Sat, 13 Oct 2007 03:44:16 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Sat, 13 Oct 2007 00:46:58 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: James Gritton In-Reply-To: <470FD0DC.5080503@gritton.org> Message-ID: <20071013004539.R1002@10.0.0.1> References: <470E5BFB.4050903@elischer.org> <470FD0DC.5080503@gritton.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Marko Zec , Julian Elischer Subject: Re: kernel level virtualisation requirements. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Oct 2007 07:44:25 -0000 On Fri, 12 Oct 2007, James Gritton wrote: > Julian Elischer wrote: > >> What I'd like to see is a bit of a 'a-la-carte' virtualisation >> ability. > ... >> My question to you, the reader, is: >> what aspects of virtualisation (the appearance of multiple instances >> of some resource) would you like to see in the system? > > Of course everything jail has now, and all the network bits that vimage > offers. > > CPU scheduling, in particular schedule the CPU first by jail, and then > by processes within jail. So the question I have is; why do all of these things instead of vmware/xen/other full virtualization? We can implement these technologies. Specifically, I could do the CPU scheduling. However, why not just fix Xen? There may be a very good answer to this, I just don't know it. Thanks, Jeff > > Filesystem quotas, without the need for each jail to have its own mount > point. > > A lot of things that fall under the IPC category: UNIX domain sockets (part > of > jail chroot I suppose), PTYs, tunnel devices, SYSV IPC, file locks. > > Swap space and resident memory limits. > > > The sysctl mechanism seems a good way to declare jails as having one > capability > or the other. This would alleviate the need to keep updating the jail > structure when someone has a new idea, especially handy since the single > structure makes it very hard to work on more than one new idea at a time. > > - Jamie > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Sat Oct 13 07:45:42 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EEC6B16A418 for ; Sat, 13 Oct 2007 07:45:42 +0000 (UTC) (envelope-from qpadla@gmail.com) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.168]) by mx1.freebsd.org (Postfix) with ESMTP id 7CCA313C447 for ; Sat, 13 Oct 2007 07:45:42 +0000 (UTC) (envelope-from qpadla@gmail.com) Received: by ug-out-1314.google.com with SMTP id y2so120396uge for ; Sat, 13 Oct 2007 00:45:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:message-id; bh=MqWtxFv862oXlEHLc1chMntl1nXBSlU0zzwFUSG5DV8=; b=E5tQEnL+AbCBajRWjBqslFqaOEg+Sd8cevV0JCtcjIg1GVWfAmkYSgMHvYH2+ilUn1K2/CG6Xj8HQHNniXCjFMTZsgrM3no+hx1ibVrsEk8O48UIw3Guu5JsMj/y0eXdAx87yzJGJYiH496ZY+tZ5M0lO1m4aObZPOyLlhs/JBc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:message-id; b=KEv3JsTOAolViXweVqYRMFBy/wf0SA+sASjKex+L+XQij/5KBJWPPYcZoQ7AvUIumTRkMIXSPjEddrGLVY57viZ17HAqYp7I4XhD3PiBjsILD0PpDpMTAntanh7iZ+tXpXGnAVAH1w8Iae0gmSnMU4Sf/mP7gPoadph7rLHr3Do= Received: by 10.67.20.11 with SMTP id x11mr5254337ugi.1192260069439; Sat, 13 Oct 2007 00:21:09 -0700 (PDT) Received: from orion ( [89.162.141.1]) by mx.google.com with ESMTPS id k1sm3824946ugf.2007.10.13.00.21.06 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 13 Oct 2007 00:21:07 -0700 (PDT) From: Nikolay Pavlov To: freebsd-arch@freebsd.org Date: Sat, 13 Oct 2007 10:20:58 +0300 User-Agent: KMail/1.9.7 References: <470E5BFB.4050903@elischer.org> <470FD0DC.5080503@gritton.org> In-Reply-To: <470FD0DC.5080503@gritton.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1379809.SUtEzzziul"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200710131021.03861.qpadla@gmail.com> Cc: Marko Zec , arch@freebsd.org, James Gritton , Julian Elischer Subject: Re: kernel level virtualisation requirements. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: qpadla@gmail.com List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Oct 2007 07:45:43 -0000 --nextPart1379809.SUtEzzziul Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Friday 12 October 2007 22:54:04 James Gritton wrote: > Julian Elischer wrote: > > What I'd like to see is a bit of a 'a-la-carte' virtualisation > > ability. > > ... > > > My question to you, the reader, is: > > what aspects of virtualisation (the appearance of multiple instances > > of some resource) would you like to see in the system? > > Of course everything jail has now, and all the network bits that vimage > offers. > > CPU scheduling, in particular schedule the CPU first by jail, and then > by processes within jail. This is absolutely "MUST HAVE" feature i think. > > Filesystem quotas, without the need for each jail to have its own mount > point. Strange, but IMHO it would be better slightly revert this statement: =46ilesystem quotas _with_ the need for each jail to have it's own mount=20 point, but with out the need to maintain them in fstab (Like it is in=20 ZFS). Because you gain the ability to maintain jails in a filesystem=20 level(snapshots, cloning, dump, restore and so on). > > A lot of things that fall under the IPC category: UNIX domain sockets > (part of > jail chroot I suppose), PTYs, tunnel devices, SYSV IPC, file locks. > > Swap space and resident memory limits. > > > The sysctl mechanism seems a good way to declare jails as having one > capability > or the other. This would alleviate the need to keep updating the jail > structure when someone has a new idea, especially handy since the single > structure makes it very hard to work on more than one new idea at a > time. =2D-=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 =2D Best regards, Nikolay Pavlov. <<<----------------------------------- = =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --nextPart1379809.SUtEzzziul Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQBHEHHf/2R6KvEYGaIRAh2SAJ44rcUr2J5eB3f3FvkqHA7XIOlFqQCbBeyt Rvi0dYRzYZbOo20RUXEPdvw= =dH9k -----END PGP SIGNATURE----- --nextPart1379809.SUtEzzziul-- From owner-freebsd-arch@FreeBSD.ORG Sat Oct 13 07:47:00 2007 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3815416A46D for ; Sat, 13 Oct 2007 07:47:00 +0000 (UTC) (envelope-from qpadla@gmail.com) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.175]) by mx1.freebsd.org (Postfix) with ESMTP id B75FB13C4B7 for ; Sat, 13 Oct 2007 07:46:59 +0000 (UTC) (envelope-from qpadla@gmail.com) Received: by ug-out-1314.google.com with SMTP id y2so120471uge for ; Sat, 13 Oct 2007 00:46:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:message-id; bh=MqWtxFv862oXlEHLc1chMntl1nXBSlU0zzwFUSG5DV8=; b=E5tQEnL+AbCBajRWjBqslFqaOEg+Sd8cevV0JCtcjIg1GVWfAmkYSgMHvYH2+ilUn1K2/CG6Xj8HQHNniXCjFMTZsgrM3no+hx1ibVrsEk8O48UIw3Guu5JsMj/y0eXdAx87yzJGJYiH496ZY+tZ5M0lO1m4aObZPOyLlhs/JBc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:message-id; b=KEv3JsTOAolViXweVqYRMFBy/wf0SA+sASjKex+L+XQij/5KBJWPPYcZoQ7AvUIumTRkMIXSPjEddrGLVY57viZ17HAqYp7I4XhD3PiBjsILD0PpDpMTAntanh7iZ+tXpXGnAVAH1w8Iae0gmSnMU4Sf/mP7gPoadph7rLHr3Do= Received: by 10.67.20.11 with SMTP id x11mr5254337ugi.1192260069439; Sat, 13 Oct 2007 00:21:09 -0700 (PDT) Received: from orion ( [89.162.141.1]) by mx.google.com with ESMTPS id k1sm3824946ugf.2007.10.13.00.21.06 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 13 Oct 2007 00:21:07 -0700 (PDT) From: Nikolay Pavlov To: freebsd-arch@freebsd.org Date: Sat, 13 Oct 2007 10:20:58 +0300 User-Agent: KMail/1.9.7 References: <470E5BFB.4050903@elischer.org> <470FD0DC.5080503@gritton.org> In-Reply-To: <470FD0DC.5080503@gritton.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1379809.SUtEzzziul"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200710131021.03861.qpadla@gmail.com> Cc: Marko Zec , arch@freebsd.org, James Gritton , Julian Elischer Subject: Re: kernel level virtualisation requirements. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: qpadla@gmail.com List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Oct 2007 07:47:00 -0000 --nextPart1379809.SUtEzzziul Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Friday 12 October 2007 22:54:04 James Gritton wrote: > Julian Elischer wrote: > > What I'd like to see is a bit of a 'a-la-carte' virtualisation > > ability. > > ... > > > My question to you, the reader, is: > > what aspects of virtualisation (the appearance of multiple instances > > of some resource) would you like to see in the system? > > Of course everything jail has now, and all the network bits that vimage > offers. > > CPU scheduling, in particular schedule the CPU first by jail, and then > by processes within jail. This is absolutely "MUST HAVE" feature i think. > > Filesystem quotas, without the need for each jail to have its own mount > point. Strange, but IMHO it would be better slightly revert this statement: =46ilesystem quotas _with_ the need for each jail to have it's own mount=20 point, but with out the need to maintain them in fstab (Like it is in=20 ZFS). Because you gain the ability to maintain jails in a filesystem=20 level(snapshots, cloning, dump, restore and so on). > > A lot of things that fall under the IPC category: UNIX domain sockets > (part of > jail chroot I suppose), PTYs, tunnel devices, SYSV IPC, file locks. > > Swap space and resident memory limits. > > > The sysctl mechanism seems a good way to declare jails as having one > capability > or the other. This would alleviate the need to keep updating the jail > structure when someone has a new idea, especially handy since the single > structure makes it very hard to work on more than one new idea at a > time. =2D-=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 =2D Best regards, Nikolay Pavlov. <<<----------------------------------- = =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --nextPart1379809.SUtEzzziul Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQBHEHHf/2R6KvEYGaIRAh2SAJ44rcUr2J5eB3f3FvkqHA7XIOlFqQCbBeyt Rvi0dYRzYZbOo20RUXEPdvw= =dH9k -----END PGP SIGNATURE----- --nextPart1379809.SUtEzzziul-- From owner-freebsd-arch@FreeBSD.ORG Sat Oct 13 07:53:47 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29B2616A46B for ; Sat, 13 Oct 2007 07:53:47 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outL.internet-mail-service.net (outL.internet-mail-service.net [216.240.47.235]) by mx1.freebsd.org (Postfix) with ESMTP id 119A213C480 for ; Sat, 13 Oct 2007 07:53:46 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Sat, 13 Oct 2007 00:53:45 -0700 X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 0C38B1267DE; Sat, 13 Oct 2007 00:53:44 -0700 (PDT) Message-ID: <47107996.5090607@elischer.org> Date: Sat, 13 Oct 2007 00:53:58 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Jeff Roberson References: <470E5BFB.4050903@elischer.org> <470FD0DC.5080503@gritton.org> <20071013004539.R1002@10.0.0.1> In-Reply-To: <20071013004539.R1002@10.0.0.1> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marko Zec , arch@freebsd.org, James Gritton Subject: Re: kernel level virtualisation requirements. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Oct 2007 07:53:47 -0000 Jeff Roberson wrote: > On Fri, 12 Oct 2007, James Gritton wrote: > >> Julian Elischer wrote: >> >>> What I'd like to see is a bit of a 'a-la-carte' virtualisation >>> ability. >> ... >>> My question to you, the reader, is: >>> what aspects of virtualisation (the appearance of multiple instances >>> of some resource) would you like to see in the system? >> >> Of course everything jail has now, and all the network bits that >> vimage offers. >> >> CPU scheduling, in particular schedule the CPU first by jail, and then >> by processes within jail. > > So the question I have is; why do all of these things instead of > vmware/xen/other full virtualization? We can implement these > technologies. Specifically, I could do the CPU scheduling. However, > why not just fix Xen? There may be a very good answer to this, I just > don't know it. Generally, you can run several hundred (or more) virtual jail/vimage style machines. xen/vmware uses so much more resources that you are usually limited to so number like 20. it is possible in a virtual networking setup to have a single process spanning several virtual environments (for example one process with a socket in each of the child universes). It is a valid question, but there is I think a place for both types of partitioning. > > Thanks, > Jeff > >> >> Filesystem quotas, without the need for each jail to have its own >> mount point. >> >> A lot of things that fall under the IPC category: UNIX domain sockets >> (part of >> jail chroot I suppose), PTYs, tunnel devices, SYSV IPC, file locks. >> >> Swap space and resident memory limits. >> >> >> The sysctl mechanism seems a good way to declare jails as having one >> capability >> or the other. This would alleviate the need to keep updating the jail >> structure when someone has a new idea, especially handy since the single >> structure makes it very hard to work on more than one new idea at a time. >> >> - Jamie >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >> From owner-freebsd-arch@FreeBSD.ORG Sat Oct 13 10:49:37 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1DA3016A420 for ; Sat, 13 Oct 2007 10:49:37 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [82.208.36.70]) by mx1.freebsd.org (Postfix) with ESMTP id DA4EB13C448 for ; Sat, 13 Oct 2007 10:49:36 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from localhost (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id E4ABC19E02A; Sat, 13 Oct 2007 12:33:31 +0200 (CEST) Received: from [192.168.1.2] (r3a200.net.upc.cz [213.220.192.200]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTP id 334E119E027; Sat, 13 Oct 2007 12:33:29 +0200 (CEST) Message-ID: <47109F59.30602@quip.cz> Date: Sat, 13 Oct 2007 12:35:05 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cz, cs, en, en-us MIME-Version: 1.0 To: arch@freebsd.org References: <470E5BFB.4050903@elischer.org> In-Reply-To: <470E5BFB.4050903@elischer.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-jail@freebsd.org, Julian Elischer Subject: Re: kernel level virtualisation requirements. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Oct 2007 10:49:37 -0000 Julian Elischer wrote: [...] > I'd like to be able to say.. > I want to share the filesystem, and unix domain sockets but have a > separate routing domain for my processes, or maybe just > for some sockets. But someone else may want to have > complete separation with everything up to and including > separate userID spaces. > > My question to you, the reader, is: > what aspects of virtualisation (the appearance of multiple instances > of some resource) would you like to see in the system? > > Even a discussion as to how to frame this question is up for discussion. > > We don't even have a taxonomy to discus the issue. > > Julian It would be nice to have something from vserver, something from zones, from xen, from jails etc. From my point of view: CPU limits - specified as relative part of shares (container can get more CPU power if CPU is not 100% loaded) or set to absolute (container can't get more than specified CPU power, so one can use it to test applications on slow CPUs etc.) Memory limits - same as CPU Disk - it would be nice if I can set how many disk space each container can use. (with similar interface as disk quotas - soft+hard limits and space+inodes). Maybe setting of disk I/O in similar style as CPU and memory limits above. UIDs - independent UIDs in containers. In relation to UIDs, one can use disk quotas inside containers. Network bandwidth - same as CPU and memory Each container can have multiple IPs, can have own routing, firewalling (vimage is nice project) Hierarchical structure - container can contain another containers. Nested containers inherit/share resources from parent container, or can be limited to some part of them. For example container1 could have 5 IPs, 40% of CPU, 200MB of memory, 50GB of disk space, container1A could have 2 IPs, 50% of CPU of its parent (container1), 50MB memory, 10GB disk space, container1B could have no IP, 10% CPU of parent, 100MB memory, no disk space limits. Other not specified resources and resources for container1C are shared within parent container. Nested containers could be used to set some limits (CPU, memory, disk, bandwidth) to more than one container at a time, I could set some limits to container2 and doesn't matter of setting any limits/portioning to container2A and container2B. host OS --- container1 --- container1A | |-- container1B | \-- container1C | +-- container2 --- container2A | \-- container2B | \-- container3 Others as said by James Gritton. I know my view is too complex, but it is only subject for discussion. I am CCing freebsd-jail@freebsd.org, as it is related to Jails. (discussion continue on arch@freebsd.org) Miroslav Lachman