From owner-freebsd-arch@FreeBSD.ORG Mon Jun 20 19:56:08 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 891921065676 for ; Mon, 20 Jun 2011 19:56:08 +0000 (UTC) (envelope-from opensourcesurvey@cs.ua.edu) Received: from mailapp-2.ua.edu (mailapp-2.ua.edu [130.160.4.237]) by mx1.freebsd.org (Postfix) with ESMTP id 4F0818FC1C for ; Mon, 20 Jun 2011 19:56:07 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq0EAJuk/02CoC+0/2dsb2JhbABNBhuXP0GFcohYd6lCljmIeIM3gnMEhyCPECuEKQGGeA X-IronPort-AV: E=Sophos;i="4.65,396,1304312400"; d="scan'208";a="244576860" Received: from unknown (HELO SHE202502) ([130.160.47.180]) by smtp.ua.edu with ESMTP; 20 Jun 2011 14:56:06 -0500 From: "Jeffrey Carver" To: Date: Mon, 20 Jun 2011 14:56:06 -0500 Message-ID: <010001cc2f84$17e1e310$47a5a930$@cs.ua.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: AcwvhBGAhl9ioeAJR5yETyyw5iInTg== Content-Language: en-us Subject: REMINDER: Participation Requested: Survey about Open-Source Software Development X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2011 19:56:08 -0000 Hi, Apologies for any inconvenience and thank you to those who have already completed the survey. We will keep the survey open for another couple of weeks. But, we do hope you will consider responding to the email request below (sent 2 weeks ago). Thanks, Dr. Jeffrey Carver Assistant Professor University of Alabama (v) 205-348-9829 (f) 205-348-0219 http://www.cs.ua.edu/~carver -----Original Message----- From: Jeffrey Carver [mailto:opensourcesurvey@cs.ua.edu] Sent: Monday, June 13, 2011 11:47 AM To: 'perl6-compiler@perl.org' Subject: Participation Requested: Survey about Open-Source Software Development Hi, Drs. Jeffrey Carver, Rosanna Guadagno, Debra McCallum, and Mr. Amiangshu Bosu, University of Alabama, and Dr. Lorin Hochstein, University of Southern California, are conducting a survey of open-source software developers. This survey seeks to understand how developers on distributed, virtual teams, like open-source projects, interact with each other to accomplish their tasks. You must be at least 19 years of age to complete the survey. The survey should take approximately 15 minutes to complete. If you are actively participating as a developer, please consider completing our survey. Here is the link to the survey: http://goo.gl/HQnux We apologize for inconvenience and if you receive multiple copies of this email. This survey has been approved by The University of Alabama IRB board. Thanks, Dr. Jeffrey Carver Assistant Professor University of Alabama (v) 205-348-9829 (f) 205-348-0219 http://www.cs.ua.edu/~carver From owner-freebsd-arch@FreeBSD.ORG Wed Jun 22 16:09:16 2011 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16CD71065679 for ; Wed, 22 Jun 2011 16:09:16 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 5D81A8FC1A for ; Wed, 22 Jun 2011 16:09:14 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA12051 for ; Wed, 22 Jun 2011 19:09:13 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E0213A9.5050303@FreeBSD.org> Date: Wed, 22 Jun 2011 19:09:13 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110504 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-arch@FreeBSD.org X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: Subject: stop scheduler in panic context X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 16:09:16 -0000 I would like to present the following diff for review and discussion: http://people.freebsd.org/~avg/stop_scheduler_on_panic.diff The idea is to stop scheduler in a panic context and to provide a special environment for the only running thread, the one that called panic(9). I tried to make this diff as minimal as possible, it doesn't include changes that I consider to be useful improvements and [even] bug fixes, but which generated controversy in non-public discussions. If there is no negative feedback within next few days, then I plan to post the patch to current@ to solicit some testing. I will definitely wait for positive feedback before committing this change. I hope that I will be able to sneak it into the 9 release (unless there are objections to this). Thank you! -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Wed Jun 22 16:43:50 2011 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A681D106564A for ; Wed, 22 Jun 2011 16:43:50 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 110BE8FC19 for ; Wed, 22 Jun 2011 16:43:49 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA12346 for ; Wed, 22 Jun 2011 19:26:11 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E0217A3.7020802@FreeBSD.org> Date: Wed, 22 Jun 2011 19:26:11 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110504 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: arch@FreeBSD.org X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: Subject: stop_cpus*() interface X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 16:43:50 -0000 I would like to propose to narrow stop_cpus*() interface: 1. Remove cpu mask/set parameter. Rationale for this is presented below in a forwarded message from a private discussion. You may also see that currently stop_cpus*() functions are always called with either (1) other_cpus mask or (2) other_cpus & ~stopped_cpus mask, where (2) is really equivalent to (1) because of (1). 2. Change return type to void. Currently return value of stop_cpus*() is never handled and it can not be really handled meaningfully. Simple boolean or errno return value can not convey which target CPUs were already stopped and which failed to become stopped and why. I think that it's better to assume that stop_cpus*() should never fail and add necessary diagnostics to catch cases where it does fail. The below forwarded message provides my thoughts on CPU stopping semantics and additionally presents my analysis of CPU stopping code in OpenSolaris. -------- Original Message -------- on 12/05/2011 21:17 Andriy Gapon said the following: > cpu_hard_stop does stop other CPUs in a hard way. At least on some archs it is > really so, e.g. x86 NMI. This means that stopped CPUs, rather threads that were > running on them, can be stopped in any kinds of contexts with any kinds of locks > held, including spinlocks. Given that fact, it is really unsafe to continue > using any locks after even one CPU is hard-stopped. So any remaining running > CPUs should be put into a special non-locking mode. This is the reason that we > invent things like THREAD_PANICED() and use polling mode in kdb context, etc. > But having more than one CPU, in fact even more than one thread, running in > non-locking mode is unsafe again - if those CPUs continue execution without any > synchronization, then they would corrupt shared data. > Thus, I argue that hard stopping should leave only one CPU and thread running. Some more thoughts. I think that the above reasoning does even apply to the current soft stopping to a certain degree. Soft stopping would not leave any spinlocks held, true, but it can still leave other kinds of locks held, e.g. regular mutexes, sx locks. And that also produces a very special environment in the end. So in my opinion current soft stopping should also always stop all other CPUs. I think that eventually we will need "really soft" graceful stopping mechanism. That mechanism would rebind all interrupts away from a CPU being stopped, would migrate all (non-special) threads away from the CPU, would instruct scheduler to not run any threads on the CPU, would remove it from any active CPU sets, etc. Now, this mechanism should really be of a targeted variety, no doubt. I also would like to share some of my observations of OpenSolaris code. This is not to try to give any support to my proposals - after all we are not Solaris, but FreeBSD - but simply to share some ideas. In OpenSolaris I've noticed three separate CPU stopping mechanisms so far. I am sure that they have more :-) 1. Stopping by debugger. This is very similar to our hard stopping (in their x86 code[*]). All other CPUs are always stopped. One difference is that the stopped CPUs run a special command loop while spinning. The master CPU can send a few commands to the slave CPUs. Examples: the master can tell a slave, if it's a BSP, to reset a system; the master can tell a slave to become a new master (I think that this is somewhat equivalent to "thread N" command in gdb). All commands: #define KMDB_DPI_CMD_RESUME_ALL 1 /* Resume all CPUs */ #define KMDB_DPI_CMD_RESUME_MASTER 2 /* Resume only master CPU */ #define KMDB_DPI_CMD_RESUME_UNLOAD 3 /* Resume for debugger unload */ #define KMDB_DPI_CMD_SWITCH_CPU 4 /* Switch to another CPU */ #define KMDB_DPI_CMD_FLUSH_CACHES 5 /* Flush slave caches */ #define KMDB_DPI_CMD_REBOOT 6 /* Reboot the machine */ 2. Stopping for panic. This is very similar to our hard stopping (in their x86 code[*]). All other CPUs are always stopped. But this is done via different code than what debugger does, I am not sure why, maybe some historic legacy. The difference from our code and the debugger code is the stopped CPUs run a different stop loop and may do some useful panic work. E.g. my understanding is that they can be used for compressing a dump image (yes, they compress their dumps for disk writing speed I guess). 3. Something remotely similar to our current soft stopping. Big difference is that they have special "pause" threads per cpu. This mechanism activates those threads, the threads make themselves non-preemptable, disable interrupts and block on some sort of a semaphore until they are told to resume. Not sure what advantage, if any, this mechanism gives them comparing to our approach. The mechanism is invoked via pause_cpus() call. It is used mainly to change state of CPUs (some per-CPU data), like e.g. configuring idle hooks, power management. [!] BTW, they also use this mechanism when onlining/offlining CPUs to avoid locking in normal paths. That is, for instance, they stop/pause all CPUs, mark a target CPU as offline, and then restart all CPUs. This way they don't need any locking when checking (and changing) CPU status. Of course, they also do all the reasonable things to do - unbinding interrupts, moving away treads, etc. The mechanism is also used for their checkpoint-resume code (which is used by suspend/resume) and in their shutdown/reboot path. This CPU stopping mechanism also always stops all other CPUs. [*] Another difference to note is that they don't use NMI for their equivalents of our hard stopping. They still have the notion of interrupt levels and various spl* stuff. So they just have a normal interrupt with highest priority to penetrate protected contexts. E.g. in their equivalent of spinlock_enter() they do not outright disable interrupts, but set current level to a special 'LOCK' level which inhibits all typical (hardware and IPI) interrupts. This mechanism adds another degree of freedom to their implementation, as such it complicates code and logic, but also adds some flexibility. I hope that there is something useful for you and FreeBSD in this lengthy overview. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Wed Jun 22 16:58:35 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 30A72106566C; Wed, 22 Jun 2011 16:58:35 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id CD82F8FC17; Wed, 22 Jun 2011 16:58:34 +0000 (UTC) Received: by qwc9 with SMTP id 9so670026qwc.13 for ; Wed, 22 Jun 2011 09:58:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=2n4X+ZpZ9r2f6Mo1lsOLtp2/N6VarIb0ldril0lQDmI=; b=WTjYhVibRQ8WFblWroN5BKpK39W7uiIINT5ZZ1vcbPUz/PKNJ/G13YSxGwQw97NH7h wiF2g6Uh0kQ5hLbxJSr5+pqj0XNjYLE9XHaWBm72O57cnrA+SOoelVwHmYEOql7kg3+4 tdqnkL7W1YTBjwVU44Fm6hR3Fl5gNP/0daxEk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=jnJ8glIct6yu1mblz3kP2qxoxKklfM46cXydSzzaejsXW4pguWr1kMRKFF3FNKK0/X mkn1NCUOQh2+cJXoznyjY+K85atFe1fouVVI1wOEoA2ej9qTSJhP7bla1Xx/P0tgsUYp Fj+zwv37lpUdDeB9bC5fVH4pyepcoeZbwCcrQ= MIME-Version: 1.0 Received: by 10.229.37.7 with SMTP id v7mr690548qcd.299.1308761914163; Wed, 22 Jun 2011 09:58:34 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.229.102.15 with HTTP; Wed, 22 Jun 2011 09:58:34 -0700 (PDT) In-Reply-To: <4E0213A9.5050303@FreeBSD.org> References: <4E0213A9.5050303@FreeBSD.org> Date: Wed, 22 Jun 2011 09:58:34 -0700 X-Google-Sender-Auth: jE8VclINky7PFsVRq_-wCmfGksM Message-ID: From: mdf@FreeBSD.org To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-arch@freebsd.org Subject: Re: stop scheduler in panic context X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 16:58:35 -0000 On Wed, Jun 22, 2011 at 9:09 AM, Andriy Gapon wrote: > > I would like to present the following diff for review and discussion: > http://people.freebsd.org/~avg/stop_scheduler_on_panic.diff The idea seems sound to me, but I don't see any bits in sched_4bsd.c or sched_ule.c to prevent other threads from running. Or does that already happen when panicstr !=3D NULL? Thanks, matthew > The idea is to stop scheduler in a panic context and to provide a special > environment for the only running thread, the one that called panic(9). > > I tried to make this diff as minimal as possible, it doesn't include chan= ges that > I consider to be useful improvements and [even] bug fixes, but which gene= rated > controversy in non-public discussions. > > If there is no negative feedback within next few days, then I plan to pos= t the > patch to current@ to solicit some testing. =A0I will definitely wait for = positive > feedback before committing this change. =A0I hope that I will be able to = sneak it > into the 9 release (unless there are objections to this). > > Thank you! > -- > Andriy Gapon > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Wed Jun 22 17:03:52 2011 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F96E10656F0; Wed, 22 Jun 2011 17:03:52 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id C0B098FC13; Wed, 22 Jun 2011 17:03:51 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA12961; Wed, 22 Jun 2011 20:03:50 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E022075.1020109@FreeBSD.org> Date: Wed, 22 Jun 2011 20:03:49 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110504 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: mdf@FreeBSD.org References: <4E0213A9.5050303@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-arch@FreeBSD.org Subject: Re: stop scheduler in panic context X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 17:03:52 -0000 on 22/06/2011 19:58 mdf@FreeBSD.org said the following: > On Wed, Jun 22, 2011 at 9:09 AM, Andriy Gapon wrote: >> >> I would like to present the following diff for review and discussion: >> http://people.freebsd.org/~avg/stop_scheduler_on_panic.diff > > The idea seems sound to me, but I don't see any bits in sched_4bsd.c > or sched_ule.c to prevent other threads from running. Or does that > already happen when panicstr != NULL? I think that that should happen automatically as a result of stop_cpus_hard plus disabling interrupts on the panic-ing CPU. >> The idea is to stop scheduler in a panic context and to provide a special >> environment for the only running thread, the one that called panic(9). >> >> I tried to make this diff as minimal as possible, it doesn't include changes that >> I consider to be useful improvements and [even] bug fixes, but which generated >> controversy in non-public discussions. >> >> If there is no negative feedback within next few days, then I plan to post the >> patch to current@ to solicit some testing. I will definitely wait for positive >> feedback before committing this change. I hope that I will be able to sneak it >> into the 9 release (unless there are objections to this). >> >> Thank you! >> -- >> Andriy Gapon >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >> -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Thu Jun 23 12:51:59 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 96EEB106566C; Thu, 23 Jun 2011 12:51:59 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6E0E48FC19; Thu, 23 Jun 2011 12:51:59 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 0E73946B06; Thu, 23 Jun 2011 08:51:59 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 717C38A027; Thu, 23 Jun 2011 08:51:58 -0400 (EDT) From: John Baldwin To: Andriy Gapon Date: Thu, 23 Jun 2011 08:51:57 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E0217A3.7020802@FreeBSD.org> In-Reply-To: <4E0217A3.7020802@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201106230851.57885.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Thu, 23 Jun 2011 08:51:58 -0400 (EDT) Cc: arch@freebsd.org Subject: Re: stop_cpus*() interface X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 12:51:59 -0000 On Wednesday, June 22, 2011 12:26:11 pm Andriy Gapon wrote: > > I would like to propose to narrow stop_cpus*() interface: > > 1. Remove cpu mask/set parameter. Rationale for this is presented below in a > forwarded message from a private discussion. You may also see that currently > stop_cpus*() functions are always called with either (1) other_cpus mask or (2) > other_cpus & ~stopped_cpus mask, where (2) is really equivalent to (1) because of (1). > > 2. Change return type to void. Currently return value of stop_cpus*() is never > handled and it can not be really handled meaningfully. Simple boolean or errno > return value can not convey which target CPUs were already stopped and which > failed to become stopped and why. I think that it's better to assume that > stop_cpus*() should never fail and add necessary diagnostics to catch cases where > it does fail. > > The below forwarded message provides my thoughts on CPU stopping semantics and > additionally presents my analysis of CPU stopping code in OpenSolaris. > > -------- Original Message -------- > on 12/05/2011 21:17 Andriy Gapon said the following: > > cpu_hard_stop does stop other CPUs in a hard way. At least on some archs it is > > really so, e.g. x86 NMI. This means that stopped CPUs, rather threads that were > > running on them, can be stopped in any kinds of contexts with any kinds of locks > > held, including spinlocks. Given that fact, it is really unsafe to continue > > using any locks after even one CPU is hard-stopped. So any remaining running > > CPUs should be put into a special non-locking mode. This is the reason that we > > invent things like THREAD_PANICED() and use polling mode in kdb context, etc. > > But having more than one CPU, in fact even more than one thread, running in > > non-locking mode is unsafe again - if those CPUs continue execution without any > > synchronization, then they would corrupt shared data. > > Thus, I argue that hard stopping should leave only one CPU and thread running. > > Some more thoughts. > > I think that the above reasoning does even apply to the current soft stopping to > a certain degree. Soft stopping would not leave any spinlocks held, true, but > it can still leave other kinds of locks held, e.g. regular mutexes, sx locks. > And that also produces a very special environment in the end. > So in my opinion current soft stopping should also always stop all other CPUs. > > I think that eventually we will need "really soft" graceful stopping mechanism. > That mechanism would rebind all interrupts away from a CPU being stopped, would > migrate all (non-special) threads away from the CPU, would instruct scheduler to > not run any threads on the CPU, would remove it from any active CPU sets, etc. > Now, this mechanism should really be of a targeted variety, no doubt. > > > I also would like to share some of my observations of OpenSolaris code. > This is not to try to give any support to my proposals - after all we are not > Solaris, but FreeBSD - but simply to share some ideas. > > In OpenSolaris I've noticed three separate CPU stopping mechanisms so far. I am > sure that they have more :-) > > 1. Stopping by debugger. This is very similar to our hard stopping (in their > x86 code[*]). All other CPUs are always stopped. One difference is that the > stopped CPUs run a special command loop while spinning. The master CPU can send > a few commands to the slave CPUs. Examples: the master can tell a slave, if > it's a BSP, to reset a system; the master can tell a slave to become a new > master (I think that this is somewhat equivalent to "thread N" command in gdb). > All commands: > #define KMDB_DPI_CMD_RESUME_ALL 1 /* Resume all CPUs */ > #define KMDB_DPI_CMD_RESUME_MASTER 2 /* Resume only master CPU */ > #define KMDB_DPI_CMD_RESUME_UNLOAD 3 /* Resume for debugger unload */ > #define KMDB_DPI_CMD_SWITCH_CPU 4 /* Switch to another CPU */ > #define KMDB_DPI_CMD_FLUSH_CACHES 5 /* Flush slave caches */ > #define KMDB_DPI_CMD_REBOOT 6 /* Reboot the machine */ > > > 2. Stopping for panic. This is very similar to our hard stopping (in their x86 > code[*]). All other CPUs are always stopped. But this is done via different > code than what debugger does, I am not sure why, maybe some historic legacy. > The difference from our code and the debugger code is the stopped CPUs run a > different stop loop and may do some useful panic work. E.g. my understanding is > that they can be used for compressing a dump image (yes, they compress their dumps > for disk writing speed I guess). > > 3. Something remotely similar to our current soft stopping. Big difference is > that they have special "pause" threads per cpu. This mechanism activates those > threads, the threads make themselves non-preemptable, disable interrupts and > block on some sort of a semaphore until they are told to resume. Not sure what > advantage, if any, this mechanism gives them comparing to our approach. > The mechanism is invoked via pause_cpus() call. It is used mainly to change > state of CPUs (some per-CPU data), like e.g. configuring idle hooks, power > management. > > [!] BTW, they also use this mechanism when onlining/offlining CPUs to avoid > locking in normal paths. That is, for instance, they stop/pause all CPUs, mark > a target CPU as offline, and then restart all CPUs. This way they don't need > any locking when checking (and changing) CPU status. Of course, they also do > all the reasonable things to do - unbinding interrupts, moving away treads, etc. > The mechanism is also used for their checkpoint-resume code (which is used by > suspend/resume) and in their shutdown/reboot path. > This CPU stopping mechanism also always stops all other CPUs. > > > [*] Another difference to note is that they don't use NMI for their equivalents > of our hard stopping. They still have the notion of interrupt levels and > various spl* stuff. So they just have a normal interrupt with highest priority > to penetrate protected contexts. E.g. in their equivalent of spinlock_enter() > they do not outright disable interrupts, but set current level to a special > 'LOCK' level which inhibits all typical (hardware and IPI) interrupts. This > mechanism adds another degree of freedom to their implementation, as such it > complicates code and logic, but also adds some flexibility. > > I hope that there is something useful for you and FreeBSD in this lengthy overview. I really like the OpenSolaris model. You could perhaps merge 1) and 2) it sounds like. The pause thread idea for handling online/offline is quite nice. On x86 you could have IPI_STOP be non-NMI if we adjusted the TPR (%cr8 on amd64) instead of using cli/sti for spinlock_enter/exit. However, older i386 CPUs do not support this, so I think this is only practical on amd64 if we were to go that route. OTOH, I think using an NMI is actually fine (though we need to do a better job of providing a way to register NMI handlers instead of the various hacks we currently have). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Thu Jun 23 12:55:01 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 51BC5106566C; Thu, 23 Jun 2011 12:55:01 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 264088FC08; Thu, 23 Jun 2011 12:55:01 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id D3A0E46B38; Thu, 23 Jun 2011 08:55:00 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 683C78A027; Thu, 23 Jun 2011 08:55:00 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Date: Thu, 23 Jun 2011 08:54:59 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E0213A9.5050303@FreeBSD.org> In-Reply-To: <4E0213A9.5050303@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201106230854.59823.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Thu, 23 Jun 2011 08:55:00 -0400 (EDT) Cc: Andriy Gapon Subject: Re: stop scheduler in panic context X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 12:55:01 -0000 On Wednesday, June 22, 2011 12:09:13 pm Andriy Gapon wrote: > > I would like to present the following diff for review and discussion: > http://people.freebsd.org/~avg/stop_scheduler_on_panic.diff If it makes your life easier, go ahead and kill the RESTARTABLE_PANICS option (perhaps do that as a separate commit first?). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Thu Jun 23 13:06:14 2011 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2DE50106566C; Thu, 23 Jun 2011 13:06:14 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4EF828FC08; Thu, 23 Jun 2011 13:06:13 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA29067; Thu, 23 Jun 2011 16:06:11 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E033A43.6090807@FreeBSD.org> Date: Thu, 23 Jun 2011 16:06:11 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110504 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: John Baldwin References: <4E0213A9.5050303@FreeBSD.org> <201106230854.59823.jhb@freebsd.org> In-Reply-To: <201106230854.59823.jhb@freebsd.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-arch@FreeBSD.org Subject: Re: stop scheduler in panic context X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 13:06:14 -0000 on 23/06/2011 15:54 John Baldwin said the following: > On Wednesday, June 22, 2011 12:09:13 pm Andriy Gapon wrote: >> >> I would like to present the following diff for review and discussion: >> http://people.freebsd.org/~avg/stop_scheduler_on_panic.diff > > If it makes your life easier, go ahead and kill the RESTARTABLE_PANICS > option (perhaps do that as a separate commit first?). I don't see much of an issue with this option except for a few extra lines of code. I would be happier about a similar offer on sync_on_panic :-) I think that sync wouldn't really work with scheduler stopped. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Thu Jun 23 14:10:39 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B39DF106566C for ; Thu, 23 Jun 2011 14:10:39 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id 803EC8FC0A for ; Thu, 23 Jun 2011 14:10:39 +0000 (UTC) Received: from [192.168.135.103] (c-24-7-47-62.hsd1.ca.comcast.net [24.7.47.62]) (authenticated bits=0) by ns1.feral.com (8.14.4/8.14.4) with ESMTP id p5NDbPJS020506 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 23 Jun 2011 06:37:26 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4E034192.2030908@feral.com> Date: Thu, 23 Jun 2011 06:37:22 -0700 From: Matthew Jacob Organization: Feral Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11 MIME-Version: 1.0 To: freebsd-arch@freebsd.org References: <4E0213A9.5050303@FreeBSD.org> <201106230854.59823.jhb@freebsd.org> In-Reply-To: <201106230854.59823.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (ns1.feral.com [192.67.166.1]); Thu, 23 Jun 2011 06:37:27 -0700 (PDT) X-Mailman-Approved-At: Thu, 23 Jun 2011 15:20:00 +0000 Subject: Re: stop scheduler in panic context X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 14:10:39 -0000 On 6/23/2011 5:54 AM, John Baldwin wrote: > On Wednesday, June 22, 2011 12:09:13 pm Andriy Gapon wrote: >> I would like to present the following diff for review and discussion: >> http://people.freebsd.org/~avg/stop_scheduler_on_panic.diff > If it makes your life easier, go ahead and kill the RESTARTABLE_PANICS > option (perhaps do that as a separate commit first?). > Please do. From owner-freebsd-arch@FreeBSD.ORG Fri Jun 24 02:47:41 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 814BB106564A for ; Fri, 24 Jun 2011 02:47:41 +0000 (UTC) (envelope-from peter.jeremy@alcatel-lucent.com) Received: from ihemail3.lucent.com (ihemail3.lucent.com [135.245.0.37]) by mx1.freebsd.org (Postfix) with ESMTP id 46C418FC08 for ; Fri, 24 Jun 2011 02:47:40 +0000 (UTC) Received: from usnavsmail1.ndc.alcatel-lucent.com (usnavsmail1.ndc.alcatel-lucent.com [135.3.39.9]) by ihemail3.lucent.com (8.13.8/IER-o) with ESMTP id p5O2UrAK016585 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Thu, 23 Jun 2011 21:30:53 -0500 (CDT) Received: from unixmail.au.alcatel-lucent.com (unixmail.au.alcatel-lucent.com [139.188.42.130]) by usnavsmail1.ndc.alcatel-lucent.com (8.14.3/8.14.3/GMO) with ESMTP id p5O2UmDs032310 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Thu, 23 Jun 2011 21:30:52 -0500 Received: from insmb.au.alcatel-lucent.com (insmb.au.alcatel-lucent.com [139.188.42.184]) by unixmail.au.alcatel-lucent.com (8.13.8+Sun/8.13.3) with ESMTP id p5O2UmIl010661 for ; Fri, 24 Jun 2011 12:30:48 +1000 (EST) Received: from pjdesk.au.alcatel-lucent.com (pjdesk.au.alcatel-lucent.com [139.188.2.2]) by insmb.au.alcatel-lucent.com (8.13.8+Sun/8.13.8) with ESMTP id p5O2Fi0S010683; Fri, 24 Jun 2011 12:15:44 +1000 (EST) X-Bogosity: Ham, spamicity=0.000000 Received: from pjdesk.au.alcatel-lucent.com (localhost [127.0.0.1]) by pjdesk.au.alcatel-lucent.com (8.14.4/8.14.4) with ESMTP id p5O2FcFU003690; Fri, 24 Jun 2011 12:15:38 +1000 (EST) (envelope-from peter.jeremy@alcatel-lucent.com) Received: (from pjeremy@localhost) by pjdesk.au.alcatel-lucent.com (8.14.4/8.14.4/Submit) id p5O2Fc04003689; Fri, 24 Jun 2011 12:15:38 +1000 (EST) (envelope-from peter.jeremy@alcatel-lucent.com) Date: Fri, 24 Jun 2011 12:15:37 +1000 From: Peter Jeremy To: freebsd-arch@freebsd.org Message-ID: <20110624021537.GD10304@pjdesk.au.alcatel-lucent.com> References: <20110623053051.GL65891@pjdesk.au.alcatel-lucent.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="sgneBHv3152wZ8jf" Content-Disposition: inline In-Reply-To: <20110623053051.GL65891@pjdesk.au.alcatel-lucent.com> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.57 on 135.245.2.37 X-Scanned-By: MIMEDefang 2.64 on 135.3.39.9 Subject: Context Switching Oddities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 02:47:41 -0000 --sgneBHv3152wZ8jf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I originally asked a variant of this in -sparc64 since the oddities initially seemed to be related to sparc64 but it was suggested that this might be a more appropriate list. I've also added more results. I have a tool that measures the rate at which a single-byte token can be passed between two processes via a socketpair - basically counting while (1) { read(fd[0], buf, 1); write(fd[1], "a", 1); } I was initially running it multiple copies of it on an otherwise idle V890 (16-CPU, 64GB RAM running -current from about a week ago), capturing 'vmstat -s' output. In the process, I have found several oddities. That prompted me to additionally test it on a V440 (4-CPU, 8GB RAM, same world as the V890), a SB1500 (1 CPU, 4GB RAM, -current =66rom a few days ago) and an Athlon-64 (dual-core, 8GB RAM, -stable amd64 from a few months ago). The SPARC systems are all using 4BSD and the amd64 is using ULE. I don't have access to any large x86 boxes to cross-check. 1) The number of context switches doesn't match my expectations. See http://i.imgur.com/28OHu.jpg (1 CPU) http://i.imgur.com/6YRh8.jpg (2 core) http://i.imgur.com/r0v7M.jpg (4 CPU) http://i.imgur.com/hkCA2.jpg (16 CPU) http://i.imgur.com/9Tt9Q.gif (combined) Based on one process writing a token to a second process requiring one context switch, I would expect the number of context switches to roughly match the green (based on token passing rate) or blue (based on syscall rate) lines. Instead, it's generally far too low, though the 4- and 16-CPU graphs start out unexpectedly high. 2) The transfer rate doesn't gradually tail off See http://i.imgur.com/YoDQ5.jpg (1 CPU) http://i.imgur.com/0Wl1Y.jpg (2 core) http://i.imgur.com/699zr.jpg (4 CPU) http://i.imgur.com/0ujRN.jpg (16 CPU) http://i.imgur.com/omxG1.gif (combined & scaled) I would expect a fairly flat peak from 1 to about n-CPU pairs (since there are that many execution threads available) that then tailed off as scheduler overheads increased. Instead the 4- & 16-CPU tests show a dip initially then rising to a peak before tailing off. Each graph shows both the rate reported by the program and the rate estimated =66rom the syscall rate. Can anyone offer an explanation for this behaviour? --=20 Peter Jeremy --sgneBHv3152wZ8jf Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) iEYEARECAAYFAk4D80kACgkQ/opHv/APuIc38wCfYhu4ewvyV4EETy00ObKPH7iV 2+EAnRi99cQ0oDhkBDE/rWrnJZCPwRgl =rZo+ -----END PGP SIGNATURE----- --sgneBHv3152wZ8jf-- From owner-freebsd-arch@FreeBSD.ORG Fri Jun 24 11:59:30 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A1E2B10656A3 for ; Fri, 24 Jun 2011 11:59:30 +0000 (UTC) (envelope-from elektronika5sale@wp.pl) Received: from mail01.home.net.pl (mail01.home.net.pl [62.129.252.11]) by mx1.freebsd.org (Postfix) with SMTP id 15A808FC1E for ; Fri, 24 Jun 2011 11:59:29 +0000 (UTC) Received: from abxj100.neoplus.adsl.tpnet.pl [83.9.3.100] (HELO aehj98.neoplus.adsl.tpnet.pl) by internetmail.home.pl [212.85.96.60] with SMTP (IdeaSmtpServer v0.70) id 3eb45c5cd5be2831; Fri, 24 Jun 2011 13:32:49 +0200 From: "Car Lab IMMO" To: "freebsd-arch" MIME-Version: 1.0 Organization: Car Lab IMMO Date: Fri, 24 Jun 2011 13:32:49 +0200 Message-Id: <20110624115930.A1E2B10656A3@hub.freebsd.org> Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: BYPASS + SIMULATOR OF IMMOBILIZERS AND SEAT OCCUPANT DETECTOR - COMPANY OFFER X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 11:59:30 -0000 =20 NEW, BETTER, WITH ADDITIONAL FUNCTIONS: BYPASS - Audi, Seat, Skoda, VW + simulator of immobilizers and seat occupant detector 39 programs ! It turns on or turns off immobilizer through the diagnostic plug OBD a= nd it allows emergency start after connect pins of instrument cluster = or immo, it also additionally works as a simulator of immobilizers and= seat occupant detector. This device is unique, it works without quantitative restrictions (you= can use it repeatedly in many cars), it is produced in the European U= nion by company CarLabImmo. PACKAGE CONTAINS: 1. BYPASS + simulator of immobilizers and seat occupant detector , 2. leather etui, 3. CD with instructions =2E.. for only 525 EURO !!! Warning! Since this version of the device, any updates FREE! (client only covers shipping costs both ways) Please visit our online store: www.elektronika.renado.pl Regards, Electronic Services Iwona Piotrowska st. Szosowa 2c 74-320 Barlinek=20 mobile 0048 691 406 958 If you are not interested in innovations offered by our company, pleas= e disregard this message. We sincerely apologize people not interested= in the news, for your time and place in your inbox.