From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 10 11:06:58 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C03CD106567E
	for <freebsd-arch@hub.freebsd.org>;
	Mon, 10 Mar 2008 11:06:58 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id AB8F78FC20
	for <freebsd-arch@hub.freebsd.org>;
	Mon, 10 Mar 2008 11:06:58 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m2AB6wVC086485
	for <freebsd-arch@FreeBSD.org>; Mon, 10 Mar 2008 11:06:58 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m2AB6v5E086481
	for freebsd-arch@FreeBSD.org; Mon, 10 Mar 2008 11:06:57 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 10 Mar 2008 11:06:57 GMT
Message-Id: <200803101106.m2AB6v5E086481@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Mar 2008 11:06:58 -0000

Current FreeBSD problem reports
Critical problems
Serious problems
Non-critical problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/120749  arch       [request] Suggest upping the default kern.ps_arg_cache

1 problem total.


From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 10 11:36:31 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 63AED1065677
	for <arch@FreeBSD.org>; Mon, 10 Mar 2008 11:36:31 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 53E718FC17
	for <arch@FreeBSD.org>; Mon, 10 Mar 2008 11:36:31 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id B6ECE46B1E;
	Mon, 10 Mar 2008 06:36:30 -0500 (EST)
Date: Mon, 10 Mar 2008 12:36:30 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: arch@FreeBSD.org
Message-ID: <20080310122338.T29929@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: net@FreeBSD.org
Subject: netatm removal warning
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Mar 2008 11:36:31 -0000


Dear all,

This is another of those boring e-mails about kernel subsystems that still 
require Giant.  Sorry about that!

As previously published, netatm is a non-MPSAFE protocol stack largely 
superseded by our two other ATM stacks, netnatm and the netgraph/atm (both 
MPSAFE).  netatm is currently non-functional and uncompileable because it 
depends on the Giant compatibility shims for the protocol stack, which were 
removed in FreeBSD 7.0.  We left the code in place in case to make it easier 
for any interested third parties to distribute patches against it (in 
particular, patches to make it MPSAFE).

The current plan is that we will remove the netatm code from HEAD and RELENG_7 
before FreeBSD 7.1.  A specific schedue for 7.1 hasn't been published yet, but 
in order to give plenty of warning, here's the proposed netatm removal 
schedule:

10 March 2008			E-mail warning to arch@/net@
10 April 2008			E-mail warning to arch@/net@
10 May 2008			Removal of netatm from HEAD
20 May 2008			Removal of netatm from RELENG_7

Obviously, netatm will remain in the revision control history should anyone 
wish to ressurect it after that date.  However, I suspect that those 
interested in ATM on FreeBSD have long since been using Harti's netgraph ATM 
framework.

Robert N M Watson
Computer Laboratory
University of Cambridge

From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 10 13:34:35 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4ED37106566B
	for <freebsd-arch@FreeBSD.org>; Mon, 10 Mar 2008 13:34:35 +0000 (UTC)
	(envelope-from skalla.raabjorn@gmx.de)
Received: from mail.gmx.net (mail.gmx.net [213.165.64.20])
	by mx1.freebsd.org (Postfix) with SMTP id F33DF8FC12
	for <freebsd-arch@FreeBSD.org>; Mon, 10 Mar 2008 13:34:34 +0000 (UTC)
	(envelope-from skalla.raabjorn@gmx.de)
Received: (qmail invoked by alias); 10 Mar 2008 13:07:53 -0000
Received: from g227178023.adsl.alicedsl.de (EHLO sol.hackerzberg.local)
	[92.227.178.23]
	by mail.gmx.net (mp055) with SMTP; 10 Mar 2008 14:07:53 +0100
X-Authenticated: #8038066
X-Provags-ID: V01U2FsdGVkX1965s4uXBV6PqIdRlIIJisGpBpO+rWaC1ALF1deH1
	pjor9ICct/9qyr
Date: Mon, 10 Mar 2008 14:07:53 +0100
From: Skalla Raabjorn <skalla.raabjorn@gmx.de>
To: freebsd-arch@FreeBSD.org
Message-ID: <20080310140753.24630bda@sol.hackerzberg.local>
X-Mailer: Claws Mail 3.3.1 (GTK+ 2.12.8; i386-portbld-freebsd7.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
Cc: 
Subject: If GIANT is locked can the MPSAFE parts run in parallel?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Mar 2008 13:34:35 -0000

Hi all,

if GIANT is locked can the MPSAFE parts run in parallel?
Like networking for example, as they have their own locks.

regards
Skalla

From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 10 13:45:37 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1F6BC1065677
	for <freebsd-arch@FreeBSD.org>; Mon, 10 Mar 2008 13:45:37 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 0D4D18FC16
	for <freebsd-arch@FreeBSD.org>; Mon, 10 Mar 2008 13:45:37 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 983FA46B0D;
	Mon, 10 Mar 2008 08:45:36 -0500 (EST)
Date: Mon, 10 Mar 2008 14:45:36 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Skalla Raabjorn <skalla.raabjorn@gmx.de>
In-Reply-To: <20080310140753.24630bda@sol.hackerzberg.local>
Message-ID: <20080310143919.V50827@fledge.watson.org>
References: <20080310140753.24630bda@sol.hackerzberg.local>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@FreeBSD.org
Subject: Re: If GIANT is locked can the MPSAFE parts run in parallel?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Mar 2008 13:45:37 -0000


On Mon, 10 Mar 2008, Skalla Raabjorn wrote:

> if GIANT is locked can the MPSAFE parts run in parallel? Like networking for 
> example, as they have their own locks.

Dear Skalla,

Yes.  Giant is [almost] a mutex like any other mutex, so as long as the MPSAFE 
subsystem isn't being invoked by something holding Giant, it generally won't 
run with it.  Even if the network stack is sometimes executed with Giant held 
(for example, when receiving a packet from SLIP), that doesn't prevent the 
network stack from executing in parallel on other CPUs, it just serializes 
with respect to other Giant holders executing.

Robert N M Watson
Computer Laboratory
University of Cambridge

From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 10 14:18:52 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 71E601065674
	for <freebsd-arch@freebsd.org>; Mon, 10 Mar 2008 14:18:52 +0000 (UTC)
	(envelope-from skalla.raabjorn@gmx.de)
Received: from mail.gmx.net (mail.gmx.net [213.165.64.20])
	by mx1.freebsd.org (Postfix) with SMTP id CF1AC8FC28
	for <freebsd-arch@freebsd.org>; Mon, 10 Mar 2008 14:18:51 +0000 (UTC)
	(envelope-from skalla.raabjorn@gmx.de)
Received: (qmail invoked by alias); 10 Mar 2008 14:18:50 -0000
Received: from g227178023.adsl.alicedsl.de (EHLO sol.hackerzberg.local)
	[92.227.178.23]
	by mail.gmx.net (mp010) with SMTP; 10 Mar 2008 15:18:50 +0100
X-Authenticated: #8038066
X-Provags-ID: V01U2FsdGVkX19ZjaZYMRNZ6doAILG23VidGSARLxIGYpEpSTThAZ
	uoh0wDPrGHyUfe
Date: Mon, 10 Mar 2008 15:18:50 +0100
From: Skalla Raabjorn <skalla.raabjorn@gmx.de>
To: freebsd-arch@freebsd.org
Message-ID: <20080310151850.6d8451ff@sol.hackerzberg.local>
In-Reply-To: <20080310143919.V50827@fledge.watson.org>
References: <20080310140753.24630bda@sol.hackerzberg.local>
	<20080310143919.V50827@fledge.watson.org>
X-Mailer: Claws Mail 3.3.1 (GTK+ 2.12.8; i386-portbld-freebsd7.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
Subject: Re: If GIANT is locked can the MPSAFE parts run in parallel?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Mar 2008 14:18:52 -0000

On Mon, 10 Mar 2008 14:45:36 +0100 (BST)
Robert Watson <rwatson@FreeBSD.org> wrote:

> 
> On Mon, 10 Mar 2008, Skalla Raabjorn wrote:
> 
> > if GIANT is locked can the MPSAFE parts run in parallel? Like networking for 
> > example, as they have their own locks.
> 
> Dear Skalla,
> 
> Yes.  Giant is [almost] a mutex like any other mutex, so as long as the MPSAFE 
> subsystem isn't being invoked by something holding Giant, it generally won't 
> run with it.  Even if the network stack is sometimes executed with Giant held 
> (for example, when receiving a packet from SLIP), that doesn't prevent the 
> network stack from executing in parallel on other CPUs, it just serializes 
> with respect to other Giant holders executing.

Thanks, that's all I wanted to know :)

From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 10 17:50:47 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4A4A21065671
	for <freebsd-arch@freebsd.org>; Mon, 10 Mar 2008 17:50:47 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219])
	by mx1.freebsd.org (Postfix) with ESMTP id E1F9E8FC31
	for <freebsd-arch@freebsd.org>; Mon, 10 Mar 2008 17:50:46 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (unverified [66.23.211.162]) 
	by speedfactory.net (SurgeMail 3.8s) with ESMTP id 234968788-1834499 
	for multiple; Mon, 10 Mar 2008 13:51:55 -0400
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2AHoCCn087969;
	Mon, 10 Mar 2008 13:50:12 -0400 (EDT) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: Jeff Roberson <jroberson@chesapeake.net>
Date: Mon, 10 Mar 2008 13:13:03 -0400
User-Agent: KMail/1.9.7
References: <20080307020626.G920@desktop> <20080307124038.I920@desktop>
	<20080307234452.U1091@desktop>
In-Reply-To: <20080307234452.U1091@desktop>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803101313.03526.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Mon, 10 Mar 2008 13:50:13 -0400 (EDT)
X-Virus-Scanned: ClamAV 0.91.2/6192/Mon Mar 10 10:54:00 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: freebsd-arch@freebsd.org
Subject: Re: Getting rid of the static msleep priority boost
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Mar 2008 17:50:47 -0000

On Saturday 08 March 2008 04:46:32 am Jeff Roberson wrote:
> On Fri, 7 Mar 2008, Jeff Roberson wrote:
> 
> > On Fri, 7 Mar 2008, John Baldwin wrote:
> >
> >> On Friday 07 March 2008 08:42:37 am John Baldwin wrote:
> >>> On Friday 07 March 2008 07:16:30 am Jeff Roberson wrote:
> >>>> Hello,
> >>>> 
> >>>> I've been studying some problems with recent scheduler improvements 
that
> >>>> help a lot on some workloads and hurt on others.  I've tracked the
> >>>> problem down to static priority boosts handed out by
> >>>> msleep/cv_broadcastpri.  The basic problem is that a user thread will 
be
> >>>> woken with a kernel priority thus allowing it to preempt a thread 
running
> >>>> on any processor with a lesser priority.  The lesser priority thread 
may
> >>>> in fact hold some resource that the higher priority thread requires.
> >>>> Thus we context switch several times and perhaps go through priority
> >>>> propagation as well.
> >>>> 
> >>>> I have verified that disabling these static priority boosts entirely
> >>>> fixes the performance problem I've run into on at least one workload.
> >>>> There are probably others that it helps and hopefully we can discover
> >>>> that.
> >>>> 
> >>>> I'd like to know if anyone has a strong preference to keep this 
feature.
> >>>> It is likely that it helps in some interactive situations.  I'm not 
sure
> >>>> how much however.  I propose that we make a sysctl that disables it and
> >>>> turn it off by default.  If we see complaints on current@ we can 
suggest
> >>>> that they toggle the sysctl to see if it alleviates problems.
> >>>> 
> >>>> Based on feedback from that experiment and some testing we can then
> >>>> choose a few options:
> >>>> 
> >>>> 1)  Disable the static boosts entirely.  Leave kernel priorities for
> >>>> kernel threads and priority propagation.  Most other kernels do this.
> >>>> Would make my life in ULE much easier as well.
> >>>> 
> >>>> 2)  Leave the support for static boosts but remove it from all but a 
few
> >>>> key locations.  Leaving it in the api would give some flexibility but
> >>>> might confuse developers.
> >>>> 
> >>>> 3)  Leave things as they are.  undesirable.
> >>>> 
> >>>> I'm leaning towards #2 based on the information I have presently.  This
> >>>> is almost a significant change to historic BSD behavior so we might 
want
> >>>> to tread lightly.
> >>> 
> >>> One thing to note is that we actually depend on the priority boost 
> >>> (evilly)
> >>> to pick processes to swap out.  (I think we check for <= PSOCK and don't
> >>> swap those out).  One thing that I've wanted to happen for a while is 
that
> >>> the sleep priority for msleep() just be a parameter available to the
> >>> scheduler that the scheduler can use to calculate the real internal
> >>> priority rather than just being a set.  That is, I imagine having:
> >>> 
> >>> void	sched_set_sleep_prio(struct thread *td, u_char pri);
> >>> u_char	sched_get_sleep_prio(struct thread *td);
> >>> 
> >>> (The swap check would use the get call).  The 4BSD scheduler's
> >>> implementation of sched_set_sleep_prio would look like this:
> >>> 
> >>> void
> >>> sched_set_sleep_prio(struct thread *td, u_char pri)
> >>> {
> >>>
> >>> 	td->td_sched->sleep_pri = pri;
> >>> 	sched_prio(td, pri);
> >>> }
> >>> 
> >>> void
> >>> sched_userret(..)
> >>> {
> >>>
> >>> 	...
> >>> 	td->td_sched->sleep_pri = 0;	/* not in the kernel anymore */
> >>> }
> >>> 
> >>> but other schedulers may just save it and recalculate the priority where
> >>> the priority calculation just considers the sleep priority as one among
> >>> many factors.  If nothing else, this allows it to be a scheduler 
decision
> >>> to ignore it (so 4BSD could continue to do what it does now, but ULE may
> >>> ignore it, or ignore certain levels, etc.)
> >> 
> >> One thing to clarify: I'm not opposed to replacing the PSOCK check with
> >> something more suitable in the swap code, (in fact, that would be 
> >> desirable),
> >> but it might take a good bit of work to do that and is probably easier to
> >> work on that as a separate change.  I also think there can be some merit 
in
> >> having code paths hint to the scheduler the relative 
interactivity/priority
> >> of a sleep.
> >
> > Couple of notes..
> >
> > The priority argument to sleep is a reasonable way for the code to hint at 
> > the relative priority/interactivity.  So that argues for leaving these 
> > arguments in place and making them more advisory.  I don't think we have 
to 
> > change the api to take advantage of that.
> >
> > I'll look more closely for places like the swap that care about the 
absolute 
> > priority of a process and see what I can come up with.  Thanks for raising 
> > that concern.
> >
> > I'd like to avoid apis that require the sched lock in seperate steps like 
> > msleep does now to elevate the priority.  So far all sched* apis require 
the 
> > thread lock on enter and I'd hate to deviate from that norm.  But another 
> > option may be just to make a globally visible td_sleep_pri that doesn't 
> > require the lock for write but does for read.  The other option is to 
bubble 
> > the argument down through the sleepq code and into sched_sleep() and 
> > sched_wakeup().  I like that the best but it's the most api churn.
> 
> http://people.freebsd.org/~jeff/sleeppri.diff
> 
> What do you think of this?  I added another parameter to sleepq_add() and 
> sched_sleep().  So the scheduler is responsible for adjusting the 
> priority.  We could do the same thing for wakeup time adjustments like 
> sleepq_broadcastpri() but we'd have to pass it through setrunnable() as 
> well.

The cv_broadcastpri() thing is a hack and I wish there was a better way to do 
it.  I.e., I don't like having wakeup setting the priority at all.  I think 
it's a good idea to pass this to sched_sleep(), but I'd rather leave 
sched_sleep() where it is and pass the prio arg to the sleepq_wait() routines 
instead so you don't get a bump unless you actually sleep.  I think it's 
probably a bug that we bump the prio on threads that may not sleep now.

> I'd like to normalize the other pri arguments in sleepq to use the same 0 
> is not set vs -1 that msleep did.  I realize that 0 is a valid priority 
> but for practical purposes this makes things consistent and does not 
> really restrict the api.

Sounds fine to me.  I think we should even formally make 0 an invalid priority 
(via a comment or something).

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 10 22:22:09 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6771610656D0;
	Mon, 10 Mar 2008 22:22:09 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 0D1578FC26;
	Mon, 10 Mar 2008 22:22:08 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com
	[24.94.75.93]) (authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	m2AMM324089149; Mon, 10 Mar 2008 18:22:06 -0400 (EDT)
	(envelope-from jroberson@chesapeake.net)
Date: Mon, 10 Mar 2008 12:22:54 -1000 (HST)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@desktop
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <200803101313.03526.jhb@freebsd.org>
Message-ID: <20080310121527.F1091@desktop>
References: <20080307020626.G920@desktop> <20080307124038.I920@desktop>
	<20080307234452.U1091@desktop> <200803101313.03526.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: Getting rid of the static msleep priority boost
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Mar 2008 22:22:09 -0000

On Mon, 10 Mar 2008, John Baldwin wrote:

> On Saturday 08 March 2008 04:46:32 am Jeff Roberson wrote:
>> On Fri, 7 Mar 2008, Jeff Roberson wrote:
>>
>>> On Fri, 7 Mar 2008, John Baldwin wrote:
>>>
>>>> On Friday 07 March 2008 08:42:37 am John Baldwin wrote:
>>>>> On Friday 07 March 2008 07:16:30 am Jeff Roberson wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I've been studying some problems with recent scheduler improvements
> that
>>>>>> help a lot on some workloads and hurt on others.  I've tracked the
>>>>>> problem down to static priority boosts handed out by
>>>>>> msleep/cv_broadcastpri.  The basic problem is that a user thread will
> be
>>>>>> woken with a kernel priority thus allowing it to preempt a thread
> running
>>>>>> on any processor with a lesser priority.  The lesser priority thread
> may
>>>>>> in fact hold some resource that the higher priority thread requires.
>>>>>> Thus we context switch several times and perhaps go through priority
>>>>>> propagation as well.
>>>>>>
>>>>>> I have verified that disabling these static priority boosts entirely
>>>>>> fixes the performance problem I've run into on at least one workload.
>>>>>> There are probably others that it helps and hopefully we can discover
>>>>>> that.
>>>>>>
>>>>>> I'd like to know if anyone has a strong preference to keep this
> feature.
>>>>>> It is likely that it helps in some interactive situations.  I'm not
> sure
>>>>>> how much however.  I propose that we make a sysctl that disables it and
>>>>>> turn it off by default.  If we see complaints on current@ we can
> suggest
>>>>>> that they toggle the sysctl to see if it alleviates problems.
>>>>>>
>>>>>> Based on feedback from that experiment and some testing we can then
>>>>>> choose a few options:
>>>>>>
>>>>>> 1)  Disable the static boosts entirely.  Leave kernel priorities for
>>>>>> kernel threads and priority propagation.  Most other kernels do this.
>>>>>> Would make my life in ULE much easier as well.
>>>>>>
>>>>>> 2)  Leave the support for static boosts but remove it from all but a
> few
>>>>>> key locations.  Leaving it in the api would give some flexibility but
>>>>>> might confuse developers.
>>>>>>
>>>>>> 3)  Leave things as they are.  undesirable.
>>>>>>
>>>>>> I'm leaning towards #2 based on the information I have presently.  This
>>>>>> is almost a significant change to historic BSD behavior so we might
> want
>>>>>> to tread lightly.
>>>>>
>>>>> One thing to note is that we actually depend on the priority boost
>>>>> (evilly)
>>>>> to pick processes to swap out.  (I think we check for <= PSOCK and don't
>>>>> swap those out).  One thing that I've wanted to happen for a while is
> that
>>>>> the sleep priority for msleep() just be a parameter available to the
>>>>> scheduler that the scheduler can use to calculate the real internal
>>>>> priority rather than just being a set.  That is, I imagine having:
>>>>>
>>>>> void	sched_set_sleep_prio(struct thread *td, u_char pri);
>>>>> u_char	sched_get_sleep_prio(struct thread *td);
>>>>>
>>>>> (The swap check would use the get call).  The 4BSD scheduler's
>>>>> implementation of sched_set_sleep_prio would look like this:
>>>>>
>>>>> void
>>>>> sched_set_sleep_prio(struct thread *td, u_char pri)
>>>>> {
>>>>>
>>>>> 	td->td_sched->sleep_pri = pri;
>>>>> 	sched_prio(td, pri);
>>>>> }
>>>>>
>>>>> void
>>>>> sched_userret(..)
>>>>> {
>>>>>
>>>>> 	...
>>>>> 	td->td_sched->sleep_pri = 0;	/* not in the kernel anymore */
>>>>> }
>>>>>
>>>>> but other schedulers may just save it and recalculate the priority where
>>>>> the priority calculation just considers the sleep priority as one among
>>>>> many factors.  If nothing else, this allows it to be a scheduler
> decision
>>>>> to ignore it (so 4BSD could continue to do what it does now, but ULE may
>>>>> ignore it, or ignore certain levels, etc.)
>>>>
>>>> One thing to clarify: I'm not opposed to replacing the PSOCK check with
>>>> something more suitable in the swap code, (in fact, that would be
>>>> desirable),
>>>> but it might take a good bit of work to do that and is probably easier to
>>>> work on that as a separate change.  I also think there can be some merit
> in
>>>> having code paths hint to the scheduler the relative
> interactivity/priority
>>>> of a sleep.
>>>
>>> Couple of notes..
>>>
>>> The priority argument to sleep is a reasonable way for the code to hint at
>>> the relative priority/interactivity.  So that argues for leaving these
>>> arguments in place and making them more advisory.  I don't think we have
> to
>>> change the api to take advantage of that.
>>>
>>> I'll look more closely for places like the swap that care about the
> absolute
>>> priority of a process and see what I can come up with.  Thanks for raising
>>> that concern.
>>>
>>> I'd like to avoid apis that require the sched lock in seperate steps like
>>> msleep does now to elevate the priority.  So far all sched* apis require
> the
>>> thread lock on enter and I'd hate to deviate from that norm.  But another
>>> option may be just to make a globally visible td_sleep_pri that doesn't
>>> require the lock for write but does for read.  The other option is to
> bubble
>>> the argument down through the sleepq code and into sched_sleep() and
>>> sched_wakeup().  I like that the best but it's the most api churn.
>>
>> http://people.freebsd.org/~jeff/sleeppri.diff
>>
>> What do you think of this?  I added another parameter to sleepq_add() and
>> sched_sleep().  So the scheduler is responsible for adjusting the
>> priority.  We could do the same thing for wakeup time adjustments like
>> sleepq_broadcastpri() but we'd have to pass it through setrunnable() as
>> well.
>
> The cv_broadcastpri() thing is a hack and I wish there was a better way to do
> it.  I.e., I don't like having wakeup setting the priority at all.  I think
> it's a good idea to pass this to sched_sleep(), but I'd rather leave
> sched_sleep() where it is and pass the prio arg to the sleepq_wait() routines
> instead so you don't get a bump unless you actually sleep.  I think it's
> probably a bug that we bump the prio on threads that may not sleep now.

Ok, I preferred not to move sched_sleep() as well but I also didn't want 
to add those arguments to the stack everywhere.  I'll do that however.

>
>> I'd like to normalize the other pri arguments in sleepq to use the same 0
>> is not set vs -1 that msleep did.  I realize that 0 is a valid priority
>> but for practical purposes this makes things consistent and does not
>> really restrict the api.
>
> Sounds fine to me.  I think we should even formally make 0 an invalid priority
> (via a comment or something).

Ok, I'll consider that.

I'm just going to commit this when it's tested and working.  It's simple 
enough I don't think it warrents further review.

Thanks,
Jeff

>
> -- 
> John Baldwin
>

From owner-freebsd-arch@FreeBSD.ORG  Tue Mar 11 02:25:27 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B494F1065675
	for <arch@freebsd.org>; Tue, 11 Mar 2008 02:25:27 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 5F9368FC16
	for <arch@freebsd.org>; Tue, 11 Mar 2008 02:25:27 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com
	[24.94.75.93]) (authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	m2B2PPaL045126
	for <arch@freebsd.org>; Mon, 10 Mar 2008 22:25:26 -0400 (EDT)
	(envelope-from jroberson@chesapeake.net)
Date: Mon, 10 Mar 2008 16:26:17 -1000 (HST)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@desktop
To: arch@freebsd.org
Message-ID: <20080310161115.X1091@desktop>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: 
Subject: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Mar 2008 02:25:27 -0000

http://people.freebsd.org/~jeff/amd64.diff

At the above address there is an implementation of cpu_switch() and 
cpu_throw() for amd64 almost entirely in C.  I'm posting this for 
discussion and eventual commit.  There are numerous reasons to do this, I 
will outline some of them.

Implementing the bulk of the code in C allows us to add/modify higher 
level features more easily.  For example, we can change the pmap active 
bits to use a cpuset_t so we can support more than 64 cpus.  It makes the 
code faster because we can do more complicated checks to save time, such 
as avoiding writing the fs/gsbase MSRs if they have not changed.  It makes 
the code faster because infrequently used options can be moved out of the 
normal code paths.

In fact, the c version is ~10% faster than the assembly version at a two 
thread sched_yield() test on a single cpu opteron:

x asm.yield
+ csw.yield
+------------------------------------------------------------------------------+
|     ++                                              x  x 
|
|+ ++ ++ +  + +          +  +   ++ +x    x     x      x  xxx 
x|
| |______M_____A___________|               |__________AM__________| 
|
+------------------------------------------------------------------------------+
     N           Min           Max        Median           Avg 
Stddev
x  10          5.17          5.88           5.5         5.479 
0.19272606
+  15          4.58          5.16          4.71     4.8126667 
0.20738049
Difference at 95.0% confidence
         -0.666333 +/- 0.170431
         -12.1616% +/- 3.11062%
         (Student's t, pooled s = 0.201773)

This test measures the total time to call sched_yield() 10,000,000 times 
between two threads.  Two threads are needed to be sure that the scheduler 
doesn't pick the same thread twice and skip cpu_switch().  The 10% speedup 
is notable because the cpu_switch() routine was consuming less than 40% of 
the cpu prior to the speedup.  So it's almost 1/3rd faster.

Peter also suggested that we can delay portions of the switch until the 
user boundary.  For workloads that involve heavy kernel activity on the 
users part with multiple switches per-syscall this would be a big savings. 
We could also use this as a framework to implement custom switch routines 
if we want to switch directly to ithreads or taskqueue threads in the 
future.

The C routine is supplemented by two assembly routines which are 
responsible for saving the core architecture state and manipulating the 
stack.  These total approximately 50 assembly instructions and are similar 
to savecontext/swapcontext.

The c code saves the old threads context but still runs on its stack as it 
continues the switch.  This is safe because the old thread is locked until 
we call "cpu_switchin()" which is similar to swapcontext.

The only appreciable downside is that it lowers the barrier of entry for 
modifying a very sensitive piece of code.  Still, I think the flexibility 
it gives us outweighs those concerns.

Comments?

Thanks,
Jeff

From owner-freebsd-arch@FreeBSD.ORG  Tue Mar 11 09:56:10 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7923E1065674
	for <arch@freebsd.org>; Tue, 11 Mar 2008 09:56:10 +0000 (UTC)
	(envelope-from peterjeremy@optushome.com.au)
Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au
	[211.29.132.196])
	by mx1.freebsd.org (Postfix) with ESMTP id EFB6D8FC36
	for <arch@freebsd.org>; Tue, 11 Mar 2008 09:56:09 +0000 (UTC)
	(envelope-from peterjeremy@optushome.com.au)
Received: from server.vk2pj.dyndns.org
	(c220-239-20-82.belrs4.nsw.optusnet.com.au [220.239.20.82])
	by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	m2B9twW8016345
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 11 Mar 2008 20:55:59 +1100
Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1])
	by server.vk2pj.dyndns.org (8.14.2/8.14.1) with ESMTP id m2B9twSQ042761;
	Tue, 11 Mar 2008 20:55:58 +1100 (EST)
	(envelope-from peter@server.vk2pj.dyndns.org)
Received: (from peter@localhost)
	by server.vk2pj.dyndns.org (8.14.2/8.14.2/Submit) id m2B9twDi042760;
	Tue, 11 Mar 2008 20:55:58 +1100 (EST) (envelope-from peter)
Date: Tue, 11 Mar 2008 20:55:58 +1100
From: Peter Jeremy <peterjeremy@optushome.com.au>
To: Jeff Roberson <jroberson@chesapeake.net>
Message-ID: <20080311095557.GX68971@server.vk2pj.dyndns.org>
References: <20080310161115.X1091@desktop>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="213E7WwkW+nU62+Y"
Content-Disposition: inline
In-Reply-To: <20080310161115.X1091@desktop>
X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc
User-Agent: Mutt/1.5.17 (2007-11-01)
Cc: arch@freebsd.org
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Mar 2008 09:56:10 -0000


--213E7WwkW+nU62+Y
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Mar 10, 2008 at 04:26:17PM -1000, Jeff Roberson wrote:
>In fact, the c version is ~10% faster than the assembly version at a two=
=20
>thread sched_yield() test on a single cpu opteron:

That sounds wonderful.  How about comparing it on an SMP system.  Are
there any locking issues that might change that performance difference
with lots of CPUs?

>The only appreciable downside is that it lowers the barrier of entry for=
=20
>modifying a very sensitive piece of code.

IMHO, this isn't a valid reason.  Increasing the both the legibility
and performance of a very sensitive piece of code is a good thing.
Having more people understand the code is also a good thing.  FreeBSD
already implements a substantial barrier of entry to code modification
(commit bits) and I don't believe this should be further raised by
unnecessarily hiding critical code in a language that the majority of
committers are not expert in.  I've seen relatively few examples of
drive-by commits breaking critical code in the past and doubt that
converting cpu_switch()/cpu_throw() into C will suddenly make them the
target of a "how can I break FreeBSD in an obscure manner" competition.
In any case, there is nothing stopping anyone with a src commit bit
mangling the existing assembler implementation.

--=20
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.

--213E7WwkW+nU62+Y
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.8 (FreeBSD)

iEYEARECAAYFAkfWVy0ACgkQ/opHv/APuIc9mwCgiz3QPF4lOauPkYpWHtaVkQ0h
JboAnjNs/04TBin4fag0B10tX254eo4O
=n3b8
-----END PGP SIGNATURE-----

--213E7WwkW+nU62+Y--

From owner-freebsd-arch@FreeBSD.ORG  Tue Mar 11 10:02:39 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 219141065672
	for <arch@freebsd.org>; Tue, 11 Mar 2008 10:02:39 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
	by mx1.freebsd.org (Postfix) with ESMTP id C73748FC2B
	for <arch@freebsd.org>; Tue, 11 Mar 2008 10:02:38 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (unknown [192.168.61.3])
	by phk.freebsd.dk (Postfix) with ESMTP id 7B4F217104;
	Tue, 11 Mar 2008 10:02:37 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m2BA2aWi005547;
	Tue, 11 Mar 2008 10:02:36 GMT (envelope-from phk@critter.freebsd.dk)
To: Peter Jeremy <peterjeremy@optushome.com.au>
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
In-Reply-To: Your message of "Tue, 11 Mar 2008 20:55:58 +1100."
	<20080311095557.GX68971@server.vk2pj.dyndns.org> 
Date: Tue, 11 Mar 2008 10:02:35 +0000
Message-ID: <5546.1205229755@critter.freebsd.dk>
Sender: phk@critter.freebsd.dk
Cc: arch@freebsd.org
Subject: Re: amd64 cpu_switch in C. 
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Mar 2008 10:02:39 -0000

In message <20080311095557.GX68971@server.vk2pj.dyndns.org>, Peter Jeremy write
s:

>>The only appreciable downside is that it lowers the barrier of entry for
>>modifying a very sensitive piece of code.
>
>IMHO, this isn't a valid reason.  Increasing the both the legibility
>and performance of a very sensitive piece of code is a good thing.
>Having more people understand the code is also a good thing.

This is not a legal inference, and that's exactly the point Jeff made:

Just because it is written in C doesn't mean people will understand
it, it merely means that they will _think_ they understand it.

Nontheless, we have plenty of 
	/* You ARE supposed to understand this */
C-code already, so I don't see it as an objection to Jeff's patch.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-arch@FreeBSD.ORG  Tue Mar 11 20:04:04 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EFFCE1065670
	for <arch@freebsd.org>; Tue, 11 Mar 2008 20:04:04 +0000 (UTC)
	(envelope-from peter@wemm.org)
Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.246])
	by mx1.freebsd.org (Postfix) with ESMTP id B74118FC1A
	for <arch@freebsd.org>; Tue, 11 Mar 2008 20:04:04 +0000 (UTC)
	(envelope-from peter@wemm.org)
Received: by an-out-0708.google.com with SMTP id c14so774180anc.13
	for <arch@freebsd.org>; Tue, 11 Mar 2008 13:04:04 -0700 (PDT)
Received: by 10.100.6.13 with SMTP id 13mr13925043anf.16.1205264106316;
	Tue, 11 Mar 2008 12:35:06 -0700 (PDT)
Received: by 10.100.8.6 with HTTP; Tue, 11 Mar 2008 12:35:06 -0700 (PDT)
Message-ID: <e7db6d980803111235n706f7dct9b65804916a95a03@mail.gmail.com>
Date: Tue, 11 Mar 2008 12:35:06 -0700
From: "Peter Wemm" <peter@wemm.org>
To: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
In-Reply-To: <5546.1205229755@critter.freebsd.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20080311095557.GX68971@server.vk2pj.dyndns.org>
	<5546.1205229755@critter.freebsd.dk>
Cc: arch@freebsd.org
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Mar 2008 20:04:05 -0000

On Tue, Mar 11, 2008 at 3:02 AM, Poul-Henning Kamp <phk@phk.freebsd.dk> wrote:
> In message <20080311095557.GX68971@server.vk2pj.dyndns.org>, Peter Jeremy write
>  s:
>
>
>  >>The only appreciable downside is that it lowers the barrier of entry for
>  >>modifying a very sensitive piece of code.
>  >
>  >IMHO, this isn't a valid reason.  Increasing the both the legibility
>  >and performance of a very sensitive piece of code is a good thing.
>  >Having more people understand the code is also a good thing.
>
>  This is not a legal inference, and that's exactly the point Jeff made:
>
>  Just because it is written in C doesn't mean people will understand
>  it, it merely means that they will _think_ they understand it.

I'd like to point out that if I hadn't converted the run queue parts
of cpu_switch into C, then KSE might never have happened.  At least,
not in the form that hit the tree.

-- 
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 04:13:19 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 880DF106567F
	for <arch@hub.freebsd.org>; Wed, 12 Mar 2008 04:13:19 +0000 (UTC)
	(envelope-from davidxu@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 451BB8FC12;
	Wed, 12 Mar 2008 04:13:19 +0000 (UTC)
	(envelope-from davidxu@FreeBSD.org)
Received: from apple.my.domain (root@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m2C4DGK7003275;
	Wed, 12 Mar 2008 04:13:17 GMT (envelope-from davidxu@freebsd.org)
Message-ID: <47D758AC.2020605@freebsd.org>
Date: Wed, 12 Mar 2008 12:14:36 +0800
From: David Xu <davidxu@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.9 (X11/20071211)
MIME-Version: 1.0
To: Jeff Roberson <jroberson@chesapeake.net>
References: <20080310161115.X1091@desktop>
In-Reply-To: <20080310161115.X1091@desktop>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: arch@FreeBSD.org
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 04:13:24 -0000

Jeff Roberson wrote:
> http://people.freebsd.org/~jeff/amd64.diff

This is a good idea. In fact, according to calling conversion, some 
registers are not needed to be saved across function call, e.g on
i386, eax, edx, and ecx. :-) but gdb may need them to dig out
stack variable's value.

Regards,
David Xu


From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 08:25:18 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0B0AA1065672
	for <arch@freebsd.org>; Wed, 12 Mar 2008 08:25:18 +0000 (UTC)
	(envelope-from peter@wemm.org)
Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.251])
	by mx1.freebsd.org (Postfix) with ESMTP id B91498FC22
	for <arch@freebsd.org>; Wed, 12 Mar 2008 08:25:17 +0000 (UTC)
	(envelope-from peter@wemm.org)
Received: by an-out-0708.google.com with SMTP id c14so852913anc.13
	for <arch@freebsd.org>; Wed, 12 Mar 2008 01:25:17 -0700 (PDT)
Received: by 10.100.94.14 with SMTP id r14mr15661286anb.23.1205310316735;
	Wed, 12 Mar 2008 01:25:16 -0700 (PDT)
Received: by 10.100.8.6 with HTTP; Wed, 12 Mar 2008 01:25:16 -0700 (PDT)
Message-ID: <e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
Date: Wed, 12 Mar 2008 01:25:16 -0700
From: "Peter Wemm" <peter@wemm.org>
To: "David Xu" <davidxu@freebsd.org>
In-Reply-To: <47D758AC.2020605@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
Cc: arch@freebsd.org
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 08:25:18 -0000

On Tue, Mar 11, 2008 at 9:14 PM, David Xu <davidxu@freebsd.org> wrote:
> Jeff Roberson wrote:
>  > http://people.freebsd.org/~jeff/amd64.diff
>
>  This is a good idea. In fact, according to calling conversion, some
>  registers are not needed to be saved across function call, e.g on
>  i386, eax, edx, and ecx. :-) but gdb may need them to dig out
>  stack variable's value.

Jeff and I have been having a friendly "competition" today.

With a UP kernel and INVARIANTS, my initial counter-patch response had
nearly double the gain on my machine.  (Jeff 7%, mine: 13.5%).
I changed to compile kernels the same as he did (no invariants, SMP
kernel, but kern.smp.disabled=1).  After that, our patch sets were the
same again - both at about 10% gain over baseline.

I've made a few more changes and am now at 23% improvement over baseline.

I'm not confident of testing methodology.  More tests are in progress.

The good news is that this tuning is finally being done.  It should
have been done in 2003 though...

-- 
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 08:51:22 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 51EA51065676;
	Wed, 12 Mar 2008 08:51:22 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 07B008FC36;
	Wed, 12 Mar 2008 08:51:21 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com
	[24.94.75.93]) (authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	m2C8pH2W075419; Wed, 12 Mar 2008 04:51:20 -0400 (EDT)
	(envelope-from jroberson@chesapeake.net)
Date: Tue, 11 Mar 2008 22:52:16 -1000 (HST)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@desktop
To: Peter Wemm <peter@wemm.org>
In-Reply-To: <e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
Message-ID: <20080311224903.V1091@desktop>
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
	<e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 08:51:22 -0000

On Wed, 12 Mar 2008, Peter Wemm wrote:

> On Tue, Mar 11, 2008 at 9:14 PM, David Xu <davidxu@freebsd.org> wrote:
>> Jeff Roberson wrote:
>> > http://people.freebsd.org/~jeff/amd64.diff
>>
>>  This is a good idea. In fact, according to calling conversion, some
>>  registers are not needed to be saved across function call, e.g on
>>  i386, eax, edx, and ecx. :-) but gdb may need them to dig out
>>  stack variable's value.
>
> Jeff and I have been having a friendly "competition" today.
>
> With a UP kernel and INVARIANTS, my initial counter-patch response had
> nearly double the gain on my machine.  (Jeff 7%, mine: 13.5%).
> I changed to compile kernels the same as he did (no invariants, SMP
> kernel, but kern.smp.disabled=1).  After that, our patch sets were the
> same again - both at about 10% gain over baseline.
>
> I've made a few more changes and am now at 23% improvement over baseline.

The question is whether we care to have it in C or not.  Given a C and 
assembly version with similar optimizations the assembly version will 
always win.  However, it's easier to write the optimizations in C.

>
> I'm not confident of testing methodology.  More tests are in progress.

To keep everyone else up to date;  We're using:
http://people.freebsd.org/~jeff/yield.c & yield.sh

Given two processes and the scheduling methodology for sched_yield() every 
yield should trigger a context switch to a new process.

>
> The good news is that this tuning is finally being done.  It should
> have been done in 2003 though...

Yes indeed, better late than never.

>
> -- 
> Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
> "All of this is for nothing if we don't go to the stars" - JMS/B5
> "If Java had true garbage collection, most programs would delete
> themselves upon execution." -- Robert Sewell
>

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 09:20:05 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5D6AD1065678
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 09:20:05 +0000 (UTC)
	(envelope-from freebsd-arch@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 0D3758FC23
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 09:20:05 +0000 (UTC)
	(envelope-from freebsd-arch@m.gmane.org)
Received: from root by ciao.gmane.org with local (Exim 4.43)
	id 1JZN7y-00046e-Io
	for freebsd-arch@freebsd.org; Wed, 12 Mar 2008 09:20:02 +0000
Received: from 195.208.174.178 ([195.208.174.178])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 09:20:02 +0000
Received: from vadim_nuclight by 195.208.174.178 with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 09:20:02 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-arch@freebsd.org
From: Vadim Goncharov <vadim_nuclight@mail.ru>
Date: Wed, 12 Mar 2008 09:13:22 +0000 (UTC)
Organization: Nuclear Lightning @ Tomsk, TPU AVTF Hostel
Lines: 32
Message-ID: <slrnftf7li.1m0l.vadim_nuclight@hostel.avtf.net>
References: <86odacc04t.fsf@ds4.des.no>
Mime-Version: 1.0
Content-Type: text/plain; charset=koi8-r
Content-Transfer-Encoding: 8bit
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: 195.208.174.178
X-Comment-To: Dag-Erling =?koi8-r?Q?Sm=F8rgrav?=
User-Agent: slrn/0.9.8.1 (FreeBSD)
Sender: news <news@ger.gmane.org>
Subject: Re: dev.* analogue for interfaces
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: vadim_nuclight@mail.ru
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 09:20:05 -0000

Hi Dag-Erling Sm�rgrav! 

On Tue, 19 Feb 2008 18:43:46 +0100; Dag-Erling Sm�rgrav wrote about 'dev.* analogue for interfaces':

> What I propose is to add a similar sysctl tree for interfaces.  It would
> look a little different.  For instance, some interfaces (bridge, vlan)
> have parents or children, but most don't.

> Just as it is for devices, creation and destruction of the interface's
> sysctl node and context would be hidden inside if_{attach,detach}() and
> completely transparent to the driver, and there will be an API that
> drivers can use if they want to add their own nodes.

> Since interfaces don't all have parents, the API will include a function
> to specify one for those that do.

> This is *not* intended to replace ifconfig; it is intended for infor-
> mation which isn't available through ifconfig and which it wouldn't be
> natural to place there.  For instance, every wlan interface already has
> a sysctl tree under net.wlan.

Will this allow to easier do things like adding new features in configuring
per-interface network stack? To not bloat ifconfig, for example, to implement
per-interface output DSCP->CoS map via sysctl subtree.

Also, I'm not sure but think it will help virtualization, multiple routing
tables, VRF and other things which can be bound to interface. So I agree with
general idea, just actual info and position in tree should be discussed.

-- 
WBR, Vadim Goncharov. ICQ#166852181       mailto:vadim_nuclight@mail.ru
[Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight]


From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 09:44:34 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9B8C4106566B
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 09:44:34 +0000 (UTC)
	(envelope-from freebsd-arch@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 235178FC24
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 09:44:34 +0000 (UTC)
	(envelope-from freebsd-arch@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1JZNVc-0005Ga-Dz
	for freebsd-arch@freebsd.org; Wed, 12 Mar 2008 09:44:28 +0000
Received: from 195.208.174.178 ([195.208.174.178])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 09:44:28 +0000
Received: from vadim_nuclight by 195.208.174.178 with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 09:44:28 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-arch@freebsd.org
From: Vadim Goncharov <vadim_nuclight@mail.ru>
Followup-To: gmane.os.freebsd.architechture
Date: Wed, 12 Mar 2008 09:44:19 +0000 (UTC)
Organization: Nuclear Lightning @ Tomsk, TPU AVTF Hostel
Lines: 41
Message-ID: <slrnftf9fi.1m0l.vadim_nuclight@hostel.avtf.net>
References: <3bbf2fe10802061700p253e68b8s704deb3e5e4ad086@mail.gmail.com>
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: 195.208.174.178
X-Comment-To: Attilio Rao
User-Agent: slrn/0.9.8.1 (FreeBSD)
Sender: news <news@ger.gmane.org>
Cc: freebsd-fs@freebsd.org
Subject: Re: [RFC] Remove NTFS kernel support
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: vadim_nuclight@mail.ru
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 09:44:34 -0000

Hi Attilio Rao! 

On Thu, 7 Feb 2008 02:00:41 +0100; Attilio Rao wrote about '[RFC] Remove NTFS kernel support':

> As exposed by several users, NTFS seems to be broken even before first
> VFS commits happeing around the end of December. Those commits exposed
> some problems about NTFS which are currently under investigation.
> Ultimately, This filesystem is also unmaintained at the moment.

> Speaking with jeff, we agreed on what can be a possible compromise:
> remove the kernel support for NTFS and maybe take care of the FUSE
> implementation.
> What I now propose is a small survey which can shade a light on us
> about what do you think about this idea and its implications:
> - Do you use NTFS?

Yes, occasionally. And I had scenarios when I was needed it withput Internet
access, FUSE, etc.

> - Are you interested in maintaining it?

Not in 8.0 timeline :)

> - Do you know a good reason to not use FUSE ntfs implementation? What
> the kernel counter part adds?

Localization: ntfs-3g requires UTF-8 as the only locale. And FreeBSD is not
good in supporting UTF-8 everywhere (syscons, ufs2, etc.), while kernel part
supports recoding to current locale's codepage. Valuable for people with non
Latin-1 set.

> - Do you think axing the kernel support a good idea?

No. It was said about FAT32 as most popular FS for file exchange, look at that
new USB flash devices with 4G+ sizes. People want to store 4G+ size files on
them (e.g. DVD images), so I've already seen some of them formatted to NTFS
instead of FAT32. Having support for them out of the box is good.

-- 
WBR, Vadim Goncharov. ICQ#166852181       mailto:vadim_nuclight@mail.ru
[Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight]


From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 10:02:32 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EA1A6106566B
	for <arch@FreeBSD.org>; Wed, 12 Mar 2008 10:02:32 +0000 (UTC)
	(envelope-from gary.jennejohn@freenet.de)
Received: from mout3.freenet.de (mout3.freenet.de [IPv6:2001:748:100:40::2:5])
	by mx1.freebsd.org (Postfix) with ESMTP id 8118D8FC45
	for <arch@FreeBSD.org>; Wed, 12 Mar 2008 10:02:32 +0000 (UTC)
	(envelope-from gary.jennejohn@freenet.de)
Received: from [195.4.92.11] (helo=1.mx.freenet.de)
	by mout3.freenet.de with esmtpa (Exim 4.69)
	(envelope-from <gary.jennejohn@freenet.de>)
	id 1JZNn4-00064J-OZ; Wed, 12 Mar 2008 11:02:30 +0100
Received: from x167d.x.pppool.de ([89.59.22.125]:56425
	helo=peedub.jennejohn.org)
	by 1.mx.freenet.de with esmtpa (ID gary.jennejohn@freenet.de) (port 25)
	(Exim 4.69 #12) id 1JZNn4-0002oL-F1; Wed, 12 Mar 2008 11:02:30 +0100
Date: Wed, 12 Mar 2008 11:02:29 +0100
From: Gary Jennejohn <gary.jennejohn@freenet.de>
Message-ID: <20080312110229.5aeefc1f@peedub.jennejohn.org>
In-Reply-To: <47D758AC.2020605@freebsd.org>
References: <20080310161115.X1091@desktop>
	<47D758AC.2020605@freebsd.org>
X-Mailer: Claws Mail 3.3.1 (GTK+ 2.10.14; amd64-portbld-freebsd8.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: arch@FreeBSD.org
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: gary.jennejohn@freenet.de
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 10:02:33 -0000

On Wed, 12 Mar 2008 12:14:36 +0800
David Xu <davidxu@FreeBSD.org> wrote:

> Jeff Roberson wrote:
> > http://people.freebsd.org/~jeff/amd64.diff
> 
> This is a good idea. In fact, according to calling conversion, some 
> registers are not needed to be saved across function call, e.g on
> i386, eax, edx, and ecx. :-) but gdb may need them to dig out
> stack variable's value.
> 

I applied this patch yesterday on an AMD64 X2 box and got this panic
today after I started X:

Unread portion of the kernel message buffer:
panic: smp_tlb_shootdown: interrupts disabled
cpuid = 0
Uptime: 47s
Physical memory: 3062 MB
Dumping 169 MB: 154 138 122 106 90 74 58 42 26 10

That's all the useful information which I have because the back trace
is corrupted.

BTW I'm using SCHED_ULE.

Maybe I shouldn't have tried this patch yet since it doesn't seem to be SMP
ready.

---
Gary Jennejohn

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 10:04:05 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7BBEC1065676;
	Wed, 12 Mar 2008 10:04:05 +0000 (UTC)
	(envelope-from bsd.luigi@alshome.be)
Received: from csmtp1.b-one.net (csmtp1.one.com [195.47.247.21])
	by mx1.freebsd.org (Postfix) with ESMTP id 40AD98FC31;
	Wed, 12 Mar 2008 10:04:05 +0000 (UTC)
	(envelope-from bsd.luigi@alshome.be)
Received: from [128.70.15.100] (85.248-78-194.adsl-static.isp.belgacom.be
	[194.78.248.85])
	by csmtp1.b-one.net (Postfix) with ESMTP id A2D1BE00B9E7;
	Wed, 12 Mar 2008 10:36:03 +0100 (CET)
Message-ID: <47D7A387.5020707@alshome.be>
Date: Wed, 12 Mar 2008 10:33:59 +0100
From: Luigi <bsd.luigi@alshome.be>
User-Agent: Thunderbird 2.0.0.12 (Windows/20080213)
MIME-Version: 1.0
To: freebsd-arch@freebsd.org, freebsd-doc <freebsd-doc@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: study about Kernels
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: bsd.luigi@alshome.be
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 10:04:05 -0000

Hi all,

I realize a study about kernels. The goal is to compare architechture of 
open kernels. I would like to examin the BSD, Darwin and linux Kernel.

Who can help me and where can I find documentation ?

Thank you very much for help.

Luigi

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 10:06:44 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3979C1065670
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 10:06:44 +0000 (UTC)
	(envelope-from freebsd-arch@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id E70A48FC1D
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 10:06:43 +0000 (UTC)
	(envelope-from freebsd-arch@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1JZNr4-0006B8-4T
	for freebsd-arch@freebsd.org; Wed, 12 Mar 2008 10:06:38 +0000
Received: from 195.208.174.178 ([195.208.174.178])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 10:06:38 +0000
Received: from vadim_nuclight by 195.208.174.178 with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 10:06:38 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-arch@freebsd.org
From: Vadim Goncharov <vadim_nuclight@mail.ru>
Date: Wed, 12 Mar 2008 10:06:28 +0000 (UTC)
Organization: Nuclear Lightning @ Tomsk, TPU AVTF Hostel
Lines: 23
Message-ID: <slrnftfap4.1m0l.vadim_nuclight@hostel.avtf.net>
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: 195.208.174.178
X-Comment-To: All
User-Agent: slrn/0.9.8.1 (FreeBSD)
Sender: news <news@ger.gmane.org>
Subject: sysctl vs procfs
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: vadim_nuclight@mail.ru
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 10:06:44 -0000

Hi!

While it is good idea to prefer more consistent sysctl in favor of procfs, the
sysctl interface has some drawbacks. For example, procfs has good file
interface for big things, like VM map of the process. Imagine 800 megs... and
sysctl in-kernel interface locks value then copies it to in-kernel memory then
it can be copied to userspace. Not suitable for providing alternative to procfs
in reading big files and getting rid of procfs, of course.

So, what about adding sysctl interfaces allowing userland-application to read
large buffers in parts without copying? Application, of course, should be aware
of the fact that underlying buffer can change while copying, but many our base
utilities (like netstat) already work in these conditions.

Another proposal is about human-readable conversions. We already have C structs
and arrays parsing/unparsing code in netgraph (/sys/netgraph/ng_parse.c). What
about porting it userland (or leave in kernel, this should be thought) to allow
user-interpreting blobs which are even hidden to user without sysctl -A ? This
and previous can improve our KVM interactions, I think.

-- 
WBR, Vadim Goncharov. ICQ#166852181       mailto:vadim_nuclight@mail.ru
[Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight]


From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 10:16:30 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7055F1065673
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 10:16:30 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 442EF8FC13
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 10:16:30 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id F15F246B43;
	Wed, 12 Mar 2008 06:16:29 -0400 (EDT)
Date: Wed, 12 Mar 2008 10:16:29 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Luigi <bsd.luigi@alshome.be>
In-Reply-To: <47D7A387.5020707@alshome.be>
Message-ID: <20080312101301.X29518@fledge.watson.org>
References: <47D7A387.5020707@alshome.be>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-doc <freebsd-doc@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: study about Kernels
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 10:16:30 -0000

On Wed, 12 Mar 2008, Luigi wrote:

> I realize a study about kernels. The goal is to compare architechture of 
> open kernels. I would like to examin the BSD, Darwin and linux Kernel.
>
> Who can help me and where can I find documentation ?
>
> Thank you very much for help.

I don't think you'll find much in the way of serious documentation of the 
differences.  If doing a comparative study, I'd encourage you also to take a 
look at the OpenSolaris kernel.  However, you can find all the source trees 
here:

   http://fxr.watson.org/

Robert N M Watson
Computer Laboratory
University of Cambridge

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 10:22:38 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 881A0106566B;
	Wed, 12 Mar 2008 10:22:38 +0000 (UTC)
	(envelope-from bsd.luigi@alshome.be)
Received: from csmtp3.b-one.net (csmtp3.one.com [195.47.247.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 47BF28FC1C;
	Wed, 12 Mar 2008 10:22:38 +0000 (UTC)
	(envelope-from bsd.luigi@alshome.be)
Received: from [128.70.15.100] (85.248-78-194.adsl-static.isp.belgacom.be
	[194.78.248.85])
	by csmtp3.b-one.net (Postfix) with ESMTP id 8B143100EC8D;
	Wed, 12 Mar 2008 11:22:36 +0100 (CET)
Message-ID: <47D7AE70.3020305@alshome.be>
Date: Wed, 12 Mar 2008 11:20:32 +0100
From: Luigi <bsd.luigi@alshome.be>
User-Agent: Thunderbird 2.0.0.12 (Windows/20080213)
MIME-Version: 1.0
To: Robert Watson <rwatson@FreeBSD.org>, 
	freebsd-arch <freebsd-arch@freebsd.org>,
	freebsd-doc <freebsd-doc@freebsd.org>
References: <47D7A387.5020707@alshome.be>
	<20080312101301.X29518@fledge.watson.org>
In-Reply-To: <20080312101301.X29518@fledge.watson.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Cc: 
Subject: Re: study about Kernels
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: bsd.luigi@alshome.be
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 10:22:38 -0000

Ok thank you very much. I'ill examin the opensolaris kernel too.

Robert Watson a �crit :
> On Wed, 12 Mar 2008, Luigi wrote:
>
>> I realize a study about kernels. The goal is to compare architechture 
>> of open kernels. I would like to examin the BSD, Darwin and linux 
>> Kernel.
>>
>> Who can help me and where can I find documentation ?
>>
>> Thank you very much for help.
>
> I don't think you'll find much in the way of serious documentation of 
> the differences.  If doing a comparative study, I'd encourage you also 
> to take a look at the OpenSolaris kernel.  However, you can find all 
> the source trees here:
>
>   http://fxr.watson.org/
>
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>


From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 11:30:48 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D98241065670
	for <arch@freebsd.org>; Wed, 12 Mar 2008 11:30:48 +0000 (UTC)
	(envelope-from peter@wemm.org)
Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.250])
	by mx1.freebsd.org (Postfix) with ESMTP id A0D788FC28
	for <arch@freebsd.org>; Wed, 12 Mar 2008 11:30:48 +0000 (UTC)
	(envelope-from peter@wemm.org)
Received: by an-out-0708.google.com with SMTP id c14so873755anc.13
	for <arch@freebsd.org>; Wed, 12 Mar 2008 04:30:47 -0700 (PDT)
Received: by 10.100.108.20 with SMTP id g20mr15988543anc.8.1205321447364;
	Wed, 12 Mar 2008 04:30:47 -0700 (PDT)
Received: by 10.100.8.6 with HTTP; Wed, 12 Mar 2008 04:30:47 -0700 (PDT)
Message-ID: <e7db6d980803120430n3103588dh22b160979c60827e@mail.gmail.com>
Date: Wed, 12 Mar 2008 04:30:47 -0700
From: "Peter Wemm" <peter@wemm.org>
To: "David Xu" <davidxu@freebsd.org>
In-Reply-To: <e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
	<e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
Cc: arch@freebsd.org
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 11:30:49 -0000

On Wed, Mar 12, 2008 at 1:25 AM, Peter Wemm <peter@wemm.org> wrote:
> On Tue, Mar 11, 2008 at 9:14 PM, David Xu <davidxu@freebsd.org> wrote:
>  > Jeff Roberson wrote:
>  >  > http://people.freebsd.org/~jeff/amd64.diff
>  >
>  >  This is a good idea. In fact, according to calling conversion, some
>  >  registers are not needed to be saved across function call, e.g on
>  >  i386, eax, edx, and ecx. :-) but gdb may need them to dig out
>  >  stack variable's value.
>
>  Jeff and I have been having a friendly "competition" today.
>
>  With a UP kernel and INVARIANTS, my initial counter-patch response had
>  nearly double the gain on my machine.  (Jeff 7%, mine: 13.5%).
>  I changed to compile kernels the same as he did (no invariants, SMP
>  kernel, but kern.smp.disabled=1).  After that, our patch sets were the
>  same again - both at about 10% gain over baseline.
>
>  I've made a few more changes and am now at 23% improvement over baseline.
>
>  I'm not confident of testing methodology.  More tests are in progress.

I've found a couple of pthreads test cases where Jeff's version is a
couple of percent slower than the baseline, and mine is either the
same or a couple of percent faster.

His:
Difference at 95.0% confidence
        0.0921053 +/- 0.0648113
        2.6455% +/- 1.86155%
(2.6% longer to run the test)
Mine:
No difference proven at 95.0% confidence

Same test, different kernel options:
His:
No difference proven at 95.0% confidence
Mine:
Difference at 95.0% confidence
        -0.2055 +/- 0.204382
        -4.06086% +/- 4.03877%

But my favourite one is Jeff's preferred test configuration:
His:
Difference at 95.0% confidence
        -0.668 +/- 0.047188
        -10.9896% +/- 0.776309%
Mine:
Difference at 95.0% confidence
        -1.457 +/- 0.0290925
        -23.9697% +/- 0.478613%

(11% less time vs 24% less time for the test)

This stuff directly affects latency with ithreads, kthreads, task
queues, etc and should show up on networking benchmarks.

I'm moving over to testing in an otherwise virgin cvs tree, since my
p4 tree is somewhat polluted.  More numbers tomorrow.

-- 
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 12:05:05 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D21491065680
	for <arch@freebsd.org>; Wed, 12 Mar 2008 12:05:05 +0000 (UTC)
	(envelope-from cokane@cokane.org)
Received: from QMTA08.westchester.pa.mail.comcast.net
	(qmta08.westchester.pa.mail.comcast.net [76.96.62.80])
	by mx1.freebsd.org (Postfix) with ESMTP id 6FC138FC1D
	for <arch@freebsd.org>; Wed, 12 Mar 2008 12:05:04 +0000 (UTC)
	(envelope-from cokane@cokane.org)
Received: from OMTA01.westchester.pa.mail.comcast.net ([76.96.62.11])
	by QMTA08.westchester.pa.mail.comcast.net with comcast
	id 0B911Z0030EZKEL5802v00; Wed, 12 Mar 2008 11:48:24 +0000
Received: from discordia ([24.61.189.203])
	by OMTA01.westchester.pa.mail.comcast.net with comcast
	id 0Bp41Z0054PktZC3M00000; Wed, 12 Mar 2008 11:49:04 +0000
X-Authority-Analysis: v=1.0 c=1 a=Pj4_Y536NHEA:10 a=0RHiSNdv7K1cofAIcD8A:9
	a=c4nhsv1tePooTTT5wNglNuvk8YcA:4 a=50e4U0PicR4A:10
	a=-rtcXVvtY7498SX-e9UA:9
	a=34nm3S-1UQFYRJjl--IA:7 a=XnLui3lyQwI9Xcj7tu-HLo7BWC8A:4
	a=NfA2RSpTaHsA:10
Received: by discordia (Postfix, from userid 103)
	id 1C5351636F9; Wed, 12 Mar 2008 07:49:04 -0400 (EDT)
X-Spam-Checker-Version: SpamAssassin 3.1.8-gr1 (2007-02-13) on discordia
X-Spam-Level: 
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.1.8-gr1
Received: from [172.20.1.3] (erwin.int.cokane.org [172.20.1.3])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by discordia (Postfix) with ESMTP id 7440D1636F8
	for <arch@FreeBSD.org>; Wed, 12 Mar 2008 07:48:53 -0400 (EDT)
Message-ID: <47D7C25D.5070908@cokane.org>
Date: Wed, 12 Mar 2008 07:45:33 -0400
From: Coleman Kane <cokane@cokane.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080304)
MIME-Version: 1.0
To: arch@FreeBSD.org
Content-Type: multipart/mixed; boundary="------------000300090804080100080401"
Cc: 
Subject: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 12:05:05 -0000

This is a multi-part message in MIME format.
--------------000300090804080100080401
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Hi all,

I was poking around SMPTODO for some work during an idle night, and I 
decided to fix the non-MPSAFE use of timeout(9) in ffs_softdep.c, and 
learn more about the callout_* API in the kernel. I'm attaching a patch 
of what I've done, which I am running in my current kernel at the moment 
(and I am using softupdates on a number of filesystems on this SMP machine).

Can anyone else try it out / review it / give feedback?

--
Coleman Kane


--------------000300090804080100080401
Content-Type: text/x-patch;
 name="ffs_softdep.c-newcallout.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="ffs_softdep.c-newcallout.diff"

diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
index 3e8ba26..3e9122f 100644
--- a/sys/ufs/ffs/ffs_softdep.c
+++ b/sys/ufs/ffs/ffs_softdep.c
@@ -664,7 +664,7 @@ static int maxindirdeps = 50;	/* max number of indirdeps before slowdown */
 static int tickdelay = 2;	/* number of ticks to pause during slowdown */
 static int proc_waiting;	/* tracks whether we have a timeout posted */
 static int *stat_countp;	/* statistic to count in proc_waiting timeout */
-static struct callout_handle handle; /* handle on posted proc_waiting timeout */
+static struct callout softdep_callout;
 static int req_pending;
 static int req_clear_inodedeps;	/* syncer process flush some inodedeps */
 #define FLUSH_INODES		1
@@ -1394,6 +1394,9 @@ softdep_initialize()
 	bioops.io_complete = softdep_disk_write_complete;
 	bioops.io_deallocate = softdep_deallocate_dependencies;
 	bioops.io_countdeps = softdep_count_dependencies;
+
+	/* Initialize the callout with an mtx. */
+	callout_init_mtx(&softdep_callout, &lk, 0);
 }
 
 /*
@@ -1403,7 +1406,9 @@ softdep_initialize()
 void
 softdep_uninitialize()
 {
-
+	ACQUIRE_LOCK(&lk);
+	callout_drain(&softdep_callout);
+	FREE_LOCK(&lk);
 	hashdestroy(pagedep_hashtbl, M_PAGEDEP, pagedep_hash);
 	hashdestroy(inodedep_hashtbl, M_INODEDEP, inodedep_hash);
 	hashdestroy(newblk_hashtbl, M_NEWBLK, newblk_hash);
@@ -5858,8 +5863,16 @@ request_cleanup(mp, resource)
 	 * We wait at most tickdelay before proceeding in any case.
 	 */
 	proc_waiting += 1;
-	if (handle.callout == NULL)
-		handle = timeout(pause_timer, 0, tickdelay > 2 ? tickdelay : 2);
+	ACQUIRE_LOCK(&lk);
+	if(callout_active(&softdep_callout) == FALSE) {
+		/* 
+			 should always return zero due to callout_active being called to verify that no active
+			 timeout already exists, which is the case where this would return non-zero (and
+			 callout_active(&softdep_callout) would be TRUE.
+    */
+		callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2, pause_timer, 0);
+	}
+	FREE_LOCK(&lk);
 	msleep((caddr_t)&proc_waiting, &lk, PPAUSE, "softupdate", 0);
 	proc_waiting -= 1;
 	return (1);
@@ -5873,15 +5886,17 @@ static void
 pause_timer(arg)
 	void *arg;
 {
-
-	ACQUIRE_LOCK(&lk);
+	/* Implied by callout_* API */
+	/* ACQUIRE_LOCK(&lk); */
 	*stat_countp += 1;
 	wakeup_one(&proc_waiting);
-	if (proc_waiting > 0)
-		handle = timeout(pause_timer, 0, tickdelay > 2 ? tickdelay : 2);
-	else
-		handle.callout = NULL;
-	FREE_LOCK(&lk);
+	if (proc_waiting > 0) {
+		/* We don't care about the return value here. */
+		callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2, pause_timer, 0);
+	} else {
+		callout_deactivate(&softdep_callout);
+	}
+	/* FREE_LOCK(&lk); */
 }
 
 /*

--------------000300090804080100080401--

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 13:53:54 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D9DE7106567D
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 13:53:54 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.freebsd.org (Postfix) with ESMTP id C38F68FC14
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 13:53:54 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from zion.baldwin.cx (66-23-211-162.clients.speedfactory.net
	[66.23.211.162]) by elvis.mu.org (Postfix) with ESMTP id C89881A4D8B;
	Wed, 12 Mar 2008 06:53:01 -0700 (PDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Wed, 12 Mar 2008 09:45:28 -0400
User-Agent: KMail/1.9.7
References: <47D7C25D.5070908@cokane.org>
In-Reply-To: <47D7C25D.5070908@cokane.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803120945.29018.jhb@freebsd.org>
Cc: Coleman Kane <cokane@cokane.org>
Subject: Re: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 13:53:55 -0000

On Wednesday 12 March 2008 07:45:33 am Coleman Kane wrote:
> Hi all,
>
> I was poking around SMPTODO for some work during an idle night, and I
> decided to fix the non-MPSAFE use of timeout(9) in ffs_softdep.c, and
> learn more about the callout_* API in the kernel. I'm attaching a patch
> of what I've done, which I am running in my current kernel at the moment
> (and I am using softupdates on a number of filesystems on this SMP
> machine).
>
> Can anyone else try it out / review it / give feedback?
> 
> @@ -1403,7 +1406,9 @@ softdep_initialize()
>  void
>  softdep_uninitialize()
>  {
> -
> +       ACQUIRE_LOCK(&lk);
> +       callout_drain(&softdep_callout);
> +       FREE_LOCK(&lk);
>         hashdestroy(pagedep_hashtbl, M_PAGEDEP, pagedep_hash);
>         hashdestroy(inodedep_hashtbl, M_INODEDEP, inodedep_hash);
>         hashdestroy(newblk_hashtbl, M_NEWBLK, newblk_hash);

Don't hold the mutex over a drain and leave the blank line at the start of the
function (style(9)).

> @@ -5858,8 +5863,16 @@ request_cleanup(mp, resource)
>          * We wait at most tickdelay before proceeding in any case.
>          */
>         proc_waiting += 1;
> -       if (handle.callout == NULL)
> -               handle = timeout(pause_timer, 0, tickdelay > 2 ? tickdelay : 2);
> +       ACQUIRE_LOCK(&lk);
> +       if(callout_active(&softdep_callout) == FALSE) {
> +               /* 
> +                        should always return zero due to callout_active being called to verify that no active
> +                        timeout already exists, which is the case where this would return non-zero (and
> +                        callout_active(&softdep_callout) would be TRUE.
> +    */
> +               callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2, pause_timer, 0);
> +       }
> +       FREE_LOCK(&lk);
>         msleep((caddr_t)&proc_waiting, &lk, PPAUSE, "softupdate", 0);
>         proc_waiting -= 1;
>         return (1);

The lock is already held, so no need to lock it again.  Also, space after
'if'.  I'm not sure the new comment is needed as the reader can already
infer that from the callout_active() test.  Also, I think you really want
callout_pending() rather than callout_active() if pause_timer() executes
normally without rescheduling itself the callout will still be marked
active and the next time this function is invoked it won't schedule the
callout.

> @@ -5873,15 +5886,17 @@ static void
>  pause_timer(arg)
>         void *arg;
>  {
> -
> -       ACQUIRE_LOCK(&lk);
> +       /* Implied by callout_* API */
> +       /* ACQUIRE_LOCK(&lk); */
>         *stat_countp += 1;
>         wakeup_one(&proc_waiting);
> -       if (proc_waiting > 0)
> -               handle = timeout(pause_timer, 0, tickdelay > 2 ? tickdelay : 2);
> -       else
> -               handle.callout = NULL;
> -       FREE_LOCK(&lk);
> +       if (proc_waiting > 0) {
> +               /* We don't care about the return value here. */
> +               callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2, pause_timer, 0);
> +       } else {
> +               callout_deactivate(&softdep_callout);
> +       }
> +       /* FREE_LOCK(&lk); */
>  }

No need to use callout_deactivate() here, the callout is already deactivated
when it is invoked.  I think you can also leave out the comment about the
return value as the vast majority of places in the kernel that call
callout_reset() ignore the return value, so it is a common practice.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 14:30:06 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 67C42106567B
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 14:30:06 +0000 (UTC)
	(envelope-from cokane@cokane.org)
Received: from QMTA04.westchester.pa.mail.comcast.net
	(qmta04.westchester.pa.mail.comcast.net [76.96.62.40])
	by mx1.freebsd.org (Postfix) with ESMTP id 0544D8FC19
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 14:30:05 +0000 (UTC)
	(envelope-from cokane@cokane.org)
Received: from OMTA04.westchester.pa.mail.comcast.net ([76.96.62.35])
	by QMTA04.westchester.pa.mail.comcast.net with comcast
	id 08yg1Z00D0ldTLk540US00; Wed, 12 Mar 2008 14:19:09 +0000
Received: from discordia ([24.61.189.203])
	by OMTA04.westchester.pa.mail.comcast.net with comcast
	id 0EL31Z00R4PktZC3Q00000; Wed, 12 Mar 2008 14:20:04 +0000
X-Authority-Analysis: v=1.0 c=1 a=yWIViUiLWPYA:10 a=CUMa_SbteGx0BZgX1pgA:9
	a=HrSz8c6paNKFlKRnIwkA:7 a=LGVNa8fhoMo1oVVE5VAFUJlUgBwA:4
	a=zUBsD6tbDSsA:10
	a=-rtcXVvtY7498SX-e9UA:9 a=lEhTTu5oXNJMJ4XxhzkA:7
	a=DneVQdTm6Pk93g1E2aIrDb4HF0cA:4 a=NfA2RSpTaHsA:10
Received: by discordia (Postfix, from userid 103)
	id BEB961636F9; Wed, 12 Mar 2008 10:20:03 -0400 (EDT)
X-Spam-Checker-Version: SpamAssassin 3.1.8-gr1 (2007-02-13) on discordia
X-Spam-Level: 
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.1.8-gr1
Received: from [172.20.1.3] (erwin.int.cokane.org [172.20.1.3])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by discordia (Postfix) with ESMTP id F31B31636F8;
	Wed, 12 Mar 2008 10:19:51 -0400 (EDT)
Message-ID: <47D7E5BF.2060102@cokane.org>
Date: Wed, 12 Mar 2008 10:16:31 -0400
From: Coleman Kane <cokane@cokane.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080304)
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <47D7C25D.5070908@cokane.org> <200803120945.29018.jhb@freebsd.org>
In-Reply-To: <200803120945.29018.jhb@freebsd.org>
Content-Type: multipart/mixed; boundary="------------080002040909020109040007"
Cc: freebsd-arch@freebsd.org
Subject: Re: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 14:30:06 -0000

This is a multi-part message in MIME format.
--------------080002040909020109040007
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

John Baldwin wrote:
> On Wednesday 12 March 2008 07:45:33 am Coleman Kane wrote:
>   
>> Hi all,
>>
>> I was poking around SMPTODO for some work during an idle night, and I
>> decided to fix the non-MPSAFE use of timeout(9) in ffs_softdep.c, and
>> learn more about the callout_* API in the kernel. I'm attaching a patch
>> of what I've done, which I am running in my current kernel at the moment
>> (and I am using softupdates on a number of filesystems on this SMP
>> machine).
>>
>> Can anyone else try it out / review it / give feedback?
>>
>> @@ -1403,7 +1406,9 @@ softdep_initialize()
>>  void
>>  softdep_uninitialize()
>>  {
>> -
>> +       ACQUIRE_LOCK(&lk);
>> +       callout_drain(&softdep_callout);
>> +       FREE_LOCK(&lk);
>>         hashdestroy(pagedep_hashtbl, M_PAGEDEP, pagedep_hash);
>>         hashdestroy(inodedep_hashtbl, M_INODEDEP, inodedep_hash);
>>         hashdestroy(newblk_hashtbl, M_NEWBLK, newblk_hash);
>>     
>
> Don't hold the mutex over a drain and leave the blank line at the start of the
> function (style(9)).
>   
Thanks. This point was not completely clear from the man page (whether 
to hold the lock around it or not). I went looking around for examples 
of this... Had I looked further, I would have found my answer in 
bge_detach of if_bge.c.
>   
>> @@ -5858,8 +5863,16 @@ request_cleanup(mp, resource)
>>          * We wait at most tickdelay before proceeding in any case.
>>          */
>>         proc_waiting += 1;
>> -       if (handle.callout == NULL)
>> -               handle = timeout(pause_timer, 0, tickdelay > 2 ? tickdelay : 2);
>> +       ACQUIRE_LOCK(&lk);
>> +       if(callout_active(&softdep_callout) == FALSE) {
>> +               /* 
>> +                        should always return zero due to callout_active being called to verify that no active
>> +                        timeout already exists, which is the case where this would return non-zero (and
>> +                        callout_active(&softdep_callout) would be TRUE.
>> +    */
>> +               callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2, pause_timer, 0);
>> +       }
>> +       FREE_LOCK(&lk);
>>         msleep((caddr_t)&proc_waiting, &lk, PPAUSE, "softupdate", 0);
>>         proc_waiting -= 1;
>>         return (1);
>>     
>
> The lock is already held, so no need to lock it again.  Also, space after
> 'if'.  I'm not sure the new comment is needed as the reader can already
> infer that from the callout_active() test.  Also, I think you really want
> callout_pending() rather than callout_active() if pause_timer() executes
> normally without rescheduling itself the callout will still be marked
> active and the next time this function is invoked it won't schedule the
> callout.
>   
Thanks, I see this now. Every call to request_cleanup seems to already 
acquire lk. This solves the use of callout_deactivate, below.
>   
>> @@ -5873,15 +5886,17 @@ static void
>>  pause_timer(arg)
>>         void *arg;
>>  {
>> -
>> -       ACQUIRE_LOCK(&lk);
>> +       /* Implied by callout_* API */
>> +       /* ACQUIRE_LOCK(&lk); */
>>         *stat_countp += 1;
>>         wakeup_one(&proc_waiting);
>> -       if (proc_waiting > 0)
>> -               handle = timeout(pause_timer, 0, tickdelay > 2 ? tickdelay : 2);
>> -       else
>> -               handle.callout = NULL;
>> -       FREE_LOCK(&lk);
>> +       if (proc_waiting > 0) {
>> +               /* We don't care about the return value here. */
>> +               callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2, pause_timer, 0);
>> +       } else {
>> +               callout_deactivate(&softdep_callout);
>> +       }
>> +       /* FREE_LOCK(&lk); */
>>  }
>>     
>
> No need to use callout_deactivate() here, the callout is already deactivated
> when it is invoked.  I think you can also leave out the comment about the
> return value as the vast majority of places in the kernel that call
> callout_reset() ignore the return value, so it is a common practice.
>   
Technically, the callout is no longer considered "pending". According to 
the man page, it isn't deactivated at the return of pause_timer. 
Nonetheless, the pointer above about s/callout_active/callout_pending/ 
makes this check here unnecessary, and I'm sure that's what you're 
meaning by this comment.

I am attaching the revised patch.

--
Coleman Kane


--------------080002040909020109040007
Content-Type: text/x-patch;
 name="ffs_softdep.c-newcallout2.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="ffs_softdep.c-newcallout2.diff"

diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
index 3e8ba26..d5c8536 100644
--- a/sys/ufs/ffs/ffs_softdep.c
+++ b/sys/ufs/ffs/ffs_softdep.c
@@ -664,7 +664,7 @@ static int maxindirdeps = 50;	/* max number of indirdeps before slowdown */
 static int tickdelay = 2;	/* number of ticks to pause during slowdown */
 static int proc_waiting;	/* tracks whether we have a timeout posted */
 static int *stat_countp;	/* statistic to count in proc_waiting timeout */
-static struct callout_handle handle; /* handle on posted proc_waiting timeout */
+static struct callout softdep_callout;
 static int req_pending;
 static int req_clear_inodedeps;	/* syncer process flush some inodedeps */
 #define FLUSH_INODES		1
@@ -1394,6 +1394,9 @@ softdep_initialize()
 	bioops.io_complete = softdep_disk_write_complete;
 	bioops.io_deallocate = softdep_deallocate_dependencies;
 	bioops.io_countdeps = softdep_count_dependencies;
+
+	/* Initialize the callout with an mtx. */
+	callout_init_mtx(&softdep_callout, &lk, 0);
 }
 
 /*
@@ -1404,6 +1407,7 @@ void
 softdep_uninitialize()
 {
 
+	callout_drain(&softdep_callout);
 	hashdestroy(pagedep_hashtbl, M_PAGEDEP, pagedep_hash);
 	hashdestroy(inodedep_hashtbl, M_INODEDEP, inodedep_hash);
 	hashdestroy(newblk_hashtbl, M_NEWBLK, newblk_hash);
@@ -5858,8 +5862,9 @@ request_cleanup(mp, resource)
 	 * We wait at most tickdelay before proceeding in any case.
 	 */
 	proc_waiting += 1;
-	if (handle.callout == NULL)
-		handle = timeout(pause_timer, 0, tickdelay > 2 ? tickdelay : 2);
+	if (callout_pending(&softdep_callout) == FALSE) {
+		callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2, pause_timer, 0);
+	}
 	msleep((caddr_t)&proc_waiting, &lk, PPAUSE, "softupdate", 0);
 	proc_waiting -= 1;
 	return (1);
@@ -5874,14 +5879,12 @@ pause_timer(arg)
 	void *arg;
 {
 
-	ACQUIRE_LOCK(&lk);
+	/* The callout_ API has acquired mtx and will hold it around this function call. */
 	*stat_countp += 1;
 	wakeup_one(&proc_waiting);
-	if (proc_waiting > 0)
-		handle = timeout(pause_timer, 0, tickdelay > 2 ? tickdelay : 2);
-	else
-		handle.callout = NULL;
-	FREE_LOCK(&lk);
+	if (proc_waiting > 0) {
+		callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2, pause_timer, 0);
+	}
 }
 
 /*

--------------080002040909020109040007--

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 15:10:33 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F03071065679
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 15:10:33 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219])
	by mx1.freebsd.org (Postfix) with ESMTP id 7899E8FC1E
	for <freebsd-arch@freebsd.org>; Wed, 12 Mar 2008 15:10:32 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (unverified [66.23.211.162]) 
	by speedfactory.net (SurgeMail 3.8s) with ESMTP id 235191287-1834499 
	for multiple; Wed, 12 Mar 2008 11:08:33 -0400
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2CFAMTJ015215;
	Wed, 12 Mar 2008 11:10:23 -0400 (EDT) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: Coleman Kane <cokane@cokane.org>
Date: Wed, 12 Mar 2008 10:58:03 -0400
User-Agent: KMail/1.9.7
References: <47D7C25D.5070908@cokane.org> <200803120945.29018.jhb@freebsd.org>
	<47D7E5BF.2060102@cokane.org>
In-Reply-To: <47D7E5BF.2060102@cokane.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803121058.04096.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Wed, 12 Mar 2008 11:10:23 -0400 (EDT)
X-Virus-Scanned: ClamAV 0.91.2/6206/Wed Mar 12 07:16:10 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: freebsd-arch@freebsd.org
Subject: Re: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 15:10:34 -0000

On Wednesday 12 March 2008 10:16:31 am Coleman Kane wrote:
> I am attaching the revised patch.

Looks good.  I would perhaps not add the extra {}'s around the single-line if 
clauses as it slightly obfuscates the diff (style(9) actually suggests no 
{}'s in that case, but I think in practice our sources have a mixture of 
both).

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 15:28:06 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 863D4106566B
	for <arch@freebsd.org>; Wed, 12 Mar 2008 15:28:06 +0000 (UTC)
	(envelope-from cokane@cokane.org)
Received: from QMTA09.emeryville.ca.mail.comcast.net
	(qmta09.emeryville.ca.mail.comcast.net [76.96.30.96])
	by mx1.freebsd.org (Postfix) with ESMTP id 65DA88FC20
	for <arch@freebsd.org>; Wed, 12 Mar 2008 15:28:05 +0000 (UTC)
	(envelope-from cokane@cokane.org)
Received: from OMTA12.emeryville.ca.mail.comcast.net ([76.96.30.44])
	by QMTA09.emeryville.ca.mail.comcast.net with comcast
	id 0EVA1Z0010x6nqcA902u00; Wed, 12 Mar 2008 15:11:13 +0000
Received: from discordia ([24.61.189.203])
	by OMTA12.emeryville.ca.mail.comcast.net with comcast
	id 0FC21Z00H4PktZC8Y00000; Wed, 12 Mar 2008 15:12:03 +0000
X-Authority-Analysis: v=1.0 c=1 a=yWIViUiLWPYA:10 a=i5KzmmKIZ9Ex48Ii2mMA:9
	a=QSIy1mGstWZsU_lFX9-BSgP05_sA:4 a=oltf0pfCdT4A:10
	a=-rtcXVvtY7498SX-e9UA:9
	a=lEhTTu5oXNJMJ4XxhzkA:7 a=n-E81eKMzrkRIg9F2d6QuEMDThgA:4
	a=NfA2RSpTaHsA:10
Received: by discordia (Postfix, from userid 103)
	id 429AC1636FA; Wed, 12 Mar 2008 11:12:02 -0400 (EDT)
X-Spam-Checker-Version: SpamAssassin 3.1.8-gr1 (2007-02-13) on discordia
X-Spam-Level: 
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.1.8-gr1
Received: from [172.20.1.3] (erwin.int.cokane.org [172.20.1.3])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by discordia (Postfix) with ESMTP id D1E001636F8;
	Wed, 12 Mar 2008 11:11:50 -0400 (EDT)
Message-ID: <47D7F1EC.6040802@cokane.org>
Date: Wed, 12 Mar 2008 11:08:28 -0400
From: Coleman Kane <cokane@cokane.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080304)
MIME-Version: 1.0
To: obrien@FreeBSD.org
References: <47D7C25D.5070908@cokane.org> <200803120945.29018.jhb@freebsd.org>
	<47D7E5BF.2060102@cokane.org>
	<20080312145734.GB26812@dragon.NUXI.org>
In-Reply-To: <20080312145734.GB26812@dragon.NUXI.org>
Content-Type: multipart/mixed; boundary="------------020807070709030700060709"
Cc: arch@FreeBSD.org, "jh >> John Baldwin" <jhb@FreeBSD.org>
Subject: Re: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 15:28:06 -0000

This is a multi-part message in MIME format.
--------------020807070709030700060709
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

David O'Brien wrote:
> On Wed, Mar 12, 2008 at 10:16:31AM -0400, Coleman Kane wrote:
>   
>> I am attaching the revised patch.
>>     
> ..
>   
>> +		callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2, pause_timer, 0);
>>     
>
> Wrap long line.
>
>   
>> +	/* The callout_ API has acquired mtx and will hold it around this function call. */
>>     
>
> Ditto.
>
>   
>> +		callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2, pause_timer, 0);
>>     
>
> Ditto.
>   
Third try at the patch, properly adjusting my vim tabs to 8 spaces as 
they should be so that I can follow style(9).

--
Coleman Kane


--------------020807070709030700060709
Content-Type: text/x-patch;
 name="ffs_softdep.c-newcallout3.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="ffs_softdep.c-newcallout3.diff"

diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
index 3e8ba26..457ed49 100644
--- a/sys/ufs/ffs/ffs_softdep.c
+++ b/sys/ufs/ffs/ffs_softdep.c
@@ -664,7 +664,7 @@ static int maxindirdeps = 50;	/* max number of indirdeps before slowdown */
 static int tickdelay = 2;	/* number of ticks to pause during slowdown */
 static int proc_waiting;	/* tracks whether we have a timeout posted */
 static int *stat_countp;	/* statistic to count in proc_waiting timeout */
-static struct callout_handle handle; /* handle on posted proc_waiting timeout */
+static struct callout softdep_callout;
 static int req_pending;
 static int req_clear_inodedeps;	/* syncer process flush some inodedeps */
 #define FLUSH_INODES		1
@@ -1394,6 +1394,9 @@ softdep_initialize()
 	bioops.io_complete = softdep_disk_write_complete;
 	bioops.io_deallocate = softdep_deallocate_dependencies;
 	bioops.io_countdeps = softdep_count_dependencies;
+
+	/* Initialize the callout with an mtx. */
+	callout_init_mtx(&softdep_callout, &lk, 0);
 }
 
 /*
@@ -1404,6 +1407,7 @@ void
 softdep_uninitialize()
 {
 
+	callout_drain(&softdep_callout);
 	hashdestroy(pagedep_hashtbl, M_PAGEDEP, pagedep_hash);
 	hashdestroy(inodedep_hashtbl, M_INODEDEP, inodedep_hash);
 	hashdestroy(newblk_hashtbl, M_NEWBLK, newblk_hash);
@@ -5858,8 +5862,10 @@ request_cleanup(mp, resource)
 	 * We wait at most tickdelay before proceeding in any case.
 	 */
 	proc_waiting += 1;
-	if (handle.callout == NULL)
-		handle = timeout(pause_timer, 0, tickdelay > 2 ? tickdelay : 2);
+	if (callout_pending(&softdep_callout) == FALSE) {
+		callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2,
+		    pause_timer, 0);
+	}
 	msleep((caddr_t)&proc_waiting, &lk, PPAUSE, "softupdate", 0);
 	proc_waiting -= 1;
 	return (1);
@@ -5874,14 +5880,16 @@ pause_timer(arg)
 	void *arg;
 {
 
-	ACQUIRE_LOCK(&lk);
+	/*
+	 * The callout_ API has acquired mtx and will hold it around this
+	 * function call.
+	 */
 	*stat_countp += 1;
 	wakeup_one(&proc_waiting);
-	if (proc_waiting > 0)
-		handle = timeout(pause_timer, 0, tickdelay > 2 ? tickdelay : 2);
-	else
-		handle.callout = NULL;
-	FREE_LOCK(&lk);
+	if (proc_waiting > 0) {
+		callout_reset(&softdep_callout, tickdelay > 2 ? tickdelay : 2,
+		    pause_timer, 0);
+	}
 }
 
 /*

--------------020807070709030700060709--

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 22:23:12 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B143A1065673
	for <arch@FreeBSD.org>; Wed, 12 Mar 2008 22:23:12 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 7EB7A8FC14
	for <arch@FreeBSD.org>; Wed, 12 Mar 2008 22:23:12 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com
	[24.94.75.93]) (authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	m2CMN5VK077024; Wed, 12 Mar 2008 18:23:11 -0400 (EDT)
	(envelope-from jroberson@chesapeake.net)
Date: Wed, 12 Mar 2008 12:24:07 -1000 (HST)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@desktop
To: Gary Jennejohn <gary.jennejohn@freenet.de>
In-Reply-To: <20080312110229.5aeefc1f@peedub.jennejohn.org>
Message-ID: <20080312122300.Y1091@desktop>
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
	<20080312110229.5aeefc1f@peedub.jennejohn.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@FreeBSD.org
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 22:23:12 -0000


On Wed, 12 Mar 2008, Gary Jennejohn wrote:

> On Wed, 12 Mar 2008 12:14:36 +0800
> David Xu <davidxu@FreeBSD.org> wrote:
>
>> Jeff Roberson wrote:
>>> http://people.freebsd.org/~jeff/amd64.diff
>>
>> This is a good idea. In fact, according to calling conversion, some
>> registers are not needed to be saved across function call, e.g on
>> i386, eax, edx, and ecx. :-) but gdb may need them to dig out
>> stack variable's value.
>>
>
> I applied this patch yesterday on an AMD64 X2 box and got this panic
> today after I started X:
>
> Unread portion of the kernel message buffer:
> panic: smp_tlb_shootdown: interrupts disabled
> cpuid = 0
> Uptime: 47s
> Physical memory: 3062 MB
> Dumping 169 MB: 154 138 122 106 90 74 58 42 26 10
>
> That's all the useful information which I have because the back trace
> is corrupted.
>
> BTW I'm using SCHED_ULE.
>
> Maybe I shouldn't have tried this patch yet since it doesn't seem to be SMP
> ready.

Thanks for testing.  I just ran into that panic myself.  I don't think 
it's a SMP problem.  In general things on arch@ are sometimes more 
experimental than things we mail to to current@ asking for people to test.

Thanks,
Jeff

>
> ---
> Gary Jennejohn
>

From owner-freebsd-arch@FreeBSD.ORG  Wed Mar 12 23:51:27 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 07C341065675
	for <arch@FreeBSD.org>; Wed, 12 Mar 2008 23:51:27 +0000 (UTC)
	(envelope-from scf@FreeBSD.org)
Received: from mail.farley.org (farley.org [67.64.95.201])
	by mx1.freebsd.org (Postfix) with ESMTP id A0FE28FC24
	for <arch@FreeBSD.org>; Wed, 12 Mar 2008 23:51:26 +0000 (UTC)
	(envelope-from scf@FreeBSD.org)
Received: from thor.farley.org (thor.farley.org [192.168.1.5])
	by mail.farley.org (8.14.2/8.14.2) with ESMTP id m2CNOL5d035933;
	Wed, 12 Mar 2008 18:24:21 -0500 (CDT) (envelope-from scf@FreeBSD.org)
Date: Wed, 12 Mar 2008 18:24:21 -0500 (CDT)
From: "Sean C. Farley" <scf@FreeBSD.org>
To: Coleman Kane <cokane@cokane.org>
In-Reply-To: <47D7F1EC.6040802@cokane.org>
Message-ID: <alpine.BSF.1.00.0803121820220.75171@thor.farley.org>
References: <47D7C25D.5070908@cokane.org> <200803120945.29018.jhb@freebsd.org>
	<47D7E5BF.2060102@cokane.org>
	<20080312145734.GB26812@dragon.NUXI.org>
	<47D7F1EC.6040802@cokane.org>
User-Agent: Alpine 1.00 (BSF 882 2007-12-20)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.4
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on mail.farley.org
Cc: arch@FreeBSD.org
Subject: Re: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Mar 2008 23:51:27 -0000

On Wed, 12 Mar 2008, Coleman Kane wrote:

> Third try at the patch, properly adjusting my vim tabs to 8 spaces as
> they should be so that I can follow style(9).

I wrote a function[1] last year to configure vim to follow style(9).
Just run ':call FreeBSD_Style()' while editing a file.

Sean
   1. http://www.farley.org/freebsd/tmp/VIM/FreeBSD.vim
-- 
scf@FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 01:41:24 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1F5ED106566C
	for <arch@freebsd.org>; Thu, 13 Mar 2008 01:41:24 +0000 (UTC)
	(envelope-from cokane@cokane.org)
Received: from QMTA02.emeryville.ca.mail.comcast.net
	(qmta02.emeryville.ca.mail.comcast.net [76.96.30.24])
	by mx1.freebsd.org (Postfix) with ESMTP id 0C54E8FC24
	for <arch@freebsd.org>; Thu, 13 Mar 2008 01:41:24 +0000 (UTC)
	(envelope-from cokane@cokane.org)
Received: from OMTA10.emeryville.ca.mail.comcast.net ([76.96.30.28])
	by QMTA02.emeryville.ca.mail.comcast.net with comcast
	id 0Ppk1Z0020cQ2SLA205y00; Thu, 13 Mar 2008 01:40:35 +0000
Received: from discordia ([24.61.189.203])
	by OMTA10.emeryville.ca.mail.comcast.net with comcast
	id 0RhN1Z0064PktZC8W00000; Thu, 13 Mar 2008 01:41:23 +0000
X-Authority-Analysis: v=1.0 c=1 a=yWIViUiLWPYA:10 a=cgcLz6ojAAAA:8
	a=WsdIhoTgK6fTm0dQQWAA:9 a=1IvbqGLK4eI6TWeiJ0fnpeGqzLUA:4
	a=BDXKcin-EtgA:10
Received: by discordia (Postfix, from userid 103)
	id 2C0AF1636FA; Wed, 12 Mar 2008 21:41:22 -0400 (EDT)
X-Spam-Checker-Version: SpamAssassin 3.1.8-gr1 (2007-02-13) on discordia
X-Spam-Level: 
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.1.8-gr1
Received: from [172.20.1.3] (erwin.int.cokane.org [172.20.1.3])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by discordia (Postfix) with ESMTP id 3747B1636F8;
	Wed, 12 Mar 2008 21:41:05 -0400 (EDT)
Message-ID: <47D88568.7000105@cokane.org>
Date: Wed, 12 Mar 2008 21:37:44 -0400
From: Coleman Kane <cokane@cokane.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080304)
MIME-Version: 1.0
To: "Sean C. Farley" <scf@FreeBSD.org>
References: <47D7C25D.5070908@cokane.org> <200803120945.29018.jhb@freebsd.org>
	<47D7E5BF.2060102@cokane.org>
	<20080312145734.GB26812@dragon.NUXI.org>
	<47D7F1EC.6040802@cokane.org>
	<alpine.BSF.1.00.0803121820220.75171@thor.farley.org>
In-Reply-To: <alpine.BSF.1.00.0803121820220.75171@thor.farley.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: arch@FreeBSD.org
Subject: Re: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 01:41:24 -0000

Sean C. Farley wrote:
> On Wed, 12 Mar 2008, Coleman Kane wrote:
>
>> Third try at the patch, properly adjusting my vim tabs to 8 spaces as
>> they should be so that I can follow style(9).
>
> I wrote a function[1] last year to configure vim to follow style(9).
> Just run ':call FreeBSD_Style()' while editing a file.
>
> Sean
>   1. http://www.farley.org/freebsd/tmp/VIM/FreeBSD.vim
Rock on.

This should be in the committers' guide or something.

--
Coleman

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 03:39:01 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 87AB41065670
	for <arch@FreeBSD.org>; Thu, 13 Mar 2008 03:39:01 +0000 (UTC)
	(envelope-from scf@FreeBSD.org)
Received: from mail.farley.org (farley.org [67.64.95.201])
	by mx1.freebsd.org (Postfix) with ESMTP id 642EA8FC1F
	for <arch@FreeBSD.org>; Thu, 13 Mar 2008 03:39:01 +0000 (UTC)
	(envelope-from scf@FreeBSD.org)
Received: from thor.farley.org (thor.farley.org [192.168.1.5])
	by mail.farley.org (8.14.2/8.14.2) with ESMTP id m2D3cwwf040244
	for <arch@freebsd.org>; Wed, 12 Mar 2008 22:38:58 -0500 (CDT)
	(envelope-from scf@FreeBSD.org)
Date: Wed, 12 Mar 2008 22:38:58 -0500 (CDT)
From: "Sean C. Farley" <scf@FreeBSD.org>
To: arch@FreeBSD.org
Message-ID: <alpine.BSF.1.00.0803122219250.75171@thor.farley.org>
User-Agent: Alpine 1.00 (BSF 882 2007-12-20)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.4
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on mail.farley.org
Cc: 
Subject: [RFC] struct grp related additions to libutil
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 03:39:01 -0000

I have written four functions related to struct grp processing that I
would like to add to libutil.  They are modeled in part after similar
calls in libutil/pw_util.c.  The calls are:
1. int gr_equal(const struct group *gr1, const struct group *gr2)
    Compares the values of two group structures.  It does a thorough, yet
    unoptimized comparison of all the members regardless of order.

2. char *gr_make(const struct group *gr)
    Creates a string (as would exist within /etc/group) from a group
    structure.

3. struct group *gr_dup(const struct group *gr)
    Duplicate a group structure.  Returned valued is a contiguous block
    of memory.

4. struct group *gr_scan(const char *line)
    Creates a group structure from a string (as produced by gr_make()).


Questions:
1. What requirements are there for making additions/changes to libutil?
2. Will there be any issues with having gr_equal() return a bool?
    Currently, it is returning an int.

I made patches with regression tests for both HEAD[1] and RELENG_7[2].

Sean
   1. http://www.farley.org/freebsd/tmp/gr_util/libutil-grp-HEAD.patch
   2. http://www.farley.org/freebsd/tmp/gr_util/libutil-grp-RELENG_7.patch
-- 
scf@FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 06:22:52 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9BEC6106566C;
	Thu, 13 Mar 2008 06:22:52 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx07.syd.optusnet.com.au
	(fallbackmx07.syd.optusnet.com.au [211.29.132.9])
	by mx1.freebsd.org (Postfix) with ESMTP id E4B8C8FC25;
	Thu, 13 Mar 2008 06:22:46 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au
	[211.29.132.199])
	by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	m2D2m0Qi018588; Thu, 13 Mar 2008 13:48:00 +1100
Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au
	(c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11])
	by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	m2D2lut1025522
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 13 Mar 2008 13:47:58 +1100
Date: Thu, 13 Mar 2008 13:47:56 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Peter Wemm <peter@wemm.org>
In-Reply-To: <e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
Message-ID: <20080313124213.J31200@delplex.bde.org>
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
	<e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 06:22:52 -0000

On Wed, 12 Mar 2008, Peter Wemm wrote:

> On Tue, Mar 11, 2008 at 9:14 PM, David Xu <davidxu@freebsd.org> wrote:
>> Jeff Roberson wrote:
>> > http://people.freebsd.org/~jeff/amd64.diff
>>
>>  This is a good idea.

I wouldn't have expected it to make much difference.  On i386 UP,
cpu_switch() normally executes only 48 instructions for in-kernel
context switches in my version of 5.2 and only 61 instructions in
-current.  ~5.2 differs from 5.2 here in only in not having to
switch %eflags.  This saves 4 instructions but much more in cycles,
especially in P4 where accesses to %eflags are very slow.  5.2 would
take 52 instructions, and -current has bloated by 9 instructions
relative to 5.2.

In-kernel switches are not a very typical case since they don't load
%cr3.  The 50-60 instructions might take as few as 20 cycles when
pipelined through 3 ALUs, but they are only moderately parallelizable
so would take more like 50-60 cycles on an Athlon.  The only very slow
instructions in them for the usual in-kernel case are the loads of
%eflags and %gs.  At least the latter is easy-to optimize away, but
the former is assoicated with spin locking hard-disabling interrupts.
For userland context switches, there is also an ltr in the usual path
of execution.  But 100 or so cycles for the simple instructions is
noise compared with the cost of the TLB flush and other cache misses
caused by loading %cr3 for userland context switches.  Userland code
that does useful work will do more than sched_yield() so it will suffer
more from cache misses.

Layers above cpu_switch() has become very bloated and make a full
context switch take several hundred cycles for the simple instructions
on machines where the simple instructions in cpu_switch() take only
100.  Its overhead may almost be signficant relative to the cache
misses.  However, this is another reason why the speed of the simple
instructions in cpu_switch() doesn't matter.

>>  In fact, according to calling conversion, some
>>  registers are not needed to be saved across function call, e.g on
>>  i386, eax, edx, and ecx. :-) but gdb may need them to dig out
>>  stack variable's value.

The asm code already saves only call-saved registers for both i386 and
amd64.  It saves call-saved registers even when it apparently doesn't
use them (lots more of these on amd64, while on i386 it uses more
call-saved registers than it needs to, apparently since this is free
after saving all call-saved registers).  I think saving more than is
needed is the result of confusion about what needs to be saved and/or
what is needed for debugging.

> Jeff and I have been having a friendly "competition" today.
>
> With a UP kernel and INVARIANTS, my initial counter-patch response had
> nearly double the gain on my machine.  (Jeff 7%, mine: 13.5%).
> I changed to compile kernels the same as he did (no invariants, SMP
> kernel, but kern.smp.disabled=1).  After that, our patch sets were the
> same again - both at about 10% gain over baseline.
>
> I've made a few more changes and am now at 23% improvement over baseline.
>
> I'm not confident of testing methodology.  More tests are in progress.
>
> The good news is that this tuning is finally being done.  It should
> have been done in 2003 though...

How is this possible with (according to my theory) most of the context
switch cost being for %cr3 and upper layers?  Unchanged amd64 has only
a few more costs than i386.  Mainly 3 unconditional wrmsr's and 2
unconditional rdmsr's for managing gsbase and fsbase.  I thought that
these were hard to avoid and anyway not nearly as expensive as %cr3 loads.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 07:28:20 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 229651065677;
	Thu, 13 Mar 2008 07:28:20 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id E4E798FC27;
	Thu, 13 Mar 2008 07:28:19 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com
	[24.94.75.93]) (authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	m2D7SDUk061861; Thu, 13 Mar 2008 03:28:14 -0400 (EDT)
	(envelope-from jroberson@chesapeake.net)
Date: Wed, 12 Mar 2008 21:29:18 -1000 (HST)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@desktop
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20080313124213.J31200@delplex.bde.org>
Message-ID: <20080312211834.T1091@desktop>
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
	<e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
	<20080313124213.J31200@delplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org, David Xu <davidxu@freebsd.org>,
	Peter Wemm <peter@wemm.org>
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 07:28:20 -0000

On Thu, 13 Mar 2008, Bruce Evans wrote:

> On Wed, 12 Mar 2008, Peter Wemm wrote:
>
>> On Tue, Mar 11, 2008 at 9:14 PM, David Xu <davidxu@freebsd.org> wrote:
>>> Jeff Roberson wrote:
>>> > http://people.freebsd.org/~jeff/amd64.diff
>>>
>>>  This is a good idea.
>
> I wouldn't have expected it to make much difference.  On i386 UP,
> cpu_switch() normally executes only 48 instructions for in-kernel
> context switches in my version of 5.2 and only 61 instructions in
> -current.  ~5.2 differs from 5.2 here in only in not having to
> switch %eflags.  This saves 4 instructions but much more in cycles,
> especially in P4 where accesses to %eflags are very slow.  5.2 would
> take 52 instructions, and -current has bloated by 9 instructions
> relative to 5.2.

More expensive than the raw instruction count is:

1)  The mispredicted branches to deal with all of the optional state and 
features that are not always saved.
2)  The cost of extra icache for getting over all of those unused 
instructions, unaligned jumps, etc.

I haven't looked at i386 very closely lately but on amd64 the wrmsrs for 
fs/gsbase are very expensive.  On my 2ghz dual core opteron the optimized 
switch seems to take about 100ns.  The total switch from userspace to 
userspace is about 4x that.

>
> In-kernel switches are not a very typical case since they don't load
> %cr3.  The 50-60 instructions might take as few as 20 cycles when
> pipelined through 3 ALUs, but they are only moderately parallelizable
> so would take more like 50-60 cycles on an Athlon.  The only very slow
> instructions in them for the usual in-kernel case are the loads of
> %eflags and %gs.  At least the latter is easy-to optimize away, but
> the former is assoicated with spin locking hard-disabling interrupts.
> For userland context switches, there is also an ltr in the usual path
> of execution.  But 100 or so cycles for the simple instructions is
> noise compared with the cost of the TLB flush and other cache misses
> caused by loading %cr3 for userland context switches.  Userland code
> that does useful work will do more than sched_yield() so it will suffer
> more from cache misses.
>

We've been working on amd64 so I can't comment specifically about i386 
costs.  However, I definitely agree that cpu_switch() is not the greatest 
overhead in the path.  Also, you have to load cr3 even for kernel threads 
because the page directory page or page directory pointer table at %cr3 
can go away once you've switched out the old thread.

> Layers above cpu_switch() has become very bloated and make a full
> context switch take several hundred cycles for the simple instructions
> on machines where the simple instructions in cpu_switch() take only
> 100.  Its overhead may almost be signficant relative to the cache
> misses.  However, this is another reason why the speed of the simple
> instructions in cpu_switch() doesn't matter.
>
>>>  In fact, according to calling conversion, some
>>>  registers are not needed to be saved across function call, e.g on
>>>  i386, eax, edx, and ecx. :-) but gdb may need them to dig out
>>>  stack variable's value.
>
> The asm code already saves only call-saved registers for both i386 and
> amd64.  It saves call-saved registers even when it apparently doesn't
> use them (lots more of these on amd64, while on i386 it uses more
> call-saved registers than it needs to, apparently since this is free
> after saving all call-saved registers).  I think saving more than is
> needed is the result of confusion about what needs to be saved and/or
> what is needed for debugging.

It has to save all of the callee saved registers in the PCB because they 
will likely differ from thread to thread.  Failing to save and restore 
them could leave you returning with the registers having different values 
and corrupt the calling function.

>
>> Jeff and I have been having a friendly "competition" today.
>> 
>> With a UP kernel and INVARIANTS, my initial counter-patch response had
>> nearly double the gain on my machine.  (Jeff 7%, mine: 13.5%).
>> I changed to compile kernels the same as he did (no invariants, SMP
>> kernel, but kern.smp.disabled=1).  After that, our patch sets were the
>> same again - both at about 10% gain over baseline.
>> 
>> I've made a few more changes and am now at 23% improvement over baseline.
>> 
>> I'm not confident of testing methodology.  More tests are in progress.
>> 
>> The good news is that this tuning is finally being done.  It should
>> have been done in 2003 though...
>
> How is this possible with (according to my theory) most of the context
> switch cost being for %cr3 and upper layers?  Unchanged amd64 has only
> a few more costs than i386.  Mainly 3 unconditional wrmsr's and 2
> unconditional rdmsr's for managing gsbase and fsbase.  I thought that
> these were hard to avoid and anyway not nearly as expensive as %cr3 loads.

%cr3 is actually a lot less expensive these days with page table flush 
filters and the PG_G bit.  We were able to optimize away setting the msrs 
in the case that the previous values match the new values.  Apparently the 
hardware doesn't optimize this case so we have to do comparisons 
ourselves.

That was a big chunk of the optimization.  Static branch hints, reordering 
code, possibly reordering for better pipeline scheduling in peter's asm, 
etc. provide the rest.

My primary motivation is to get ithread/kthread/taskqueue switch costs 
down for interrupt heavy applications.  There is a lot of unnecessary fat 
there.

Jeff

>
> Bruce
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 13:25:06 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 30D041065678
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 13:25:06 +0000 (UTC)
	(envelope-from freebsd-arch@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id D5CA98FC29
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 13:25:05 +0000 (UTC)
	(envelope-from freebsd-arch@m.gmane.org)
Received: from root by ciao.gmane.org with local (Exim 4.43)
	id 1JZnQc-0005wG-6x
	for freebsd-arch@freebsd.org; Thu, 13 Mar 2008 13:25:02 +0000
Received: from 195.208.174.178 ([195.208.174.178])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 13:25:02 +0000
Received: from vadim_nuclight by 195.208.174.178 with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 13:25:02 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-arch@freebsd.org
From: Vadim Goncharov <vadim_nuclight@mail.ru>
Date: Thu, 13 Mar 2008 13:24:49 +0000 (UTC)
Organization: Nuclear Lightning @ Tomsk, TPU AVTF Hostel
Lines: 20
Message-ID: <slrnftiap1.106o.vadim_nuclight@hostel.avtf.net>
References: <47D7C25D.5070908@cokane.org> <200803120945.29018.jhb@freebsd.org>
	<47D7E5BF.2060102@cokane.org>
	<20080312145734.GB26812@dragon.NUXI.org>
	<47D7F1EC.6040802@cokane.org>
	<alpine.BSF.1.00.0803121820220.75171@thor.farley.org>
	<47D88568.7000105@cokane.org>
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: 195.208.174.178
X-Comment-To: Coleman Kane
User-Agent: slrn/0.9.8.1 (FreeBSD)
Sender: news <news@ger.gmane.org>
Subject: Re: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: vadim_nuclight@mail.ru
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 13:25:06 -0000

Hi Coleman Kane! 

On Wed, 12 Mar 2008 21:37:44 -0400; Coleman Kane wrote about 'Re: SMPTODO: remove timeout(9) from ffs_softdep.c':

>>> Third try at the patch, properly adjusting my vim tabs to 8 spaces as
>>> they should be so that I can follow style(9).
>>
>> I wrote a function[1] last year to configure vim to follow style(9).
>> Just run ':call FreeBSD_Style()' while editing a file.
>>
>> Sean
>>   1. http://www.farley.org/freebsd/tmp/VIM/FreeBSD.vim
> Rock on.
> This should be in the committers' guide or something.

I vote for this too :)

-- 
WBR, Vadim Goncharov. ICQ#166852181       mailto:vadim_nuclight@mail.ru
[Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight]


From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 13:34:54 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2A1BD106566C;
	Thu, 13 Mar 2008 13:34:54 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id 933038FC19;
	Thu, 13 Mar 2008 13:34:53 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au
	(c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	m2DDYRFk006412
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 14 Mar 2008 00:34:29 +1100
Date: Fri, 14 Mar 2008 00:34:27 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Jeff Roberson <jroberson@chesapeake.net>
In-Reply-To: <20080312211834.T1091@desktop>
Message-ID: <20080313230809.W32527@delplex.bde.org>
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
	<e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
	<20080313124213.J31200@delplex.bde.org> <20080312211834.T1091@desktop>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org, Peter Wemm <peter@wemm.org>,
	David Xu <davidxu@freebsd.org>
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 13:34:54 -0000

On Wed, 12 Mar 2008, Jeff Roberson wrote:

> On Thu, 13 Mar 2008, Bruce Evans wrote:
>
>> On Wed, 12 Mar 2008, Peter Wemm wrote:
>> 
>>> On Tue, Mar 11, 2008 at 9:14 PM, David Xu <davidxu@freebsd.org> wrote:
>>>> Jeff Roberson wrote:
>>>> > http://people.freebsd.org/~jeff/amd64.diff
>>>> 
>>>>  This is a good idea.
>> 
>> I wouldn't have expected it to make much difference.  On i386 UP,
>> cpu_switch() normally executes only 48 instructions for in-kernel
>> context switches in my version of 5.2 and only 61 instructions in
>> -current.  ~5.2 differs from 5.2 here in only in not having to
>> switch %eflags.  This saves 4 instructions but much more in cycles,
>> especially in P4 where accesses to %eflags are very slow.  5.2 would
>> take 52 instructions, and -current has bloated by 9 instructions
>> relative to 5.2.
>
> More expensive than the raw instruction count is:
>
> 1)  The mispredicted branches to deal with all of the optional state and 
> features that are not always saved.

This is unlikely to matter, and apparently doesn't, at least in simple
benchmarks, since the C version has even more branches.  Features that
are rarely used cause branches that are usually perfectly predicted.

> 2)  The cost of extra icache for getting over all of those unused 
> instructions, unaligned jumps, etc.

Again, if this were the cause of slowness then it would affect the C
version more, since the C version is larger.

In fact, the benchmark is probably too simple to show the cost of
branches.  Just doing sched_yield() in a loop gives the following
atypical behaviour which may be atypical enough for the larger branch
and cache costs for the C version to not have much effect:
- it doesn't go near most of the special cases, so branches are
   predictable (always non-special) and are thus predicted provided
   (a) the CPU actually does reasonably good branch prediction, and
   (b) the branch predictions fit in the branch prediction cache
       (reasonably good branch prediction probably requires such a
       cache).
- it doesn't touch much icache or dcache or branch-cache, so
   everything probably stays cached.

If just the branch-cache were thrashed, then reasonably good dynamic
branch prediction is impossible and things would be slow.  In the C
version, you use predict_true() and predict_false() a lot.  This
might improve static branch prediction but makes little difference
if the branch cache is working.

The C version uses lots of non-inline function calls.  Just the
branches for this would have a significant overhead if the branches
are mispredicted.  I think you are depending on gcc's auto-inlining
of static functions which are only called once to avoid the full
cost of the function calls.

> I haven't looked at i386 very closely lately but on amd64 the wrmsrs for 
> fs/gsbase are very expensive.  On my 2ghz dual core opteron the optimized 
> switch seems to take about 100ns.  The total switch from userspace to 
> userspace is about 4x that.

Probably avoiding these is the only significant large between all
the versions.  You use predict_false() for executing them.  Are fsbase
and gsbase really usually constant across processes?

400nS is about what I get for i386 on 2.2GHz A64 UP too (6.17 S for
./yield 1000000 10).  getpid() on this machine takes 180nS so it is
unreasonable to expect sched_yield() to take much less than a few hundred
nS.

Some perfmon output for ./yield 100000 10:

% # s/kx-ls-microarchitectural-resync-by-self-mod-code 
% 0
% # s/kx-ls-buffer2-full 
% 909905
% # s/kx-ls-retired-cflush-instructions 
% 0
% # s/kx-ls-retired-cpuid-instructions 
% 0
% # s/kx-dc-accesses 
% 496436422
% # s/kx-dc-misses 
% 11102024

11 cache dmisses per yield.  Probably the main cause of slowness (main
memory latency on this machine is 42 nsec so 11 cache misses takes
462 of the 617 nS per call?).

% # s/kx-dc-refills-from-l2 
% 0
% # s/kx-dc-refills-from-system 
% 0
% # s/kx-dc-writebacks 
% 0
% # s/kx-dc-l1-dtlb-miss-and-l2-dtlb-hits 
% 3459100
% # s/kx-dc-l1-and-l2-dtlb-misses 
% 2138231
% # s/kx-dc-misaligned-references 
% 87
% # s/kx-dc-microarchitectural-late-cancel-of-an-access 
% 73146415
% # s/kx-dc-microarchitectural-early-cancel-of-an-access 
% 236927303
% # s/kx-bu-cpu-clk-unhalted 
% 1303921314
% # s/kx-ic-fetches 
% 236207869
% # s/kx-ic-misses 
% 22988

Insignificant icache misses.

% # s/kx-ic-refill-from-l2 
% 18979
% # s/kx-ic-refill-from-system 
% 4191
% # s/kx-ic-l1-itlb-misses 
% 0
% # s/kx-ic-l1-l2-itlb-misses 
% 1619297
% # s/kx-ic-instruction-fetch-stall 
% 1034570822
% # s/kx-ic-return-stack-hit 
% 20822416
% # s/kx-ic-return-stack-overflow 
% 5870
% # s/kx-fr-retired-instructions 
% 701240247
% # s/kx-fr-retired-ops 
% 1163464391
% # s/kx-fr-retired-branches 
% 121636370
% # s/kx-fr-retired-branches-mispredicted 
% 2761910
% # s/kx-fr-retired-taken-branches 
% 93488548
% # s/kx-fr-retired-taken-branches-mispredicted 
% 2848315

2.8 branches mispredicted per call.

# s/kx-fr-retired-far-control-transfers 
% 2000934

1 int0x80 and 1 iret per shched_yield(), and apparentlty not much else.

% # s/kx-fr-retired-resync-branches 
% 936968
% # s/kx-fr-retired-near-returns 
% 19008374
% # s/kx-fr-retired-near-returns-mispredicted 
% 784103

0.8 returns mispredicted per call.

% # s/kx-fr-retired-taken-branches-mispred-by-addr-miscompare 
% 721241
% # s/kx-fr-interrupts-masked-cycles 
% 658462615

Ugh, this is from spinlocks bogusly masking interrupts.  More than half
the cycles have interrupts masked.  This at least shows that lots of
time is being spent near cpu_switch() with a spinlock held.

% # s/kx-fr-interrupts-masked-while-pending-cycles 
% 9365

Since the CPU is reasonably fast, interrupts aren't masked for very long
each time.  This maximum is still 4.5 uS.

% # s/kx-fr-hardware-interrupts 
% 63
% # s/kx-fr-decoder-empty 
% 247898696
% # s/kx-fr-dispatch-stalls 
% 589228741
% # s/kx-fr-dispatch-stall-from-branch-abort-to-retire 
% 39894120
% # s/kx-fr-dispatch-stall-for-serialization 
% 44037193
% # s/kx-fr-dispatch-stall-for-segment-load 
% 134520281

134 cyles per call.  This may be more for ones in syscall() generally.
I think each segreg load still costs ~20 cycles.  Since this is on
i386, there are 6 per call (%ds, %es and %fs save and restore), plus
%ss save and which might not be counted here.  134 is a lot -- about
60nS of the 180nS for getpid().

% # s/kx-fr-dispatch-stall-when-reorder-buffer-is-full 
% 18648001
% # s/kx-fr-dispatch-stall-when-reservation-stations-are-full 
% 121485247
% # s/kx-fr-dispatch-stall-when-fpu-is-full 
% 19
% # s/kx-fr-dispatch-stall-when-ls-is-full 
% 203578275
% # s/kx-fr-dispatch-stall-when-waiting-for-all-to-be-quiet 
% 63136307
% # s/kx-fr-dispatch-stall-when-far-xfer-or-resync-br-pending 
% 6994131

>> In-kernel switches are not a very typical case since they don't load
>> %cr3...
>
> We've been working on amd64 so I can't comment specifically about i386 costs. 
> However, I definitely agree that cpu_switch() is not the greatest overhead in 
> the path.  Also, you have to load cr3 even for kernel threads because the 
> page directory page or page directory pointer table at %cr3 can go away once 
> you've switched out the old thread.

I don't see this.  The switch is avoided if %cr3 wouldn't change, which
I think usually or always happens for switches between kernel threads.

>> The asm code already saves only call-saved registers for both i386 and
>> amd64.  It saves call-saved registers even when it apparently doesn't
>> use them (lots more of these on amd64, while on i386 it uses more
>> call-saved registers than it needs to, apparently since this is free
>> after saving all call-saved registers).  I think saving more than is
>> needed is the result of confusion about what needs to be saved and/or
>> what is needed for debugging.
>
> It has to save all of the callee saved registers in the PCB because they will 
> likely differ from thread to thread.  Failing to save and restore them could 
> leave you returning with the registers having different values and corrupt 
> the calling function.

Yes, I had forgotten the detail of how the non-local flow of control can
change the registers (the next call to the function in the context of
the switched-to-process may have different values in the registers due
to changes to the registers in callers).

All that can be done differently here is saving all the registers on the
stack (except %esp) in the usual way.  This would probably be faster on
old i386's using pushal or pushl, but on amd64 pushal is not available,
and on Athlons generally (before Barcelona?) it is faster not to use pushl,
so on amd64 the registers should be saved using movl and then it is just
as easy to put them in the pcb as on the stack.

>>> The good news is that this tuning is finally being done.  It should
>>> have been done in 2003 though...
>> 
>> How is this possible with (according to my theory) most of the context
>> switch cost being for %cr3 and upper layers?  Unchanged amd64 has only
>> a few more costs than i386.  Mainly 3 unconditional wrmsr's and 2
>> unconditional rdmsr's for managing gsbase and fsbase.  I thought that
>> these were hard to avoid and anyway not nearly as expensive as %cr3 loads.
>
> %cr3 is actually a lot less expensive these days with page table flush 
> filters and the PG_G bit.  We were able to optimize away setting the msrs in 
> the case that the previous values match the new values.  Apparently the 
> hardware doesn't optimize this case so we have to do comparisons ourselves.
>
> That was a big chunk of the optimization.  Static branch hints, reordering 
> code, possibly reordering for better pipeline scheduling in peter's asm, etc. 
> provide the rest.

All the old i386 asm and probably clones of it on amd64 is certainly not
optimized globally for anything newer than an i386 (barely even an i486).
This rarely matters however.  It lost more on Pentium-1's, but now out of
order execution and better branch prediction hides most inefficiencies.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 13:50:38 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2C0FD1065705
	for <arch@freebsd.org>; Thu, 13 Mar 2008 13:50:38 +0000 (UTC)
	(envelope-from ed@hoeg.nl)
Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211])
	by mx1.freebsd.org (Postfix) with ESMTP id DBB818FC1B
	for <arch@freebsd.org>; Thu, 13 Mar 2008 13:50:36 +0000 (UTC)
	(envelope-from ed@hoeg.nl)
Received: by palm.hoeg.nl (Postfix, from userid 1000)
	id 567711CC44; Thu, 13 Mar 2008 14:50:35 +0100 (CET)
Date: Thu, 13 Mar 2008 14:50:35 +0100
From: Ed Schouten <ed@80386.nl>
To: FreeBSD Arch <arch@freebsd.org>
Message-ID: <20080313135035.GB80576@hoeg.nl>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="ht9V8wKec6a3w1Ef"
Content-Disposition: inline
User-Agent: Mutt/1.5.17 (2007-11-01)
Cc: 
Subject: New TTY layer: condvar(9) and Giant
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 13:50:38 -0000


--ht9V8wKec6a3w1Ef
Content-Type: multipart/mixed; boundary="FexDM9E/OpjgUmaq"
Content-Disposition: inline


--FexDM9E/OpjgUmaq
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello everyone,

Almost a month ago I started working on my assignment for my internship,
to reimplement a new TTY layer that fixes a lot of architectural
problems. So far, things are going quite fast:

- I've already implemented a basic TTY layer, which has support for
  canonical and non-canonical mode. It still misses important features
  including flow control, but it seems to work quite good. Unlike the
  old layer, it doesn't buffer data as much, which should hopefully mean
  it's a bit faster.
- I'm using a new PTY driver called pts(4). It works quite good, but it
  misses the compatibility bits, which we'll need to have to support
  older FreeBSD or Linux binaries.
- Some of you may have read I'm working on syscons now. I've got syscons
  working with the new TTY layer; I'm typing this message through
  syscons. ;-)

A lot of drivers that are used by the old TTY layer aren't mpsafe yet.
Of course, I'm willing to fix this, but this cannot be done in the
nearby future. This is why the new TTY layer should still allow TTY's to
be run under Giant.

In my initial implementation, each TTY device had its own mutex. In
theory, this is great. The PTY driver already uses this and it works
fine. There will be a lot of drivers, however, that want to use a
per-class mutex to lock all related TTY devices down at once (i.e.
syscons, which allocates 16 virtual TTY's). This is why I introduced a
per-class lock. When set to Giant, all TTY instances will lock down the
Giant lock when entering the TTY layer.

Unfortunately, I discovered condvar(9) can't properly unlock/lock the
Giant, which causes the system to panic. The condvar routines already
call DROP_GIANT before unlocking the lock itself.

I've attached a patch that adds support for Giant to condvar(9). I had
to patch sys/mutex.h a little, because we now only need to call
DROP_GIANT() under certain conditions. The macro's didn't allow that,
because DROP_GIANT starts a new code block.

I'm sending this to arch@, because I want to know if I'm doing something
silly. It seems to work properly on my machine, but I'm not an SMP
expert. ;-)

--=20
 Ed Schouten <ed@fxq.nl>
 WWW: http://g-rave.nl/

--FexDM9E/OpjgUmaq
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="condvar-giant.diff"
Content-Transfer-Encoding: quoted-printable

--- sys/kern/kern_condvar.c
+++ sys/kern/kern_condvar.c
@@ -95,6 +95,7 @@
 _cv_wait(struct cv *cvp, struct lock_object *lock)
 {
 	WITNESS_SAVE_DECL(lock_witness);
+	PARTIAL_DROP_GIANT_DECL();
 	struct lock_class *class;
 	struct thread *td;
 	int lock_state;
@@ -123,7 +124,8 @@
 	sleepq_lock(cvp);
=20
 	cvp->cv_waiters++;
-	DROP_GIANT();
+	if (lock !=3D (struct lock_object *)&Giant)
+		PARTIAL_DROP_GIANT();
=20
 	sleepq_add(cvp, lock, cvp->cv_description, SLEEPQ_CONDVAR, 0);
 	if (class->lc_flags & LC_SLEEPABLE)
@@ -137,7 +139,8 @@
 	if (KTRPOINT(td, KTR_CSW))
 		ktrcsw(0, 0);
 #endif
-	PICKUP_GIANT();
+	if (lock !=3D (struct lock_object *)&Giant)
+		PARTIAL_PICKUP_GIANT();
 	class->lc_lock(lock, lock_state);
 	WITNESS_RESTORE(lock, lock_witness);
 }
@@ -149,6 +152,7 @@
 void
 _cv_wait_unlock(struct cv *cvp, struct lock_object *lock)
 {
+	PARTIAL_DROP_GIANT_DECL();
 	struct lock_class *class;
 	struct thread *td;
=20
@@ -176,7 +180,8 @@
 	sleepq_lock(cvp);
=20
 	cvp->cv_waiters++;
-	DROP_GIANT();
+	if (lock !=3D (struct lock_object *)&Giant)
+		PARTIAL_DROP_GIANT();
=20
 	sleepq_add(cvp, lock, cvp->cv_description, SLEEPQ_CONDVAR, 0);
 	if (class->lc_flags & LC_SLEEPABLE)
@@ -190,7 +195,8 @@
 	if (KTRPOINT(td, KTR_CSW))
 		ktrcsw(0, 0);
 #endif
-	PICKUP_GIANT();
+	if (lock !=3D (struct lock_object *)&Giant)
+		PARTIAL_PICKUP_GIANT();
 }
=20
 /*
@@ -203,6 +209,7 @@
 _cv_wait_sig(struct cv *cvp, struct lock_object *lock)
 {
 	WITNESS_SAVE_DECL(lock_witness);
+	PARTIAL_DROP_GIANT_DECL();
 	struct lock_class *class;
 	struct thread *td;
 	struct proc *p;
@@ -233,7 +240,8 @@
 	sleepq_lock(cvp);
=20
 	cvp->cv_waiters++;
-	DROP_GIANT();
+	if (lock !=3D (struct lock_object *)&Giant)
+		PARTIAL_DROP_GIANT();
=20
 	sleepq_add(cvp, lock, cvp->cv_description, SLEEPQ_CONDVAR |
 	    SLEEPQ_INTERRUPTIBLE, 0);
@@ -248,7 +256,8 @@
 	if (KTRPOINT(td, KTR_CSW))
 		ktrcsw(0, 0);
 #endif
-	PICKUP_GIANT();
+	if (lock !=3D (struct lock_object *)&Giant)
+		PARTIAL_PICKUP_GIANT();
 	class->lc_lock(lock, lock_state);
 	WITNESS_RESTORE(lock, lock_witness);
=20
@@ -264,6 +273,7 @@
 _cv_timedwait(struct cv *cvp, struct lock_object *lock, int timo)
 {
 	WITNESS_SAVE_DECL(lock_witness);
+	PARTIAL_DROP_GIANT_DECL();
 	struct lock_class *class;
 	struct thread *td;
 	int lock_state, rval;
@@ -293,7 +303,8 @@
 	sleepq_lock(cvp);
=20
 	cvp->cv_waiters++;
-	DROP_GIANT();
+	if (lock !=3D (struct lock_object *)&Giant)
+		PARTIAL_DROP_GIANT();
=20
 	sleepq_add(cvp, lock, cvp->cv_description, SLEEPQ_CONDVAR, 0);
 	sleepq_set_timeout(cvp, timo);
@@ -308,7 +319,8 @@
 	if (KTRPOINT(td, KTR_CSW))
 		ktrcsw(0, 0);
 #endif
-	PICKUP_GIANT();
+	if (lock !=3D (struct lock_object *)&Giant)
+		PARTIAL_PICKUP_GIANT();
 	class->lc_lock(lock, lock_state);
 	WITNESS_RESTORE(lock, lock_witness);
=20
@@ -325,6 +337,7 @@
 _cv_timedwait_sig(struct cv *cvp, struct lock_object *lock, int timo)
 {
 	WITNESS_SAVE_DECL(lock_witness);
+	PARTIAL_DROP_GIANT_DECL();
 	struct lock_class *class;
 	struct thread *td;
 	struct proc *p;
@@ -356,7 +369,8 @@
 	sleepq_lock(cvp);
=20
 	cvp->cv_waiters++;
-	DROP_GIANT();
+	if (lock !=3D (struct lock_object *)&Giant)
+		PARTIAL_DROP_GIANT();
=20
 	sleepq_add(cvp, lock, cvp->cv_description, SLEEPQ_CONDVAR |
 	    SLEEPQ_INTERRUPTIBLE, 0);
@@ -372,7 +386,8 @@
 	if (KTRPOINT(td, KTR_CSW))
 		ktrcsw(0, 0);
 #endif
-	PICKUP_GIANT();
+	if (lock !=3D (struct lock_object *)&Giant)
+		PARTIAL_PICKUP_GIANT();
 	class->lc_lock(lock, lock_state);
 	WITNESS_RESTORE(lock, lock_witness);
=20
--- sys/sys/mutex.h
+++ sys/sys/mutex.h
@@ -368,26 +368,33 @@
 #ifndef DROP_GIANT
 #define DROP_GIANT()							\
 do {									\
+	PARTIAL_DROP_GIANT_DECL();					\
+	PARTIAL_DROP_GIANT();
+
+#define PARTIAL_DROP_GIANT_DECL()					\
 	int _giantcnt =3D 0;						\
-	WITNESS_SAVE_DECL(Giant);					\
-									\
+	WITNESS_SAVE_DECL(Giant);
+
+#define PARTIAL_DROP_GIANT() do {					\
 	if (mtx_owned(&Giant)) {					\
 		WITNESS_SAVE(&Giant.lock_object, Giant);		\
 		for (_giantcnt =3D 0; mtx_owned(&Giant); _giantcnt++)	\
 			mtx_unlock(&Giant);				\
-	}
+	}								\
+} while (0)
=20
 #define PICKUP_GIANT()							\
 	PARTIAL_PICKUP_GIANT();						\
 } while (0)
=20
-#define PARTIAL_PICKUP_GIANT()						\
+#define PARTIAL_PICKUP_GIANT() do {					\
 	mtx_assert(&Giant, MA_NOTOWNED);				\
 	if (_giantcnt > 0) {						\
 		while (_giantcnt--)					\
 			mtx_lock(&Giant);				\
 		WITNESS_RESTORE(&Giant.lock_object, Giant);		\
-	}
+	}								\
+} while(0)
 #endif
=20
 #define	UGAR(rval) do {							\

--FexDM9E/OpjgUmaq--

--ht9V8wKec6a3w1Ef
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (FreeBSD)

iEYEARECAAYFAkfZMSsACgkQ52SDGA2eCwXMFQCfYwaKBVHU7xCBZv/D+yglHfmk
7dEAn1mXTouuc66FGFTiiVnM6ylfLri5
=5MNx
-----END PGP SIGNATURE-----

--ht9V8wKec6a3w1Ef--

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 14:26:29 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 52B9C106566C
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 14:26:29 +0000 (UTC)
	(envelope-from cokane@freebsd.org)
Received: from QMTA08.emeryville.ca.mail.comcast.net
	(qmta08.emeryville.ca.mail.comcast.net [76.96.30.80])
	by mx1.freebsd.org (Postfix) with ESMTP id 24B058FC1E
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 14:26:29 +0000 (UTC)
	(envelope-from cokane@freebsd.org)
Received: from OMTA03.emeryville.ca.mail.comcast.net ([76.96.30.27])
	by QMTA08.emeryville.ca.mail.comcast.net with comcast
	id 0dTj1Z0040b6N64A804T00; Thu, 13 Mar 2008 14:15:46 +0000
Received: from discordia ([24.61.189.203])
	by OMTA03.emeryville.ca.mail.comcast.net with comcast
	id 0eGG1Z0074PktZC8P00000; Thu, 13 Mar 2008 14:16:17 +0000
X-Authority-Analysis: v=1.0 c=1 a=yWIViUiLWPYA:10 a=c5sTgUsrrxMA:10
	a=pW49DAFhvdKtc-utfFoA:9 a=pMymjJw0Sgto9zC7YjIuad9EabMA:4
	a=zUBsD6tbDSsA:10
Received: by discordia (Postfix, from userid 103)
	id 2870B1636F9; Thu, 13 Mar 2008 10:16:16 -0400 (EDT)
X-Spam-Checker-Version: SpamAssassin 3.1.8-gr1 (2007-02-13) on discordia
X-Spam-Level: 
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.1.8-gr1
Received: from [172.20.1.3] (erwin.int.cokane.org [172.20.1.3])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by discordia (Postfix) with ESMTP id C8ABE1636F8;
	Thu, 13 Mar 2008 10:16:02 -0400 (EDT)
Message-ID: <47D93656.2000203@FreeBSD.org>
Date: Thu, 13 Mar 2008 10:12:38 -0400
From: Coleman Kane <cokane@FreeBSD.org>
Organization: The FreeBSD Project
User-Agent: Thunderbird 2.0.0.12 (X11/20080312)
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>, freebsd-arch@freebsd.org
References: <47D7C25D.5070908@cokane.org> <200803120945.29018.jhb@freebsd.org>
	<47D7E5BF.2060102@cokane.org> <200803121058.04096.jhb@freebsd.org>
In-Reply-To: <200803121058.04096.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Cc: imp@FreeBSD.org, obrien@FreeBSD.org
Subject: Re: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: cokane@FreeBSD.org
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 14:26:29 -0000

John Baldwin wrote:
> On Wednesday 12 March 2008 10:16:31 am Coleman Kane wrote:
>   
>> I am attaching the revised patch.
>>     
>
> Looks good.  I would perhaps not add the extra {}'s around the single-line if 
> clauses as it slightly obfuscates the diff (style(9) actually suggests no 
> {}'s in that case, but I think in practice our sources have a mixture of 
> both).
>   
I'm going to commit the attached patch to ffs_softdep.c this afternoon 
(EDT) if there aren't any objections. Thanks for the thrice-over guys, 
you've been very helpful.

--
Coleman Kane


From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 14:50:40 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B0AB41065671
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 14:50:40 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.freebsd.org (Postfix) with ESMTP id 968B68FC16
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 14:50:40 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from zion.baldwin.cx (66-23-211-162.clients.speedfactory.net
	[66.23.211.162]) by elvis.mu.org (Postfix) with ESMTP id C99201A4D7C;
	Thu, 13 Mar 2008 07:49:43 -0700 (PDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Thu, 13 Mar 2008 10:49:57 -0400
User-Agent: KMail/1.9.7
References: <20080313135035.GB80576@hoeg.nl>
In-Reply-To: <20080313135035.GB80576@hoeg.nl>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803131049.58051.jhb@freebsd.org>
Cc: Ed Schouten <ed@80386.nl>
Subject: Re: New TTY layer: condvar(9) and Giant
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 14:50:40 -0000

On Thursday 13 March 2008 09:50:35 am Ed Schouten wrote:
> Hello everyone,
>
> Almost a month ago I started working on my assignment for my internship,
> to reimplement a new TTY layer that fixes a lot of architectural
> problems. So far, things are going quite fast:
>
> - I've already implemented a basic TTY layer, which has support for
>   canonical and non-canonical mode. It still misses important features
>   including flow control, but it seems to work quite good. Unlike the
>   old layer, it doesn't buffer data as much, which should hopefully mean
>   it's a bit faster.
> - I'm using a new PTY driver called pts(4). It works quite good, but it
>   misses the compatibility bits, which we'll need to have to support
>   older FreeBSD or Linux binaries.
> - Some of you may have read I'm working on syscons now. I've got syscons
>   working with the new TTY layer; I'm typing this message through
>   syscons. ;-)
>
> A lot of drivers that are used by the old TTY layer aren't mpsafe yet.
> Of course, I'm willing to fix this, but this cannot be done in the
> nearby future. This is why the new TTY layer should still allow TTY's to
> be run under Giant.
>
> In my initial implementation, each TTY device had its own mutex. In
> theory, this is great. The PTY driver already uses this and it works
> fine. There will be a lot of drivers, however, that want to use a
> per-class mutex to lock all related TTY devices down at once (i.e.
> syscons, which allocates 16 virtual TTY's). This is why I introduced a
> per-class lock. When set to Giant, all TTY instances will lock down the
> Giant lock when entering the TTY layer.
>
> Unfortunately, I discovered condvar(9) can't properly unlock/lock the
> Giant, which causes the system to panic. The condvar routines already
> call DROP_GIANT before unlocking the lock itself.
>
> I've attached a patch that adds support for Giant to condvar(9). I had
> to patch sys/mutex.h a little, because we now only need to call
> DROP_GIANT() under certain conditions. The macro's didn't allow that,
> because DROP_GIANT starts a new code block.
>
> I'm sending this to arch@, because I want to know if I'm doing something
> silly. It seems to work properly on my machine, but I'm not an SMP
> expert. ;-)

In general this sort of thing is discouraged as explicit use of Giant is 
discouraged.  It's magical properties (being implicitly dropped in places) 
can make it unsuitable for use as a regular mutex (though in practice any 
regular mutex would need to be dropped in the same places to avoid problems).  
In other driver locking cases the need for this has been avoided, although 
probably what I sort of forced CAM to do maybe isn't quite right.

Also, your patches won't work in the case of Giant being recursed (it will 
only drop Giant once and the sleeping thread will still own Giant).  If you 
do want to make this work my suggestion would be to make the lc_unlock and 
lc_lock not do anything for Giant.  You could either do this by 1) patching 
kern_convar.c so it does something like this:

	if (lock != &Giant.lo_object)
		cookie = class->lc_unlock(lock);

or instead patch the lc_lock/lc_unlock routines to just not do anything for 
Giant like so:

Index: kern_mutex.c
===================================================================
RCS file: /host/cvs/usr/cvs/src/sys/kern/kern_mutex.c,v
retrieving revision 1.205
diff -u -r1.205 kern_mutex.c
--- kern_mutex.c        13 Feb 2008 23:39:05 -0000      1.205
+++ kern_mutex.c        13 Mar 2008 14:49:04 -0000
@@ -134,6 +134,8 @@
 lock_mtx(struct lock_object *lock, int how)
 {

+       if (lock == &Giant.lo_object)
+               return;
        mtx_lock((struct mtx *)lock);
 }

@@ -149,6 +151,8 @@
 {
        struct mtx *m;

+       if (lock == &Giant.lo_object)
+               return (0);
        m = (struct mtx *)lock;
        mtx_assert(m, MA_OWNED | MA_NOTRECURSED);
        mtx_unlock(m);

I still don't like the idea of letting Giant work with msleep/cv_*wait*() 
because I think it will be abused.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 14:53:17 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 13F321065670
	for <arch@freebsd.org>; Thu, 13 Mar 2008 14:53:17 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from fk-out-0910.google.com (fk-out-0910.google.com [209.85.128.187])
	by mx1.freebsd.org (Postfix) with ESMTP id 8E2348FC15
	for <arch@freebsd.org>; Thu, 13 Mar 2008 14:53:16 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: by fk-out-0910.google.com with SMTP id b27so4085545fka.11
	for <arch@freebsd.org>; Thu, 13 Mar 2008 07:53:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth;
	bh=9l5zCdBHDeBRsdJ3CEnWMvKiABKfwJVi/E0GqdE9SyI=;
	b=xBv4hqW68xOPQk7nKGtfNrZf7m1P18Zr0u4HX+/Z1AVA41lz3ZZn6TwY8pkt6xqWaVEhnQyUK0qx5Lm3p/HLwFwUXN5/51VH/s3SQuaaZ7Q+dk3xDQ6D/cWcTnWbh0Wn4MFHveAjC+xQ2tAf1lduZSXGtCx0x90xqdtQGhQBXTs=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth;
	b=o16UPuV/qNub4jl2/HtcsNJcphlMPtx2tt4r40bg/+ext9ecc04KkiVNr1Gc7ztYWqVJeWLJzpmJU6dfmbfaERqFkZYTARtr1MPYx09pit94Q/7vMVLkCWYlN5A/plqq9FTCd/xW7g4XmEn6lZp3InlGHpMOQ0ADjGGBr7SjYQg=
Received: by 10.82.127.14 with SMTP id z14mr23371281buc.3.1205419994230;
	Thu, 13 Mar 2008 07:53:14 -0700 (PDT)
Received: by 10.86.30.17 with HTTP; Thu, 13 Mar 2008 07:53:14 -0700 (PDT)
Message-ID: <3bbf2fe10803130753p623867d8j3cbb65e0c78a2164@mail.gmail.com>
Date: Thu, 13 Mar 2008 15:53:14 +0100
From: "Attilio Rao" <attilio@freebsd.org>
Sender: asmrookie@gmail.com
To: "Ed Schouten" <ed@80386.nl>
In-Reply-To: <20080313135035.GB80576@hoeg.nl>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20080313135035.GB80576@hoeg.nl>
X-Google-Sender-Auth: 90bab4486d957c24
Cc: FreeBSD Arch <arch@freebsd.org>
Subject: Re: New TTY layer: condvar(9) and Giant
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 14:53:17 -0000

2008/3/13, Ed Schouten <ed@80386.nl>:
> Hello everyone,
>
>  Almost a month ago I started working on my assignment for my internship,
>  to reimplement a new TTY layer that fixes a lot of architectural
>  problems. So far, things are going quite fast:
>
>  - I've already implemented a basic TTY layer, which has support for
>   canonical and non-canonical mode. It still misses important features
>   including flow control, but it seems to work quite good. Unlike the
>   old layer, it doesn't buffer data as much, which should hopefully mean
>   it's a bit faster.
>  - I'm using a new PTY driver called pts(4). It works quite good, but it
>   misses the compatibility bits, which we'll need to have to support
>   older FreeBSD or Linux binaries.
>  - Some of you may have read I'm working on syscons now. I've got syscons
>   working with the new TTY layer; I'm typing this message through
>   syscons. ;-)
>
>  A lot of drivers that are used by the old TTY layer aren't mpsafe yet.
>  Of course, I'm willing to fix this, but this cannot be done in the
>  nearby future. This is why the new TTY layer should still allow TTY's to
>  be run under Giant.
>
>  In my initial implementation, each TTY device had its own mutex. In
>  theory, this is great. The PTY driver already uses this and it works
>  fine. There will be a lot of drivers, however, that want to use a
>  per-class mutex to lock all related TTY devices down at once (i.e.
>  syscons, which allocates 16 virtual TTY's). This is why I introduced a
>  per-class lock. When set to Giant, all TTY instances will lock down the
>  Giant lock when entering the TTY layer.
>
>  Unfortunately, I discovered condvar(9) can't properly unlock/lock the
>  Giant, which causes the system to panic. The condvar routines already
>  call DROP_GIANT before unlocking the lock itself.

I don't think we should allow this.
Giant is alredy too hidden inside other locking primitives creating a
lot of mis-understanding, mis-conceptions and mis-assumptions.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 15:17:14 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7D5A7106566B;
	Thu, 13 Mar 2008 15:17:14 +0000 (UTC) (envelope-from ed@hoeg.nl)
Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211])
	by mx1.freebsd.org (Postfix) with ESMTP id 3829C8FC29;
	Thu, 13 Mar 2008 15:17:14 +0000 (UTC) (envelope-from ed@hoeg.nl)
Received: by palm.hoeg.nl (Postfix, from userid 1000)
	id 872171CE0E; Thu, 13 Mar 2008 16:17:13 +0100 (CET)
Date: Thu, 13 Mar 2008 16:17:13 +0100
From: Ed Schouten <ed@80386.nl>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20080313151713.GD80576@hoeg.nl>
References: <20080313135035.GB80576@hoeg.nl>
	<200803131049.58051.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="hNweOTLwwbnii4NA"
Content-Disposition: inline
In-Reply-To: <200803131049.58051.jhb@freebsd.org>
User-Agent: Mutt/1.5.17 (2007-11-01)
Cc: freebsd-arch@freebsd.org
Subject: Re: New TTY layer: condvar(9) and Giant
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 15:17:14 -0000


--hNweOTLwwbnii4NA
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

* John Baldwin <jhb@freebsd.org> wrote:
> Also, your patches won't work in the case of Giant being recursed (it wil=
l=20
> only drop Giant once and the sleeping thread will still own Giant).  If y=
ou=20
> do want to make this work my suggestion would be to make the lc_unlock an=
d=20
> lc_lock not do anything for Giant.  You could either do this by 1) patchi=
ng=20
> kern_convar.c so it does something like this:
>=20
> 	if (lock !=3D &Giant.lo_object)
> 		cookie =3D class->lc_unlock(lock);
>=20
> or instead patch the lc_lock/lc_unlock routines to just not do anything f=
or=20
> Giant like so:
>=20
> Index: kern_mutex.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /host/cvs/usr/cvs/src/sys/kern/kern_mutex.c,v
> retrieving revision 1.205
> diff -u -r1.205 kern_mutex.c
> --- kern_mutex.c        13 Feb 2008 23:39:05 -0000      1.205
> +++ kern_mutex.c        13 Mar 2008 14:49:04 -0000
> @@ -134,6 +134,8 @@
>  lock_mtx(struct lock_object *lock, int how)
>  {
>=20
> +       if (lock =3D=3D &Giant.lo_object)
> +               return;
>         mtx_lock((struct mtx *)lock);
>  }
>=20
> @@ -149,6 +151,8 @@
>  {
>         struct mtx *m;
>=20
> +       if (lock =3D=3D &Giant.lo_object)
> +               return (0);
>         m =3D (struct mtx *)lock;
>         mtx_assert(m, MA_OWNED | MA_NOTRECURSED);
>         mtx_unlock(m);

Indeed, those solutions look a lot better. The reason why I just
disabled DROP/PICKUP_GIANT, was because I only wanted to allow those
interfaces to work when Giant was only picked up once.

> I still don't like the idea of letting Giant work with msleep/cv_*wait*()=
=20
> because I think it will be abused.

I don't like it either, but we'll need a mechanism like this to make the
transition easier. I would rather have syscons mpsafe, but it just
depends on too many other components that aren't mpsafe either (keyboard
and mouse input, etc).

I'm personally not afraid about it being abused, because people who are
writing new drivers shouldn't be using Giant anyway.

--=20
 Ed Schouten <ed@fxq.nl>
 WWW: http://g-rave.nl/

--hNweOTLwwbnii4NA
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (FreeBSD)

iEYEARECAAYFAkfZRXkACgkQ52SDGA2eCwWNmQCfQFxfN0qyrvU2BF1xOXCfmF5V
X3wAni0zPlW+3eO5i1vaCCDQg5YAOCzf
=pJ4B
-----END PGP SIGNATURE-----

--hNweOTLwwbnii4NA--

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 15:21:26 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EAF881065679
	for <freebsd-arch@FreeBSD.org>; Thu, 13 Mar 2008 15:21:26 +0000 (UTC)
	(envelope-from scf@FreeBSD.org)
Received: from mail.farley.org (farley.org [67.64.95.201])
	by mx1.freebsd.org (Postfix) with ESMTP id A4BBB8FC1C
	for <freebsd-arch@FreeBSD.org>; Thu, 13 Mar 2008 15:21:26 +0000 (UTC)
	(envelope-from scf@FreeBSD.org)
Received: from thor.farley.org (thor.farley.org [192.168.1.5])
	by mail.farley.org (8.14.2/8.14.2) with ESMTP id m2DF0lgM056068
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 10:00:47 -0500 (CDT)
	(envelope-from scf@FreeBSD.org)
Date: Thu, 13 Mar 2008 10:00:47 -0500 (CDT)
From: "Sean C. Farley" <scf@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
In-Reply-To: <slrnftiap1.106o.vadim_nuclight@hostel.avtf.net>
Message-ID: <alpine.BSF.1.00.0803130953350.72703@thor.farley.org>
References: <47D7C25D.5070908@cokane.org> <200803120945.29018.jhb@freebsd.org>
	<47D7E5BF.2060102@cokane.org>
	<20080312145734.GB26812@dragon.NUXI.org>
	<47D7F1EC.6040802@cokane.org>
	<alpine.BSF.1.00.0803121820220.75171@thor.farley.org>
	<47D88568.7000105@cokane.org>
	<slrnftiap1.106o.vadim_nuclight@hostel.avtf.net>
User-Agent: Alpine 1.00 (BSF 882 2007-12-20)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.4
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on mail.farley.org
Cc: 
Subject: Re: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 15:21:27 -0000

On Thu, 13 Mar 2008, Vadim Goncharov wrote:

> Hi Coleman Kane!
>
> On Wed, 12 Mar 2008 21:37:44 -0400; Coleman Kane wrote about 'Re: SMPTODO: remove timeout(9) from ffs_softdep.c':
>
>>>> Third try at the patch, properly adjusting my vim tabs to 8 spaces
>>>> as they should be so that I can follow style(9).
>>>
>>> I wrote a function[1] last year to configure vim to follow style(9).
>>> Just run ':call FreeBSD_Style()' while editing a file.
>>>
>>> Sean
>>>   1. http://www.farley.org/freebsd/tmp/VIM/FreeBSD.vim
>> Rock on.
>> This should be in the committers' guide or something.
>
> I vote for this too :)

I asked my mentor to allow me to commit the file to tools/tools/editing
next to freebsd.el.  There the Emacs and Vim scripts can battle it out
for all eternity.  :)

I tweaked the file a bit by adding a few comments and a mapping to
<Leader>f for easy calling.  BTW, I am thinking about commenting out the
mapping before committing and letting people manually activate it.  This
is to avoid conflicts that may arise with existing mapping in a person's
environment.

Sean
-- 
scf@FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 18:08:05 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AF8501065671
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 18:08:05 +0000 (UTC)
	(envelope-from obrien@NUXI.org)
Received: from dragon.nuxi.org (trang.nuxi.org [74.95.12.85])
	by mx1.freebsd.org (Postfix) with ESMTP id 8DE828FC12
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 18:08:05 +0000 (UTC)
	(envelope-from obrien@NUXI.org)
Received: from dragon.nuxi.org (obrien@localhost [127.0.0.1])
	by dragon.nuxi.org (8.14.1/8.14.1) with ESMTP id m2DI85v5083430
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 11:08:05 -0700 (PDT)
	(envelope-from obrien@dragon.nuxi.org)
Received: (from obrien@localhost)
	by dragon.nuxi.org (8.14.2/8.14.1/Submit) id m2DI85ti083429
	for freebsd-arch@freebsd.org; Thu, 13 Mar 2008 11:08:05 -0700 (PDT)
	(envelope-from obrien)
Date: Thu, 13 Mar 2008 11:08:05 -0700
From: "David O'Brien" <obrien@freebsd.org>
To: freebsd-arch@freebsd.org
Message-ID: <20080313180805.GA83406@dragon.NUXI.org>
Mail-Followup-To: obrien@freebsd.org, freebsd-arch@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Operating-System: FreeBSD 8.0-CURRENT
User-Agent: Mutt/1.5.16 (2007-06-09)
Subject: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of 'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: obrien@freebsd.org
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 18:08:05 -0000

Hi folks,
Some folks at Juniper have submitted these changes to hwpmc(4).
I am sending them here for public review.

Their thoughts are:
    The mp_ncpus refers to the count of the active CPU's.  Where as
    mp_maxid refers to the count of all the cpus on the SMP.  Using
    mp_ncpus in the cpu_id range-check of hwpmc module would lead to the
    assumption that all the active CPU's in the SMP are not interleaved.
    But for running on some platforms, the active and inactive cpus could
    be interleaved making hwpmc not work for the cpus whose cpu_id is
    greater than the active-cpu count.

-- 
-- David  (obrien@FreeBSD.org)

Index: sys/dev/hwpmc/hwpmc_amd.c
===================================================================
RCS file: /cvs/junos-2001/src/sys/dev/hwpmc/hwpmc_amd.c,v
retrieving revision 1.1.1.1
retrieving revision 1.4
diff -u -p -r1.1.1.1 -r1.4
--- sys/dev/hwpmc/hwpmc_amd.c	21 Jun 2006 03:30:02 -0000	1.1.1.1
+++ sys/dev/hwpmc/hwpmc_amd.c	30 Oct 2007 18:00:43 -0000	1.4
@@ -265,7 +265,7 @@ amd_read_pmc(int cpu, int ri, pmc_value_
 	const struct pmc_hw *phw;
 	pmc_value_t tmp;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < AMD_NPMCS,
 	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
@@ -320,7 +320,7 @@ amd_write_pmc(int cpu, int ri, pmc_value
 	const struct pmc_hw *phw;
 	enum pmc_mode mode;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < AMD_NPMCS,
 	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
@@ -367,7 +367,7 @@ amd_config_pmc(int cpu, int ri, struct p
 
 	PMCDBG(MDP,CFG,1, "cpu=%d ri=%d pm=%p", cpu, ri, pm);
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < AMD_NPMCS,
 	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
@@ -449,7 +449,7 @@ amd_allocate_pmc(int cpu, int ri, struct
 
 	(void) cpu;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < AMD_NPMCS,
 	    ("[amd,%d] illegal row index %d", __LINE__, ri));
@@ -543,7 +543,7 @@ amd_release_pmc(int cpu, int ri, struct 
 
 	(void) pmc;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < AMD_NPMCS,
 	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
@@ -575,7 +575,7 @@ amd_start_pmc(int cpu, int ri)
 	struct pmc_hw *phw;
 	const struct amd_descr *pd;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < AMD_NPMCS,
 	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
@@ -624,7 +624,7 @@ amd_stop_pmc(int cpu, int ri)
 	const struct amd_descr *pd;
 	uint64_t config;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < AMD_NPMCS,
 	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
@@ -676,7 +676,7 @@ amd_intr(int cpu, uintptr_t eip, int use
 	struct pmc_hw *phw;
 	pmc_value_t v;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] out of range CPU %d", __LINE__, cpu));
 
 	PMCDBG(MDP,INT,1, "cpu=%d eip=%p um=%d", cpu, (void *) eip,
@@ -756,7 +756,7 @@ amd_describe(int cpu, int ri, struct pmc
 	const struct amd_descr *pd;
 	struct pmc_hw *phw;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] illegal CPU %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < AMD_NPMCS,
 	    ("[amd,%d] row-index %d out of range", __LINE__, ri));
@@ -825,7 +825,7 @@ amd_init(int cpu)
 	struct amd_cpu *pcs;
 	struct pmc_hw  *phw;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] insane cpu number %d", __LINE__, cpu));
 
 	PMCDBG(MDP,INI,1,"amd-init cpu=%d", cpu);
@@ -868,7 +868,7 @@ amd_cleanup(int cpu)
 	uint32_t evsel;
 	struct pmc_cpu *pcs;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] insane cpu number (%d)", __LINE__, cpu));
 
 	PMCDBG(MDP,INI,1,"amd-cleanup cpu=%d", cpu);
Index: sys/dev/hwpmc/hwpmc_mod.c
===================================================================
RCS file: /cvs/junos-2001/src/sys/dev/hwpmc/hwpmc_mod.c,v
retrieving revision 1.1.1.1
retrieving revision 1.4
diff -u -p -r1.1.1.1 -r1.4
--- sys/dev/hwpmc/hwpmc_mod.c	21 Jun 2006 03:30:03 -0000	1.1.1.1
+++ sys/dev/hwpmc/hwpmc_mod.c	30 Oct 2007 18:00:43 -0000	1.4
@@ -615,7 +615,7 @@ pmc_restore_cpu_binding(struct pmc_bindi
 static void
 pmc_select_cpu(int cpu)
 {
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[pmc,%d] bad cpu number %d", __LINE__, cpu));
 
 	/* never move to a disabled CPU */
@@ -1167,7 +1167,7 @@ pmc_process_csw_in(struct thread *td)
 	PMCDBG(CSW,SWI,1, "cpu=%d proc=%p (%d, %s) pp=%p", cpu, p,
 	    p->p_pid, p->p_comm, pp);
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[pmc,%d] wierd CPU id %d", __LINE__, cpu));
 
 	pc = pmc_pcpu[cpu];
@@ -1292,7 +1292,7 @@ pmc_process_csw_out(struct thread *td)
 	PMCDBG(CSW,SWO,1, "cpu=%d proc=%p (%d, %s) pp=%p", cpu, p,
 	    p->p_pid, p->p_comm, pp);
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[pmc,%d wierd CPU id %d", __LINE__, cpu));
 
 	pc = pmc_pcpu[cpu];
@@ -2313,7 +2313,7 @@ pmc_stop(struct pmc *pm)
 
 	cpu = PMC_TO_CPU(pm);
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[pmc,%d] illegal cpu=%d", __LINE__, cpu));
 
 	if (pmc_cpu_is_disabled(cpu))
@@ -2478,7 +2478,7 @@ pmc_syscall_handler(struct thread *td, v
 		struct pmc_op_getcpuinfo gci;
 
 		gci.pm_cputype = md->pmd_cputype;
-		gci.pm_ncpu    = mp_ncpus;
+		gci.pm_ncpu    = mp_maxid + 1;
 		gci.pm_npmc    = md->pmd_npmc;
 		gci.pm_nclass  = md->pmd_nclass;
 		bcopy(md->pmd_classes, &gci.pm_classes,
@@ -2546,7 +2546,7 @@ pmc_syscall_handler(struct thread *td, v
 		if ((error = copyin(&gpi->pm_cpu, &cpu, sizeof(cpu))) != 0)
 			break;
 
-		if (cpu >= (unsigned int) mp_ncpus) {
+		if (cpu > (unsigned int) mp_maxid) {
 			error = EINVAL;
 			break;
 		}
@@ -2641,7 +2641,7 @@ pmc_syscall_handler(struct thread *td, v
 
 		cpu = pma.pm_cpu;
 
-		if (cpu < 0 || cpu >= mp_ncpus) {
+		if (cpu < 0 || cpu > mp_maxid) {
 			error = EINVAL;
 			break;
 		}
@@ -2734,7 +2734,7 @@ pmc_syscall_handler(struct thread *td, v
 
 		if ((mode != PMC_MODE_SS  &&  mode != PMC_MODE_SC  &&
 		     mode != PMC_MODE_TS  &&  mode != PMC_MODE_TC) ||
-		    (cpu != (u_int) PMC_CPU_ANY && cpu >= (u_int) mp_ncpus)) {
+		    (cpu != (u_int) PMC_CPU_ANY && cpu > (u_int) mp_maxid)) {
 			error = EINVAL;
 			break;
 		}
@@ -3973,16 +3973,16 @@ pmc_initialize(void)
 		return ENOSYS;
 
 	/* allocate space for the per-cpu array */
-	MALLOC(pmc_pcpu, struct pmc_cpu **, mp_ncpus * sizeof(struct pmc_cpu *),
-	    M_PMC, M_WAITOK|M_ZERO);
+	MALLOC(pmc_pcpu, struct pmc_cpu **,
+	    (mp_maxid + 1) * sizeof(struct pmc_cpu *), M_PMC, M_WAITOK|M_ZERO);
 
 	/* per-cpu 'saved values' for managing process-mode PMCs */
 	MALLOC(pmc_pcpu_saved, pmc_value_t *,
-	    sizeof(pmc_value_t) * mp_ncpus * md->pmd_npmc, M_PMC, M_WAITOK);
+	    sizeof(pmc_value_t) * (mp_maxid + 1) * md->pmd_npmc, M_PMC, M_WAITOK);
 
 	/* perform cpu dependent initialization */
 	pmc_save_cpu_binding(&pb);
-	for (cpu = 0; cpu < mp_ncpus; cpu++) {
+	for (cpu = 0; cpu <= mp_maxid; cpu++) {
 		if (pmc_cpu_is_disabled(cpu))
 			continue;
 		pmc_select_cpu(cpu);
@@ -3995,7 +3995,7 @@ pmc_initialize(void)
 		return error;
 
 	/* allocate space for the sample array */
-	for (cpu = 0; cpu < mp_ncpus; cpu++) {
+	for (cpu = 0; cpu <= mp_maxid; cpu++) {
 		if (pmc_cpu_is_disabled(cpu))
 			continue;
 		MALLOC(sb, struct pmc_samplebuffer *,
@@ -4156,7 +4156,7 @@ pmc_cleanup(void)
 	    ("[pmc,%d] Global SS count not empty", __LINE__));
 
 	/* free the per-cpu sample buffers */
-	for (cpu = 0; cpu < mp_ncpus; cpu++) {
+	for (cpu = 0; cpu <= mp_maxid; cpu++) {
 		if (pmc_cpu_is_disabled(cpu))
 			continue;
 		KASSERT(pmc_pcpu[cpu]->pc_sb != NULL,
@@ -4170,7 +4170,7 @@ pmc_cleanup(void)
 	PMCDBG(MOD,INI,3, "%s", "md cleanup");
 	if (md) {
 		pmc_save_cpu_binding(&pb);
-		for (cpu = 0; cpu < mp_ncpus; cpu++) {
+		for (cpu = 0; cpu <= mp_maxid; cpu++) {
 			PMCDBG(MOD,INI,1,"pmc-cleanup cpu=%d pcs=%p",
 			    cpu, pmc_pcpu[cpu]);
 			if (pmc_cpu_is_disabled(cpu))
Index: sys/dev/hwpmc/hwpmc_piv.c
===================================================================
RCS file: /cvs/junos-2001/src/sys/dev/hwpmc/hwpmc_piv.c,v
retrieving revision 1.1.1.1
retrieving revision 1.4
diff -u -p -r1.1.1.1 -r1.4
--- sys/dev/hwpmc/hwpmc_piv.c	21 Jun 2006 03:30:03 -0000	1.1.1.1
+++ sys/dev/hwpmc/hwpmc_piv.c	30 Oct 2007 18:00:43 -0000	1.4
@@ -585,7 +585,7 @@ p4_init(int cpu)
 	struct p4_logicalcpu *plcs;
 	struct pmc_hw *phw;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p4,%d] insane cpu number %d", __LINE__, cpu));
 
 	PMCDBG(MDP,INI,0, "p4-init cpu=%d logical=%d", cpu,
@@ -737,7 +737,7 @@ p4_read_pmc(int cpu, int ri, pmc_value_t
 	struct pmc_hw *phw;
 	pmc_value_t tmp;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p4,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P4_NPMCS,
 	    ("[p4,%d] illegal row-index %d", __LINE__, ri));
@@ -815,7 +815,7 @@ p4_write_pmc(int cpu, int ri, pmc_value_
 	const struct pmc_hw *phw;
 	const struct p4pmc_descr *pd;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P4_NPMCS,
 	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
@@ -889,7 +889,7 @@ p4_config_pmc(int cpu, int ri, struct pm
 	struct p4_cpu *pc;
 	int cfgflags, cpuflag;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p4,%d] illegal CPU %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P4_NPMCS,
 	    ("[p4,%d] illegal row-index %d", __LINE__, ri));
@@ -1026,7 +1026,7 @@ p4_allocate_pmc(int cpu, int ri, struct 
 	struct p4_event_descr *pevent;
 	const struct p4pmc_descr *pd;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p4,%d] illegal CPU %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P4_NPMCS,
 	    ("[p4,%d] illegal row-index value %d", __LINE__, ri));
@@ -1273,7 +1273,7 @@ p4_start_pmc(int cpu, int ri)
 	struct pmc_hw *phw;
 	struct p4pmc_descr *pd;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p4,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P4_NPMCS,
 	    ("[p4,%d] illegal row-index %d", __LINE__, ri));
@@ -1425,7 +1425,7 @@ p4_stop_pmc(int cpu, int ri)
 	struct p4pmc_descr *pd;
 	pmc_value_t tmp;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p4,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P4_NPMCS,
 	    ("[p4,%d] illegal row index %d", __LINE__, ri));
@@ -1694,7 +1694,7 @@ p4_describe(int cpu, int ri, struct pmc_
 	struct pmc_hw *phw;
 	const struct p4pmc_descr *pd;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p4,%d] illegal CPU %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P4_NPMCS,
 	    ("[p4,%d] row-index %d out of range", __LINE__, ri));
Index: sys/dev/hwpmc/hwpmc_ppro.c
===================================================================
RCS file: /cvs/junos-2001/src/sys/dev/hwpmc/hwpmc_ppro.c,v
retrieving revision 1.1.1.1
retrieving revision 1.4
diff -u -p -r1.1.1.1 -r1.4
--- sys/dev/hwpmc/hwpmc_ppro.c	21 Jun 2006 03:30:03 -0000	1.1.1.1
+++ sys/dev/hwpmc/hwpmc_ppro.c	30 Oct 2007 18:00:43 -0000	1.4
@@ -331,7 +331,7 @@ p6_init(int cpu)
 	struct p6_cpu *pcs;
 	struct pmc_hw *phw;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p6,%d] bad cpu %d", __LINE__, cpu));
 
 	PMCDBG(MDP,INI,0,"p6-init cpu=%d", cpu);
@@ -361,7 +361,7 @@ p6_cleanup(int cpu)
 {
 	struct pmc_cpu *pcs;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p6,%d] bad cpu %d", __LINE__, cpu));
 
 	PMCDBG(MDP,INI,0,"p6-cleanup cpu=%d", cpu);
@@ -507,7 +507,7 @@ p6_allocate_pmc(int cpu, int ri, struct 
 
 	(void) cpu;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p4,%d] illegal CPU %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P6_NPMCS,
 	    ("[p4,%d] illegal row-index value %d", __LINE__, ri));
@@ -611,7 +611,7 @@ p6_release_pmc(int cpu, int ri, struct p
 
 	PMCDBG(MDP,REL,1, "p6-release cpu=%d ri=%d pm=%p", cpu, ri, pm);
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p6,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P6_NPMCS,
 	    ("[p6,%d] illegal row-index %d", __LINE__, ri));
@@ -633,7 +633,7 @@ p6_start_pmc(int cpu, int ri)
 	struct pmc_hw *phw;
 	const struct p6pmc_descr *pd;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p6,%d] illegal CPU value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P6_NPMCS,
 	    ("[p6,%d] illegal row-index %d", __LINE__, ri));
@@ -677,7 +677,7 @@ p6_stop_pmc(int cpu, int ri)
 	struct pmc_hw *phw;
 	struct p6pmc_descr *pd;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p6,%d] illegal cpu value %d", __LINE__, cpu));
 	KASSERT(ri >= 0 && ri < P6_NPMCS,
 	    ("[p6,%d] illegal row index %d", __LINE__, ri));
@@ -719,7 +719,7 @@ p6_intr(int cpu, uintptr_t eip, int user
 	struct pmc_hw *phw;
 	pmc_value_t v;
 
-	KASSERT(cpu >= 0 && cpu < mp_ncpus,
+	KASSERT(cpu >= 0 && cpu <= mp_maxid,
 	    ("[p6,%d] CPU %d out of range", __LINE__, cpu));
 
 	retval = 0;
Index: usr.sbin/pmccontrol/pmccontrol.c
===================================================================
RCS file: /cvs/junos-2001/src/usr.sbin/pmccontrol/pmccontrol.c,v
retrieving revision 1.1.1.1
retrieving revision 1.4
diff -u -p -r1.1.1.1 -r1.4
--- usr.sbin/pmccontrol/pmccontrol.c	3 Nov 2006 01:43:32 -0000	1.1.1.1
+++ usr.sbin/pmccontrol/pmccontrol.c	29 Nov 2007 22:47:14 -0000	1.4
@@ -207,10 +207,16 @@ pmcc_do_enable_disable(struct pmcc_op_li
 			else if (b == PMCC_OP_DISABLE)
 				error = pmc_disable(i, j);
 
-			if (error < 0)
+			if (error < 0) {
+				if (errno == ENXIO) {
+					/* This cpu wasn't configured. */
+					error = 0;
+					continue;
+				}
 				err(EX_OSERR, "%s of PMC %d on CPU %d failed",
 				    b == PMCC_OP_ENABLE ? "Enable" :
 				    "Disable", j, i);
+			}
 		}
 
 	return error;
@@ -242,9 +248,14 @@ pmcc_do_list_state(void)
 		    (logical_cpus_mask & (1 << cpu)))
 			continue; /* skip P4-style 'logical' cpus */
 #endif
-		if (pmc_pmcinfo(cpu, &pi) < 0)
+		if (pmc_pmcinfo(cpu, &pi) < 0) {
+			if (errno == ENXIO) {
+				/* This cpu wasn't enabled. */
+				continue;
+			}
 			err(EX_OSERR, "Unable to get PMC status for CPU %d",
 			    cpu);
+		}
 
 		printf("#CPU %d:\n", c++);
 		npmc = pmc_npmc(cpu);
Index: usr.sbin/pmcstat/pmcstat.c
===================================================================
RCS file: /cvs/junos-2001/src/usr.sbin/pmcstat/pmcstat.c,v
retrieving revision 1.1.1.1
retrieving revision 1.4
diff -u -p -r1.1.1.1 -r1.4
--- usr.sbin/pmcstat/pmcstat.c	3 Nov 2006 01:43:32 -0000	1.1.1.1
+++ usr.sbin/pmcstat/pmcstat.c	30 Aug 2007 15:03:02 -0000	1.4
@@ -692,6 +692,7 @@ main(int argc, char **argv)
 		if ((args.pa_logparser = pmclog_open(args.pa_logfd)) == NULL)
 			err(EX_OSERR, "ERROR: Cannot create parser");
 		pmcstat_process_log(&args);
+		pmcstat_shutdown_logging();
 		exit(EX_OK);
 	}
 

-- 
-- David  (obrien@FreeBSD.org)
Q: Because it reverses the logical flow of conversation.
A: Why is top-posting (putting a reply at the top of the message) frowned upon?
Let's not play "Jeopardy-style quoting"

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 18:54:32 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 93BF71065671
	for <freebsd-arch@FreeBSD.org>; Thu, 13 Mar 2008 18:54:32 +0000 (UTC)
	(envelope-from obrien@NUXI.org)
Received: from dragon.nuxi.org (trang.nuxi.org [74.95.12.85])
	by mx1.freebsd.org (Postfix) with ESMTP id 678B38FC19
	for <freebsd-arch@FreeBSD.org>; Thu, 13 Mar 2008 18:54:32 +0000 (UTC)
	(envelope-from obrien@NUXI.org)
Received: from dragon.nuxi.org (obrien@localhost [127.0.0.1])
	by dragon.nuxi.org (8.14.1/8.14.1) with ESMTP id m2DIsVQb085250;
	Thu, 13 Mar 2008 11:54:31 -0700 (PDT)
	(envelope-from obrien@dragon.nuxi.org)
Received: (from obrien@localhost)
	by dragon.nuxi.org (8.14.2/8.14.1/Submit) id m2DIsVik085249;
	Thu, 13 Mar 2008 11:54:31 -0700 (PDT) (envelope-from obrien)
Date: Thu, 13 Mar 2008 11:54:31 -0700
From: "David O'Brien" <obrien@FreeBSD.org>
To: "Sean C. Farley" <scf@FreeBSD.org>
Message-ID: <20080313185431.GB85022@dragon.NUXI.org>
Mail-Followup-To: obrien@freebsd.org, "Sean C. Farley" <scf@FreeBSD.org>,
	freebsd-arch@FreeBSD.org
References: <47D7C25D.5070908@cokane.org> <200803120945.29018.jhb@freebsd.org>
	<47D7E5BF.2060102@cokane.org>
	<20080312145734.GB26812@dragon.NUXI.org>
	<47D7F1EC.6040802@cokane.org>
	<alpine.BSF.1.00.0803121820220.75171@thor.farley.org>
	<47D88568.7000105@cokane.org>
	<slrnftiap1.106o.vadim_nuclight@hostel.avtf.net>
	<alpine.BSF.1.00.0803130953350.72703@thor.farley.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.BSF.1.00.0803130953350.72703@thor.farley.org>
X-Operating-System: FreeBSD 8.0-CURRENT
User-Agent: Mutt/1.5.16 (2007-06-09)
Cc: freebsd-arch@FreeBSD.org
Subject: Re: SMPTODO: remove timeout(9) from ffs_softdep.c
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: obrien@FreeBSD.org
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 18:54:32 -0000

On Thu, Mar 13, 2008 at 10:00:47AM -0500, Sean C. Farley wrote:
> I asked my mentor to allow me to commit the file to tools/tools/editing
> next to freebsd.el.  There the Emacs and Vim scripts can battle it out
> for all eternity.  :)

This is too valuable to not have ASAP.
I've committed it as tools/tools/editing/freebsd.vim.  (case matches
emacs .el)

Let the embellishment of freebsd.vim begin!
 
> I tweaked the file a bit by adding a few comments and a mapping to
> <Leader>f for easy calling.  BTW, I am thinking about commenting out the
> mapping before committing and letting people manually activate it.  This
> is to avoid conflicts that may arise with existing mapping in a person's
> environment.

I don't see a need - no one should be blindly using dot files without
reviewing them.

-- 
-- David  (obrien@FreeBSD.org)

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 19:16:54 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5DB8D1065673
	for <freebsd-arch@FreeBSD.org>; Thu, 13 Mar 2008 19:16:54 +0000 (UTC)
	(envelope-from jhb@FreeBSD.org)
Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219])
	by mx1.freebsd.org (Postfix) with ESMTP id A04068FC12
	for <freebsd-arch@FreeBSD.org>; Thu, 13 Mar 2008 19:16:53 +0000 (UTC)
	(envelope-from jhb@FreeBSD.org)
Received: from server.baldwin.cx (unverified [66.23.211.162]) 
	by speedfactory.net (SurgeMail 3.8s) with ESMTP id 235372110-1834499 
	for multiple; Thu, 13 Mar 2008 15:14:52 -0400
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2DJGS7t031052;
	Thu, 13 Mar 2008 15:16:40 -0400 (EDT) (envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org, obrien@FreeBSD.org
Date: Thu, 13 Mar 2008 15:16:12 -0400
User-Agent: KMail/1.9.7
References: <20080313180805.GA83406@dragon.NUXI.org>
In-Reply-To: <20080313180805.GA83406@dragon.NUXI.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803131516.12284.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Thu, 13 Mar 2008 15:16:41 -0400 (EDT)
X-Virus-Scanned: ClamAV 0.91.2/6225/Thu Mar 13 10:52:37 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: 
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of
	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 19:16:54 -0000

On Thursday 13 March 2008 02:08:05 pm David O'Brien wrote:
> Hi folks,
> Some folks at Juniper have submitted these changes to hwpmc(4).
> I am sending them here for public review.
> 
> Their thoughts are:
>     The mp_ncpus refers to the count of the active CPU's.  Where as
>     mp_maxid refers to the count of all the cpus on the SMP.  Using
>     mp_ncpus in the cpu_id range-check of hwpmc module would lead to the
>     assumption that all the active CPU's in the SMP are not interleaved.
>     But for running on some platforms, the active and inactive cpus could
>     be interleaved making hwpmc not work for the cpus whose cpu_id is
>     greater than the active-cpu count.

This is correct, but you need to handle CPUs that are absent.  It might be 
sufficient to update pmc_cpu_is_disabled() in kern_pmc.c to check 
CPU_ABSENT(cpu) and claim the CPU is disabled if it is absent, but I'm not 
sure that will catch everything as that seems aimed at handling having a 
non-absent CPU halted (such as disabling HTT on i386).

> -- 
> -- David  (obrien@FreeBSD.org)
> 
> Index: sys/dev/hwpmc/hwpmc_amd.c
> ===================================================================
> RCS file: /cvs/junos-2001/src/sys/dev/hwpmc/hwpmc_amd.c,v
> retrieving revision 1.1.1.1
> retrieving revision 1.4
> diff -u -p -r1.1.1.1 -r1.4
> --- sys/dev/hwpmc/hwpmc_amd.c	21 Jun 2006 03:30:02 -0000	1.1.1.1
> +++ sys/dev/hwpmc/hwpmc_amd.c	30 Oct 2007 18:00:43 -0000	1.4
> @@ -265,7 +265,7 @@ amd_read_pmc(int cpu, int ri, pmc_value_
>  	const struct pmc_hw *phw;
>  	pmc_value_t tmp;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < AMD_NPMCS,
>  	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
> @@ -320,7 +320,7 @@ amd_write_pmc(int cpu, int ri, pmc_value
>  	const struct pmc_hw *phw;
>  	enum pmc_mode mode;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < AMD_NPMCS,
>  	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
> @@ -367,7 +367,7 @@ amd_config_pmc(int cpu, int ri, struct p
>  
>  	PMCDBG(MDP,CFG,1, "cpu=%d ri=%d pm=%p", cpu, ri, pm);
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < AMD_NPMCS,
>  	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
> @@ -449,7 +449,7 @@ amd_allocate_pmc(int cpu, int ri, struct
>  
>  	(void) cpu;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < AMD_NPMCS,
>  	    ("[amd,%d] illegal row index %d", __LINE__, ri));
> @@ -543,7 +543,7 @@ amd_release_pmc(int cpu, int ri, struct 
>  
>  	(void) pmc;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < AMD_NPMCS,
>  	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
> @@ -575,7 +575,7 @@ amd_start_pmc(int cpu, int ri)
>  	struct pmc_hw *phw;
>  	const struct amd_descr *pd;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < AMD_NPMCS,
>  	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
> @@ -624,7 +624,7 @@ amd_stop_pmc(int cpu, int ri)
>  	const struct amd_descr *pd;
>  	uint64_t config;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < AMD_NPMCS,
>  	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
> @@ -676,7 +676,7 @@ amd_intr(int cpu, uintptr_t eip, int use
>  	struct pmc_hw *phw;
>  	pmc_value_t v;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] out of range CPU %d", __LINE__, cpu));
>  
>  	PMCDBG(MDP,INT,1, "cpu=%d eip=%p um=%d", cpu, (void *) eip,
> @@ -756,7 +756,7 @@ amd_describe(int cpu, int ri, struct pmc
>  	const struct amd_descr *pd;
>  	struct pmc_hw *phw;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] illegal CPU %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < AMD_NPMCS,
>  	    ("[amd,%d] row-index %d out of range", __LINE__, ri));
> @@ -825,7 +825,7 @@ amd_init(int cpu)
>  	struct amd_cpu *pcs;
>  	struct pmc_hw  *phw;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] insane cpu number %d", __LINE__, cpu));
>  
>  	PMCDBG(MDP,INI,1,"amd-init cpu=%d", cpu);
> @@ -868,7 +868,7 @@ amd_cleanup(int cpu)
>  	uint32_t evsel;
>  	struct pmc_cpu *pcs;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] insane cpu number (%d)", __LINE__, cpu));
>  
>  	PMCDBG(MDP,INI,1,"amd-cleanup cpu=%d", cpu);
> Index: sys/dev/hwpmc/hwpmc_mod.c
> ===================================================================
> RCS file: /cvs/junos-2001/src/sys/dev/hwpmc/hwpmc_mod.c,v
> retrieving revision 1.1.1.1
> retrieving revision 1.4
> diff -u -p -r1.1.1.1 -r1.4
> --- sys/dev/hwpmc/hwpmc_mod.c	21 Jun 2006 03:30:03 -0000	1.1.1.1
> +++ sys/dev/hwpmc/hwpmc_mod.c	30 Oct 2007 18:00:43 -0000	1.4
> @@ -615,7 +615,7 @@ pmc_restore_cpu_binding(struct pmc_bindi
>  static void
>  pmc_select_cpu(int cpu)
>  {
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[pmc,%d] bad cpu number %d", __LINE__, cpu));
>  
>  	/* never move to a disabled CPU */
> @@ -1167,7 +1167,7 @@ pmc_process_csw_in(struct thread *td)
>  	PMCDBG(CSW,SWI,1, "cpu=%d proc=%p (%d, %s) pp=%p", cpu, p,
>  	    p->p_pid, p->p_comm, pp);
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[pmc,%d] wierd CPU id %d", __LINE__, cpu));
>  
>  	pc = pmc_pcpu[cpu];
> @@ -1292,7 +1292,7 @@ pmc_process_csw_out(struct thread *td)
>  	PMCDBG(CSW,SWO,1, "cpu=%d proc=%p (%d, %s) pp=%p", cpu, p,
>  	    p->p_pid, p->p_comm, pp);
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[pmc,%d wierd CPU id %d", __LINE__, cpu));
>  
>  	pc = pmc_pcpu[cpu];
> @@ -2313,7 +2313,7 @@ pmc_stop(struct pmc *pm)
>  
>  	cpu = PMC_TO_CPU(pm);
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[pmc,%d] illegal cpu=%d", __LINE__, cpu));
>  
>  	if (pmc_cpu_is_disabled(cpu))
> @@ -2478,7 +2478,7 @@ pmc_syscall_handler(struct thread *td, v
>  		struct pmc_op_getcpuinfo gci;
>  
>  		gci.pm_cputype = md->pmd_cputype;
> -		gci.pm_ncpu    = mp_ncpus;
> +		gci.pm_ncpu    = mp_maxid + 1;
>  		gci.pm_npmc    = md->pmd_npmc;
>  		gci.pm_nclass  = md->pmd_nclass;
>  		bcopy(md->pmd_classes, &gci.pm_classes,
> @@ -2546,7 +2546,7 @@ pmc_syscall_handler(struct thread *td, v
>  		if ((error = copyin(&gpi->pm_cpu, &cpu, sizeof(cpu))) != 0)
>  			break;
>  
> -		if (cpu >= (unsigned int) mp_ncpus) {
> +		if (cpu > (unsigned int) mp_maxid) {
>  			error = EINVAL;
>  			break;
>  		}
> @@ -2641,7 +2641,7 @@ pmc_syscall_handler(struct thread *td, v
>  
>  		cpu = pma.pm_cpu;
>  
> -		if (cpu < 0 || cpu >= mp_ncpus) {
> +		if (cpu < 0 || cpu > mp_maxid) {
>  			error = EINVAL;
>  			break;
>  		}
> @@ -2734,7 +2734,7 @@ pmc_syscall_handler(struct thread *td, v
>  
>  		if ((mode != PMC_MODE_SS  &&  mode != PMC_MODE_SC  &&
>  		     mode != PMC_MODE_TS  &&  mode != PMC_MODE_TC) ||
> -		    (cpu != (u_int) PMC_CPU_ANY && cpu >= (u_int) mp_ncpus)) {
> +		    (cpu != (u_int) PMC_CPU_ANY && cpu > (u_int) mp_maxid)) {
>  			error = EINVAL;
>  			break;
>  		}
> @@ -3973,16 +3973,16 @@ pmc_initialize(void)
>  		return ENOSYS;
>  
>  	/* allocate space for the per-cpu array */
> -	MALLOC(pmc_pcpu, struct pmc_cpu **, mp_ncpus * sizeof(struct pmc_cpu *),
> -	    M_PMC, M_WAITOK|M_ZERO);
> +	MALLOC(pmc_pcpu, struct pmc_cpu **,
> +	    (mp_maxid + 1) * sizeof(struct pmc_cpu *), M_PMC, M_WAITOK|M_ZERO);
>  
>  	/* per-cpu 'saved values' for managing process-mode PMCs */
>  	MALLOC(pmc_pcpu_saved, pmc_value_t *,
> -	    sizeof(pmc_value_t) * mp_ncpus * md->pmd_npmc, M_PMC, M_WAITOK);
> +	    sizeof(pmc_value_t) * (mp_maxid + 1) * md->pmd_npmc, M_PMC, M_WAITOK);
>  
>  	/* perform cpu dependent initialization */
>  	pmc_save_cpu_binding(&pb);
> -	for (cpu = 0; cpu < mp_ncpus; cpu++) {
> +	for (cpu = 0; cpu <= mp_maxid; cpu++) {
>  		if (pmc_cpu_is_disabled(cpu))
>  			continue;
>  		pmc_select_cpu(cpu);
> @@ -3995,7 +3995,7 @@ pmc_initialize(void)
>  		return error;
>  
>  	/* allocate space for the sample array */
> -	for (cpu = 0; cpu < mp_ncpus; cpu++) {
> +	for (cpu = 0; cpu <= mp_maxid; cpu++) {
>  		if (pmc_cpu_is_disabled(cpu))
>  			continue;
>  		MALLOC(sb, struct pmc_samplebuffer *,
> @@ -4156,7 +4156,7 @@ pmc_cleanup(void)
>  	    ("[pmc,%d] Global SS count not empty", __LINE__));
>  
>  	/* free the per-cpu sample buffers */
> -	for (cpu = 0; cpu < mp_ncpus; cpu++) {
> +	for (cpu = 0; cpu <= mp_maxid; cpu++) {
>  		if (pmc_cpu_is_disabled(cpu))
>  			continue;
>  		KASSERT(pmc_pcpu[cpu]->pc_sb != NULL,
> @@ -4170,7 +4170,7 @@ pmc_cleanup(void)
>  	PMCDBG(MOD,INI,3, "%s", "md cleanup");
>  	if (md) {
>  		pmc_save_cpu_binding(&pb);
> -		for (cpu = 0; cpu < mp_ncpus; cpu++) {
> +		for (cpu = 0; cpu <= mp_maxid; cpu++) {
>  			PMCDBG(MOD,INI,1,"pmc-cleanup cpu=%d pcs=%p",
>  			    cpu, pmc_pcpu[cpu]);
>  			if (pmc_cpu_is_disabled(cpu))
> Index: sys/dev/hwpmc/hwpmc_piv.c
> ===================================================================
> RCS file: /cvs/junos-2001/src/sys/dev/hwpmc/hwpmc_piv.c,v
> retrieving revision 1.1.1.1
> retrieving revision 1.4
> diff -u -p -r1.1.1.1 -r1.4
> --- sys/dev/hwpmc/hwpmc_piv.c	21 Jun 2006 03:30:03 -0000	1.1.1.1
> +++ sys/dev/hwpmc/hwpmc_piv.c	30 Oct 2007 18:00:43 -0000	1.4
> @@ -585,7 +585,7 @@ p4_init(int cpu)
>  	struct p4_logicalcpu *plcs;
>  	struct pmc_hw *phw;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p4,%d] insane cpu number %d", __LINE__, cpu));
>  
>  	PMCDBG(MDP,INI,0, "p4-init cpu=%d logical=%d", cpu,
> @@ -737,7 +737,7 @@ p4_read_pmc(int cpu, int ri, pmc_value_t
>  	struct pmc_hw *phw;
>  	pmc_value_t tmp;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p4,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P4_NPMCS,
>  	    ("[p4,%d] illegal row-index %d", __LINE__, ri));
> @@ -815,7 +815,7 @@ p4_write_pmc(int cpu, int ri, pmc_value_
>  	const struct pmc_hw *phw;
>  	const struct p4pmc_descr *pd;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[amd,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P4_NPMCS,
>  	    ("[amd,%d] illegal row-index %d", __LINE__, ri));
> @@ -889,7 +889,7 @@ p4_config_pmc(int cpu, int ri, struct pm
>  	struct p4_cpu *pc;
>  	int cfgflags, cpuflag;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p4,%d] illegal CPU %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P4_NPMCS,
>  	    ("[p4,%d] illegal row-index %d", __LINE__, ri));
> @@ -1026,7 +1026,7 @@ p4_allocate_pmc(int cpu, int ri, struct 
>  	struct p4_event_descr *pevent;
>  	const struct p4pmc_descr *pd;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p4,%d] illegal CPU %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P4_NPMCS,
>  	    ("[p4,%d] illegal row-index value %d", __LINE__, ri));
> @@ -1273,7 +1273,7 @@ p4_start_pmc(int cpu, int ri)
>  	struct pmc_hw *phw;
>  	struct p4pmc_descr *pd;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p4,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P4_NPMCS,
>  	    ("[p4,%d] illegal row-index %d", __LINE__, ri));
> @@ -1425,7 +1425,7 @@ p4_stop_pmc(int cpu, int ri)
>  	struct p4pmc_descr *pd;
>  	pmc_value_t tmp;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p4,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P4_NPMCS,
>  	    ("[p4,%d] illegal row index %d", __LINE__, ri));
> @@ -1694,7 +1694,7 @@ p4_describe(int cpu, int ri, struct pmc_
>  	struct pmc_hw *phw;
>  	const struct p4pmc_descr *pd;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p4,%d] illegal CPU %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P4_NPMCS,
>  	    ("[p4,%d] row-index %d out of range", __LINE__, ri));
> Index: sys/dev/hwpmc/hwpmc_ppro.c
> ===================================================================
> RCS file: /cvs/junos-2001/src/sys/dev/hwpmc/hwpmc_ppro.c,v
> retrieving revision 1.1.1.1
> retrieving revision 1.4
> diff -u -p -r1.1.1.1 -r1.4
> --- sys/dev/hwpmc/hwpmc_ppro.c	21 Jun 2006 03:30:03 -0000	1.1.1.1
> +++ sys/dev/hwpmc/hwpmc_ppro.c	30 Oct 2007 18:00:43 -0000	1.4
> @@ -331,7 +331,7 @@ p6_init(int cpu)
>  	struct p6_cpu *pcs;
>  	struct pmc_hw *phw;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p6,%d] bad cpu %d", __LINE__, cpu));
>  
>  	PMCDBG(MDP,INI,0,"p6-init cpu=%d", cpu);
> @@ -361,7 +361,7 @@ p6_cleanup(int cpu)
>  {
>  	struct pmc_cpu *pcs;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p6,%d] bad cpu %d", __LINE__, cpu));
>  
>  	PMCDBG(MDP,INI,0,"p6-cleanup cpu=%d", cpu);
> @@ -507,7 +507,7 @@ p6_allocate_pmc(int cpu, int ri, struct 
>  
>  	(void) cpu;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p4,%d] illegal CPU %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P6_NPMCS,
>  	    ("[p4,%d] illegal row-index value %d", __LINE__, ri));
> @@ -611,7 +611,7 @@ p6_release_pmc(int cpu, int ri, struct p
>  
>  	PMCDBG(MDP,REL,1, "p6-release cpu=%d ri=%d pm=%p", cpu, ri, pm);
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p6,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P6_NPMCS,
>  	    ("[p6,%d] illegal row-index %d", __LINE__, ri));
> @@ -633,7 +633,7 @@ p6_start_pmc(int cpu, int ri)
>  	struct pmc_hw *phw;
>  	const struct p6pmc_descr *pd;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p6,%d] illegal CPU value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P6_NPMCS,
>  	    ("[p6,%d] illegal row-index %d", __LINE__, ri));
> @@ -677,7 +677,7 @@ p6_stop_pmc(int cpu, int ri)
>  	struct pmc_hw *phw;
>  	struct p6pmc_descr *pd;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p6,%d] illegal cpu value %d", __LINE__, cpu));
>  	KASSERT(ri >= 0 && ri < P6_NPMCS,
>  	    ("[p6,%d] illegal row index %d", __LINE__, ri));
> @@ -719,7 +719,7 @@ p6_intr(int cpu, uintptr_t eip, int user
>  	struct pmc_hw *phw;
>  	pmc_value_t v;
>  
> -	KASSERT(cpu >= 0 && cpu < mp_ncpus,
> +	KASSERT(cpu >= 0 && cpu <= mp_maxid,
>  	    ("[p6,%d] CPU %d out of range", __LINE__, cpu));
>  
>  	retval = 0;
> Index: usr.sbin/pmccontrol/pmccontrol.c
> ===================================================================
> RCS file: /cvs/junos-2001/src/usr.sbin/pmccontrol/pmccontrol.c,v
> retrieving revision 1.1.1.1
> retrieving revision 1.4
> diff -u -p -r1.1.1.1 -r1.4
> --- usr.sbin/pmccontrol/pmccontrol.c	3 Nov 2006 01:43:32 -0000	1.1.1.1
> +++ usr.sbin/pmccontrol/pmccontrol.c	29 Nov 2007 22:47:14 -0000	1.4
> @@ -207,10 +207,16 @@ pmcc_do_enable_disable(struct pmcc_op_li
>  			else if (b == PMCC_OP_DISABLE)
>  				error = pmc_disable(i, j);
>  
> -			if (error < 0)
> +			if (error < 0) {
> +				if (errno == ENXIO) {
> +					/* This cpu wasn't configured. */
> +					error = 0;
> +					continue;
> +				}
>  				err(EX_OSERR, "%s of PMC %d on CPU %d failed",
>  				    b == PMCC_OP_ENABLE ? "Enable" :
>  				    "Disable", j, i);
> +			}
>  		}
>  
>  	return error;
> @@ -242,9 +248,14 @@ pmcc_do_list_state(void)
>  		    (logical_cpus_mask & (1 << cpu)))
>  			continue; /* skip P4-style 'logical' cpus */
>  #endif
> -		if (pmc_pmcinfo(cpu, &pi) < 0)
> +		if (pmc_pmcinfo(cpu, &pi) < 0) {
> +			if (errno == ENXIO) {
> +				/* This cpu wasn't enabled. */
> +				continue;
> +			}
>  			err(EX_OSERR, "Unable to get PMC status for CPU %d",
>  			    cpu);
> +		}
>  
>  		printf("#CPU %d:\n", c++);
>  		npmc = pmc_npmc(cpu);
> Index: usr.sbin/pmcstat/pmcstat.c
> ===================================================================
> RCS file: /cvs/junos-2001/src/usr.sbin/pmcstat/pmcstat.c,v
> retrieving revision 1.1.1.1
> retrieving revision 1.4
> diff -u -p -r1.1.1.1 -r1.4
> --- usr.sbin/pmcstat/pmcstat.c	3 Nov 2006 01:43:32 -0000	1.1.1.1
> +++ usr.sbin/pmcstat/pmcstat.c	30 Aug 2007 15:03:02 -0000	1.4
> @@ -692,6 +692,7 @@ main(int argc, char **argv)
>  		if ((args.pa_logparser = pmclog_open(args.pa_logfd)) == NULL)
>  			err(EX_OSERR, "ERROR: Cannot create parser");
>  		pmcstat_process_log(&args);
> +		pmcstat_shutdown_logging();
>  		exit(EX_OK);
>  	}
>  
> 
> -- 
> -- David  (obrien@FreeBSD.org)
> Q: Because it reverses the logical flow of conversation.
> A: Why is top-posting (putting a reply at the top of the message) frowned 
upon?
> Let's not play "Jeopardy-style quoting"
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
> 


-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 23:12:41 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5DC021065670
	for <freebsd-arch@FreeBSD.org>; Thu, 13 Mar 2008 23:12:41 +0000 (UTC)
	(envelope-from scf@FreeBSD.org)
Received: from mail.farley.org (farley.org [67.64.95.201])
	by mx1.freebsd.org (Postfix) with ESMTP id 19D338FC19
	for <freebsd-arch@FreeBSD.org>; Thu, 13 Mar 2008 23:12:41 +0000 (UTC)
	(envelope-from scf@FreeBSD.org)
Received: from thor.farley.org (thor.farley.org [192.168.1.5])
	by mail.farley.org (8.14.2/8.14.2) with ESMTP id m2DNCdTl065151
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 18:12:39 -0500 (CDT)
	(envelope-from scf@FreeBSD.org)
Date: Thu, 13 Mar 2008 18:12:39 -0500 (CDT)
From: "Sean C. Farley" <scf@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
In-Reply-To: <20080313185431.GB85022@dragon.NUXI.org>
Message-ID: <alpine.BSF.1.00.0803131804270.2408@thor.farley.org>
References: <47D7C25D.5070908@cokane.org> <200803120945.29018.jhb@freebsd.org>
	<47D7E5BF.2060102@cokane.org>
	<20080312145734.GB26812@dragon.NUXI.org>
	<47D7F1EC.6040802@cokane.org>
	<alpine.BSF.1.00.0803121820220.75171@thor.farley.org>
	<47D88568.7000105@cokane.org>
	<slrnftiap1.106o.vadim_nuclight@hostel.avtf.net>
	<alpine.BSF.1.00.0803130953350.72703@thor.farley.org>
	<20080313185431.GB85022@dragon.NUXI.org>
User-Agent: Alpine 1.00 (BSF 882 2007-12-20)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.4
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on mail.farley.org
Cc: 
Subject: FreeBSD Vim style(9) plugin (was Re: SMPTODO: remove timeout(9) from
 ffs_softdep.c)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 23:12:41 -0000

On Thu, 13 Mar 2008, David O'Brien wrote:

> On Thu, Mar 13, 2008 at 10:00:47AM -0500, Sean C. Farley wrote:
>> I asked my mentor to allow me to commit the file to
>> tools/tools/editing next to freebsd.el.  There the Emacs and Vim
>> scripts can battle it out for all eternity.  :)
>
> This is too valuable to not have ASAP.
> I've committed it as tools/tools/editing/freebsd.vim.  (case matches
> emacs .el)
>
> Let the embellishment of freebsd.vim begin!

Thank you for the commit.

Before I wrote it, I looked around for something similar assuming that
some other FreeBSD developers must be using Vim.  I assumed others have
written similar plugins, or their default was FreeBSD style(9).

>> I tweaked the file a bit by adding a few comments and a mapping to
>> <Leader>f for easy calling.  BTW, I am thinking about commenting out
>> the mapping before committing and letting people manually activate
>> it.  This is to avoid conflicts that may arise with existing mapping
>> in a person's environment.
>
> I don't see a need - no one should be blindly using dot files without
> reviewing them.

OK.  Just wondered.

Sean
-- 
scf@FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 23:35:47 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C3D4E106567E;
	Thu, 13 Mar 2008 23:35:47 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 70B6D8FC1F;
	Thu, 13 Mar 2008 23:35:47 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com
	[24.94.75.93]) (authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	m2DNZdBd081656; Thu, 13 Mar 2008 19:35:40 -0400 (EDT)
	(envelope-from jroberson@chesapeake.net)
Date: Thu, 13 Mar 2008 13:36:47 -1000 (HST)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@desktop
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20080313230809.W32527@delplex.bde.org>
Message-ID: <20080313132152.Y1091@desktop>
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
	<e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
	<20080313124213.J31200@delplex.bde.org> <20080312211834.T1091@desktop>
	<20080313230809.W32527@delplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org, David Xu <davidxu@freebsd.org>,
	Peter Wemm <peter@wemm.org>
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 23:35:47 -0000


On Fri, 14 Mar 2008, Bruce Evans wrote:

> On Wed, 12 Mar 2008, Jeff Roberson wrote:
>
>> On Thu, 13 Mar 2008, Bruce Evans wrote:
>> 
>>> On Wed, 12 Mar 2008, Peter Wemm wrote:
>>> 
>>>> On Tue, Mar 11, 2008 at 9:14 PM, David Xu <davidxu@freebsd.org> wrote:
>>>>> Jeff Roberson wrote:
>>>>> > http://people.freebsd.org/~jeff/amd64.diff
>>>>>
>>>>>  This is a good idea.
>>> 
>>> I wouldn't have expected it to make much difference.  On i386 UP,
>>> cpu_switch() normally executes only 48 instructions for in-kernel
>>> context switches in my version of 5.2 and only 61 instructions in
>>> -current.  ~5.2 differs from 5.2 here in only in not having to
>>> switch %eflags.  This saves 4 instructions but much more in cycles,
>>> especially in P4 where accesses to %eflags are very slow.  5.2 would
>>> take 52 instructions, and -current has bloated by 9 instructions
>>> relative to 5.2.
>> 
>> More expensive than the raw instruction count is:
>> 
>> 1)  The mispredicted branches to deal with all of the optional state and 
>> features that are not always saved.
>
> This is unlikely to matter, and apparently doesn't, at least in simple
> benchmarks, since the C version has even more branches.  Features that
> are rarely used cause branches that are usually perfectly predicted.

The c version has two fewer branches because it tests for two unlikely 
features together.  It has a few more branches than the in cvs asm version 
and the same number of extra branches as peter's asm version to support 
conditional gs/fsbase setting.  The other extra branches have to do with 
supporting cpu_switch() and cpu_throw() together.

>
>> 2)  The cost of extra icache for getting over all of those unused 
>> instructions, unaligned jumps, etc.
>
> Again, if this were the cause of slowness then it would affect the C
> version more, since the C version is larger.

The C version is not larger than the asm version at high optimization 
levels when you consider the total instruction count that is brought into 
the icache.  It's worth noting that my C version is slower in some cases 
other than the microbenchmark due to extra instructions for optimizations 
that don't matter.  Peter's asm version is tight enough that the extra 
compares don't cost more than the compacted code wins.  The C version 
touches more distinct icache lines but makes up for it in other 
optmiizations in the common case.

>
> In fact, the benchmark is probably too simple to show the cost of
> branches.  Just doing sched_yield() in a loop gives the following
> atypical behaviour which may be atypical enough for the larger branch
> and cache costs for the C version to not have much effect:
> - it doesn't go near most of the special cases, so branches are
>  predictable (always non-special) and are thus predicted provided
>  (a) the CPU actually does reasonably good branch prediction, and
>  (b) the branch predictions fit in the branch prediction cache
>      (reasonably good branch prediction probably requires such a
>      cache).

This cache is surely virtual as it happens in the first few stages of the 
pipeline.  That means it's flushed on every switch.  We're probably coming 
in cold every time.

> - it doesn't touch much icache or dcache or branch-cache, so
>  everything probably stays cached.
>
> If just the branch-cache were thrashed, then reasonably good dynamic
> branch prediction is impossible and things would be slow.  In the C
> version, you use predict_true() and predict_false() a lot.  This
> might improve static branch prediction but makes little difference
> if the branch cache is working.

I doubt there are any cases where the branch cache is effective here.  I 
don't know that for certain but it seems unlikely that it would be 
preserved across switches due to the complexity in validating addresses.

>
> The C version uses lots of non-inline function calls.  Just the
> branches for this would have a significant overhead if the branches
> are mispredicted.  I think you are depending on gcc's auto-inlining
> of static functions which are only called once to avoid the full
> cost of the function calls.

I depend on it not inlining them to avoid polluting the icache with unused 
instructions.  I broke that with my most recent patch by moving the calls 
back into C.

>
>> I haven't looked at i386 very closely lately but on amd64 the wrmsrs for 
>> fs/gsbase are very expensive.  On my 2ghz dual core opteron the optimized 
>> switch seems to take about 100ns.  The total switch from userspace to 
>> userspace is about 4x that.
>
> Probably avoiding these is the only significant large between all
> the versions.  You use predict_false() for executing them.  Are fsbase
> and gsbase really usually constant across processes?

If they are non threaded, yes.

>
> 400nS is about what I get for i386 on 2.2GHz A64 UP too (6.17 S for
> ./yield 1000000 10).  getpid() on this machine takes 180nS so it is
> unreasonable to expect sched_yield() to take much less than a few hundred
> nS.
>
> Some perfmon output for ./yield 100000 10:
>
> % # s/kx-ls-microarchitectural-resync-by-self-mod-code % 0
> % # s/kx-ls-buffer2-full % 909905
> % # s/kx-ls-retired-cflush-instructions % 0
> % # s/kx-ls-retired-cpuid-instructions % 0
> % # s/kx-dc-accesses % 496436422
> % # s/kx-dc-misses % 11102024
>
> 11 cache dmisses per yield.  Probably the main cause of slowness (main
> memory latency on this machine is 42 nsec so 11 cache misses takes
> 462 of the 617 nS per call?).

Yes I reduced that recently by reordering struct tdq and td_sched some. 
It would be even better if we could group the scheduling related fields of 
td_* near the bottom with td_sched.  This would require more tedius 
initialization in fork and would be prone to being disturbed by people 
adding fields to struct thread wherever they please.  Ultimately it 
doesn't matter that much except in this microbenchmarks anyway.

>
> % # s/kx-dc-refills-from-l2 % 0
> % # s/kx-dc-refills-from-system % 0
> % # s/kx-dc-writebacks % 0
> % # s/kx-dc-l1-dtlb-miss-and-l2-dtlb-hits % 3459100
> % # s/kx-dc-l1-and-l2-dtlb-misses % 2138231
> % # s/kx-dc-misaligned-references % 87
> % # s/kx-dc-microarchitectural-late-cancel-of-an-access % 73146415
> % # s/kx-dc-microarchitectural-early-cancel-of-an-access % 236927303
> % # s/kx-bu-cpu-clk-unhalted % 1303921314
> % # s/kx-ic-fetches % 236207869
> % # s/kx-ic-misses % 22988
>
> Insignificant icache misses.
>
> % # s/kx-ic-refill-from-l2 % 18979
> % # s/kx-ic-refill-from-system % 4191
> % # s/kx-ic-l1-itlb-misses % 0
> % # s/kx-ic-l1-l2-itlb-misses % 1619297
> % # s/kx-ic-instruction-fetch-stall % 1034570822
> % # s/kx-ic-return-stack-hit % 20822416
> % # s/kx-ic-return-stack-overflow % 5870
> % # s/kx-fr-retired-instructions % 701240247
> % # s/kx-fr-retired-ops % 1163464391
> % # s/kx-fr-retired-branches % 121636370
> % # s/kx-fr-retired-branches-mispredicted % 2761910
> % # s/kx-fr-retired-taken-branches % 93488548
> % # s/kx-fr-retired-taken-branches-mispredicted % 2848315
>
> 2.8 branches mispredicted per call.
>
> # s/kx-fr-retired-far-control-transfers % 2000934
>
> 1 int0x80 and 1 iret per shched_yield(), and apparentlty not much else.
>
> % # s/kx-fr-retired-resync-branches % 936968
> % # s/kx-fr-retired-near-returns % 19008374
> % # s/kx-fr-retired-near-returns-mispredicted % 784103
>
> 0.8 returns mispredicted per call.
>
> % # s/kx-fr-retired-taken-branches-mispred-by-addr-miscompare % 721241
> % # s/kx-fr-interrupts-masked-cycles % 658462615
>
> Ugh, this is from spinlocks bogusly masking interrupts.  More than half
> the cycles have interrupts masked.  This at least shows that lots of
> time is being spent near cpu_switch() with a spinlock held.
>

I'm not sure why you feel masking interrupts in spinlocks is bogus.  It's 
central to our SMP strategy.  Unless you think we should do it lazily like 
we do with critical_*.  I know jhb had that working at one point but it 
was abandoned.

> % # s/kx-fr-interrupts-masked-while-pending-cycles % 9365
>
> Since the CPU is reasonably fast, interrupts aren't masked for very long
> each time.  This maximum is still 4.5 uS.
>
> % # s/kx-fr-hardware-interrupts % 63
> % # s/kx-fr-decoder-empty % 247898696
> % # s/kx-fr-dispatch-stalls % 589228741
> % # s/kx-fr-dispatch-stall-from-branch-abort-to-retire % 39894120
> % # s/kx-fr-dispatch-stall-for-serialization % 44037193
> % # s/kx-fr-dispatch-stall-for-segment-load % 134520281
>
> 134 cyles per call.  This may be more for ones in syscall() generally.
> I think each segreg load still costs ~20 cycles.  Since this is on
> i386, there are 6 per call (%ds, %es and %fs save and restore), plus
> %ss save and which might not be counted here.  134 is a lot -- about
> 60nS of the 180nS for getpid().
>
> % # s/kx-fr-dispatch-stall-when-reorder-buffer-is-full % 18648001
> % # s/kx-fr-dispatch-stall-when-reservation-stations-are-full % 121485247
> % # s/kx-fr-dispatch-stall-when-fpu-is-full % 19
> % # s/kx-fr-dispatch-stall-when-ls-is-full % 203578275
> % # s/kx-fr-dispatch-stall-when-waiting-for-all-to-be-quiet % 63136307
> % # s/kx-fr-dispatch-stall-when-far-xfer-or-resync-br-pending % 6994131
>
>>> In-kernel switches are not a very typical case since they don't load
>>> %cr3...
>> 
>> We've been working on amd64 so I can't comment specifically about i386 
>> costs. However, I definitely agree that cpu_switch() is not the greatest 
>> overhead in the path.  Also, you have to load cr3 even for kernel threads 
>> because the page directory page or page directory pointer table at %cr3 can 
>> go away once you've switched out the old thread.
>
> I don't see this.  The switch is avoided if %cr3 wouldn't change, which
> I think usually or always happens for switches between kernel threads.

I see, you're saying 'between kernel threads'.  There was some discussion 
of allowing kernel threads to use the page tables of whichever thread was 
last switched in to avoid cr3 in all cases for them.  This requires other 
changes to be safe however.

>
>>> The asm code already saves only call-saved registers for both i386 and
>>> amd64.  It saves call-saved registers even when it apparently doesn't
>>> use them (lots more of these on amd64, while on i386 it uses more
>>> call-saved registers than it needs to, apparently since this is free
>>> after saving all call-saved registers).  I think saving more than is
>>> needed is the result of confusion about what needs to be saved and/or
>>> what is needed for debugging.
>> 
>> It has to save all of the callee saved registers in the PCB because they 
>> will likely differ from thread to thread.  Failing to save and restore them 
>> could leave you returning with the registers having different values and 
>> corrupt the calling function.
>
> Yes, I had forgotten the detail of how the non-local flow of control can
> change the registers (the next call to the function in the context of
> the switched-to-process may have different values in the registers due
> to changes to the registers in callers).
>
> All that can be done differently here is saving all the registers on the
> stack (except %esp) in the usual way.  This would probably be faster on
> old i386's using pushal or pushl, but on amd64 pushal is not available,
> and on Athlons generally (before Barcelona?) it is faster not to use pushl,
> so on amd64 the registers should be saved using movl and then it is just
> as easy to put them in the pcb as on the stack.
>
>>>> The good news is that this tuning is finally being done.  It should
>>>> have been done in 2003 though...
>>> 
>>> How is this possible with (according to my theory) most of the context
>>> switch cost being for %cr3 and upper layers?  Unchanged amd64 has only
>>> a few more costs than i386.  Mainly 3 unconditional wrmsr's and 2
>>> unconditional rdmsr's for managing gsbase and fsbase.  I thought that
>>> these were hard to avoid and anyway not nearly as expensive as %cr3 loads.
>> 
>> %cr3 is actually a lot less expensive these days with page table flush 
>> filters and the PG_G bit.  We were able to optimize away setting the msrs 
>> in the case that the previous values match the new values.  Apparently the 
>> hardware doesn't optimize this case so we have to do comparisons ourselves.
>> 
>> That was a big chunk of the optimization.  Static branch hints, reordering 
>> code, possibly reordering for better pipeline scheduling in peter's asm, 
>> etc. provide the rest.
>
> All the old i386 asm and probably clones of it on amd64 is certainly not
> optimized globally for anything newer than an i386 (barely even an i486).
> This rarely matters however.  It lost more on Pentium-1's, but now out of
> order execution and better branch prediction hides most inefficiencies.
>
> Bruce
>

Jeff

From owner-freebsd-arch@FreeBSD.ORG  Thu Mar 13 23:51:18 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1B4461065671
	for <arch@freebsd.org>; Thu, 13 Mar 2008 23:51:18 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outY.internet-mail-service.net (outY.internet-mail-service.net
	[216.240.47.248])
	by mx1.freebsd.org (Postfix) with ESMTP id AE19F8FC22
	for <arch@freebsd.org>; Thu, 13 Mar 2008 23:51:17 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160)
	by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP;
	Thu, 13 Mar 2008 16:51:17 -0700
Received: from julian-mac.elischer.org (localhost [127.0.0.1])
	by idiom.com (Postfix) with ESMTP id 832402D600F;
	Thu, 13 Mar 2008 16:51:16 -0700 (PDT)
Message-ID: <47D9BDF3.80409@elischer.org>
Date: Thu, 13 Mar 2008 16:51:15 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213)
MIME-Version: 1.0
To: Jeff Roberson <jroberson@chesapeake.net>
References: <20080310161115.X1091@desktop>
	<47D758AC.2020605@freebsd.org>	<e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>	<20080313124213.J31200@delplex.bde.org>
	<20080312211834.T1091@desktop>	<20080313230809.W32527@delplex.bde.org>
	<20080313132152.Y1091@desktop>
In-Reply-To: <20080313132152.Y1091@desktop>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: arch@freebsd.org, David Xu <davidxu@freebsd.org>,
	Peter Wemm <peter@wemm.org>
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2008 23:51:18 -0000

Jeff Roberson wrote:
>
> 
> I'm not sure why you feel masking interrupts in spinlocks is bogus.  
> It's central to our SMP strategy.  Unless you think we should do it 
> lazily like we do with critical_*.  I know jhb had that working at one 
> point but it was abandoned.
> 
>

My memory is that we used to mask interrupts lazily in 4.x

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 02:07:07 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0FC731065672
	for <arch@hub.freebsd.org>; Fri, 14 Mar 2008 02:07:07 +0000 (UTC)
	(envelope-from davidxu@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id EBE958FC17;
	Fri, 14 Mar 2008 02:07:06 +0000 (UTC)
	(envelope-from davidxu@FreeBSD.org)
Received: from apple.my.domain (root@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m2E272FR055536;
	Fri, 14 Mar 2008 02:07:03 GMT (envelope-from davidxu@freebsd.org)
Message-ID: <47D9DE17.7030605@freebsd.org>
Date: Fri, 14 Mar 2008 10:08:23 +0800
From: David Xu <davidxu@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.9 (X11/20071211)
MIME-Version: 1.0
To: Jeff Roberson <jroberson@chesapeake.net>
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
	<e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
	<20080313124213.J31200@delplex.bde.org>
	<20080312211834.T1091@desktop>
	<20080313230809.W32527@delplex.bde.org>
	<20080313132152.Y1091@desktop>
In-Reply-To: <20080313132152.Y1091@desktop>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: arch@FreeBSD.org, Peter Wemm <peter@wemm.org>
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 02:07:07 -0000

Jeff Roberson wrote:

>> Ugh, this is from spinlocks bogusly masking interrupts.  More than half
>> the cycles have interrupts masked.  This at least shows that lots of
>> time is being spent near cpu_switch() with a spinlock held.
>>
> 
> I'm not sure why you feel masking interrupts in spinlocks is bogus.  
> It's central to our SMP strategy.  Unless you think we should do it 
> lazily like we do with critical_*.  I know jhb had that working at one 
> point but it was abandoned.

It may be that general mutex already does spinning, so spinlock is used
only when interrupt should be enabled and disabled which is expensive.
I don't know how many spinlocks are abused in CURRENT source code.

Regards,
David Xu


From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 02:19:09 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 42C121065671;
	Fri, 14 Mar 2008 02:19:09 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail14.syd.optusnet.com.au (mail14.syd.optusnet.com.au
	[211.29.132.195])
	by mx1.freebsd.org (Postfix) with ESMTP id CA3268FC14;
	Fri, 14 Mar 2008 02:19:08 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au
	(c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11])
	by mail14.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	m2E2Ii5F014565
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 14 Mar 2008 13:18:49 +1100
Date: Fri, 14 Mar 2008 13:18:44 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Julian Elischer <julian@elischer.org>
In-Reply-To: <47D9BDF3.80409@elischer.org>
Message-ID: <20080314115225.G34431@delplex.bde.org>
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
	<e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
	<20080313124213.J31200@delplex.bde.org> <20080312211834.T1091@desktop>
	<20080313230809.W32527@delplex.bde.org> <20080313132152.Y1091@desktop>
	<47D9BDF3.80409@elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Peter Wemm <peter@wemm.org>, David Xu <davidxu@freebsd.org>,
	arch@freebsd.org
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 02:19:09 -0000

On Thu, 13 Mar 2008, Julian Elischer wrote:

> Jeff Roberson wrote:
>> I'm not sure why you feel masking interrupts in spinlocks is bogus.  It's 
>> central to our SMP strategy.  Unless you think we should do it lazily like 
>> we do with critical_*.  I know jhb had that working at one point but it was 
>> abandoned.

Masking interrupts in spinlocks breaks fast interrupts among other things.

Yes, I think it should be done like in critical_*.  My version has
done this for 6 years or so, but I don't really care about SMP and
never made it work right for SMP.  Its main impact is on fast interrupt
handlers.  Interrupt handlers cannot access any data that is not locked,
and for non-broken fast interrupt handlers, in practice this means not
accessing any global data, since locking global data would be too hard
and/or slow.  Global data includes all per-CPU-data, and I enforce
non-access to this by loading %fs with 0 in fast interrupt handlers.
This makes fast interrupt handlers quite difficult to write.  An
interrupt handler like hardclock(), which stomps around in global data,
in some places without even locking the data, is far too large and
complicated to be a non-broken fast interrupt handler.  I use normal
interrupt handlers for hardclock() and statclock() so my lower interrupt
latency costs performance.

> My memory is that we used to mask interrupts lazily in 4.x

Right.  Only for i386.  The masking is in the PIC so it only affects
devices on non-fast interrupts, which should only be slow devices.
Lazy masking for critical_*() has the same results (it only affects
non-fast interrupts) although its mechanism is different.

I implemented this in 386BSD and am unhappy that it was broken in
SMPng, though with CPUs hundreds of times faster than they were when
386BSD was new, and with devices not so much faster and/or with larger
buffers, the extra latency rarely matters in practice; also, with SMP
there is only extra latency if all CPUs happen to hold a spinlock at
the same time.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 03:00:00 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 82B9D1065675;
	Fri, 14 Mar 2008 03:00:00 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au
	[211.29.133.51])
	by mx1.freebsd.org (Postfix) with ESMTP id 013138FC15;
	Fri, 14 Mar 2008 02:59:59 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au
	(c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11])
	by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	m2E2xk4O002740
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 14 Mar 2008 13:59:47 +1100
Date: Fri, 14 Mar 2008 13:59:46 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Jeff Roberson <jroberson@chesapeake.net>
In-Reply-To: <20080313132152.Y1091@desktop>
Message-ID: <20080314132033.I34431@delplex.bde.org>
References: <20080310161115.X1091@desktop> <47D758AC.2020605@freebsd.org>
	<e7db6d980803120125y41926333hb2724ecd07c0ac92@mail.gmail.com>
	<20080313124213.J31200@delplex.bde.org> <20080312211834.T1091@desktop>
	<20080313230809.W32527@delplex.bde.org> <20080313132152.Y1091@desktop>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org, Peter Wemm <peter@wemm.org>,
	David Xu <davidxu@freebsd.org>
Subject: Re: amd64 cpu_switch in C.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 03:00:00 -0000

On Thu, 13 Mar 2008, Jeff Roberson wrote:

Please trim quotes more.

> On Fri, 14 Mar 2008, Bruce Evans wrote:
>
>> On Wed, 12 Mar 2008, Jeff Roberson wrote:

>>> More expensive than the raw instruction count is:
>>> 
>>> 1)  The mispredicted branches to deal with all of the optional state and 
>>> features that are not always saved.
>> 
>> This is unlikely to matter, and apparently doesn't, at least in simple
>> benchmarks, since the C version has even more branches.  Features that
>> are rarely used cause branches that are usually perfectly predicted.
>
> The c version has two fewer branches because it tests for two unlikely 
> features together.  It has a few more branches than the in cvs asm version 
> and the same number of extra branches as peter's asm version to support 
> conditional gs/fsbase setting.  The other extra branches have to do with 
> supporting cpu_switch() and cpu_throw() together.

Testing features together is probably best here, but it might not
always be.  Execution more branches might be faster because each
individual branch is easier to predict.

>>> 2)  The cost of extra icache for getting over all of those unused 
>>> instructions, unaligned jumps, etc.
>> 
>> Again, if this were the cause of slowness then it would affect the C
>> version more, since the C version is larger.
>
> The C version is not larger than the asm version at high optimization levels 
> when you consider the total instruction count that is brought into the 
> icache.  It's worth noting that my C version is slower in some cases other 
> than the microbenchmark due to extra instructions for optimizations that 
> don't matter.  Peter's asm version is tight enough that the extra compares 
> don't cost more than the compacted code wins.  The C version touches more 
> distinct icache lines but makes up for it in other optmiizations in the 
> common case.

Are calls to rarely-called functions getting auto-inlined for your C
version?  THe asm version doesn't worry about this.  Even with
auto-inlining of static functions that are only called once (a new
bugfeature in gcc-4.1 which breaks profiling and debugging), at some
optimization levels gcc will place code for the unusual case far away
so as not to pollute the i-cache in the usual case although this may
cost an extra branch in the unusual case.  For rarely-called functions,
it must be better to not inline too.

>> In fact, the benchmark is probably too simple to show the cost of
>> branches.  Just doing sched_yield() in a loop gives the following
>> atypical behaviour which may be atypical enough for the larger branch
>> and cache costs for the C version to not have much effect:
>> - it doesn't go near most of the special cases, so branches are
>>  predictable (always non-special) and are thus predicted provided
>>  (a) the CPU actually does reasonably good branch prediction, and
>>  (b) the branch predictions fit in the branch prediction cache
>>      (reasonably good branch prediction probably requires such a
>>      cache).
>
> This cache is surely virtual as it happens in the first few stages of the 
> pipeline.  That means it's flushed on every switch.  We're probably coming in 
> cold every time.

Which cache?  My perfmon results show that the branch cache is far from cold.

>> The C version uses lots of non-inline function calls.  Just the
>> branches for this would have a significant overhead if the branches
>> are mispredicted.  I think you are depending on gcc's auto-inlining
>> of static functions which are only called once to avoid the full
>> cost of the function calls.
>
> I depend on it not inlining them to avoid polluting the icache with unused 
> instructions.  I broke that with my most recent patch by moving the calls 
> back into C.

:-) Maybe I only looked at the most recent patch.  It seemed to have lots
of calls.

To prevent inlining you probably need to use the noinline attribute for
some functions.  I don't see how the C version can be both simpler and
(as|more) optimal than the asm version.  It already has magic somewhat
self-documenting macros for branch prediction and magic undocumented 
layout for the function calls etc. to improve branch prediction and
icache use.  For even-more-micro optimizations in libm, I try to do
everything in C, but the only way I can get near the efficiency that
I want is to look at the asm output and then figure out how to trick
the compiler into not being so stupid. I could optimize it in asm
with less work (starting with the asm output, especially at first to
learn what works for SSE scheduling), but only for a single CPU type.

>> Some perfmon output for ./yield 100000 10:
>> ...
>> % # s/kx-fr-dispatch-stall-for-segment-load % 134520281
>> 
>> 134 cyles per call.  This may be more for ones in syscall() generally.
>> I think each segreg load still costs ~20 cycles.  Since this is on
>> i386, there are 6 per call (%ds, %es and %fs save and restore), plus
>> %ss save and which might not be counted here.  134 is a lot -- about
>> 60nS of the 180nS for getpid().

I forgot about parallelism.  With 3-way execution on an Athlon, there
is at least a chance that all 3 segment registers are loaded in parallel,
taking only ~20 cycles for all 3, but no chance of proceeding with
other instructions if so.  OTOH, if only 1 or 2 ALUs can do segreg
loads, then the other ALUs may be able to proceed with independent
instructions.  We have some nearby instructions that depend on %ds
(these might benefit from using %ss) but few or no nearby dependencies
on %es and %fs.  Kernel code mostly doesn't worry about dependencies
at all.  Dependencies don't matter as much in integer code as in SSE/FPU
code.

>>> We've been working on amd64 so I can't comment specifically about i386 
>>> costs. However, I definitely agree that cpu_switch() is not the greatest 
>>> overhead in the path.  Also, you have to load cr3 even for kernel threads 
>>> because the page directory page or page directory pointer table at %cr3 
>>> can go away once you've switched out the old thread.
>> 
>> I don't see this.  The switch is avoided if %cr3 wouldn't change, which
>> I think usually or always happens for switches between kernel threads.
>
> I see, you're saying 'between kernel threads'.  There was some discussion of 
> allowing kernel threads to use the page tables of whichever thread was last 
> switched in to avoid cr3 in all cases for them.  This requires other changes 
> to be safe however.

Probably a good idea.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 05:58:23 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 13B791065676
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 05:58:23 +0000 (UTC)
	(envelope-from joseph.koshy@gmail.com)
Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.156])
	by mx1.freebsd.org (Postfix) with ESMTP id 8CA9B8FC25
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 05:58:22 +0000 (UTC)
	(envelope-from joseph.koshy@gmail.com)
Received: by fg-out-1718.google.com with SMTP id 16so3271767fgg.35
	for <freebsd-arch@freebsd.org>; Thu, 13 Mar 2008 22:58:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=ba2RW5BQ+x3ZpKBwn6UASgrnMPMgo60kJVs+Ojjo2Lg=;
	b=caT+5axDR36Znl8/hXU6ZpvYxlxikwT0gCmJ8VTDQZym9squ++4unFfDXTZ4zMhXsPWK9PrTDj6RhZ5QNYfdKYWhFph2F+vQa62R46GpEYCOAyZHp2heK0LSSOuPI6Cu5zIxClirFL/lmyiONIbY9GGbpFKPPAKB5gxXMMj2uPg=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=DpvGQU5I4tXIOvC2YQg38dLo9oK3zI9TIHkKpr3w0TgacKTPSAYH6vHcfptbUtwDhQ35QaSNSPE0hHzYaX8Jy0Nvco58LAz864RL37et7edRfdzmrLu/iuF86O7OTme13hq11ky3vZ7nTYzuuTHULUSFUsTSUnJo4gwet5yThrA=
Received: by 10.86.68.20 with SMTP id q20mr2306814fga.59.1205472763829;
	Thu, 13 Mar 2008 22:32:43 -0700 (PDT)
Received: by 10.86.99.18 with HTTP; Thu, 13 Mar 2008 22:32:43 -0700 (PDT)
Message-ID: <84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
Date: Fri, 14 Mar 2008 11:02:43 +0530
From: "Joseph Koshy" <joseph.koshy@gmail.com>
To: "John Baldwin" <jhb@freebsd.org>
In-Reply-To: <200803131516.12284.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20080313180805.GA83406@dragon.NUXI.org>
	<200803131516.12284.jhb@freebsd.org>
Cc: freebsd-arch@freebsd.org
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of
	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 05:58:23 -0000

On Fri, Mar 14, 2008 at 12:46 AM, John Baldwin <jhb@freebsd.org> wrote:
> On Thursday 13 March 2008 02:08:05 pm David O'Brien wrote:
>  > Hi folks,
>  > Some folks at Juniper have submitted these changes to hwpmc(4).
>  > I am sending them here for public review.
>  >
>  > Their thoughts are:
>  >     The mp_ncpus refers to the count of the active CPU's.  Where as
>  >     mp_maxid refers to the count of all the cpus on the SMP.  Using
>  >     mp_ncpus in the cpu_id range-check of hwpmc module would lead to the
>  >     assumption that all the active CPU's in the SMP are not interleaved.
>  >     But for running on some platforms, the active and inactive cpus could
>  >     be interleaved making hwpmc not work for the cpus whose cpu_id is
>  >     greater than the active-cpu count.

jhb>  This is correct, but you need to handle CPUs that are absent.  It might be
jhb>  sufficient to update pmc_cpu_is_disabled() in kern_pmc.c to check
jhb>  CPU_ABSENT(cpu) and claim the CPU is disabled if it is absent, but I'm not
jhb>  sure that will catch everything as that seems aimed at handling having a
jhb>  non-absent CPU halted (such as disabling HTT on i386).

That is inline with the feedback (and sample patch to kern_pmc.c) that I
had sent in to O'Brien.

But there are other problems with the patch at various levels,
probably not obvious to someone who is just looking at the kernel
code.

First, the relevance.  My understanding is that these changes are for
a proprietary SMP platform that uses non-mainstream (Tier3 or
Tier4) CPUs.  It so happens that Juniper decided to numbers CPUs
'sparsely' in their kernel variant and that is the motivation for this
patch.

IMO, as a policy, code changes for exotic hardware need to be
maintained by vendors of said exotic hardware and not dumped on
volunteers.

Second, when I designed the PMCTools API I didn't consider that CPU
numbers could be 'sparse'.  [They aren't sparsely allocated
on the i386/amd64---the code I looked at when I was designing
PmcTools.]  So there are assumptions sprinkled throughout userland
that that the integers 0..hw.ncpus  can select a valid CPU.  While
all that can be tracked down and changed, and documentation updated,
it is still work that I would prefer to defer until there is a chance
that someone
in the general public can use it.  I do need to prioritize how I spend my
volunteer hours.

Third, IFF we as a project are going to support 'sparse CPU numbering,
I would like to see the form that takes before making changes to
HWPMC and tools.  For example:
- How will userland and in-kernel modules find out which CPUs are
  physically present?   Would there be a bitmask on the lines of today's
  machdep.hlt_cpus that we could query?  Could we make the
  'all_cpus' bitmask visible to userland?  What happens when we
  start supporting systems with more than 32 processors?
- Will sysctl hw.ncpus represent the count of present CPUs or will it
  represent the maximum CPU id?
- How will userland  distinguish between absent CPUs those that
  could be temporarily administratively disabled?
- Are we going to support 'transient' CPUs [that come and go]?  Why
  would we want sparse CPU numbering otherwise?

Nit: 'mp_maxid' appears to be an index, not a count as claimed above.

If support for sparse CPU numbering is something useful, I feel the
correct sequence should be to discuss it here, add sparse CPU
numbering to the base i386/amd64 kernels (say) first and then
propagate the feature to auxiliary code like HWPMC and userland.

Changing HWPMC and its userland before the base kernel itself
changes does not seem to be the right thing to do.

Regards,
Koshy

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 06:13:23 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F086F106566B;
	Fri, 14 Mar 2008 06:13:22 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id B4FFE8FC1D;
	Fri, 14 Mar 2008 06:13:22 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com
	[24.94.75.93]) (authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	m2E6DHdG045317; Fri, 14 Mar 2008 02:13:18 -0400 (EDT)
	(envelope-from jroberson@chesapeake.net)
Date: Thu, 13 Mar 2008 20:14:27 -1000 (HST)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@desktop
To: Joseph Koshy <joseph.koshy@gmail.com>
In-Reply-To: <84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
Message-ID: <20080313200839.S1091@desktop>
References: <20080313180805.GA83406@dragon.NUXI.org>
	<200803131516.12284.jhb@freebsd.org>
	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of
	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 06:13:23 -0000

On Fri, 14 Mar 2008, Joseph Koshy wrote:

> On Fri, Mar 14, 2008 at 12:46 AM, John Baldwin <jhb@freebsd.org> wrote:
>> On Thursday 13 March 2008 02:08:05 pm David O'Brien wrote:
>> > Hi folks,
>> > Some folks at Juniper have submitted these changes to hwpmc(4).
>> > I am sending them here for public review.
>> >
>> > Their thoughts are:
>> >     The mp_ncpus refers to the count of the active CPU's.  Where as
>> >     mp_maxid refers to the count of all the cpus on the SMP.  Using
>> >     mp_ncpus in the cpu_id range-check of hwpmc module would lead to the
>> >     assumption that all the active CPU's in the SMP are not interleaved.
>> >     But for running on some platforms, the active and inactive cpus could
>> >     be interleaved making hwpmc not work for the cpus whose cpu_id is
>> >     greater than the active-cpu count.
>
> jhb>  This is correct, but you need to handle CPUs that are absent.  It might be
> jhb>  sufficient to update pmc_cpu_is_disabled() in kern_pmc.c to check
> jhb>  CPU_ABSENT(cpu) and claim the CPU is disabled if it is absent, but I'm not
> jhb>  sure that will catch everything as that seems aimed at handling having a
> jhb>  non-absent CPU halted (such as disabling HTT on i386).
>
> That is inline with the feedback (and sample patch to kern_pmc.c) that I
> had sent in to O'Brien.
>
> But there are other problems with the patch at various levels,
> probably not obvious to someone who is just looking at the kernel
> code.
>
> First, the relevance.  My understanding is that these changes are for
> a proprietary SMP platform that uses non-mainstream (Tier3 or
> Tier4) CPUs.  It so happens that Juniper decided to numbers CPUs
> 'sparsely' in their kernel variant and that is the motivation for this
> patch.
>
> IMO, as a policy, code changes for exotic hardware need to be
> maintained by vendors of said exotic hardware and not dumped on
> volunteers.

In general we accept vendor patches that are not disruptive even in the 
case that the general communit doesn't perceive the real value.  It is 
important for us to work with and encourage vendors.

>
> Second, when I designed the PMCTools API I didn't consider that CPU
> numbers could be 'sparse'.  [They aren't sparsely allocated
> on the i386/amd64---the code I looked at when I was designing
> PmcTools.]  So there are assumptions sprinkled throughout userland
> that that the integers 0..hw.ncpus  can select a valid CPU.  While
> all that can be tracked down and changed, and documentation updated,
> it is still work that I would prefer to defer until there is a chance
> that someone
> in the general public can use it.  I do need to prioritize how I spend my
> volunteer hours.

We're not asking you to support the feature.  It looks like juniper 
already has it tested and working.  We just need someone to review the 
patches and commit them.

>
> Third, IFF we as a project are going to support 'sparse CPU numbering,
> I would like to see the form that takes before making changes to
> HWPMC and tools.  For example:

The majority of the kernel already deals with sparse cpu mappings.  That's 
why we have CPU_ABSENT().  Please look at UMA and ULE for examples of code 
that I have written which use this macro correctly.  I'm sure there are 
other places that do as well that I'm not familiar with.

> - How will userland and in-kernel modules find out which CPUs are
>  physically present?   Would there be a bitmask on the lines of today's
>  machdep.hlt_cpus that we could query?  Could we make the
>  'all_cpus' bitmask visible to userland?  What happens when we
>  start supporting systems with more than 32 processors?

The kernel has the various cpumasks available in sys/smp.h.  Userland can 
now use cpusets to find out what processors are available to it.  In the 
future we are going to replace simple cpumasks with the cpuset_t structure 
from cpusets so on machines that support more than sizeof(register) * 8 
processors we will use arrays.

> - Will sysctl hw.ncpus represent the count of present CPUs or will it
>  represent the maximum CPU id?

That is the number of cpus not the maximum id.

> - How will userland  distinguish between absent CPUs those that
>  could be temporarily administratively disabled?

We don't presently make the distinction to the user.

> - Are we going to support 'transient' CPUs [that come and go]?  Why
>  would we want sparse CPU numbering otherwise?

That is a much more difficult problem and one which we have discussed for 
virtualization purposes.   This patch would further that eventual goal 
although obviously there is more work to do to get there.

>
> Nit: 'mp_maxid' appears to be an index, not a count as claimed above.

Yes, that is unfortunate for these purposes.

>
> If support for sparse CPU numbering is something useful, I feel the
> correct sequence should be to discuss it here, add sparse CPU
> numbering to the base i386/amd64 kernels (say) first and then
> propagate the feature to auxiliary code like HWPMC and userland.
>
> Changing HWPMC and its userland before the base kernel itself
> changes does not seem to be the right thing to do.

The rest of the generic code in the kernel already supports this.  Juniper 
claims to have tested and is using this feature.  Furthermore, it will get 
us a tiny step closer to being able to support pluggable cpus in a 
virtualized environment.

Thanks,
Jeff

>
> Regards,
> Koshy
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 06:40:36 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 789501065671
	for <freebsd-arch@FreeBSD.ORG>; Fri, 14 Mar 2008 06:40:36 +0000 (UTC)
	(envelope-from imp@bsdimp.com)
Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85])
	by mx1.freebsd.org (Postfix) with ESMTP id 215C98FC1C
	for <freebsd-arch@FreeBSD.ORG>; Fri, 14 Mar 2008 06:40:36 +0000 (UTC)
	(envelope-from imp@bsdimp.com)
Received: from localhost (localhost [127.0.0.1])
	by harmony.bsdimp.com (8.14.2/8.14.1) with ESMTP id m2E6b7u8084978;
	Fri, 14 Mar 2008 00:37:07 -0600 (MDT) (envelope-from imp@bsdimp.com)
Date: Fri, 14 Mar 2008 00:37:49 -0600 (MDT)
Message-Id: <20080314.003749.-432746071.imp@bsdimp.com>
To: jroberson@chesapeake.net
From: "M. Warner Losh" <imp@bsdimp.com>
In-Reply-To: <20080313200839.S1091@desktop>
References: <200803131516.12284.jhb@freebsd.org>
	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
	<20080313200839.S1091@desktop>
X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of
 'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 06:40:36 -0000

In message: <20080313200839.S1091@desktop>
            Jeff Roberson <jroberson@chesapeake.net> writes:
: In general we accept vendor patches that are not disruptive even in the 
: case that the general communit doesn't perceive the real value.  It is 
: important for us to work with and encourage vendors.
...
: The rest of the generic code in the kernel already supports this.  Juniper 
: claims to have tested and is using this feature.  Furthermore, it will get 
: us a tiny step closer to being able to support pluggable cpus in a 
: virtualized environment.

I'd like to echo these sentiments.  We've generally been willing to
accept code from vendors that makes their lives easier, even when that
code doesn't directly benefit the project.  We do this on the theory
that if we make their life easy, they will contribute to the project.
Juniper has certainly given a large chunk of code to the project (a
fairly complete MIPS port that has been integrated with the so-called
"mips2" port and will be headed into the tree soonish), which is
certainly a lot more code than has been given from vendors whom we've
made much bigger accommodations to.

In this case a vendor came forward with a patch that introduces no
real additional burdon to the volunteers who are maintaining the
code.  It seems like a no brainer to me to commit it.  There's
certainly no compelling technical argument against it.

I work for Cisco.  Cisco has no love for Juniper, and vice versa.
However, I put that aside for the good of the project and work with
people from Juniper all the time to make the project better by
focusing on the technology.  The project has similar expectations for
all its developers: if there's a technical reason to not do something,
then that's OK.  If there's a political reason, especially one that
isn't shared honestly an openly, then the bar is much much higher to
exclude the technology from the tree.  What would people think if I
were to block the MIPS stuff from Juniper just because it came from
Juniper and I work for a company that is in competition with Juniper?
I don't think it would be too favorable.

Warner

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 11:40:05 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3DBF91065672
	for <freebsd-arch@FreeBSD.ORG>; Fri, 14 Mar 2008 11:40:05 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 285458FC20
	for <freebsd-arch@FreeBSD.ORG>; Fri, 14 Mar 2008 11:40:04 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 1945D46B8F;
	Fri, 14 Mar 2008 07:40:04 -0400 (EDT)
Date: Fri, 14 Mar 2008 11:40:04 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: "M. Warner Losh" <imp@bsdimp.com>
In-Reply-To: <20080314.003749.-432746071.imp@bsdimp.com>
Message-ID: <20080314112104.I60466@fledge.watson.org>
References: <200803131516.12284.jhb@freebsd.org>
	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
	<20080313200839.S1091@desktop>
	<20080314.003749.-432746071.imp@bsdimp.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of
	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 11:40:05 -0000

On Fri, 14 Mar 2008, M. Warner Losh wrote:

> I'd like to echo these sentiments.  We've generally been willing to accept 
> code from vendors that makes their lives easier, even when that code doesn't 
> directly benefit the project.  We do this on the theory that if we make 
> their life easy, they will contribute to the project. Juniper has certainly 
> given a large chunk of code to the project (a fairly complete MIPS port that 
> has been integrated with the so-called "mips2" port and will be headed into 
> the tree soonish), which is certainly a lot more code than has been given 
> from vendors whom we've made much bigger accommodations to.
>
> In this case a vendor came forward with a patch that introduces no real 
> additional burdon to the volunteers who are maintaining the code.  It seems 
> like a no brainer to me to commit it.  There's certainly no compelling 
> technical argument against it.

I think (hope?) everyone here would generally agree on the point regarding 
vendors.  However, I think there is a technical point being made as well, and 
we're at risk of losing track of it.

Koshy has pointed out that changing just the kernel parts is *insufficient* to 
remove the assumption of non-sparse CPU identifiers, because the kernel parts 
are not all there is to hwpmc.  The KASSERT()s document not just the 
assumptions of the kernel code, which are updated by the proposed patch, but 
also relate to the guarantees made by the user APIs for hwpmc libraries, 
tools, and documentation.  They are directly affected by the proposed change 
because they both expose and rely on the non-sparse CPU identifier assumption, 
and also need to be updated to reflect the changed assumption.

FWIW, we should reemphasize here that sparse CPU identifiers, although not all 
that well-supported by our kernel, do exist and function today on all the SMP 
architectures that we support.  The hyperthreading disable frob introduced a 
few years ago leads to sparse identifiers for live CPUs on i386 and amd64, and 
triggered problems in several pieces of code (now believed to mostly be 
resolved?).  We do need a better general infrastructure for handling CPU 
information, and the cpuset(2) API starts to address this.  I understand that 
a man page for this will materialize soon :-).

Still missing, and something to discuss in detail at the devsummit since it 
will require non-trivial architectural changes, is how to handle live CPU 
reconfiguration, which is increasingly relevant due to hypervisor-driven 
virtualization.  It became rapidly clear when the HTT frob was a run-time 
changeable sysctl (no longer true, I hope) that changing the set of "absent" 
CPUs at run time caused our kernel to behave in relatively catastrophic ways, 
and should be avoided, and that's just a hint in the direction of the changes 
we'll need to make to fully support hotplug.  Universal support for sparse CPU 
identifiers throughout the system is just one prerequisite for getting to 
hotplug.

Robert N M Watson
Computer Laboratory
University of Cambridge

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 16:22:42 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B805D106566C
	for <arch@FreeBSD.org>; Fri, 14 Mar 2008 16:22:42 +0000 (UTC)
	(envelope-from imp@bsdimp.com)
Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85])
	by mx1.freebsd.org (Postfix) with ESMTP id 88E5B8FC15
	for <arch@FreeBSD.org>; Fri, 14 Mar 2008 16:22:42 +0000 (UTC)
	(envelope-from imp@bsdimp.com)
Received: from localhost (localhost [127.0.0.1])
	by harmony.bsdimp.com (8.14.2/8.14.1) with ESMTP id m2EGKrjh098511
	for <arch@FreeBSD.ORG>; Fri, 14 Mar 2008 10:20:54 -0600 (MDT)
	(envelope-from imp@bsdimp.com)
Date: Fri, 14 Mar 2008 10:21:37 -0600 (MDT)
Message-Id: <20080314.102137.-2034679600.imp@bsdimp.com>
To: arch@FreeBSD.org
From: "M. Warner Losh" <imp@bsdimp.com>
X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Multipart/Mixed;
	boundary="--Next_Part(Fri_Mar_14_10_21_37_2008_059)--"
Content-Transfer-Encoding: 7bit
Cc: 
Subject: BUS_DMA_ISA unused, planning on removing
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 16:22:42 -0000

----Next_Part(Fri_Mar_14_10_21_37_2008_059)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Greetings,

It appears that BUS_DMA_ISA is unused:

find . -name \*.c -o -name \*.h | xargs egrep BUS_DMA_ISA
./ia64/isa/isa_dma.c:                          /*flags*/BUS_DMA_ISA,
./sys/bus_dma.h:#define BUS_DMA_ISA             0x400   /* map memory for AXP ISA dma */

I talked to Marcel, and he's cool with removing it.  Can anybody see a
reason not to GC this?

Warner

----Next_Part(Fri_Mar_14_10_21_37_2008_059)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="bus-dma-isa.diff"

Index: ia64/isa/isa_dma.c
===================================================================
RCS file: /pe/ncvs/src/sys/ia64/isa/isa_dma.c,v
retrieving revision 1.10
diff -u -r1.10 isa_dma.c
--- ia64/isa/isa_dma.c	9 Jul 2007 04:58:16 -0000	1.10
+++ ia64/isa/isa_dma.c	14 Mar 2008 16:17:39 -0000
@@ -106,7 +106,7 @@
 			       /*filter*/NULL, /*filterarg*/NULL,
 			       /*maxsize*/bouncebufsize,
 			       /*nsegments*/1, /*maxsegz*/0x3ffff,
-			       /*flags*/BUS_DMA_ISA,
+			       /*flags*/0,
 			       /*lockfunc*/busdma_lock_mutex,
 			       /*lockarg*/&Giant,
 			       &dma_tag[chan]) != 0) {
Index: sys/bus_dma.h
===================================================================
RCS file: /pe/ncvs/src/sys/sys/bus_dma.h,v
retrieving revision 1.30
diff -u -r1.30 bus_dma.h
--- sys/bus_dma.h	3 Sep 2006 00:26:17 -0000	1.30
+++ sys/bus_dma.h	14 Mar 2008 16:17:17 -0000
@@ -101,7 +101,6 @@
  */
 #define	BUS_DMA_NOWRITE		0x100
 #define	BUS_DMA_NOCACHE		0x200
-#define	BUS_DMA_ISA		0x400	/* map memory for AXP ISA dma */
 
 /* Forwards needed by prototypes below. */
 struct mbuf;

----Next_Part(Fri_Mar_14_10_21_37_2008_059)----

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 16:37:05 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CD5D2106567F;
	Fri, 14 Mar 2008 16:37:05 +0000 (UTC) (envelope-from sam@freebsd.org)
Received: from ebb.errno.com (ebb.errno.com [69.12.149.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 9E1228FC1C;
	Fri, 14 Mar 2008 16:37:05 +0000 (UTC) (envelope-from sam@freebsd.org)
Received: from trouble.errno.com (trouble.errno.com [10.0.0.248])
	(authenticated bits=0)
	by ebb.errno.com (8.13.6/8.12.6) with ESMTP id m2EGClcF058175
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 14 Mar 2008 09:12:48 -0700 (PDT) (envelope-from sam@freebsd.org)
Message-ID: <47DAA3FF.9040906@freebsd.org>
Date: Fri, 14 Mar 2008 09:12:47 -0700
From: Sam Leffler <sam@freebsd.org>
Organization: FreeBSD Project
User-Agent: Thunderbird 2.0.0.9 (X11/20071125)
MIME-Version: 1.0
To: Robert Watson <rwatson@freebsd.org>
References: <200803131516.12284.jhb@freebsd.org>	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>	<20080313200839.S1091@desktop>	<20080314.003749.-432746071.imp@bsdimp.com>
	<20080314112104.I60466@fledge.watson.org>
In-Reply-To: <20080314112104.I60466@fledge.watson.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-DCC--Metrics: ebb.errno.com; whitelist
Cc: freebsd-arch@freebsd.org
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead
	of	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 16:37:06 -0000

Robert Watson wrote:
> On Fri, 14 Mar 2008, M. Warner Losh wrote:
>
>> I'd like to echo these sentiments.  We've generally been willing to 
>> accept code from vendors that makes their lives easier, even when 
>> that code doesn't directly benefit the project.  We do this on the 
>> theory that if we make their life easy, they will contribute to the 
>> project. Juniper has certainly given a large chunk of code to the 
>> project (a fairly complete MIPS port that has been integrated with 
>> the so-called "mips2" port and will be headed into the tree soonish), 
>> which is certainly a lot more code than has been given from vendors 
>> whom we've made much bigger accommodations to.
>>
>> In this case a vendor came forward with a patch that introduces no 
>> real additional burdon to the volunteers who are maintaining the 
>> code.  It seems like a no brainer to me to commit it.  There's 
>> certainly no compelling technical argument against it.
>
> I think (hope?) everyone here would generally agree on the point 
> regarding vendors.  However, I think there is a technical point being 
> made as well, and we're at risk of losing track of it.
>
> Koshy has pointed out that changing just the kernel parts is 
> *insufficient* to remove the assumption of non-sparse CPU identifiers, 
> because the kernel parts are not all there is to hwpmc.  The 
> KASSERT()s document not just the assumptions of the kernel code, which 
> are updated by the proposed patch, but also relate to the guarantees 
> made by the user APIs for hwpmc libraries, tools, and documentation.  
> They are directly affected by the proposed change because they both 
> expose and rely on the non-sparse CPU identifier assumption, and also 
> need to be updated to reflect the changed assumption.
>
> FWIW, we should reemphasize here that sparse CPU identifiers, although 
> not all that well-supported by our kernel, do exist and function today 
> on all the SMP architectures that we support.  The hyperthreading 
> disable frob introduced a few years ago leads to sparse identifiers 
> for live CPUs on i386 and amd64, and triggered problems in several 
> pieces of code (now believed to mostly be resolved?).  We do need a 
> better general infrastructure for handling CPU information, and the 
> cpuset(2) API starts to address this.  I understand that a man page 
> for this will materialize soon :-).
>
> Still missing, and something to discuss in detail at the devsummit 
> since it will require non-trivial architectural changes, is how to 
> handle live CPU reconfiguration, which is increasingly relevant due to 
> hypervisor-driven virtualization.  It became rapidly clear when the 
> HTT frob was a run-time changeable sysctl (no longer true, I hope) 
> that changing the set of "absent" CPUs at run time caused our kernel 
> to behave in relatively catastrophic ways, and should be avoided, and 
> that's just a hint in the direction of the changes we'll need to make 
> to fully support hotplug.  Universal support for sparse CPU 
> identifiers throughout the system is just one prerequisite for getting 
> to hotplug.
>

hwpmc is a useful tool and needs to be improved.  It appears there are 
multiple groups/people interested in doing that and we need to leverage 
that, not discourage it (given the rate of progress on the existing 
implementation I can only guess it's too much work for one individual).  
Getting the kernel changes in will allow other work to go on in parallel 
and doesn't appear to impact any existing usage.

Please commit these changes and let's move on.

    Sam


From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 16:45:54 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2E8381065673
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 16:45:54 +0000 (UTC)
	(envelope-from gnn@neville-neil.com)
Received: from proxy.meer.net (proxy.meer.net [64.13.141.13])
	by mx1.freebsd.org (Postfix) with ESMTP id 1D1958FC1A
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 16:45:53 +0000 (UTC)
	(envelope-from gnn@neville-neil.com)
Received: from outbound0.mx.meer.net (outbound0.mx.meer.net [209.157.153.23])
	by proxy.meer.net (8.14.2/8.14.2) with ESMTP id m2EFt1wO041774
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 08:55:06 -0700 (PDT)
	(envelope-from gnn@neville-neil.com)
Received: from mail.meer.net (mail.meer.net [209.157.152.14])
	by outbound0.mx.meer.net (8.12.10/8.12.6) with ESMTP id m2EFh0iJ000580; 
	Fri, 14 Mar 2008 07:43:34 -0800 (PST)
	(envelope-from gnn@neville-neil.com)
Received: from mail2.meer.net (mail2.meer.net [64.13.141.16])
	by mail.meer.net (8.13.3/8.13.3/meer) with ESMTP id m2EFgnxd078983;
	Fri, 14 Mar 2008 08:42:49 -0700 (PDT)
	(envelope-from gnn@neville-neil.com)
Received: from minion.myhome.westell.com.neville-neil.com
	(209-45-135-131.dia.static.qwest.net [209.45.135.131])
	(authenticated bits=0)
	by mail2.meer.net (8.14.1/8.14.1) with ESMTP id m2EFgmI4029698;
	Fri, 14 Mar 2008 08:42:49 -0700 (PDT)
	(envelope-from gnn@neville-neil.com)
Date: Fri, 14 Mar 2008 11:42:48 -0400
Message-ID: <m263vp9urr.wl%gnn@neville-neil.com>
From: gnn@freebsd.org
To: Robert Watson <rwatson@freebsd.org>
In-Reply-To: <20080314112104.I60466@fledge.watson.org>
References: <200803131516.12284.jhb@freebsd.org>
	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
	<20080313200839.S1091@desktop>
	<20080314.003749.-432746071.imp@bsdimp.com>
	<20080314112104.I60466@fledge.watson.org>
User-Agent: Wanderlust/2.15.5 (Almost Unreal) SEMI/1.14.6 (Maruoka)
	FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.7 Emacs/22.1.50
	(i386-apple-darwin8.10.1) MULE/5.0 (SAKAKI)
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset=US-ASCII
X-Bayes-Prob: 0.5 (Score 0)
X-Spam-Score: 0.70 () [Tag at 5.00] COMBINED_FROM,NO_REAL_NAME
X-CanItPRO-Stream: default
X-Canit-Stats-ID: 44615 - 8278e499340a
X-Scanned-By: CanIt (www . roaringpenguin . com) on 64.13.141.13
Cc: freebsd-arch@freebsd.org
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead
	of	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 16:45:54 -0000

At Fri, 14 Mar 2008 11:40:04 +0000 (GMT),
rwatson wrote:
> 
> On Fri, 14 Mar 2008, M. Warner Losh wrote:
> 
> > I'd like to echo these sentiments.  We've generally been willing to accept 
> > code from vendors that makes their lives easier, even when that code doesn't 
> > directly benefit the project.  We do this on the theory that if we make 
> > their life easy, they will contribute to the project. Juniper has certainly 
> > given a large chunk of code to the project (a fairly complete MIPS port that 
> > has been integrated with the so-called "mips2" port and will be headed into 
> > the tree soonish), which is certainly a lot more code than has been given 
> > from vendors whom we've made much bigger accommodations to.
> >
> > In this case a vendor came forward with a patch that introduces no real 
> > additional burdon to the volunteers who are maintaining the code.  It seems 
> > like a no brainer to me to commit it.  There's certainly no compelling 
> > technical argument against it.
> 
> I think (hope?) everyone here would generally agree on the point
> regarding vendors.  However, I think there is a technical point
> being made as well, and we're at risk of losing track of it.
> 
> Koshy has pointed out that changing just the kernel parts is
> *insufficient* to remove the assumption of non-sparse CPU
> identifiers, because the kernel parts are not all there is to hwpmc.
> The KASSERT()s document not just the assumptions of the kernel code,
> which are updated by the proposed patch, but also relate to the
> guarantees made by the user APIs for hwpmc libraries, tools, and
> documentation.  They are directly affected by the proposed change
> because they both expose and rely on the non-sparse CPU identifier
> assumption, and also need to be updated to reflect the changed
> assumption.
> 
> FWIW, we should reemphasize here that sparse CPU identifiers,
> although not all that well-supported by our kernel, do exist and
> function today on all the SMP architectures that we support.  The
> hyperthreading disable frob introduced a few years ago leads to
> sparse identifiers for live CPUs on i386 and amd64, and triggered
> problems in several pieces of code (now believed to mostly be
> resolved?).  We do need a better general infrastructure for handling
> CPU information, and the cpuset(2) API starts to address this.  I
> understand that a man page for this will materialize soon :-).
> 
> Still missing, and something to discuss in detail at the devsummit
> since it will require non-trivial architectural changes, is how to
> handle live CPU reconfiguration, which is increasingly relevant due
> to hypervisor-driven virtualization.  It became rapidly clear when
> the HTT frob was a run-time changeable sysctl (no longer true, I
> hope) that changing the set of "absent" CPUs at run time caused our
> kernel to behave in relatively catastrophic ways, and should be
> avoided, and that's just a hint in the direction of the changes
> we'll need to make to fully support hotplug.  Universal support for
> sparse CPU identifiers throughout the system is just one
> prerequisite for getting to hotplug.

Just to jump in on this quickly.  I'm looking at the patches and at
hwpmc in general and I'll try to massage all of this stuff together so
that we can get this up and running on the newer processors.  So, if
people have patches out there please post links and/or email them to
me, and I'll review them and get them reviewed and try to get them
into the tree.  I think everyone agrees that we want hwpmc to keep
advancing with newer chips as it's one of the tools we have to really
understand and improve the performance of our systems.

Best,
George


From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 17:03:09 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3193D106567E
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 17:03:09 +0000 (UTC)
	(envelope-from joseph.koshy@gmail.com)
Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.157])
	by mx1.freebsd.org (Postfix) with ESMTP id C18298FC2B
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 17:03:08 +0000 (UTC)
	(envelope-from joseph.koshy@gmail.com)
Received: by fg-out-1718.google.com with SMTP id 16so3471540fgg.35
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 10:03:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=x9v6+hcDWGzACBl922H/9lsq2ACxLLGM0IN+cVAX/IQ=;
	b=Di6CEHSd5RqxBF/Mf8ewrH9MDsZY2BCJixcE+QC7YB/7jf2mMUU23T9G1ITdxCwyJHBoe9yUVdb/gBOiUqimtF/Nq7GjUeNXyqmnOvaGXc95bqBb1iHQnYIxh+2+68Xon42+sLPL6sDNOJm4jpZuo8gHeivhiMrFh5gs4toP/AQ=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=NsVTw5nH7yXSvqLyYHCuIc8ENpl4LleuTjAwAXFNKRsaV74YZ4EyKEmkRg7Q/+YnpOiaKSDWtVO8rkgEhh2AzvwQPVNmZA8Bbmal1I1+BMHPPoMj9VmMr0BFn58doMTkU3s5AQ5Gy1kukWSx3vUSvaZaua8PnvRQujjAgpVMmUA=
Received: by 10.82.107.15 with SMTP id f15mr26999518buc.39.1205514187158;
	Fri, 14 Mar 2008 10:03:07 -0700 (PDT)
Received: by 10.86.99.18 with HTTP; Fri, 14 Mar 2008 10:03:07 -0700 (PDT)
Message-ID: <84dead720803141003p386f10e3y9f0a8aeceada53c4@mail.gmail.com>
Date: Fri, 14 Mar 2008 22:33:07 +0530
From: "Joseph Koshy" <joseph.koshy@gmail.com>
To: "Robert Watson" <rwatson@freebsd.org>
In-Reply-To: <20080314112104.I60466@fledge.watson.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <200803131516.12284.jhb@freebsd.org>
	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
	<20080313200839.S1091@desktop>
	<20080314.003749.-432746071.imp@bsdimp.com>
	<20080314112104.I60466@fledge.watson.org>
Cc: freebsd-arch@freebsd.org
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of
	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 17:03:09 -0000

rw>  Koshy has pointed out that changing just the kernel parts is
*insufficient* to
rw>  remove the assumption of non-sparse CPU identifiers, because the
kernel parts
rw>  are not all there is to hwpmc.  The KASSERT()s document not just the
rw>  assumptions of the kernel code, which are updated by the proposed
patch, but
rw>  also relate to the guarantees made by the user APIs for hwpmc libraries,
rw>  tools, and documentation.  They are directly affected by the
proposed change
rw>  because they both expose and rely on the non-sparse CPU
identifier assumption,
rw>  and also need to be updated to reflect the changed assumption.

Thank you Robert, for keeping the focus on the technical issues.

Regards,
Koshy

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 17:19:55 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F3DAD1065671
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 17:19:54 +0000 (UTC)
	(envelope-from joseph.koshy@gmail.com)
Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.152])
	by mx1.freebsd.org (Postfix) with ESMTP id 92E1C8FC29
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 17:19:54 +0000 (UTC)
	(envelope-from joseph.koshy@gmail.com)
Received: by fg-out-1718.google.com with SMTP id 16so3476846fgg.35
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 10:19:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=1GQB49w7QBBldtGyg3khojJmKlhpnM00zt0duzLXKR0=;
	b=suDXvdqi75c33kTMr5S3MQdNBMlkDFKucy0rkgzbZ0IOEraUmMU5yLtLejgt3DPXx446az6G7GJhHWbJWqrpeAsSobFgRFpUJWaOQM8yfYNQe5oOaXD0Ei4wqSFaW4iVuoJceFw1cBjSLzdZcumahxxxpmiZQEf2PApzl1NW3hk=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=D3VHdqMcdFxbdVErspSqNmmhz2Iy8WxxQ4fdnSU0qxlss5ldx0tjFvLS+KlQxgjca75sm9BPX5IUf6hldmeGXBSqIU8dtv4T48jpsWo0XHdmBCDyn//9BDwqDVIW9KeYyn+fQXK6EwZM/YTyrnTSul++415fpgpQ1OXRGd9RguE=
Received: by 10.86.36.11 with SMTP id j11mr2740740fgj.5.1205515192945;
	Fri, 14 Mar 2008 10:19:52 -0700 (PDT)
Received: by 10.86.99.18 with HTTP; Fri, 14 Mar 2008 10:19:52 -0700 (PDT)
Message-ID: <84dead720803141019j5b3d6cbfyf23583596ba97f88@mail.gmail.com>
Date: Fri, 14 Mar 2008 22:49:52 +0530
From: "Joseph Koshy" <joseph.koshy@gmail.com>
To: "Jeff Roberson" <jroberson@chesapeake.net>
In-Reply-To: <20080313200839.S1091@desktop>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20080313180805.GA83406@dragon.NUXI.org>
	<200803131516.12284.jhb@freebsd.org>
	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
	<20080313200839.S1091@desktop>
Cc: freebsd-arch@freebsd.org
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of
	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 17:19:55 -0000

jr>  In general we accept vendor patches that are not disruptive even in the
jr>  case that the general communit doesn't perceive the real value.  It is
jr>  important for us to work with and encourage vendors.

Well thats ok, but we need to keep the quality bar too and 'do the
right thing'.

jr>  We're not asking you to support the feature.  It looks like juniper
jr>  already has it tested and working.  We just need someone to review the
jr>  patches and commit them.

The patch offers userland a way to get the kernel to schedule threads
on non-existent CPUS.

So I'm curious to know how it was 'tested' in Juniper.

As for support, I'm the one currently answering questions and fielding
the bug reports about PmcTools.

>  The majority of the kernel already deals with sparse cpu mappings.  That's
>  why we have CPU_ABSENT().  Please look at UMA and ULE for examples of code
>  that I have written which use this macro correctly.  I'm sure there are
>  other places that do as well that I'm not familiar with.

Yes, I suggested changes to kern_pmc.c that use CPU_ABSENT().

>  The kernel has the various cpumasks available in sys/smp.h.  Userland can
>  now use cpusets to find out what processors are available to it.  In the
>  future we are going to replace simple cpumasks with the cpuset_t structure
>  from cpusets so on machines that support more than sizeof(register) * 8
>  processors we will use arrays.

Ok, will read up about cpusets.  A manual page would help.

>  > - How will userland  distinguish between absent CPUs those that
>  >  could be temporarily administratively disabled?
>
>  We don't presently make the distinction to the user.

Ok, we can treat them both as 'missing'.  HWPMC cannot deal with CPUs
that come and go though.

>  The rest of the generic code in the kernel already supports this.

The MD layers need to catch up then?

> Juniper claims to have tested and is using this feature.

Define 'tested'.

> Furthermore, it will get  us a tiny step closer to being able to support
> pluggable cpus in a  virtualized environment.

Ok, but that isn't really relevant to HWPMC.   Virtualized environments do
not usually emulate PMCs.

Thanks,
Koshy

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 17:25:46 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C3FF41065676
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 17:25:46 +0000 (UTC)
	(envelope-from joseph.koshy@gmail.com)
Received: from fk-out-0910.google.com (fk-out-0910.google.com [209.85.128.190])
	by mx1.freebsd.org (Postfix) with ESMTP id 639198FC1A
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 17:25:46 +0000 (UTC)
	(envelope-from joseph.koshy@gmail.com)
Received: by fk-out-0910.google.com with SMTP id b27so4832723fka.11
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 10:25:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=CscYvJ7SzSBVOF9WNLp9IFPwJvaMAefAGMEWo1LiWJA=;
	b=ksylA5HL0GDCwoeww210NW8tcI49rtUS0B7psMDdMFQcrY4daY1uYALKRr7LH4F+2VKwwHa+/jfOJPOMEgrDWMgPB9Ge3D1/Hr4FSVtoS/yk+PimgePR4NKAXTHP1aHBbd0F8f3cjUaQJuC//4MqOoqGGsewX3pFV+R1t8cu1Vo=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=YzpTxW0qbvP+oVCeAX2flOxpuH9gV9OhQ/r58G8FLQUhRbWhPHCdf4fdSCsMA2axxiDH3KRWkEA7BWM311HEWzsjO6DCbCm5kZrbyf+1Yam9hXQ2mNVrio4dQndWUwQkOAI7Vd8eMAGIn48SuZ0Fo7HY/4dtxgQ2Tk6lK3v8QCw=
Received: by 10.82.113.6 with SMTP id l6mr27066281buc.20.1205515544570;
	Fri, 14 Mar 2008 10:25:44 -0700 (PDT)
Received: by 10.86.99.18 with HTTP; Fri, 14 Mar 2008 10:25:44 -0700 (PDT)
Message-ID: <84dead720803141025y543da4d6r2f91a5db1bcf2e34@mail.gmail.com>
Date: Fri, 14 Mar 2008 22:55:44 +0530
From: "Joseph Koshy" <joseph.koshy@gmail.com>
To: gnn@freebsd.org
In-Reply-To: <m263vp9urr.wl%gnn@neville-neil.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <200803131516.12284.jhb@freebsd.org>
	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
	<20080313200839.S1091@desktop>
	<20080314.003749.-432746071.imp@bsdimp.com>
	<20080314112104.I60466@fledge.watson.org>
	<m263vp9urr.wl%gnn@neville-neil.com>
Cc: Robert Watson <rwatson@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of
	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 17:25:46 -0000

>  Just to jump in on this quickly.  I'm looking at the patches and at
>  hwpmc in general and I'll try to massage all of this stuff together so
>  that we can get this up and running on the newer processors.

FYI, here is documentation about how to go about adding new PMC support:

  http://wiki.freebsd.org/PmcTools/PmcHardwareHowTo

Regards,
Koshy

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 18:32:18 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DD3741065670
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 18:32:18 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219])
	by mx1.freebsd.org (Postfix) with ESMTP id 677E38FC15
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 18:32:18 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (unverified [66.23.211.162]) 
	by speedfactory.net (SurgeMail 3.8s) with ESMTP id 235507521-1834499 
	for multiple; Fri, 14 Mar 2008 14:30:29 -0400
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2EIW7e8043494;
	Fri, 14 Mar 2008 14:32:12 -0400 (EDT) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Fri, 14 Mar 2008 14:13:44 -0400
User-Agent: KMail/1.9.7
References: <20080314.102137.-2034679600.imp@bsdimp.com>
In-Reply-To: <20080314.102137.-2034679600.imp@bsdimp.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803141413.44367.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Fri, 14 Mar 2008 14:32:13 -0400 (EDT)
X-Virus-Scanned: ClamAV 0.91.2/6232/Fri Mar 14 12:43:44 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: 
Subject: Re: BUS_DMA_ISA unused, planning on removing
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 18:32:19 -0000

On Friday 14 March 2008 12:21:37 pm M. Warner Losh wrote:
> Greetings,
> 
> It appears that BUS_DMA_ISA is unused:
> 
> find . -name \*.c -o -name \*.h | xargs egrep BUS_DMA_ISA
> ./ia64/isa/isa_dma.c:                          /*flags*/BUS_DMA_ISA,
> ./sys/bus_dma.h:#define BUS_DMA_ISA             0x400   /* map memory for 
AXP ISA dma */
> 
> I talked to Marcel, and he's cool with removing it.  Can anybody see a
> reason not to GC this?

It was for Alpha and ia64 probably cut and pasted it.  (Alpha had a separate 
sort of IOMMU for ISA dma.)  You can axe it.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Fri Mar 14 18:32:42 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2A79A106566B
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 18:32:42 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219])
	by mx1.freebsd.org (Postfix) with ESMTP id C63F38FC19
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 18:32:41 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (unverified [66.23.211.162]) 
	by speedfactory.net (SurgeMail 3.8s) with ESMTP id 235507532-1834499 
	for multiple; Fri, 14 Mar 2008 14:30:33 -0400
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2EIW7e9043494;
	Fri, 14 Mar 2008 14:32:16 -0400 (EDT) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: "Joseph Koshy" <joseph.koshy@gmail.com>
Date: Fri, 14 Mar 2008 14:31:53 -0400
User-Agent: KMail/1.9.7
References: <20080313180805.GA83406@dragon.NUXI.org>
	<200803131516.12284.jhb@freebsd.org>
	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
In-Reply-To: <84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803141431.53846.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Fri, 14 Mar 2008 14:32:16 -0400 (EDT)
X-Virus-Scanned: ClamAV 0.91.2/6232/Fri Mar 14 12:43:44 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: freebsd-arch@freebsd.org
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of
	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Mar 2008 18:32:42 -0000

On Friday 14 March 2008 01:32:43 am Joseph Koshy wrote:
> On Fri, Mar 14, 2008 at 12:46 AM, John Baldwin <jhb@freebsd.org> wrote:
> > On Thursday 13 March 2008 02:08:05 pm David O'Brien wrote:
> >  > Hi folks,
> >  > Some folks at Juniper have submitted these changes to hwpmc(4).
> >  > I am sending them here for public review.
> >  >
> >  > Their thoughts are:
> >  >     The mp_ncpus refers to the count of the active CPU's.  Where as
> >  >     mp_maxid refers to the count of all the cpus on the SMP.  Using
> >  >     mp_ncpus in the cpu_id range-check of hwpmc module would lead to 
the
> >  >     assumption that all the active CPU's in the SMP are not 
interleaved.
> >  >     But for running on some platforms, the active and inactive cpus 
could
> >  >     be interleaved making hwpmc not work for the cpus whose cpu_id is
> >  >     greater than the active-cpu count.
> 
> jhb>  This is correct, but you need to handle CPUs that are absent.  It 
might be
> jhb>  sufficient to update pmc_cpu_is_disabled() in kern_pmc.c to check
> jhb>  CPU_ABSENT(cpu) and claim the CPU is disabled if it is absent, but I'm 
not
> jhb>  sure that will catch everything as that seems aimed at handling having 
a
> jhb>  non-absent CPU halted (such as disabling HTT on i386).
> 
> That is inline with the feedback (and sample patch to kern_pmc.c) that I
> had sent in to O'Brien.
> 
> But there are other problems with the patch at various levels,
> probably not obvious to someone who is just looking at the kernel
> code.
> 
> First, the relevance.  My understanding is that these changes are for
> a proprietary SMP platform that uses non-mainstream (Tier3 or
> Tier4) CPUs.  It so happens that Juniper decided to numbers CPUs
> 'sparsely' in their kernel variant and that is the motivation for this
> patch.
> 
> IMO, as a policy, code changes for exotic hardware need to be
> maintained by vendors of said exotic hardware and not dumped on
> volunteers.

I would respond with two things:

1) I commited an overhaul of the x86 new-bus code to make it easier 
for "exotic" embedded x86 hardware platforms in use at companies such as 
NetApp to hook into new-bus more cleanly.  By making it easier for companies 
to use FreeBSD we a) make it possible for them to even consider using 
FreeBSD, and b) for companies that use FreeBSD and devote resources 
(employees) to working on FreeBSD, those resources (e.g. grehan@ at NetApp) 
can spend more of their time working on stuff that might be able to given 
back to FreeBSD than coming up with hacks to work around deficiencies in 
FreeBSD.

2) All the sparse CPU stuff actually dates back to 5.0 and was there to 
support Alpha which originally numbered the CPUs using the HWPRB CPU IDs 
which were not sparse at all.  (I think my DS20 has CPUs 6 and 7 or some 
such).  So this was actually done to support a Tier-1 plaform (at the time).

Also, note that the comments in sys/smp.h for CPU_ABSENT() and cpu_setmaxid() 
specifically refer to mp_maxid's purpose and the fact that sparse CPU ID sets 
are expected and should be handled by code in the kernel.

> Second, when I designed the PMCTools API I didn't consider that CPU
> numbers could be 'sparse'.  [They aren't sparsely allocated
> on the i386/amd64---the code I looked at when I was designing
> PmcTools.]  So there are assumptions sprinkled throughout userland
> that that the integers 0..hw.ncpus  can select a valid CPU.  While
> all that can be tracked down and changed, and documentation updated,
> it is still work that I would prefer to defer until there is a chance
> that someone
> in the general public can use it.  I do need to prioritize how I spend my
> volunteer hours.

FreeBSD has been trying to not be quite as i386-centric as it used to be.  If 
you look at other code in the kernel that handles per-cpu data such as UMA 
you will see that it uses mp_maxid and CPU_ABSENT().  There are other places 
in the kernel that are broken though (such as ndis(4)).

> Third, IFF we as a project are going to support 'sparse CPU numbering,
> I would like to see the form that takes before making changes to
> HWPMC and tools.  For example:
> - How will userland and in-kernel modules find out which CPUs are
>   physically present?   Would there be a bitmask on the lines of today's
>   machdep.hlt_cpus that we could query?  Could we make the
>   'all_cpus' bitmask visible to userland?  What happens when we
>   start supporting systems with more than 32 processors?

Yes, we can certainly export more stuff to userland.  The all_cpus mask would 
be good as would a MI online_cpus mask, though at this point they would be 
cpusets to handle > 32 rather than cpumask_t.  Note that machdep.hlt_cpus is 
x86-only and would be superseded by a MI online_cpus mask.

> - Will sysctl hw.ncpus represent the count of present CPUs or will it
>   represent the maximum CPU id?

hw.ncpus is always mp_ncpus
kern.smp.cpus is also mp_ncpus
kern.smp.maxcpus is MAX_CPUS.

Userland can just iterate from 0 to kern.smp.maxcpus while handling absent 
CPUs.  (For example, the kern.cp_time[] sysctl just writes out all 0's for 
absent CPUs so that is how userland can determine an absent CPU in that 
case.)

> - How will userland  distinguish between absent CPUs those that
>   could be temporarily administratively disabled?

See above re: all_cpus and online_cpus cpu sets.

> - Are we going to support 'transient' CPUs [that come and go]?  Why
>   would we want sparse CPU numbering otherwise?

Yes.

> Nit: 'mp_maxid' appears to be an index, not a count as claimed above.

Correct, and documented as such in sys/smp.h.

> If support for sparse CPU numbering is something useful, I feel the
> correct sequence should be to discuss it here, add sparse CPU
> numbering to the base i386/amd64 kernels (say) first and then
> propagate the feature to auxiliary code like HWPMC and userland.
> 
> Changing HWPMC and its userland before the base kernel itself
> changes does not seem to be the right thing to do.

While the userland interface is somewhat lacking, all of the in-kernel 
infrastructure has been in place for at least the past 4 years, and there is 
no excuse for any in-kernel code not properly handling sparse CPU IDs.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Sat Mar 15 05:43:02 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 190D31065673
	for <freebsd-arch@freebsd.org>; Sat, 15 Mar 2008 05:43:02 +0000 (UTC)
	(envelope-from joseph.koshy@gmail.com)
Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.158])
	by mx1.freebsd.org (Postfix) with ESMTP id 814BC8FC23
	for <freebsd-arch@freebsd.org>; Sat, 15 Mar 2008 05:43:01 +0000 (UTC)
	(envelope-from joseph.koshy@gmail.com)
Received: by fg-out-1718.google.com with SMTP id 16so3696780fgg.35
	for <freebsd-arch@freebsd.org>; Fri, 14 Mar 2008 22:43:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=1LDe7embEZsf2PAgnps0sas/wEp08h11p+VgJvmqArs=;
	b=Nh2/gSdvSQ7UetRjd3jhNFO65Pe0X9ZHuGgOfOPWxy8ncDkLelbIMVcVpMcxvGcYg1BePkr7SxuUYysJW+tFEsvLFUbL/edAGsy2SIz5+kX6Gal6B2b0aKAYkYWRnKjhsc6ZRwcolo3t9G/6ax+RD0U8TCUlmix9jGMtdwuxnmU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=esk+2crhO2k4Ut9bpuCn2sMeVD2x7ntzexv2PRAWLjxNd2U/HO3O8lsuzWv5glkWTcpSkVc0vjbxGQfYrLxFi6YI0M3w7UsR9RmEJ6Wns+eXCH5/d5gOHxNFlqSj0mtZvu76AReTLax1BNxJA/It+nQ3a+teoOh6NweFm8oQSFk=
Received: by 10.86.26.11 with SMTP id 11mr11266492fgz.74.1205559780337;
	Fri, 14 Mar 2008 22:43:00 -0700 (PDT)
Received: by 10.86.99.18 with HTTP; Fri, 14 Mar 2008 22:43:00 -0700 (PDT)
Message-ID: <84dead720803142243r6c8cc68dm325e7fb925189fd@mail.gmail.com>
Date: Sat, 15 Mar 2008 11:13:00 +0530
From: "Joseph Koshy" <joseph.koshy@gmail.com>
To: "John Baldwin" <jhb@freebsd.org>
In-Reply-To: <200803141431.53846.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20080313180805.GA83406@dragon.NUXI.org>
	<200803131516.12284.jhb@freebsd.org>
	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
	<200803141431.53846.jhb@freebsd.org>
Cc: freebsd-arch@freebsd.org
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of
	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Mar 2008 05:43:02 -0000

>  FreeBSD has been trying to not be quite as i386-centric as it used to be.  If
>  you look at other code in the kernel that handles per-cpu data such as UMA
>  you will see that it uses mp_maxid and CPU_ABSENT().  There are other places
>  in the kernel that are broken though (such as ndis(4)).

HWPMC is very x86 centric, for obvious reasons.

>  Yes, we can certainly export more stuff to userland.  The all_cpus mask would
>  be good as would a MI online_cpus mask, though at this point they would be
>  cpusets to handle > 32 rather than cpumask_t.  Note that machdep.hlt_cpus is
>  x86-only and would be superseded by a MI online_cpus mask.

Sure, an MI counter is a good idea.

>  > - Will sysctl hw.ncpus represent the count of present CPUs or will it
>  >   represent the maximum CPU id?
>
>  hw.ncpus is always mp_ncpus
>  kern.smp.cpus is also mp_ncpus
>  kern.smp.maxcpus is MAX_CPUS.

>  Userland can just iterate from 0 to kern.smp.maxcpus while handling absent
>  CPUs.  (For example, the kern.cp_time[] sysctl just writes out all 0's for
>  absent CPUs so that is how userland can determine an absent CPU in that
>  case.)

I thought of that.   For PMCTools use, using the proposed 'online_cpus' mask
would be a better option.   MAX_CPUS is a compile time value and could be
large, whereas most machines will have far fewer CPUs than that limit.
Why waste cycles needlessly?

Now it appears to me that in the scheme of things described
above one of mp_maxid and mp_ncpus is superfluous.

Here is the reasoning:

0) We need a compile time limit for the kernel; this is kern.smp.maxcpus.

1) A given machine has a maximum number of CPUs that can fit in it.
   This is usually <<= MAXCPUS.     Let us call this {MACHINE-MAX}.
   We need to scale kernel data structures based on {MACHINE-MAX}
   since using {MAXCPUS} is probably wasteful.  We cannot just count the
   current number of CPUS, as we do today, because more could be
   hotplugged in later.

2) At any given instant a subset of CPUs 0..{MACHINE_MAX} will be
    online. This would be tracked by the kern.smp.online_cpus/all_cpus
    bitmask.

Therefore we can use either a count (mp_ncpus) or a maximum id
(mp_maxid) to represent {MACHINE-MAX}, but  either one would do.

However, x86 MD code uses both, with newer code seeming to prefer
mp_maxid.  So I am puzzled.  There are far more uses of mp_ncpus
there though.

jk> Changing HWPMC and its userland before the base kernel itself
jk> changes does not seem to be the right thing to do.

jb>  While the userland intIerface is somewhat lacking, all of the in-kernel
jb>  infrastructure has been in place for at least the past 4 years,
and there is
jb>  no excuse for any in-kernel code not properly handling sparse CPU IDs.

I try keep userland, kernel and documentation associated with PmcTools
in sync.

Looking around, there appear to be lots of nits that need correction.
For one,  the kern.smp sysctl hierarchy is undocumented.


Thanks,
Koshy

From owner-freebsd-arch@FreeBSD.ORG  Sat Mar 15 10:43:23 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4B1A21065674
	for <freebsd-arch@freebsd.org>; Sat, 15 Mar 2008 10:43:23 +0000 (UTC)
	(envelope-from hselasky@c2i.net)
Received: from swip.net (mailfe05.swip.net [212.247.154.129])
	by mx1.freebsd.org (Postfix) with ESMTP id 5D97D8FC26
	for <freebsd-arch@freebsd.org>; Sat, 15 Mar 2008 10:43:22 +0000 (UTC)
	(envelope-from hselasky@c2i.net)
X-Cloudmark-Score: 0.000000 []
Received: from [62.113.132.89] (account mc467741@c2i.net [62.113.132.89]
	verified) by mailfe05.swip.net (CommuniGate Pro SMTP 5.1.13)
	with ESMTPA id 751509790; Sat, 15 Mar 2008 10:43:18 +0100
From: Hans Petter Selasky <hselasky@c2i.net>
To: freebsd-arch@freebsd.org
Date: Sat, 15 Mar 2008 10:44:23 +0100
User-Agent: KMail/1.9.7
References: <86ve4s9357.fsf@ds4.des.no> <47B3EB4E.40508@elischer.org>
	<20080217.113340.390436320.imp@bsdimp.com>
In-Reply-To: <20080217.113340.390436320.imp@bsdimp.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803151044.25764.hselasky@c2i.net>
Cc: des@des.no, julian@elischer.org, ed@fxq.nl
Subject: Re: Proposal for redesigning the TTY layer
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Mar 2008 10:43:23 -0000

Hi Ed,

Just some ideas:

Maybe you can add some more functionality to the TTY layer so that it becomes 
symmetric with regard to Host and Device side. For example that you can send 
a "RING" to a modem, and not only receive a "RING." This can be very 
interesting for embedded products where you want to emulate a modem through 
an USB device side driver. The official FreeBSD USB stack does not support 
device side drivers, but the one in P4 does.

--HPS

From owner-freebsd-arch@FreeBSD.ORG  Sat Mar 15 12:40:10 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 43AEA1065688
	for <arch@freebsd.org>; Sat, 15 Mar 2008 12:40:10 +0000 (UTC)
	(envelope-from ed@hoeg.nl)
Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211])
	by mx1.freebsd.org (Postfix) with ESMTP id 035928FC17
	for <arch@freebsd.org>; Sat, 15 Mar 2008 12:40:09 +0000 (UTC)
	(envelope-from ed@hoeg.nl)
Received: by palm.hoeg.nl (Postfix, from userid 1000)
	id 46E6C1CC44; Sat, 15 Mar 2008 13:40:08 +0100 (CET)
Date: Sat, 15 Mar 2008 13:40:08 +0100
From: Ed Schouten <ed@80386.nl>
To: FreeBSD Arch <arch@freebsd.org>
Message-ID: <20080315124008.GF80576@hoeg.nl>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="3M7QbeJEF900HlmX"
Content-Disposition: inline
User-Agent: Mutt/1.5.17 (2007-11-01)
Cc: 
Subject: vgone() calling VOP_CLOSE() -> blocked threads?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Mar 2008 12:40:10 -0000


--3M7QbeJEF900HlmX
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello everyone,

The last couple of days I'm seeing some strange things in my mpsafetty
branch related to terminal revocation.

In my current TTY design, I hold a count (t_ldisccnt) of the amount of
threads that are sleeping in the line discipline. I need to store such a
count, because it's not possible to change line disciplines while some
threads are still blocked inside the discipline. This means that when
d_close() is called on a TTY, t_ldisccnt should always be 0. There
cannot be any threads stuck inside the line discipline when there aren't
any descriptors referencing it.

Unfortunately, this isn't entirely true with the current VFS/devfs
design. When vgone() is called, a VOP_CLOSE() is performed , which means
there could be a dozen threads still stuck inside a device driver, but
the close routine is already called to clean up stuff. There are a
*real* lot of drivers that blindly clean up their stuff in the d_close()
routine, expecting that the device is completely unused. This can
easily be demonstrated by revoking a bpf device, while running tcpdump.

To be honest, I'm not completely sure how to solve this issue, though I
know it should at least do something similar to this:

- The device driver should have a seperate routine (d_revoke) to wake
  up any blocked threads, to make sure they leave the device driver
  properly.

- Maybe vgonel() shouldn't call VOP_CLOSE(). It should probably move the
  vnode into deadfs, with the exception of the close() routine. Maybe
  it's better to add a new function to do this, vrevoke().

This means that when a revoke() call is performed, all blocked threads
are woken up, will leave the driver, to find out their terminal has been
revoked. Further system calls will fail, because the vnode is in deadfs,
but when the processes close the descriptor, the device driver can still
clean up everything.

In theory these changes would also make it easier for other filesystems
to support the revoke() call. A generic vop_revoke could just call
vrevoke(), which means the current system calls aren't interrupted, but
calls later on will fail. This will be sufficient for most filesystems.

I'm not a VFS guru, so it will probably take me some time and will
probably dogfood some of my filesystems. I could probably need some
help. ;-)

--=20
 Ed Schouten <ed@fxq.nl>
 WWW: http://g-rave.nl/

--3M7QbeJEF900HlmX
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (FreeBSD)

iEYEARECAAYFAkfbw6gACgkQ52SDGA2eCwWHvQCeP/wk8sTFNFsgKM2kdVhGN6PS
3zQAniPruoouxd1GnjDDq6al+rWk+pBb
=zwA/
-----END PGP SIGNATURE-----

--3M7QbeJEF900HlmX--

From owner-freebsd-arch@FreeBSD.ORG  Sat Mar 15 16:55:38 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A27F91065675
	for <arch@freebsd.org>; Sat, 15 Mar 2008 16:55:38 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail16.syd.optusnet.com.au (mail16.syd.optusnet.com.au
	[211.29.132.197])
	by mx1.freebsd.org (Postfix) with ESMTP id 331C28FC1F
	for <arch@freebsd.org>; Sat, 15 Mar 2008 16:55:37 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au
	(c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11])
	by mail16.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	m2FGtIjX022004
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 16 Mar 2008 03:55:21 +1100
Date: Sun, 16 Mar 2008 03:55:18 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Ed Schouten <ed@80386.nl>
In-Reply-To: <20080315124008.GF80576@hoeg.nl>
Message-ID: <20080316015903.N39516@delplex.bde.org>
References: <20080315124008.GF80576@hoeg.nl>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: FreeBSD Arch <arch@freebsd.org>
Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Mar 2008 16:55:38 -0000

On Sat, 15 Mar 2008, Ed Schouten wrote:

> The last couple of days I'm seeing some strange things in my mpsafetty
> branch related to terminal revocation.
>
> In my current TTY design, I hold a count (t_ldisccnt) of the amount of
> threads that are sleeping in the line discipline. I need to store such a
> count, because it's not possible to change line disciplines while some
> threads are still blocked inside the discipline. This means that when
> d_close() is called on a TTY, t_ldisccnt should always be 0. There
> cannot be any threads stuck inside the line discipline when there aren't
> any descriptors referencing it.
>
> Unfortunately, this isn't entirely true with the current VFS/devfs
> design. When vgone() is called, a VOP_CLOSE() is performed , which means
> there could be a dozen threads still stuck inside a device driver, but
> the close routine is already called to clean up stuff. There are a
> *real* lot of drivers that blindly clean up their stuff in the d_close()
> routine, expecting that the device is completely unused. This can
> easily be demonstrated by revoking a bpf device, while running tcpdump.

Yes, most drivers are broken here, but the problem is rarely noticed
because revoke() isn't normally applied to any devices except ttys.
Even ordinary close() can cause problems when a thread is sleeping
in device open, but this too is only common for ttys (for callin and
callout devices).

The tty driver is about the only driver that handles this problem
almost correctly.  It uses a generation count.  All tty drivers are
supposed to sleep using only ttysleep().  ttysleep() checks the
generation count and returns ERESTART if the generation is new.  All
tty drivers should consider this error to be fatal and propagate it
up to the syscall level where the syscall is restarted.  This tends
to happen naturally, but some places (in device close IIRC), the driver
ignores the error and does more i/o (to finish cleaning up in close
-- close and open can easily pass each other and clobber each others
state when this happens).  More I/O also tends to occur if a revoke()
happens when a thread is blocked but not sleeping.  Then ttysleep()
isn't in sight, so the thread has no idea that the generation count
changed.  Giant locking limits this problem.

> To be honest, I'm not completely sure how to solve this issue, though I
> know it should at least do something similar to this:
>
> - The device driver should have a seperate routine (d_revoke) to wake
>  up any blocked threads, to make sure they leave the device driver
>  properly.

Something is needed to signal blocked but non-sleeping threads.  I
think the wakeup for ttys now normally occurs as a side effect of
flushing i/o.  revoke() normally calls ttyclose() which calls ttyldclose()
which normally calls ttylclose() which flushes i/o which wakes up
threads waiting on the i/o.  I don't see how ttylclose() can work right
in the usual !FNONBLOCK case.  Maybe revoke() sets FNONBLOCK.  The
generation count stuff doesn't help here because the flush is done
before incrementing the generation count.

There is an obvious race here for threads doing i/o instead of waiting
for it.  These muse be blocked (by Giant now for ttys, or by your
MPSAFE locking).  They will run when revoke() releases the lock and
find the i/o flushed and maybe the generation count incrememented, but
they normally won't check these states and will just blunder on doing
more i/o.  It would be painful to check these states every time the
lock is aquired, but this seems to be necessary.  Magic Giant locking
makes the places where the lock is acquired hard see.

> - Maybe vgonel() shouldn't call VOP_CLOSE(). It should probably move the
>  vnode into deadfs, with the exception of the close() routine. Maybe
>  it's better to add a new function to do this, vrevoke().
>
> This means that when a revoke() call is performed, all blocked threads
> are woken up, will leave the driver, to find out their terminal has been
> revoked. Further system calls will fail, because the vnode is in deadfs,
> but when the processes close the descriptor, the device driver can still
> clean up everything.

I think vfs already moves the vnode to deadfs.  It doesn't do anything
to synchronize with threads running in device drivers.  The forced
last-close() should complete synchronously as part of revoke().  Then
other threads leave the device driver asynchronously, hopefully not
much later.  Then if the generation count stuff is working right, the
syscall is restarted, but now file descriptors point to deadfs so the
syscall normally fails.  I think the async completion is OK provided
it is done right (don't delay it indefinitely, and don't do more
i/o on completion).  It doesn't seem to be useful to make revoke()
wait for the completions.

I don't think it would work well to move everything except d_close to
deadfs.

Other problems near here:
- neither vfs nor drivers currently know how many threads are in a
   driver.  vfs uses vp->v_rdev->si_usecount, but this doesn't quite work
   since it doesn't count threads sleeping in open.  Maybe ones excuting
   last close too -- this would be more of a problem.  revoke() just
   uses vcount(), which just acquires the device locks and returns
   si_usecount after releasing the device lock.  (I don't understand
   this locking -- what stops the count changing after the lock is
   released, or if it cannot changed then why acquire the lock?)  This
   can result in revoke() not calling device close when it should.
   Drivers can obviously keep count of their activities using large
   code.  I can't see any way for vfs to keep count short of asking
   drivers for their counts.
- there can be any number of threads in device open and close concurrently,
   even without the complications for revoke().  The most problematic
   cases happen when last-close blocks, as is common for ttys waiting
   for output to drain (since no one cares about their output actually
   working and ensures draining it using tcdrain() -- normal losing
   programs finish up with something like printf(); exit(); and depend
   on the close() in exit(); blocking to drain the output).  Then new
   opens are allowed, and this is useful for doing ioctls() to unblocked
   blocked closes.  If the new open or fcntl sets non-blocking mode, then
   the last-close for the new open may pass the blocked last close.  If
   the new mode is blocking, then the last-close for the new open may block
   too.  The number of threads in last-close is thus unlimited.  A thundering
   herd of them tends to stomp on each other when they are all unblocked at
   the same time.

   The connections of this with revoke() are:
   - it takes vfs's not counting of all threads in the device driver to
     allow the useful behaviour of opens while a close is blocked and
     the necessary behaviour of last-close while another last-close is
     executing (drivers should be aware of this possibility and merge
     the closes, but don't).
   - I think revoke() sets FNONBLOCK somewhere.  Thus it tends to unblock
     any thread waiting in last-close for output to drain.

   Less problematic cases occur when opens block.  ttyopen() understands
   this possibility and handles it almost right using its t_wopeners
   count.  ttyopen() uses various sleeps where it should use ttysleep()
   or check the generation count itself; this results in it looping
   internally instead of restarting the syscall, which is only a small
   error since for open() alone, restarting the syscall would call back
   to the same non-dead device open except in unusual cases where there
   was a signal and syscalls are not restarted, or the device name went
   away.  There is still a problem with the vfs usage counting -- in
   one case involving callin and callout devices whose details I forget,
   last-close is not called when it needs to be called to wake up all
   the threads sleeping in open so that they can enter a new state.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Sat Mar 15 17:16:47 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D57101065675;
	Sat, 15 Mar 2008 17:16:47 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 94DC78FC18;
	Sat, 15 Mar 2008 17:16:47 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 1FC1046C23;
	Sat, 15 Mar 2008 13:16:46 -0400 (EDT)
Date: Sat, 15 Mar 2008 17:16:46 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Joseph Koshy <joseph.koshy@gmail.com>
In-Reply-To: <84dead720803142243r6c8cc68dm325e7fb925189fd@mail.gmail.com>
Message-ID: <20080315170411.A42065@fledge.watson.org>
References: <20080313180805.GA83406@dragon.NUXI.org>
	<200803131516.12284.jhb@freebsd.org>
	<84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
	<200803141431.53846.jhb@freebsd.org>
	<84dead720803142243r6c8cc68dm325e7fb925189fd@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead
	of	'mp_ncpus'.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Mar 2008 17:16:48 -0000

On Sat, 15 Mar 2008, Joseph Koshy wrote:

> Therefore we can use either a count (mp_ncpus) or a maximum id (mp_maxid) to 
> represent {MACHINE-MAX}, but either one would do.
>
> However, x86 MD code uses both, with newer code seeming to prefer mp_maxid. 
> So I am puzzled.  There are far more uses of mp_ncpus there though.

I suspect that's because kernel code wants to index into a data structure 
using the CPU ID, i.e., curcpu, but don't want to size the array at MAXCPU, 
which will be an increasingly large compile-time constant over time.  This 
relies on the relative non-sparseness of CPU IDs to be of benefit, and 
generally, this does hold.  For example, on the HTT boxes, CPU IDs might be 
0..3 with 0 and 2 being used, and that's still less than 16 or 32.  However, 
in some cases we size kernel arrays to MAXCPU, and sometimes to mp_maxid. 
There's a reasonable argument that sizing arrays this way is a dubious 
practice as you more ideally want to store per-CPU data hung off the percpu 
block to avoid adjacent per-cpu data in the same cache line.

I ran into some similar concerns when trying to figure out how best to export 
memory allocator statistics from the kernel.  In the end what I concluded was 
that I would export contiguous CPU data up to mp_maxid from the kernel, and 
that userspace would try to avoid any compile-time knowledge of CPU limits so 
that it doesn't matter if a kernel is compiled for UP (MAXCPU=1) or SMP 
(MAXCPU=(n), where n is often 16, I believe).  I do end up exporting data for 
absent CPUs under mp_maxid.

Robert N M Watson
Computer Laboratory
University of Cambridge

From owner-freebsd-arch@FreeBSD.ORG  Sat Mar 15 19:48:07 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A8105106564A
	for <arch@freebsd.org>; Sat, 15 Mar 2008 19:48:07 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
	by mx1.freebsd.org (Postfix) with ESMTP id 8812B8FC1A
	for <arch@freebsd.org>; Sat, 15 Mar 2008 19:48:07 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (unknown [192.168.64.3])
	by phk.freebsd.dk (Postfix) with ESMTP id 7AFBF17104;
	Sat, 15 Mar 2008 19:48:05 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m2FJm4U5006719;
	Sat, 15 Mar 2008 19:48:04 GMT (envelope-from phk@critter.freebsd.dk)
To: Ed Schouten <ed@80386.nl>
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
In-Reply-To: Your message of "Sat, 15 Mar 2008 13:40:08 +0100."
	<20080315124008.GF80576@hoeg.nl> 
Date: Sat, 15 Mar 2008 19:48:04 +0000
Message-ID: <6718.1205610484@critter.freebsd.dk>
Sender: phk@critter.freebsd.dk
Cc: FreeBSD Arch <arch@freebsd.org>
Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads? 
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Mar 2008 19:48:07 -0000


>To be honest, I'm not completely sure how to solve this issue, though I
>know it should at least do something similar to this:
>
>- The device driver should have a seperate routine (d_revoke) to wake
>  up any blocked threads, to make sure they leave the device driver
>  properly.

It's already there, it's called d_purge().


-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-arch@FreeBSD.ORG  Sat Mar 15 21:06:00 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4320D106566C
	for <arch@freebsd.org>; Sat, 15 Mar 2008 21:06:00 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197])
	by mx1.freebsd.org (Postfix) with ESMTP id B845E8FC1A
	for <arch@freebsd.org>; Sat, 15 Mar 2008 21:05:59 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from [212.82.216.226] (helo=skuns.kiev.zoral.com.ua)
	by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256)
	(Exim 4.67) (envelope-from <kostikbel@gmail.com>) id 1JacxM-0004Q2-W0
	for arch@freebsd.org; Sat, 15 Mar 2008 22:26:19 +0200
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by skuns.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m2FJmMWB091804
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 15 Mar 2008 21:48:22 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id
	m2FJm9oL062454; Sat, 15 Mar 2008 21:48:09 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id m2FJm9C1062453; 
	Sat, 15 Mar 2008 21:48:09 +0200 (EET)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Sat, 15 Mar 2008 21:48:09 +0200
From: Kostik Belousov <kostikbel@gmail.com>
To: Ed Schouten <ed@80386.nl>
Message-ID: <20080315194809.GN10374@deviant.kiev.zoral.com.ua>
References: <20080315124008.GF80576@hoeg.nl>
	<20080316015903.N39516@delplex.bde.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="h/ohfBjN02kAJu/T"
Content-Disposition: inline
In-Reply-To: <20080316015903.N39516@delplex.bde.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: ClamAV version 0.91.2,
	clamav-milter version 0.91.2 on skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.4
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on
	skuns.kiev.zoral.com.ua
X-Scanner-Signature: 46978630fe90eedb12f62e56ce2f8684
X-DrWeb-checked: yes
X-SpamTest-Envelope-From: kostikbel@gmail.com
X-SpamTest-Group-ID: 00000000
X-SpamTest-Info: Profiles 2421 [Mar 14 2008]
X-SpamTest-Info: helo_type=3
X-SpamTest-Method: none
X-SpamTest-Rate: 0
X-SpamTest-Status: Not detected
X-SpamTest-Status-Extended: not_detected
X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0278], KAS30/Release
Cc: FreeBSD Arch <arch@freebsd.org>
Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Mar 2008 21:06:00 -0000


--h/ohfBjN02kAJu/T
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Mar 16, 2008 at 03:55:18AM +1100, Bruce Evans wrote:
> On Sat, 15 Mar 2008, Ed Schouten wrote:
>=20
> >The last couple of days I'm seeing some strange things in my mpsafetty
> >branch related to terminal revocation.
> >
> >In my current TTY design, I hold a count (t_ldisccnt) of the amount of
> >threads that are sleeping in the line discipline. I need to store such a
> >count, because it's not possible to change line disciplines while some
> >threads are still blocked inside the discipline. This means that when
> >d_close() is called on a TTY, t_ldisccnt should always be 0. There
> >cannot be any threads stuck inside the line discipline when there aren't
> >any descriptors referencing it.
> >
> >Unfortunately, this isn't entirely true with the current VFS/devfs
> >design. When vgone() is called, a VOP_CLOSE() is performed , which means
> >there could be a dozen threads still stuck inside a device driver, but
> >the close routine is already called to clean up stuff. There are a
> >*real* lot of drivers that blindly clean up their stuff in the d_close()
> >routine, expecting that the device is completely unused. This can
> >easily be demonstrated by revoking a bpf device, while running tcpdump.
>=20
> Yes, most drivers are broken here, but the problem is rarely noticed
> because revoke() isn't normally applied to any devices except ttys.
> Even ordinary close() can cause problems when a thread is sleeping
> in device open, but this too is only common for ttys (for callin and
> callout devices).
>=20
> The tty driver is about the only driver that handles this problem
> almost correctly.  It uses a generation count.  All tty drivers are
> supposed to sleep using only ttysleep().  ttysleep() checks the
> generation count and returns ERESTART if the generation is new.  All
> tty drivers should consider this error to be fatal and propagate it
> up to the syscall level where the syscall is restarted.  This tends
> to happen naturally, but some places (in device close IIRC), the driver
> ignores the error and does more i/o (to finish cleaning up in close
> -- close and open can easily pass each other and clobber each others
> state when this happens).  More I/O also tends to occur if a revoke()
> happens when a thread is blocked but not sleeping.  Then ttysleep()
> isn't in sight, so the thread has no idea that the generation count
> changed.  Giant locking limits this problem.
>=20
> >To be honest, I'm not completely sure how to solve this issue, though I
> >know it should at least do something similar to this:
> >
> >- The device driver should have a seperate routine (d_revoke) to wake
> > up any blocked threads, to make sure they leave the device driver
> > properly.
>=20
> Something is needed to signal blocked but non-sleeping threads.  I
> think the wakeup for ttys now normally occurs as a side effect of
> flushing i/o.  revoke() normally calls ttyclose() which calls ttyldclose()
> which normally calls ttylclose() which flushes i/o which wakes up
> threads waiting on the i/o.  I don't see how ttylclose() can work right
> in the usual !FNONBLOCK case.  Maybe revoke() sets FNONBLOCK.  The
> generation count stuff doesn't help here because the flush is done
> before incrementing the generation count.
>=20
> There is an obvious race here for threads doing i/o instead of waiting
> for it.  These muse be blocked (by Giant now for ttys, or by your
> MPSAFE locking).  They will run when revoke() releases the lock and
> find the i/o flushed and maybe the generation count incrememented, but
> they normally won't check these states and will just blunder on doing
> more i/o.  It would be painful to check these states every time the
> lock is aquired, but this seems to be necessary.  Magic Giant locking
> makes the places where the lock is acquired hard see.
>=20
> >- Maybe vgonel() shouldn't call VOP_CLOSE(). It should probably move the
> > vnode into deadfs, with the exception of the close() routine. Maybe
> > it's better to add a new function to do this, vrevoke().
> >
> >This means that when a revoke() call is performed, all blocked threads
> >are woken up, will leave the driver, to find out their terminal has been
> >revoked. Further system calls will fail, because the vnode is in deadfs,
> >but when the processes close the descriptor, the device driver can still
> >clean up everything.
>=20
> I think vfs already moves the vnode to deadfs.  It doesn't do anything
> to synchronize with threads running in device drivers.  The forced
> last-close() should complete synchronously as part of revoke().  Then
> other threads leave the device driver asynchronously, hopefully not
> much later.  Then if the generation count stuff is working right, the
> syscall is restarted, but now file descriptors point to deadfs so the
> syscall normally fails.  I think the async completion is OK provided
> it is done right (don't delay it indefinitely, and don't do more
> i/o on completion).  It doesn't seem to be useful to make revoke()
> wait for the completions.
>=20
> I don't think it would work well to move everything except d_close to
> deadfs.
>=20
> Other problems near here:
> - neither vfs nor drivers currently know how many threads are in a
>   driver.  vfs uses vp->v_rdev->si_usecount, but this doesn't quite work
This is provided by si_threadcount.
See the dev(vn)_refthread and it usage in the devfs vnops and fops.


>   since it doesn't count threads sleeping in open.  Maybe ones excuting
>   last close too -- this would be more of a problem.  revoke() just
>   uses vcount(), which just acquires the device locks and returns
>   si_usecount after releasing the device lock.  (I don't understand
>   this locking -- what stops the count changing after the lock is
>   released, or if it cannot changed then why acquire the lock?)  This
>   can result in revoke() not calling device close when it should.
>   Drivers can obviously keep count of their activities using large
>   code.  I can't see any way for vfs to keep count short of asking
>   drivers for their counts.
> - there can be any number of threads in device open and close concurrentl=
y,
>   even without the complications for revoke().  The most problematic
>   cases happen when last-close blocks, as is common for ttys waiting
>   for output to drain (since no one cares about their output actually
>   working and ensures draining it using tcdrain() -- normal losing
>   programs finish up with something like printf(); exit(); and depend
>   on the close() in exit(); blocking to drain the output).  Then new
>   opens are allowed, and this is useful for doing ioctls() to unblocked
>   blocked closes.  If the new open or fcntl sets non-blocking mode, then
>   the last-close for the new open may pass the blocked last close.  If
>   the new mode is blocking, then the last-close for the new open may block
>   too.  The number of threads in last-close is thus unlimited.  A thunder=
ing
>   herd of them tends to stomp on each other when they are all unblocked at
>   the same time.
>=20
>   The connections of this with revoke() are:
>   - it takes vfs's not counting of all threads in the device driver to
>     allow the useful behaviour of opens while a close is blocked and
>     the necessary behaviour of last-close while another last-close is
>     executing (drivers should be aware of this possibility and merge
>     the closes, but don't).
>   - I think revoke() sets FNONBLOCK somewhere.  Thus it tends to unblock
>     any thread waiting in last-close for output to drain.
>=20
>   Less problematic cases occur when opens block.  ttyopen() understands
>   this possibility and handles it almost right using its t_wopeners
>   count.  ttyopen() uses various sleeps where it should use ttysleep()
>   or check the generation count itself; this results in it looping
>   internally instead of restarting the syscall, which is only a small
>   error since for open() alone, restarting the syscall would call back
>   to the same non-dead device open except in unusual cases where there
>   was a signal and syscalls are not restarted, or the device name went
>   away.  There is still a problem with the vfs usage counting -- in
>   one case involving callin and callout devices whose details I forget,
>   last-close is not called when it needs to be called to wake up all
>   the threads sleeping in open so that they can enter a new state.

The device driver already could provide the d_purge method that
is intended to safely drain all threads that are now in the
driver. See the kern_conf.c:destroy_dev() for the usage.

Also, please note that the drivers cannot call destroy_dev() from the
d_close method due to selflock with si_threadcount. The livelock is
caused by the fix for the problem identical to the problem you described,
with substitution s/ldisc/cdev/. The destroy_dev_sched()
function is provided to execute destroy_dev() from another context.

Alternatively to what was proposed regarding vrevoke(), you could
use the similar lifecycle management for the ldisc, if suitable.

--h/ohfBjN02kAJu/T
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (FreeBSD)

iEYEARECAAYFAkfcJ/gACgkQC3+MBN1Mb4jN+ACfXT7H0LrUGepI7fnS51azFdte
pSYAnR9PIGY9M/yezNxRpxph+od4d5Up
=u9ra
-----END PGP SIGNATURE-----

--h/ohfBjN02kAJu/T--