From owner-freebsd-performance@FreeBSD.ORG  Sun Jun 11 17:45:29 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: performance@FreeBSD.org
Delivered-To: freebsd-performance@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 832C216A494;
	Sun, 11 Jun 2006 17:45:29 +0000 (UTC)
	(envelope-from kris@obsecurity.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3F63D43D49;
	Sun, 11 Jun 2006 17:45:29 +0000 (GMT)
	(envelope-from kris@obsecurity.org)
Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196])
	by elvis.mu.org (Postfix) with ESMTP id 20CC11A3C2D;
	Sun, 11 Jun 2006 10:45:29 -0700 (PDT)
Received: by obsecurity.dyndns.org (Postfix, from userid 1000)
	id 598F65157C; Sun, 11 Jun 2006 13:45:28 -0400 (EDT)
Date: Sun, 11 Jun 2006 13:45:28 -0400
From: Kris Kennaway <kris@obsecurity.org>
To: performance@FreeBSD.org
Message-ID: <20060611174527.GA31119@xor.obsecurity.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="3V7upXqbjpZ4EhLz"
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i
Cc: scrappy@FreeBSD.org
Subject: Postgresql performance profiling
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Jun 2006 17:45:29 -0000


--3V7upXqbjpZ4EhLz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

I set up supersmack against postgresql 8.1 from ports (default config)
on a 12 CPU E4500.  It scales and performs somewhat better than mysql
on this machine (which is heavily limited by contention between
threads in a process), but there are a number of obvious performance
bottlenecks:

* The postgres processes seem to change their proctitle hundreds or
thousands of times per second.  This is currently done via a
Giant-locked sysctl (kern.proc.args) so there is enormous contention
for Giant.  Even when this is fixed (thanks to a patch from csjp@),
each of them requires a syscall and syscalls ain't free.  This is not
a clever thing to be doing from a performance standpoint.

* pgsql uses select() and this seems to be a major choke point.  I bet
you'd see fairly impressive performance gains (especially on SMP) if
it was modified to use kqueue instead of select.

* You really want to avoid using IPv6 for transport (since it's
Giant-locked).  This was an issue at first since I was running against
localhost, which maps to ::1 by default.  We should reconsider the
preference for IPv6 over IPv4 until IPv6 is Giant-free - there are
probably many other situations where IPv6 is being secretly used
"because it is there" and costing performance.

* The sysv IPC code is still giant-locked.  pgsql makes a lot of
semop() calls which grab Giant, and it also msleep()s on the Giant
lock in the semwait channel.

* When semop() wants to wake up some sleeping processes because
semaphores have been released, it does a wakeup() and wakes them all
up.  This means a thundering herd (I see up to 11 CPUs being woken
here).  Since we know exactly how many resources are available, it
would be better to only wakeup_one() that number of times instead.

Here are what seem to be the relevant heavily-contended mutex
acquisitions (ratio = cnt_lock/count measures how many times this lock
was contended by something else while held by this code line):

  count   cnt_hold cnt_lock ratio name
 106080     7420    19238   .181 kern/kern_synch.c:222 (lockbuilder mtxpool) <-- vfs
 175435    13952    42365   .241 kern/kern_condvar.c:113 (lockbuilder mtxpool) <-- vfs
1075841   271138   419862   .390 kern/kern_synch.c:220 (Giant) <-- msleep with Giant
 734613   248249   291969   .397 kern/sys_generic.c:1140 (sellck) <-- select
 800332   379020   326324   .407 kern/sys_generic.c:944 (sellck) <-- select
 401751    19731   175305   .436 kern/sys_generic.c:1092 (sellck) <-- select
 400280   198880   176623   .441 kern/sys_generic.c:935 (sellck) <-- select
1361163   695637   624171   .458 sparc64/sparc64/trap.c:586 (Giant) <-- semop
 400190   193112   238578   .596 kern/kern_condvar.c:208 (sellck) <-- select

Kris

--3V7upXqbjpZ4EhLz
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFEjFa3Wry0BWjoQKURAnpmAKClAPG4O9VCh82gg30kdE4xVyw6gwCgw1fz
Xr5QpUf1hCBIIXmcZuNdx8U=
=Tu8r
-----END PGP SIGNATURE-----

--3V7upXqbjpZ4EhLz--

From owner-freebsd-performance@FreeBSD.ORG  Sun Jun 11 18:01:10 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: performance@FreeBSD.org
Delivered-To: freebsd-performance@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4800116A420;
	Sun, 11 Jun 2006 18:01:10 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id ED6E943D49;
	Sun, 11 Jun 2006 18:01:09 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 9277846B94;
	Sun, 11 Jun 2006 14:01:09 -0400 (EDT)
Date: Sun, 11 Jun 2006 19:01:09 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Kris Kennaway <kris@obsecurity.org>
In-Reply-To: <20060611174527.GA31119@xor.obsecurity.org>
Message-ID: <20060611185702.L26634@fledge.watson.org>
References: <20060611174527.GA31119@xor.obsecurity.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: scrappy@FreeBSD.org, performance@FreeBSD.org
Subject: Re: Postgresql performance profiling
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Jun 2006 18:01:10 -0000


On Sun, 11 Jun 2006, Kris Kennaway wrote:

> * The postgres processes seem to change their proctitle hundreds or 
> thousands of times per second.  This is currently done via a Giant-locked 
> sysctl (kern.proc.args) so there is enormous contention for Giant.  Even 
> when this is fixed (thanks to a patch from csjp@), each of them requires a 
> syscall and syscalls ain't free.  This is not a clever thing to be doing 
> from a performance standpoint.

You might consider disabling setproctitle() entirely to see what impact that 
has?

> * pgsql uses select() and this seems to be a major choke point.  I bet you'd 
> see fairly impressive performance gains (especially on SMP) if it was 
> modified to use kqueue instead of select.
>
> * You really want to avoid using IPv6 for transport (since it's 
> Giant-locked).  This was an issue at first since I was running against 
> localhost, which maps to ::1 by default.  We should reconsider the 
> preference for IPv6 over IPv4 until IPv6 is Giant-free - there are probably 
> many other situations where IPv6 is being secretly used "because it is 
> there" and costing performance.

FYI, for purely loopback traffic, it's probably safe to mark the IPv6 netisr 
as MPSAFE.  Add NETISR_MPSAFE as a flag to the following line in ip6_input.c:

ip6_input.c:    netisr_register(NETISR_IPV6, ip6_input, &ip6intrq, 0);

If you have non-loopback traffic, you may put yourself at greater risks of 
panic in the IPv6 multicast and neighbor discovery code, however, so this 
should be done with caution.  It might be an interesting exercise though.

> * The sysv IPC code is still giant-locked.  pgsql makes a lot of semop() 
> calls which grab Giant, and it also msleep()s on the Giant lock in the 
> semwait channel.

It is likely quite easy to put subsystem locks around System V IPC subsystems. 
I'm a bit surprised no one has done it already.  sysvshm is a bit more tricky, 
but sysvsem and sysvmsg should be quite straight forward.

> * When semop() wants to wake up some sleeping processes because semaphores 
> have been released, it does a wakeup() and wakes them all up.  This means a 
> thundering herd (I see up to 11 CPUs being woken here).  Since we know 
> exactly how many resources are available, it would be better to only 
> wakeup_one() that number of times instead.

Should be easy to experiment with.

Robert N M Watson
Computer Laboratory
Universty of Cambridge

From owner-freebsd-performance@FreeBSD.ORG  Sun Jun 11 20:31:58 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: performance@FreeBSD.org
Delivered-To: freebsd-performance@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0CA3016A418;
	Sun, 11 Jun 2006 20:31:58 +0000 (UTC)
	(envelope-from kris@obsecurity.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8717E43D70;
	Sun, 11 Jun 2006 20:31:51 +0000 (GMT)
	(envelope-from kris@obsecurity.org)
Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196])
	by elvis.mu.org (Postfix) with ESMTP id 0486C1A3C2D;
	Sun, 11 Jun 2006 13:31:51 -0700 (PDT)
Received: by obsecurity.dyndns.org (Postfix, from userid 1000)
	id CEA6D521A2; Sun, 11 Jun 2006 16:31:46 -0400 (EDT)
Date: Sun, 11 Jun 2006 16:31:44 -0400
From: Kris Kennaway <kris@obsecurity.org>
To: Kris Kennaway <kris@obsecurity.org>
Message-ID: <20060611203144.GA34123@xor.obsecurity.org>
References: <20060611174527.GA31119@xor.obsecurity.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="PNTmBPCT7hxwcZjr"
Content-Disposition: inline
In-Reply-To: <20060611174527.GA31119@xor.obsecurity.org>
User-Agent: Mutt/1.4.2.1i
Cc: scrappy@FreeBSD.org, performance@FreeBSD.org
Subject: Re: Postgresql performance profiling
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Jun 2006 20:31:58 -0000


--PNTmBPCT7hxwcZjr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jun 11, 2006 at 01:45:28PM -0400, Kris Kennaway wrote:
> I set up supersmack against postgresql 8.1 from ports (default config)
> on a 12 CPU E4500.  It scales and performs somewhat better than mysql
> on this machine (which is heavily limited by contention between
> threads in a process), but there are a number of obvious performance
> bottlenecks:

FYI, on a dual p4 + HTT, mysql significantly outperforms pgsql (by
>55% peak performance, probably more if I was using libthr which I
cannot on this machine for technical reasons) on select-key.smack when
configured the same way (i.e. transport over IPv4 instead of local
socket, which supersmack prefers for mysql).

Contention is still a big issue here (only listing mutexes contended
more than 10% of acquisitions):

     0          0     142969      0       1996      14458   .101 kern/kern_=
synch.c:218 (Giant)
     0          0     199028      0      11649      27944   .140 kern/kern_=
condvar.c:208 (sellck)
     0          0     400103      0     111216      91336   .228 kern/kern_=
sysctl.c:1317 (Giant)
     0          0     303147      0     108735     131237   .432 i386/i386/=
trap.c:1005 (Giant)

I turned off process title setting and got an 8% performance boost.

Contention is now a bit better but still serious:

     0          0      22952      0       2067       2521   .109 vm/vm_faul=
t.c:987 (vm object)
     0          0     199153      0      12589      31512   .158 kern/kern_=
condvar.c:208 (sellck)
     0          0     361305      0     124766     130901   .362 i386/i386/=
trap.c:1005 (Giant)

i.e. semop() (the Giant-locked syscall) is contending with itself a
lot, and select() is a secondary problem.

Actually rwatson noticed that semop() is marked MPSAFE, so it's not
clear (but nevertheless true) why Giant is acquired here.  OK, pjd
worked out that it's because SYSCALL_MODULE_HELPER() *never* sets the
mpsafe flag, so all such syscalls registered that way (i.e. those
which are part of subsystems that may be loaded from kld) are
Giant-locked regardless of what syscalls.master says.

I removed the SYSCALL_MODULE_HELPERs from sysv_sem.c but now
postgresql hangs when trying to start; possibly the locking in
sysv_sem.c is just broken since it was never in fact tested.

Kris

> * The postgres processes seem to change their proctitle hundreds or
> thousands of times per second.  This is currently done via a
> Giant-locked sysctl (kern.proc.args) so there is enormous contention
> for Giant.  Even when this is fixed (thanks to a patch from csjp@),
> each of them requires a syscall and syscalls ain't free.  This is not
> a clever thing to be doing from a performance standpoint.
>=20
> * pgsql uses select() and this seems to be a major choke point.  I bet
> you'd see fairly impressive performance gains (especially on SMP) if
> it was modified to use kqueue instead of select.
>=20
> * You really want to avoid using IPv6 for transport (since it's
> Giant-locked).  This was an issue at first since I was running against
> localhost, which maps to ::1 by default.  We should reconsider the
> preference for IPv6 over IPv4 until IPv6 is Giant-free - there are
> probably many other situations where IPv6 is being secretly used
> "because it is there" and costing performance.
>=20
> * The sysv IPC code is still giant-locked.  pgsql makes a lot of
> semop() calls which grab Giant, and it also msleep()s on the Giant
> lock in the semwait channel.
>=20
> * When semop() wants to wake up some sleeping processes because
> semaphores have been released, it does a wakeup() and wakes them all
> up.  This means a thundering herd (I see up to 11 CPUs being woken
> here).  Since we know exactly how many resources are available, it
> would be better to only wakeup_one() that number of times instead.
>=20
> Here are what seem to be the relevant heavily-contended mutex
> acquisitions (ratio =3D cnt_lock/count measures how many times this lock
> was contended by something else while held by this code line):
>=20
>   count   cnt_hold cnt_lock ratio name
>  106080     7420    19238   .181 kern/kern_synch.c:222 (lockbuilder mtxpo=
ol) <-- vfs
>  175435    13952    42365   .241 kern/kern_condvar.c:113 (lockbuilder mtx=
pool) <-- vfs
> 1075841   271138   419862   .390 kern/kern_synch.c:220 (Giant) <-- msleep=
 with Giant
>  734613   248249   291969   .397 kern/sys_generic.c:1140 (sellck) <-- sel=
ect
>  800332   379020   326324   .407 kern/sys_generic.c:944 (sellck) <-- sele=
ct
>  401751    19731   175305   .436 kern/sys_generic.c:1092 (sellck) <-- sel=
ect
>  400280   198880   176623   .441 kern/sys_generic.c:935 (sellck) <-- sele=
ct
> 1361163   695637   624171   .458 sparc64/sparc64/trap.c:586 (Giant) <-- s=
emop
>  400190   193112   238578   .596 kern/kern_condvar.c:208 (sellck) <-- sel=
ect
>=20
> Kris


--PNTmBPCT7hxwcZjr
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFEjH2uWry0BWjoQKURAsonAKCarmABCAfQLdp+3DnJNvN7AuOF3ACfcxkt
a8UTiVQhh/fDu/xeADalNeg=
=DsOF
-----END PGP SIGNATURE-----

--PNTmBPCT7hxwcZjr--

From owner-freebsd-performance@FreeBSD.ORG  Sun Jun 11 21:37:04 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: performance@freebsd.org
Delivered-To: freebsd-performance@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 59C1116A418;
	Sun, 11 Jun 2006 21:37:04 +0000 (UTC) (envelope-from scrappy@hub.org)
Received: from hub.org (hub.org [200.46.204.220])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E2BB743D46;
	Sun, 11 Jun 2006 21:37:03 +0000 (GMT) (envelope-from scrappy@hub.org)
Received: from localhost (mx1.hub.org [200.46.208.251])
	by hub.org (Postfix) with ESMTP id F3049290C25;
	Sun, 11 Jun 2006 18:36:59 -0300 (ADT)
Received: from hub.org ([200.46.204.220])
	by localhost (mx1.hub.org [200.46.208.251]) (amavisd-new, port 10024)
	with ESMTP id 58527-06; Sun, 11 Jun 2006 18:37:03 -0300 (ADT)
Received: from ganymede.hub.org (blk-7-151-244.eastlink.ca [71.7.151.244])
	by hub.org (Postfix) with ESMTP id 743C2290C20;
	Sun, 11 Jun 2006 18:36:59 -0300 (ADT)
Received: by ganymede.hub.org (Postfix, from userid 1000)
	id DDFAB3EC22; Sun, 11 Jun 2006 18:37:05 -0300 (ADT)
Received: from localhost (localhost [127.0.0.1])
	by ganymede.hub.org (Postfix) with ESMTP id D53553EA1B;
	Sun, 11 Jun 2006 18:37:05 -0300 (ADT)
Date: Sun, 11 Jun 2006 18:37:05 -0300 (ADT)
From: "Marc G. Fournier" <scrappy@hub.org>
To: Kris Kennaway <kris@obsecurity.org>
In-Reply-To: <20060611174527.GA31119@xor.obsecurity.org>
Message-ID: <20060611183544.D1114@ganymede.hub.org>
References: <20060611174527.GA31119@xor.obsecurity.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: scrappy@FreeBSD.org, performance@FreeBSD.org
Subject: Re: Postgresql performance profiling
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Jun 2006 21:37:04 -0000

On Sun, 11 Jun 2006, Kris Kennaway wrote:

> * The postgres processes seem to change their proctitle hundreds or 
> thousands of times per second.  This is currently done via a 
> Giant-locked sysctl (kern.proc.args) so there is enormous contention for 
> Giant.  Even when this is fixed (thanks to a patch from csjp@), each of 
> them requires a syscall and syscalls ain't free.  This is not a clever 
> thing to be doing from a performance standpoint.

to disable for testing, after you run configure, manually edit 
src/include/pg_config.h and undef HAVE_SETPROCTITLE ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664

From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 00:30:36 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: performance@FreeBSD.org
Delivered-To: freebsd-performance@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D25C416A41F;
	Mon, 12 Jun 2006 00:30:36 +0000 (UTC)
	(envelope-from kris@obsecurity.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8A42A43D45;
	Mon, 12 Jun 2006 00:30:36 +0000 (GMT)
	(envelope-from kris@obsecurity.org)
Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196])
	by elvis.mu.org (Postfix) with ESMTP id 453E11A3C24;
	Sun, 11 Jun 2006 17:30:36 -0700 (PDT)
Received: by obsecurity.dyndns.org (Postfix, from userid 1000)
	id F3E15516F6; Sun, 11 Jun 2006 20:30:34 -0400 (EDT)
Date: Sun, 11 Jun 2006 20:30:34 -0400
From: Kris Kennaway <kris@obsecurity.org>
To: Kris Kennaway <kris@obsecurity.org>
Message-ID: <20060612003034.GA37926@xor.obsecurity.org>
References: <20060611174527.GA31119@xor.obsecurity.org>
	<20060611203144.GA34123@xor.obsecurity.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="0F1p//8PRICkK4MW"
Content-Disposition: inline
In-Reply-To: <20060611203144.GA34123@xor.obsecurity.org>
User-Agent: Mutt/1.4.2.1i
Cc: scrappy@FreeBSD.org, performance@FreeBSD.org
Subject: Re: Postgresql performance profiling
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 00:30:37 -0000


--0F1p//8PRICkK4MW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jun 11, 2006 at 04:31:44PM -0400, Kris Kennaway wrote:
> On Sun, Jun 11, 2006 at 01:45:28PM -0400, Kris Kennaway wrote:
> > I set up supersmack against postgresql 8.1 from ports (default config)
> > on a 12 CPU E4500.  It scales and performs somewhat better than mysql
> > on this machine (which is heavily limited by contention between
> > threads in a process), but there are a number of obvious performance
> > bottlenecks:
>=20
> FYI, on a dual p4 + HTT, mysql significantly outperforms pgsql (by
> >55% peak performance, probably more if I was using libthr which I
> cannot on this machine for technical reasons) on select-key.smack when
> configured the same way (i.e. transport over IPv4 instead of local
> socket, which supersmack prefers for mysql).
>=20
> Contention is still a big issue here (only listing mutexes contended
> more than 10% of acquisitions):
>=20
>      0          0     142969      0       1996      14458   .101 kern/ker=
n_synch.c:218 (Giant)
>      0          0     199028      0      11649      27944   .140 kern/ker=
n_condvar.c:208 (sellck)
>      0          0     400103      0     111216      91336   .228 kern/ker=
n_sysctl.c:1317 (Giant)
>      0          0     303147      0     108735     131237   .432 i386/i38=
6/trap.c:1005 (Giant)
>=20
> I turned off process title setting and got an 8% performance boost.
>=20
> Contention is now a bit better but still serious:
>=20
>      0          0      22952      0       2067       2521   .109 vm/vm_fa=
ult.c:987 (vm object)
>      0          0     199153      0      12589      31512   .158 kern/ker=
n_condvar.c:208 (sellck)
>      0          0     361305      0     124766     130901   .362 i386/i38=
6/trap.c:1005 (Giant)
>=20
> i.e. semop() (the Giant-locked syscall) is contending with itself a
> lot, and select() is a secondary problem.
>=20
> Actually rwatson noticed that semop() is marked MPSAFE, so it's not
> clear (but nevertheless true) why Giant is acquired here.  OK, pjd
> worked out that it's because SYSCALL_MODULE_HELPER() *never* sets the
> mpsafe flag, so all such syscalls registered that way (i.e. those
> which are part of subsystems that may be loaded from kld) are
> Giant-locked regardless of what syscalls.master says.
>=20
> I removed the SYSCALL_MODULE_HELPERs from sysv_sem.c but now
> postgresql hangs when trying to start; possibly the locking in
> sysv_sem.c is just broken since it was never in fact tested.

That was my mistake, the syscalls weren't getting registered.  I made
SYSCALL_MODULE_HELPER add the SYF_MPSAFE flag to work around it
instead.  The new mutex contention looks like:

     0          0     199118      0      12134      30704   .154 kern/kern_=
condvar.c:208 (sellck)
     0          0     354890      0     100749     110295   .310 kern/sysv_=
sem.c:1011 (semid)

i.e. semaphores are still contending with themselves.  It didn't make
any performance difference on this workload, as expected since it was
only contending with itself and still is (but in mixed workloads with
other Giant activity it will help, of course).

Kris

--0F1p//8PRICkK4MW
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFEjLWqWry0BWjoQKURArMtAJwMTP19UbohRLWGvMoKU4pFhdrCNQCeIYcO
7mTKb5txb7l6XmZzE+SRQ54=
=6KxN
-----END PGP SIGNATURE-----

--0F1p//8PRICkK4MW--

From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 14:21:05 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BB9F416A503
	for <freebsd-performance@freebsd.org>;
	Mon, 12 Jun 2006 14:21:05 +0000 (UTC)
	(envelope-from danial_thom@yahoo.com)
Received: from web33301.mail.mud.yahoo.com (web33301.mail.mud.yahoo.com
	[68.142.206.116])
	by mx1.FreeBSD.org (Postfix) with SMTP id 4A19343D46
	for <freebsd-performance@freebsd.org>;
	Mon, 12 Jun 2006 14:21:05 +0000 (GMT)
	(envelope-from danial_thom@yahoo.com)
Received: (qmail 190 invoked by uid 60001); 12 Jun 2006 14:21:04 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=u9ZgsQ0AAWN5Esw6KkYKduRPhsGET+KQhKc7RXws6PtiCQDhUrtLfyfU1+DnnwVTqAgvNjgomoe6kuymMOAemHD75i2jbTn6hEZMie3iChG4DUpWz8TJzZANaNhkTpUxS0VP9dhdSVoXAc+WOl8mYoAA46uyOz7Zt/Ty5zhzcPE=
	; 
Message-ID: <20060612142104.188.qmail@web33301.mail.mud.yahoo.com>
Received: from [65.34.182.15] by web33301.mail.mud.yahoo.com via HTTP;
	Mon, 12 Jun 2006 07:21:04 PDT
Date: Mon, 12 Jun 2006 07:21:04 -0700 (PDT)
From: Danial Thom <danial_thom@yahoo.com>
To: freebsd-performance@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Subject: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: danial_thom@yahoo.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 14:21:05 -0000

I'm just setting up to evaluate 6.1 for a
project, and before I tune I hoped to get some
feedback on why some things are the way they are.


first, why is the default for HZ now 1000? It
seems that 900 extra clock interrupts aren't a
performance enhancement.

Is there a reason that ITR isn't a tunable in the
em driver? It seems more usable generally to end
users than the delays.

Running a simple test with a traffic generator
(firing udp packets to a blackhole), the system
overhead with a single processor goes up from 10%
to 15% when running a kernel with SMP enabled
(and nothing else different). I have ITR set to
6000 interrupts per second. That seems like an
awful lot of overhead. Is there some problem
running an SMP-enabled kernel when only 1
processor is present, or is there really 50%
extra overhead on an SMP scheduler? I'll have a
dual core in a few days to test with.

Lastly, is there a utility similar to cpustat in
DragonflyBSD which shows the per-cpu usage stats?

Thanks,

DT

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 15:00:36 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1AC1B16A41B
	for <freebsd-performance@freebsd.org>;
	Mon, 12 Jun 2006 15:00:36 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6BCA043D45
	for <freebsd-performance@freebsd.org>;
	Mon, 12 Jun 2006 15:00:35 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 3021846C2F;
	Mon, 12 Jun 2006 11:00:31 -0400 (EDT)
Date: Mon, 12 Jun 2006 16:00:30 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Danial Thom <danial_thom@yahoo.com>
In-Reply-To: <20060612142104.188.qmail@web33301.mail.mud.yahoo.com>
Message-ID: <20060612155149.S24745@fledge.watson.org>
References: <20060612142104.188.qmail@web33301.mail.mud.yahoo.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 15:00:36 -0000

On Mon, 12 Jun 2006, Danial Thom wrote:

> first, why is the default for HZ now 1000? It seems that 900 extra clock 
> interrupts aren't a performance enhancement.

This is a design change that is in the process of being reconsidered.  I 
expect that HZ will not be 1000 in 7.x, but can't tell you whether it will go 
back to 100, or some middle ground.  There are a number of benefits to a 
higher HZ, not least is more accurate timing of some network timer events. 
Since I don't have my hands in the timer code, I can't speak to what the 
decision process here is, or when any change might happen, but I do expect to 
see some change.

> Running a simple test with a traffic generator (firing udp packets to a 
> blackhole), the system overhead with a single processor goes up from 10% to 
> 15% when running a kernel with SMP enabled (and nothing else different). I 
> have ITR set to 6000 interrupts per second. That seems like an awful lot of 
> overhead. Is there some problem running an SMP-enabled kernel when only 1 
> processor is present, or is there really 50% extra overhead on an SMP 
> scheduler? I'll have a dual core in a few days to test with.

I don't know about the particular number, but there is a significant overhead 
to building in SMP support currently -- in particular, you pick up a lot of 
atomic instructions which increases the cost of locking operations even 
without contention.  Some of that overhead reduces as the workload goes up, as 
there's coalescing of work under locked regions, reduced context switch rates 
as work is performed in batches, etc.  There is currently extremely active 
work in the area of reducing the overhead of scheduling and context switching, 
being driven in part by the 32-processor support in Sun4v.  I don't expect to 
see large portions of that merged to RELENG_6, but it will be in RELENG_7. 
Again, not my area of expertise, but there is work going on in this area.

Finally, there is a known performance problem involving loopback network 
traffic and preemption, which results in additional context switches.  You may 
want to try disabling preemption and see if/how that impacts your numbers. 
There has been seen quite a bit of discussion of this problem, and I expect to 
see a solution for it in the near future.  This problem does not manifest for 
remote traffic, only loopback traffic.

Robert N M Watson
Computer Laboratory
Universty of Cambridge

From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 19:58:02 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A23CB16A49E
	for <freebsd-performance@freebsd.org>;
	Mon, 12 Jun 2006 19:58:02 +0000 (UTC)
	(envelope-from danial_thom@yahoo.com)
Received: from web33306.mail.mud.yahoo.com (web33306.mail.mud.yahoo.com
	[68.142.206.121])
	by mx1.FreeBSD.org (Postfix) with SMTP id 0026B43D4C
	for <freebsd-performance@freebsd.org>;
	Mon, 12 Jun 2006 19:58:01 +0000 (GMT)
	(envelope-from danial_thom@yahoo.com)
Received: (qmail 72454 invoked by uid 60001); 12 Jun 2006 19:57:54 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=ys3GSJpJnj3CE+8YNrTvInDIozOoouRPWoBQpcQ1zTRHu1P79Ffw3bff+00nClo9q1ABVNnWM3I1hGbgC9VxJFl+OJ/tT8W54OKjzEbnhaT3SlMSSLokLqtzGjXjf07dn/LYL4m08cPJTqnHFvfGVMDS/gxmxcLihx0HqCFAsjM=
	; 
Message-ID: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
Received: from [65.34.182.15] by web33306.mail.mud.yahoo.com via HTTP;
	Mon, 12 Jun 2006 12:57:54 PDT
Date: Mon, 12 Jun 2006 12:57:54 -0700 (PDT)
From: Danial Thom <danial_thom@yahoo.com>
To: Robert Watson <rwatson@FreeBSD.org>, freebsd-performance@freebsd.org
In-Reply-To: <20060612155149.S24745@fledge.watson.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: 
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: danial_thom@yahoo.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 19:58:02 -0000


--- Robert Watson <rwatson@FreeBSD.org> wrote:

> On Mon, 12 Jun 2006, Danial Thom wrote:
> 
> > first, why is the default for HZ now 1000? It
> seems that 900 extra clock 
> > interrupts aren't a performance enhancement.
> 
> This is a design change that is in the process
> of being reconsidered.  I 
> expect that HZ will not be 1000 in 7.x, but
> can't tell you whether it will go 
> back to 100, or some middle ground.  There are
> a number of benefits to a 
> higher HZ, not least is more accurate timing of
> some network timer events. 
> Since I don't have my hands in the timer code,
> I can't speak to what the 
> decision process here is, or when any change
> might happen, but I do expect to 
> see some change.

Will anything break if I tweek this downward?

> 
> > Running a simple test with a traffic
> generator (firing udp packets to a 
> > blackhole), the system overhead with a single
> processor goes up from 10% to 
> > 15% when running a kernel with SMP enabled
> (and nothing else different). I 
> > have ITR set to 6000 interrupts per second.
> That seems like an awful lot of 
> > overhead. Is there some problem running an
> SMP-enabled kernel when only 1 
> > processor is present, or is there really 50%
> extra overhead on an SMP 
> > scheduler? I'll have a dual core in a few
> days to test with.
> 
> I don't know about the particular number, but
> there is a significant overhead 
> to building in SMP support currently -- in
> particular, you pick up a lot of 
> atomic instructions which increases the cost of
> locking operations even 
> without contention.  Some of that overhead
> reduces as the workload goes up, as 
> there's coalescing of work under locked
> regions, reduced context switch rates 
> as work is performed in batches, etc.  There is
> currently extremely active 
> work in the area of reducing the overhead of
> scheduling and context switching, 
> being driven in part by the 32-processor
> support in Sun4v.  I don't expect to 
> see large portions of that merged to RELENG_6,
> but it will be in RELENG_7. 
> Again, not my area of expertise, but there is
> work going on in this area.
> 
> Finally, there is a known performance problem
> involving loopback network 
> traffic and preemption, which results in
> additional context switches.  You may 
> want to try disabling preemption and see if/how
> that impacts your numbers. 
> There has been seen quite a bit of discussion
> of this problem, and I expect to 
> see a solution for it in the near future.  This
> problem does not manifest for 
> remote traffic, only loopback traffic.

I'm sending this traffic from an external device,
receiving on an em controller with blackhole set
to 1. So I assume this loopback issue doesn't
apply to this test?

DT

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 20:01:46 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0C8DF16A41A;
	Mon, 12 Jun 2006 20:01:46 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2879043D81;
	Mon, 12 Jun 2006 20:01:45 +0000 (GMT)
	(envelope-from scottl@samsco.org)
Received: from [10.10.3.185] ([69.15.205.254]) (authenticated bits=0)
	by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k5CK1YMF060726;
	Mon, 12 Jun 2006 14:01:41 -0600 (MDT)
	(envelope-from scottl@samsco.org)
Message-ID: <448DC818.9070100@samsco.org>
Date: Mon, 12 Jun 2006 14:01:28 -0600
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060206
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: danial_thom@yahoo.com
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
In-Reply-To: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=0.0 required=3.8 tests=none autolearn=failed 
	version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org
Cc: freebsd-performance@freebsd.org, Robert Watson <rwatson@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 20:01:46 -0000

Danial Thom wrote:
> 
> --- Robert Watson <rwatson@FreeBSD.org> wrote:
> 
> 
>>On Mon, 12 Jun 2006, Danial Thom wrote:
>>
>>
>>>first, why is the default for HZ now 1000? It
>>
>>seems that 900 extra clock 
>>
>>>interrupts aren't a performance enhancement.
>>
>>This is a design change that is in the process
>>of being reconsidered.  I 
>>expect that HZ will not be 1000 in 7.x, but
>>can't tell you whether it will go 
>>back to 100, or some middle ground.  There are
>>a number of benefits to a 
>>higher HZ, not least is more accurate timing of
>>some network timer events. 
>>Since I don't have my hands in the timer code,
>>I can't speak to what the 
>>decision process here is, or when any change
>>might happen, but I do expect to 
>>see some change.
> 
> 
> Will anything break if I tweek this downward?
> 

I run a number of high-load production systems that
do a lot of network and filesystem activity, all
with HZ set to 100.  It has also been shown in the
past that certain things in the network area where
not fixed to deal with a high HZ value, so it's
possible that it's even more stable/reliable with
an HZ value of 100.

My personal opinion is that HZ should gop back down
to 100 in 7-CURRENT immediately, and only be incremented
back up when/if it's proven to be the right thing to do.
And, I say that as someone who (errantly) pushed for the
increase to 1000 several years ago.

Scott


From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 20:02:52 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0C9C116A418
	for <freebsd-performance@freebsd.org>;
	Mon, 12 Jun 2006 20:02:52 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B3A4443D46
	for <freebsd-performance@freebsd.org>;
	Mon, 12 Jun 2006 20:02:51 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 36CE346C72;
	Mon, 12 Jun 2006 16:02:51 -0400 (EDT)
Date: Mon, 12 Jun 2006 21:02:51 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Danial Thom <danial_thom@yahoo.com>
In-Reply-To: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
Message-ID: <20060612210054.S26068@fledge.watson.org>
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 20:02:52 -0000


On Mon, 12 Jun 2006, Danial Thom wrote:

>> This is a design change that is in the process of being reconsidered.  I 
>> expect that HZ will not be 1000 in 7.x, but can't tell you whether it will 
>> go back to 100, or some middle ground.  There are a number of benefits to a 
>> higher HZ, not least is more accurate timing of some network timer events. 
>> Since I don't have my hands in the timer code, I can't speak to what the 
>> decision process here is, or when any change might happen, but I do expect 
>> to see some change.
>
> Will anything break if I tweek this downward?

No, shouldn't do.  I wouldn't go below 100 though, as things like process 
statistics, involuntary context switches, etc, are all affected.

>> Finally, there is a known performance problem involving loopback network 
>> traffic and preemption, which results in additional context switches.  You 
>> may want to try disabling preemption and see if/how that impacts your 
>> numbers. There has been seen quite a bit of discussion of this problem, and 
>> I expect to see a solution for it in the near future.  This problem does 
>> not manifest for remote traffic, only loopback traffic.
>
> I'm sending this traffic from an external device, receiving on an em 
> controller with blackhole set to 1. So I assume this loopback issue doesn't 
> apply to this test?

The above comments only refer to traffic being sent over if_loop interfaces or 
certain other deferred work scenarios. Basically, defering of work to the 
netisr from a user thread rather than an interrupt thread results in a 
premature context switch.

Robert N M Watson
Computer Laboratory
Universty of Cambridge

From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 20:08:14 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2D5A616A479
	for <freebsd-performance@freebsd.org>;
	Mon, 12 Jun 2006 20:08:14 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5C4E743D66
	for <freebsd-performance@freebsd.org>;
	Mon, 12 Jun 2006 20:08:13 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 8FA9946C58;
	Mon, 12 Jun 2006 16:08:12 -0400 (EDT)
Date: Mon, 12 Jun 2006 21:08:12 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Scott Long <scottl@samsco.org>
In-Reply-To: <448DC818.9070100@samsco.org>
Message-ID: <20060612210723.K26068@fledge.watson.org>
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
	<448DC818.9070100@samsco.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org, danial_thom@yahoo.com
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 20:08:14 -0000

On Mon, 12 Jun 2006, Scott Long wrote:

> I run a number of high-load production systems that do a lot of network and 
> filesystem activity, all with HZ set to 100.  It has also been shown in the 
> past that certain things in the network area where not fixed to deal with a 
> high HZ value, so it's possible that it's even more stable/reliable with an 
> HZ value of 100.
>
> My personal opinion is that HZ should gop back down to 100 in 7-CURRENT 
> immediately, and only be incremented back up when/if it's proven to be the 
> right thing to do. And, I say that as someone who (errantly) pushed for the 
> increase to 1000 several years ago.

I think it's probably a good idea to do it sooner rather than later.  It may 
slightly negatively impact some services that rely on frequent timers to do 
things like retransmit timing and the like.  But I haven't done any 
measurements.

Robert N M Watson
Computer Laboratory
Universty of Cambridge

From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 20:32:50 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0722116A418;
	Mon, 12 Jun 2006 20:32:50 +0000 (UTC)
	(envelope-from kris@obsecurity.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B518B43D46;
	Mon, 12 Jun 2006 20:32:49 +0000 (GMT)
	(envelope-from kris@obsecurity.org)
Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196])
	by elvis.mu.org (Postfix) with ESMTP id 99FFA1A4DA8;
	Mon, 12 Jun 2006 13:32:49 -0700 (PDT)
Received: by obsecurity.dyndns.org (Postfix, from userid 1000)
	id 0321A5153E; Mon, 12 Jun 2006 16:32:48 -0400 (EDT)
Date: Mon, 12 Jun 2006 16:32:48 -0400
From: Kris Kennaway <kris@obsecurity.org>
To: Robert Watson <rwatson@FreeBSD.org>
Message-ID: <20060612203248.GA72885@xor.obsecurity.org>
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
	<448DC818.9070100@samsco.org>
	<20060612210723.K26068@fledge.watson.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="mYCpIKhGyMATD0i+"
Content-Disposition: inline
In-Reply-To: <20060612210723.K26068@fledge.watson.org>
User-Agent: Mutt/1.4.2.1i
Cc: Scott Long <scottl@samsco.org>, danial_thom@yahoo.com,
	freebsd-performance@freebsd.org
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 20:32:50 -0000


--mYCpIKhGyMATD0i+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote:
> On Mon, 12 Jun 2006, Scott Long wrote:
>=20
> >I run a number of high-load production systems that do a lot of network=
=20
> >and filesystem activity, all with HZ set to 100.  It has also been shown=
=20
> >in the past that certain things in the network area where not fixed to=
=20
> >deal with a high HZ value, so it's possible that it's even more=20
> >stable/reliable with an HZ value of 100.
> >
> >My personal opinion is that HZ should gop back down to 100 in 7-CURRENT=
=20
> >immediately, and only be incremented back up when/if it's proven to be t=
he=20
> >right thing to do. And, I say that as someone who (errantly) pushed for=
=20
> >the increase to 1000 several years ago.
>=20
> I think it's probably a good idea to do it sooner rather than later.  It=
=20
> may slightly negatively impact some services that rely on frequent timers=
=20
> to do things like retransmit timing and the like.  But I haven't done any=
=20
> measurements.

As you know, but for the benefit of the list, restoring HZ=3D100 is
often an important performance tweak on SMP systems with many CPUs
because of all the sched_lock activity from statclock/hardclock, which
scales with HZ and NCPUS.

Kris

--mYCpIKhGyMATD0i+
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFEjc9wWry0BWjoQKURAoX7AKD3jrbSgbmpMEQibSGwucYvLxt9aACg3Y/i
5SbAlN+kIKUkkGdkZ3genJs=
=+GDa
-----END PGP SIGNATURE-----

--mYCpIKhGyMATD0i+--

From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 23:16:02 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from localhost.my.domain (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id B5F5516A418;
	Mon, 12 Jun 2006 23:16:01 +0000 (UTC)
	(envelope-from davidxu@freebsd.org)
From: David Xu <davidxu@freebsd.org>
To: freebsd-performance@freebsd.org
Date: Tue, 13 Jun 2006 07:15:52 +0800
User-Agent: KMail/1.8.2
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
	<20060612210723.K26068@fledge.watson.org>
	<20060612203248.GA72885@xor.obsecurity.org>
In-Reply-To: <20060612203248.GA72885@xor.obsecurity.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="gb2312"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200606130715.52425.davidxu@freebsd.org>
Cc: danial_thom@yahoo.com, Scott Long <scottl@samsco.org>,
	Robert Watson <rwatson@freebsd.org>, Kris Kennaway <kris@obsecurity.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 23:16:02 -0000

On Tuesday 13 June 2006 04:32, Kris Kennaway wrote:
> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote:
> > On Mon, 12 Jun 2006, Scott Long wrote:
> > >I run a number of high-load production systems that do a lot of network
> > >and filesystem activity, all with HZ set to 100.  It has also been shown
> > >in the past that certain things in the network area where not fixed to
> > >deal with a high HZ value, so it's possible that it's even more
> > >stable/reliable with an HZ value of 100.
> > >
> > >My personal opinion is that HZ should gop back down to 100 in 7-CURRENT
> > >immediately, and only be incremented back up when/if it's proven to be
> > > the right thing to do. And, I say that as someone who (errantly) pushed
> > > for the increase to 1000 several years ago.
> >
> > I think it's probably a good idea to do it sooner rather than later.  It
> > may slightly negatively impact some services that rely on frequent timers
> > to do things like retransmit timing and the like.  But I haven't done any
> > measurements.
>
> As you know, but for the benefit of the list, restoring HZ=100 is
> often an important performance tweak on SMP systems with many CPUs
> because of all the sched_lock activity from statclock/hardclock, which
> scales with HZ and NCPUS.
>
> Kris

sched_lock is another big bottleneck, since if you 32 CPUs, in theory
you have 32X context switch speed, but now it still has only 1X speed,
and there are code abusing sched_lock, the M:N bits dynamically inserts
a thread into thread list at context switch time, this is a bug, this
causes thread list in a proc has to be protected by scheduler lock, 
and delivering a signal to process has to hold scheduler lock and
find a thread, if the proc has many threads, this will introduce
long scheduler latency, a proc lock is not enough to find a thread,
this is a bug, there are other code abusing scheduler lock which
really can use its own lock.

David Xu

From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 23:19:59 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9BA5216A41F;
	Mon, 12 Jun 2006 23:19:59 +0000 (UTC) (envelope-from arr@watson.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 08F3D43D45;
	Mon, 12 Jun 2006 23:19:54 +0000 (GMT) (envelope-from arr@watson.org)
Received: from fledge.watson.org (localhost.watson.org [127.0.0.1])
	by fledge.watson.org (8.13.4/8.13.4) with ESMTP id k5CNJqUX043240;
	Mon, 12 Jun 2006 19:19:52 -0400 (EDT) (envelope-from arr@watson.org)
Received: from localhost (arr@localhost)
	by fledge.watson.org (8.13.4/8.13.4/Submit) with ESMTP id
	k5CNJqIb043237; Mon, 12 Jun 2006 19:19:52 -0400 (EDT)
	(envelope-from arr@watson.org)
X-Authentication-Warning: fledge.watson.org: arr owned process doing -bs
Date: Mon, 12 Jun 2006 19:19:52 -0400 (EDT)
From: "Andrew R. Reiter" <arr@watson.org>
To: David Xu <davidxu@freebsd.org>
In-Reply-To: <200606130715.52425.davidxu@freebsd.org>
Message-ID: <20060612191828.A38957@fledge.watson.org>
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
	<20060612210723.K26068@fledge.watson.org>
	<20060612203248.GA72885@xor.obsecurity.org>
	<200606130715.52425.davidxu@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: Robert Watson <rwatson@freebsd.org>, freebsd-performance@freebsd.org,
	danial_thom@yahoo.com, Scott Long <scottl@samsco.org>,
	Kris Kennaway <kris@obsecurity.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 23:19:59 -0000

On Tue, 13 Jun 2006, David Xu wrote:

:On Tuesday 13 June 2006 04:32, Kris Kennaway wrote:
:> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote:
:> > On Mon, 12 Jun 2006, Scott Long wrote:
:> > >I run a number of high-load production systems that do a lot of network
:> > >and filesystem activity, all with HZ set to 100.  It has also been shown
:> > >in the past that certain things in the network area where not fixed to
:> > >deal with a high HZ value, so it's possible that it's even more
:> > >stable/reliable with an HZ value of 100.
:> > >
:> > >My personal opinion is that HZ should gop back down to 100 in 7-CURRENT
:> > >immediately, and only be incremented back up when/if it's proven to be
:> > > the right thing to do. And, I say that as someone who (errantly) pushed
:> > > for the increase to 1000 several years ago.
:> >
:> > I think it's probably a good idea to do it sooner rather than later.  It
:> > may slightly negatively impact some services that rely on frequent timers
:> > to do things like retransmit timing and the like.  But I haven't done any
:> > measurements.
:>
:> As you know, but for the benefit of the list, restoring HZ=100 is
:> often an important performance tweak on SMP systems with many CPUs
:> because of all the sched_lock activity from statclock/hardclock, which
:> scales with HZ and NCPUS.
:>
:> Kris
:
:sched_lock is another big bottleneck, since if you 32 CPUs, in theory
:you have 32X context switch speed, but now it still has only 1X speed,
:and there are code abusing sched_lock, the M:N bits dynamically inserts
:a thread into thread list at context switch time, this is a bug, this
:causes thread list in a proc has to be protected by scheduler lock, 
:and delivering a signal to process has to hold scheduler lock and
:find a thread, if the proc has many threads, this will introduce
:long scheduler latency, a proc lock is not enough to find a thread,
:this is a bug, there are other code abusing scheduler lock which
:really can use its own lock.
:
:David Xu

Given that it seems that various scenarios for locking bottlenecks can 
occur on various systems with different numbers of CPUs.  Has there been 
any research done on providing "best fit" profiles for varied N cpu 
systems?  

Cheers,
Andrew

--
arr@watson.org

From owner-freebsd-performance@FreeBSD.ORG  Mon Jun 12 23:21:09 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9CA2016A41B;
	Mon, 12 Jun 2006 23:21:09 +0000 (UTC) (envelope-from arr@watson.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 31B8543D49;
	Mon, 12 Jun 2006 23:21:09 +0000 (GMT) (envelope-from arr@watson.org)
Received: from fledge.watson.org (localhost.watson.org [127.0.0.1])
	by fledge.watson.org (8.13.4/8.13.4) with ESMTP id k5CNL8GH043290;
	Mon, 12 Jun 2006 19:21:08 -0400 (EDT) (envelope-from arr@watson.org)
Received: from localhost (arr@localhost)
	by fledge.watson.org (8.13.4/8.13.4/Submit) with ESMTP id
	k5CNL8Jh043287; Mon, 12 Jun 2006 19:21:08 -0400 (EDT)
	(envelope-from arr@watson.org)
X-Authentication-Warning: fledge.watson.org: arr owned process doing -bs
Date: Mon, 12 Jun 2006 19:21:08 -0400 (EDT)
From: "Andrew R. Reiter" <arr@watson.org>
To: David Xu <davidxu@freebsd.org>
In-Reply-To: <20060612191828.A38957@fledge.watson.org>
Message-ID: <20060612192015.G38957@fledge.watson.org>
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
	<20060612210723.K26068@fledge.watson.org>
	<20060612203248.GA72885@xor.obsecurity.org>
	<200606130715.52425.davidxu@freebsd.org>
	<20060612191828.A38957@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: danial_thom@yahoo.com, freebsd-performance@freebsd.org,
	Robert Watson <rwatson@freebsd.org>, Scott Long <scottl@samsco.org>,
	Kris Kennaway <kris@obsecurity.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jun 2006 23:21:09 -0000


Sorry to reply to myself ...

On Mon, 12 Jun 2006, Andrew R. Reiter wrote:

:On Tue, 13 Jun 2006, David Xu wrote:
:
::On Tuesday 13 June 2006 04:32, Kris Kennaway wrote:
::> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote:
::> > On Mon, 12 Jun 2006, Scott Long wrote:
::> > >I run a number of high-load production systems that do a lot of network
::> > >and filesystem activity, all with HZ set to 100.  It has also been shown
::> > >in the past that certain things in the network area where not fixed to
::> > >deal with a high HZ value, so it's possible that it's even more
::> > >stable/reliable with an HZ value of 100.
::> > >
::> > >My personal opinion is that HZ should gop back down to 100 in 7-CURRENT
::> > >immediately, and only be incremented back up when/if it's proven to be
::> > > the right thing to do. And, I say that as someone who (errantly) pushed
::> > > for the increase to 1000 several years ago.
::> >
::> > I think it's probably a good idea to do it sooner rather than later.  It
::> > may slightly negatively impact some services that rely on frequent timers
::> > to do things like retransmit timing and the like.  But I haven't done any
::> > measurements.
::>
::> As you know, but for the benefit of the list, restoring HZ=100 is
::> often an important performance tweak on SMP systems with many CPUs
::> because of all the sched_lock activity from statclock/hardclock, which
::> scales with HZ and NCPUS.
::>
::> Kris
::
::sched_lock is another big bottleneck, since if you 32 CPUs, in theory
::you have 32X context switch speed, but now it still has only 1X speed,
::and there are code abusing sched_lock, the M:N bits dynamically inserts
::a thread into thread list at context switch time, this is a bug, this
::causes thread list in a proc has to be protected by scheduler lock, 
::and delivering a signal to process has to hold scheduler lock and
::find a thread, if the proc has many threads, this will introduce
::long scheduler latency, a proc lock is not enough to find a thread,
::this is a bug, there are other code abusing scheduler lock which
::really can use its own lock.
::
::David Xu
:
:Given that it seems that various scenarios for locking bottlenecks can 
:occur on various systems with different numbers of CPUs.  Has there been 
:any research done on providing "best fit" profiles for varied N cpu 
:systems?  

Meaning at compile time certain profiles are taken for a given system to 
provide a good effort at providing a "best fit" for locking with their 
system.

:
:Cheers,
:Andrew
:
:--
:arr@watson.org
:_______________________________________________
:freebsd-performance@freebsd.org mailing list
:http://lists.freebsd.org/mailman/listinfo/freebsd-performance
:To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org"
:
:

--
arr@watson.org

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 10:01:11 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6AC2A16A41B;
	Tue, 13 Jun 2006 10:01:11 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 03E8F43D46;
	Tue, 13 Jun 2006 10:01:10 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 993A046C43;
	Tue, 13 Jun 2006 06:01:10 -0400 (EDT)
Date: Tue, 13 Jun 2006 11:01:10 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: David Xu <davidxu@freebsd.org>
In-Reply-To: <200606130715.52425.davidxu@freebsd.org>
Message-ID: <20060613105930.N34121@fledge.watson.org>
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
	<20060612210723.K26068@fledge.watson.org>
	<20060612203248.GA72885@xor.obsecurity.org>
	<200606130715.52425.davidxu@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org, kmacy@FreeBSD.org, danial_thom@yahoo.com,
	Scott Long <scottl@samsco.org>, Kris Kennaway <kris@obsecurity.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 10:01:11 -0000


On Tue, 13 Jun 2006, David Xu wrote:

> On Tuesday 13 June 2006 04:32, Kris Kennaway wrote:
>> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote:
>>> On Mon, 12 Jun 2006, Scott Long wrote:
>>>> I run a number of high-load production systems that do a lot of network
>>>> and filesystem activity, all with HZ set to 100.  It has also been shown
>>>> in the past that certain things in the network area where not fixed to
>>>> deal with a high HZ value, so it's possible that it's even more
>>>> stable/reliable with an HZ value of 100.
>>>>
>>>> My personal opinion is that HZ should gop back down to 100 in 7-CURRENT 
>>>> immediately, and only be incremented back up when/if it's proven to be 
>>>> the right thing to do. And, I say that as someone who (errantly) pushed 
>>>> for the increase to 1000 several years ago.
>>>
>>> I think it's probably a good idea to do it sooner rather than later.  It 
>>> may slightly negatively impact some services that rely on frequent timers 
>>> to do things like retransmit timing and the like.  But I haven't done any 
>>> measurements.
>>
>> As you know, but for the benefit of the list, restoring HZ=100 is often an 
>> important performance tweak on SMP systems with many CPUs because of all 
>> the sched_lock activity from statclock/hardclock, which scales with HZ and 
>> NCPUS.
>
> sched_lock is another big bottleneck, since if you 32 CPUs, in theory you 
> have 32X context switch speed, but now it still has only 1X speed, and there 
> are code abusing sched_lock, the M:N bits dynamically inserts a thread into 
> thread list at context switch time, this is a bug, this causes thread list 
> in a proc has to be protected by scheduler lock, and delivering a signal to 
> process has to hold scheduler lock and find a thread, if the proc has many 
> threads, this will introduce long scheduler latency, a proc lock is not 
> enough to find a thread, this is a bug, there are other code abusing 
> scheduler lock which really can use its own lock.

I've added Kip Macy to the CC, who is working with a patch for Sun4v that 
eliminates sched_lock.  Maybe he can comment some more on this thread?

Robert N M Watson
Computer Laboratory
Universty of Cambridge

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 15:24:54 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id F0E7616A47F
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 15:24:54 +0000 (UTC)
	(envelope-from danial_thom@yahoo.com)
Received: from web33306.mail.mud.yahoo.com (web33306.mail.mud.yahoo.com
	[68.142.206.121])
	by mx1.FreeBSD.org (Postfix) with SMTP id 603AB43D81
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 15:24:20 +0000 (GMT)
	(envelope-from danial_thom@yahoo.com)
Received: (qmail 87826 invoked by uid 60001); 13 Jun 2006 15:24:10 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=tnbzVcKCD7YcmQwTOqF4Mn8m9RmHiSGMm9+kw8T9xKtLl2CqyGq9sNQ77OSOWqH31PcFR4yANPZexBYdgh3gPAal6t6lDf/h/42Fa3y9oIFchBcL9tAs9mXGWIPXr4S0IazPdGsPBW66QfbFmK90PX3a5RWhc6hN7dOEvTXeFEI=
	; 
Message-ID: <20060613152410.87824.qmail@web33306.mail.mud.yahoo.com>
Received: from [65.34.182.15] by web33306.mail.mud.yahoo.com via HTTP;
	Tue, 13 Jun 2006 08:24:10 PDT
Date: Tue, 13 Jun 2006 08:24:10 -0700 (PDT)
From: Danial Thom <danial_thom@yahoo.com>
To: Robert Watson <rwatson@FreeBSD.org>, David Xu <davidxu@freebsd.org>
In-Reply-To: <20060613105930.N34121@fledge.watson.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-performance@freebsd.org
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: danial_thom@yahoo.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 15:24:55 -0000

I'm sorry if I missed it, but I don't believe
anyone answered this question:

>Lastly, is there a utility similar to cpustat in

>DragonflyBSD which shows the per-cpu usage
>stats? 

I need to gauge the efficiency of SMP for a
particular application, and also have some way of
measuring the effects of code changes.

Thanks,

DT


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 15:30:26 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 302A716A478;
	Tue, 13 Jun 2006 15:30:26 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9810A43D77;
	Tue, 13 Jun 2006 15:30:24 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 86B4746B0F;
	Tue, 13 Jun 2006 11:30:13 -0400 (EDT)
Date: Tue, 13 Jun 2006 16:30:13 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Danial Thom <danial_thom@yahoo.com>
In-Reply-To: <20060613152410.87824.qmail@web33306.mail.mud.yahoo.com>
Message-ID: <20060613162933.U88691@fledge.watson.org>
References: <20060613152410.87824.qmail@web33306.mail.mud.yahoo.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 15:30:26 -0000


On Tue, 13 Jun 2006, Danial Thom wrote:

> I'm sorry if I missed it, but I don't believe anyone answered this question:
>
>> Lastly, is there a utility similar to cpustat in
>
>> DragonflyBSD which shows the per-cpu usage stats?
>
> I need to gauge the efficiency of SMP for a particular application, and also 
> have some way of measuring the effects of code changes.

I didn't answer it because I don't know what output cpustat provides. What 
output does cpustat provide on DragonflyBSD?

Robert N M Watson
Computer Laboratory
Universty of Cambridge

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 16:08:16 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 25C8916A41A
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 16:08:16 +0000 (UTC)
	(envelope-from danial_thom@yahoo.com)
Received: from web33307.mail.mud.yahoo.com (web33307.mail.mud.yahoo.com
	[68.142.206.122])
	by mx1.FreeBSD.org (Postfix) with SMTP id 1695943D53
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 16:08:15 +0000 (GMT)
	(envelope-from danial_thom@yahoo.com)
Received: (qmail 41748 invoked by uid 60001); 13 Jun 2006 16:08:14 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=28FpI0xpG4cnO8SAQ9qoQNfJbW7F0gd4Jq9l8bOYTQe7WYuxFCPOd0f15OQpkla4PLxNN6PKIiQyWf+8rFqsWD2tyZEitJYOInYpcJDKahwMQzgdcn6vjqye+7TThg5Reqv4Atj4XtRlPd78Q1OY2vT7ADxv+opJbw3MDL7p9hY=
	; 
Message-ID: <20060613160814.41746.qmail@web33307.mail.mud.yahoo.com>
Received: from [65.34.182.15] by web33307.mail.mud.yahoo.com via HTTP;
	Tue, 13 Jun 2006 09:08:14 PDT
Date: Tue, 13 Jun 2006 09:08:14 -0700 (PDT)
From: Danial Thom <danial_thom@yahoo.com>
To: Robert Watson <rwatson@FreeBSD.org>
In-Reply-To: <20060613162933.U88691@fledge.watson.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-performance@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: danial_thom@yahoo.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 16:08:16 -0000


--- Robert Watson <rwatson@FreeBSD.org> wrote:

> 
> On Tue, 13 Jun 2006, Danial Thom wrote:
> 
> > I'm sorry if I missed it, but I don't believe
> anyone answered this question:
> >
> >> Lastly, is there a utility similar to
> cpustat in
> >
> >> DragonflyBSD which shows the per-cpu usage
> stats?
> >
> > I need to gauge the efficiency of SMP for a
> particular application, and also 
> > have some way of measuring the effects of
> code changes.
> 
> I didn't answer it because I don't know what
> output cpustat provides. What 
> output does cpustat provide on DragonflyBSD?

Its a simple output such as:

CPU-0 state:   14.00% user,   0.00% nice,   2.00%
sys,   6.00% intr, 78.00% idle
CPU-1 state:   4.00% user,   0.00% nice,   17.00%
sys,   2.00% intr, 77.00% idle

Of course, hp-ux type output for top would be
ideal:

Load averages: 0.27, 0.28, 0.28
203 processes: 186 sleeping, 17 running
Cpu states:
CPU   LOAD   USER   NICE    SYS   IDLE  BLOCK 
SWAIT   INTR   SSYS
 0    0.05   0.0%   0.0%   0.0% 100.0%   0.0%  
0.0%   0.0%   0.0%
 1    0.92   0.0%   0.0%   0.0% 100.0%   0.0%  
0.0%   0.0%   0.0%
 2    0.03   0.0%   0.0%   0.0% 100.0%   0.0%  
0.0%   0.0%   0.0%
 3    0.08   0.0%   0.0%   0.0% 100.0%   0.0%  
0.0%   0.0%   0.0%
---   ----  -----  -----  -----  -----  ----- 
-----  -----  -----
avg   0.27   0.0%   0.0%   0.0% 100.0%   0.0%  
0.0%   0.0%   0.0%

What is the plan for FreeBSD, as I don't see that
top shows any distribution among cpus?

DT


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 16:31:06 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7D45F16A473;
	Tue, 13 Jun 2006 16:31:06 +0000 (UTC) (envelope-from tbyte@otel.net)
Received: from mail.otel.net (gw3.OTEL.net [212.36.8.151])
	by mx1.FreeBSD.org (Postfix) with ESMTP id ECD1A43D48;
	Tue, 13 Jun 2006 16:31:05 +0000 (GMT) (envelope-from tbyte@otel.net)
Received: from dragon.otel.net ([212.36.8.135])
	by mail.otel.net with esmtp (Exim 4.62 (FreeBSD))
	(envelope-from <tbyte@otel.net>)
	id 1FqBnD-000DTf-BK; Tue, 13 Jun 2006 19:31:03 +0300
From: Iasen Kostov <tbyte@otel.net>
To: danial_thom@yahoo.com
In-Reply-To: <20060613160814.41746.qmail@web33307.mail.mud.yahoo.com>
References: <20060613160814.41746.qmail@web33307.mail.mud.yahoo.com>
Content-Type: text/plain
Date: Tue, 13 Jun 2006 19:31:02 +0300
Message-Id: <1150216262.81055.0.camel@DraGoN.OTEL.net>
Mime-Version: 1.0
X-Mailer: Evolution 2.6.2 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: freebsd-performance@freebsd.org, Robert Watson <rwatson@FreeBSD.org>,
	David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 16:31:06 -0000

On Tue, 2006-06-13 at 09:08 -0700, Danial Thom wrote:
> 
> --- Robert Watson <rwatson@FreeBSD.org> wrote:
> 
> > 
> > On Tue, 13 Jun 2006, Danial Thom wrote:
> > 
> > > I'm sorry if I missed it, but I don't believe
> > anyone answered this question:
> > >
> > >> Lastly, is there a utility similar to
> > cpustat in
> > >
> > >> DragonflyBSD which shows the per-cpu usage
> > stats?
> > >
> > > I need to gauge the efficiency of SMP for a
> > particular application, and also 
> > > have some way of measuring the effects of
> > code changes.
> > 
> > I didn't answer it because I don't know what
> > output cpustat provides. What 
> > output does cpustat provide on DragonflyBSD?
> 
> Its a simple output such as:
> 
> CPU-0 state:   14.00% user,   0.00% nice,   2.00%
> sys,   6.00% intr, 78.00% idle
> CPU-1 state:   4.00% user,   0.00% nice,   17.00%
> sys,   2.00% intr, 77.00% idle
> 
> Of course, hp-ux type output for top would be
> ideal:
> 
> Load averages: 0.27, 0.28, 0.28
> 203 processes: 186 sleeping, 17 running
> Cpu states:
> CPU   LOAD   USER   NICE    SYS   IDLE  BLOCK 
> SWAIT   INTR   SSYS
>  0    0.05   0.0%   0.0%   0.0% 100.0%   0.0%  
> 0.0%   0.0%   0.0%
>  1    0.92   0.0%   0.0%   0.0% 100.0%   0.0%  
> 0.0%   0.0%   0.0%
>  2    0.03   0.0%   0.0%   0.0% 100.0%   0.0%  
> 0.0%   0.0%   0.0%
>  3    0.08   0.0%   0.0%   0.0% 100.0%   0.0%  
> 0.0%   0.0%   0.0%
> ---   ----  -----  -----  -----  -----  ----- 
> -----  -----  -----
> avg   0.27   0.0%   0.0%   0.0% 100.0%   0.0%  
> 0.0%   0.0%   0.0%
> 
> What is the plan for FreeBSD, as I don't see that
> top shows any distribution among cpus?
> 
Probably You've missed the -S option:

last pid: 37969;  load averages:  1.85,  1.92,  2.20
up 1+02:28:38  19:29:53
336 processes: 9 running, 311 sleeping, 1 zombie, 15 waiting
CPU states: 25.0% user,  1.5% nice, 20.6% system,  1.5% interrupt, 51.5%
idle
Mem: 1945M Active, 2793M Inact, 1008M Wired, 307M Cache, 214M Buf, 1690M
Free
Swap: 4096M Total, 408K Used, 4095M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
COMMAND
   14 root        1 171   52     0K    16K RUN    0  19.2H 62.99% idle:
cpu0
   13 root        1 171   52     0K    16K RUN    1 810:43 61.77% idle:
cpu1
   11 root        1 171   52     0K    16K RUN    3  17.6H 61.52% idle:
cpu3
   12 root        1 171   52     0K    16K RUN    2 931:34 60.99% idle:
cpu2


From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 16:57:53 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 21F2616A476;
	Tue, 13 Jun 2006 16:57:53 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id AB71143D58;
	Tue, 13 Jun 2006 16:57:52 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 4C4AF46BBA;
	Tue, 13 Jun 2006 12:57:52 -0400 (EDT)
Date: Tue, 13 Jun 2006 17:57:52 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Danial Thom <danial_thom@yahoo.com>
In-Reply-To: <20060613160814.41746.qmail@web33307.mail.mud.yahoo.com>
Message-ID: <20060613175531.S26068@fledge.watson.org>
References: <20060613160814.41746.qmail@web33307.mail.mud.yahoo.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 16:57:53 -0000

On Tue, 13 Jun 2006, Danial Thom wrote:

>> I didn't answer it because I don't know what output cpustat provides. What 
>> output does cpustat provide on DragonflyBSD?
>
> Its a simple output such as:
>
> CPU-0 state:   14.00% user,   0.00% nice,   2.00%
> sys,   6.00% intr, 78.00% idle
> CPU-1 state:   4.00% user,   0.00% nice,   17.00%
> sys,   2.00% intr, 77.00% idle
>
> Of course, hp-ux type output for top would be
> ideal:
>
> Load averages: 0.27, 0.28, 0.28
> 203 processes: 186 sleeping, 17 running
> Cpu states:
> CPU   LOAD   USER   NICE    SYS   IDLE  BLOCK
> SWAIT   INTR   SSYS
> 0    0.05   0.0%   0.0%   0.0% 100.0%   0.0%
> 0.0%   0.0%   0.0%
> 1    0.92   0.0%   0.0%   0.0% 100.0%   0.0%
> 0.0%   0.0%   0.0%
> 2    0.03   0.0%   0.0%   0.0% 100.0%   0.0%
> 0.0%   0.0%   0.0%
> 3    0.08   0.0%   0.0%   0.0% 100.0%   0.0%
> 0.0%   0.0%   0.0%
> ---   ----  -----  -----  -----  -----  -----
> -----  -----  -----
> avg   0.27   0.0%   0.0%   0.0% 100.0%   0.0%
> 0.0%   0.0%   0.0%
>
> What is the plan for FreeBSD, as I don't see that top shows any distribution 
> among cpus?

top displays some CPU information, especially with -S which shows you the 
level of activity for the idle thread on each CPU.  The above looks useful, 
and should be fairly easy to add.  I've been thinking about adding a few new 
pages to systat output:

- Kernel memory allocator stats, based on memstat/memtop (and similar to what
   vmstat -z and vmstat -m show).
- CPU statistics such as the above.

I think there are some patches floating around already that gather per-cpu 
cp_time measurements, but Kris has commented to me that they reduce 
performance somewhat, so I'll have to investigate some.  That may be a caching 
effect of some sort.

Robert N M Watson
Computer Laboratory
Universty of Cambridge

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 18:23:42 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 723B416A41B
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 18:23:42 +0000 (UTC)
	(envelope-from danial_thom@yahoo.com)
Received: from web33304.mail.mud.yahoo.com (web33304.mail.mud.yahoo.com
	[68.142.206.119])
	by mx1.FreeBSD.org (Postfix) with SMTP id 13E3743D46
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 18:23:42 +0000 (GMT)
	(envelope-from danial_thom@yahoo.com)
Received: (qmail 76246 invoked by uid 60001); 13 Jun 2006 18:23:28 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=b0o1m8kfJsLzMLLyQFqu5hTpY1SnEYfMrp/2nOTCSwpf+zGNV5DEIvZvfpUMJ6M/kvJ0oAQ72GmGxUDQw6UvkiK8t2H4EGRTag9hVZVb5KNIAIxjHv5MHTV0CALsVXUzc8ek8VZsK4GGTxUepjxVftBvrGIkW4YOyfUkpzELpSs=
	; 
Message-ID: <20060613182328.76244.qmail@web33304.mail.mud.yahoo.com>
Received: from [65.34.182.15] by web33304.mail.mud.yahoo.com via HTTP;
	Tue, 13 Jun 2006 11:23:28 PDT
Date: Tue, 13 Jun 2006 11:23:28 -0700 (PDT)
From: Danial Thom <danial_thom@yahoo.com>
To: Robert Watson <rwatson@FreeBSD.org>
In-Reply-To: <20060613175531.S26068@fledge.watson.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-performance@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: danial_thom@yahoo.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 18:23:42 -0000


--- Robert Watson <rwatson@FreeBSD.org> wrote:

> On Tue, 13 Jun 2006, Danial Thom wrote:
> 
> >> I didn't answer it because I don't know what
> output cpustat provides. What 
> >> output does cpustat provide on DragonflyBSD?
> >
> > Its a simple output such as:
> >
> > CPU-0 state:   14.00% user,   0.00% nice,  
> 2.00%
> > sys,   6.00% intr, 78.00% idle
> > CPU-1 state:   4.00% user,   0.00% nice,  
> 17.00%
> > sys,   2.00% intr, 77.00% idle
> >
> > Of course, hp-ux type output for top would be
> > ideal:
> >
> > Load averages: 0.27, 0.28, 0.28
> > 203 processes: 186 sleeping, 17 running
> > Cpu states:
> > CPU   LOAD   USER   NICE    SYS   IDLE  BLOCK
> > SWAIT   INTR   SSYS
> > 0    0.05   0.0%   0.0%   0.0% 100.0%   0.0%
> > 0.0%   0.0%   0.0%
> > 1    0.92   0.0%   0.0%   0.0% 100.0%   0.0%
> > 0.0%   0.0%   0.0%
> > 2    0.03   0.0%   0.0%   0.0% 100.0%   0.0%
> > 0.0%   0.0%   0.0%
> > 3    0.08   0.0%   0.0%   0.0% 100.0%   0.0%
> > 0.0%   0.0%   0.0%
> > ---   ----  -----  -----  -----  -----  -----
> > -----  -----  -----
> > avg   0.27   0.0%   0.0%   0.0% 100.0%   0.0%
> > 0.0%   0.0%   0.0%
> >
> > What is the plan for FreeBSD, as I don't see
> that top shows any distribution 
> > among cpus?
> 
> top displays some CPU information, especially
> with -S which shows you the 
> level of activity for the idle thread on each
> CPU.  The above looks useful, 
> and should be fairly easy to add.  I've been
> thinking about adding a few new 
> pages to systat output:
> 
> - Kernel memory allocator stats, based on
> memstat/memtop (and similar to what
>    vmstat -z and vmstat -m show).
> - CPU statistics such as the above.
> 
> I think there are some patches floating around
> already that gather per-cpu 
> cp_time measurements, but Kris has commented to
> me that they reduce 
> performance somewhat, so I'll have to
> investigate some.  That may be a caching 
> effect of some sort.

Maybe someone can explain this output. The top
line shows 99.6%idle. Is it just showing CPU 0s
stats on the top line?

last pid:   705;  load averages:  0.06,  0.02, 
0.00          up 0+00:29:36  14:22:42
69 processes:  3 running, 48 sleeping, 18 waiting
CPU states:  0.0% user,  0.0% nice,  0.4% system,
 0.0% interrupt, 99.6% idle
Mem: 8160K Active, 8108K Inact, 17M Wired, 9712K
Buf, 461M Free
Swap: 512M Total, 512M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE 
C   TIME   WCPU COMMAND
   11 root        1 171   52     0K     8K RUN   
1  28:58 98.97% idle: cpu1
   12 root        1 171   52     0K     8K CPU0  
0  27:34 77.64% idle: cpu0
   23 root        1 -68 -187     0K     8K WAIT  
0   1:07 17.14% irq21: em1

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 18:36:24 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2EDF616A41A;
	Tue, 13 Jun 2006 18:36:24 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id AE1A643D46;
	Tue, 13 Jun 2006 18:36:23 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id C748846BD3;
	Tue, 13 Jun 2006 14:36:21 -0400 (EDT)
Date: Tue, 13 Jun 2006 19:36:21 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Danial Thom <danial_thom@yahoo.com>
In-Reply-To: <20060613182328.76244.qmail@web33304.mail.mud.yahoo.com>
Message-ID: <20060613193040.O26068@fledge.watson.org>
References: <20060613182328.76244.qmail@web33304.mail.mud.yahoo.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 18:36:24 -0000


On Tue, 13 Jun 2006, Danial Thom wrote:

> Maybe someone can explain this output. The top line shows 99.6%idle. Is it 
> just showing CPU 0s stats on the top line?

Two types of measurements are taken: sampled ticks regarding whether the 
system as a while is in {user, nice, system, intr, idle}, and then sampling 
for individual processes.  Right now, the system measurements are kept in a 
simple array of tick counters called cp_time.  John Baldwin and others have 
changes that make these tick counters per-CPU.  The lines at the top of 
top(1)'s output are derived from those tick counters.  Ticks are measured on 
each CPU, so those are a summary across all CPUs.  To add cpustat support, we 
need to merge John's patch to make cp_time per-CPU (ie., different counters 
for different CPUs) and teach the userland tools to retrieve them.  When you 
run top you'll notice that it adjusts the measurements each refresh.  In 
effect, what it's doing is sampling the change in tick counts over the window, 
pulling down the new values and calculating the percentages of ticks in each 
"bucket" in the last window.

Robert N M Watson
Computer Laboratory
Universty of Cambridge

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 18:43:40 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1A63216A47B
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 18:43:40 +0000 (UTC)
	(envelope-from danial_thom@yahoo.com)
Received: from web33314.mail.mud.yahoo.com (web33314.mail.mud.yahoo.com
	[68.142.206.129])
	by mx1.FreeBSD.org (Postfix) with SMTP id 1D05B43D48
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 18:43:37 +0000 (GMT)
	(envelope-from danial_thom@yahoo.com)
Received: (qmail 60231 invoked by uid 60001); 13 Jun 2006 18:43:36 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=DyK7tmlOn0MN29p52zFolf4agqne+9Xi5q4ilu2ic25Yg1fFrVAj6uYFNaJ2op1xqiA9w2JUiFZ/RyXGHCEmXqte39QyyzU+Qq0pWfSd/OTPjQ1vCYKZa41g5uWno3Z3rEKQC9XxbeYhN+c/Y5X6EVVYGpMBq61GrBIav7fZXVU=
	; 
Message-ID: <20060613184336.60229.qmail@web33314.mail.mud.yahoo.com>
Received: from [65.34.182.15] by web33314.mail.mud.yahoo.com via HTTP;
	Tue, 13 Jun 2006 11:43:36 PDT
Date: Tue, 13 Jun 2006 11:43:36 -0700 (PDT)
From: Danial Thom <danial_thom@yahoo.com>
To: Robert Watson <rwatson@FreeBSD.org>
In-Reply-To: <20060613193040.O26068@fledge.watson.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-performance@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: danial_thom@yahoo.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 18:43:40 -0000


--- Robert Watson <rwatson@FreeBSD.org> wrote:

> 
> On Tue, 13 Jun 2006, Danial Thom wrote:
> 
> > Maybe someone can explain this output. The
> top line shows 99.6%idle. Is it 
> > just showing CPU 0s stats on the top line?
> 
> Two types of measurements are taken: sampled
> ticks regarding whether the 
> system as a while is in {user, nice, system,
> intr, idle}, and then sampling 
> for individual processes.  Right now, the
> system measurements are kept in a 
> simple array of tick counters called cp_time. 
> John Baldwin and others have 
> changes that make these tick counters per-CPU. 
> The lines at the top of 
> top(1)'s output are derived from those tick
> counters.  Ticks are measured on 
> each CPU, so those are a summary across all
> CPUs.  To add cpustat support, we 
> need to merge John's patch to make cp_time
> per-CPU (ie., different counters 
> for different CPUs) and teach the userland
> tools to retrieve them.  When you 
> run top you'll notice that it adjusts the
> measurements each refresh.  In 
> effect, what it's doing is sampling the change
> in tick counts over the window, 
> pulling down the new values and calculating the
> percentages of ticks in each 
> "bucket" in the last window.

That doesn't explain why the Top line shows 99.6%
idle, but the cpu idle threads are showing
significant usage. 

I'm getting a constant 6000 Interrupts / Second
on my em controller, yet top jumps all over the
place; sitting at 99% idle for 10 seconds, then
jumping to 50%, then somewhere in between. It
seems completely unreliable. The load I'm
applying is constant.

DT

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 19:01:42 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4B70816A474;
	Tue, 13 Jun 2006 19:01:42 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CF86543D45;
	Tue, 13 Jun 2006 19:01:41 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 638AC46C0F;
	Tue, 13 Jun 2006 15:01:40 -0400 (EDT)
Date: Tue, 13 Jun 2006 20:01:40 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Danial Thom <danial_thom@yahoo.com>
In-Reply-To: <20060613184336.60229.qmail@web33314.mail.mud.yahoo.com>
Message-ID: <20060613195113.T26068@fledge.watson.org>
References: <20060613184336.60229.qmail@web33314.mail.mud.yahoo.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 19:01:42 -0000


On Tue, 13 Jun 2006, Danial Thom wrote:

>> Two types of measurements are taken: sampled ticks regarding whether the 
>> system as a while is in {user, nice, system, intr, idle}, and then sampling 
>> for individual processes.  Right now, the system measurements are kept in a 
>> simple array of tick counters called cp_time. John Baldwin and others have 
>> changes that make these tick counters per-CPU. The lines at the top of 
>> top(1)'s output are derived from those tick counters.  Ticks are measured 
>> on each CPU, so those are a summary across all CPUs.  To add cpustat 
>> support, we need to merge John's patch to make cp_time per-CPU (ie., 
>> different counters for different CPUs) and teach the userland tools to 
>> retrieve them.  When you run top you'll notice that it adjusts the 
>> measurements each refresh.  In effect, what it's doing is sampling the 
>> change in tick counts over the window, pulling down the new values and 
>> calculating the percentages of ticks in each "bucket" in the last window.
>
> That doesn't explain why the Top line shows 99.6% idle, but the cpu idle 
> threads are showing significant usage.
>
> I'm getting a constant 6000 Interrupts / Second on my em controller, yet top 
> jumps all over the place; sitting at 99% idle for 10 seconds, then jumping 
> to 50%, then somewhere in between. It seems completely unreliable. The load 
> I'm applying is constant.

I can't speak to the details of the thread/process use sampling model.  Top 
uses something called the "weighted cpu percentage" by default; you can switch 
to "unweighted" using the -C argument.  The top documentation fails to 
document the semantics of the percentages, but I suspect -C will give you more 
of what you expect.  The weighted CPU measurement takes into account process 
history, so it takes a while for sudden spike in CPU use to be fully 
reflected, and you may see seemingly counter-intuitive results, such as the 
appearance of greater than 100% CPU use.  Try out -C and see if you see 
something that makes more sense?

Robert N M Watson
Computer Laboratory
Universty of Cambridge

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 19:04:24 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1503016A41B;
	Tue, 13 Jun 2006 19:04:24 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 677EA43D49;
	Tue, 13 Jun 2006 19:04:20 +0000 (GMT)
	(envelope-from scottl@samsco.org)
Received: from [10.10.3.185] ([69.15.205.254]) (authenticated bits=0)
	by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k5DJ4A1S068353;
	Tue, 13 Jun 2006 13:04:16 -0600 (MDT)
	(envelope-from scottl@samsco.org)
Message-ID: <448F0C20.3090800@samsco.org>
Date: Tue, 13 Jun 2006 13:04:00 -0600
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060206
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: danial_thom@yahoo.com
References: <20060613184336.60229.qmail@web33314.mail.mud.yahoo.com>
In-Reply-To: <20060613184336.60229.qmail@web33314.mail.mud.yahoo.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=0.0 required=3.8 tests=none autolearn=failed 
	version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org
Cc: freebsd-performance@freebsd.org, Robert Watson <rwatson@freebsd.org>,
	David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 19:04:24 -0000

Danial Thom wrote:
> 
> --- Robert Watson <rwatson@FreeBSD.org> wrote:
> 
> 
>>On Tue, 13 Jun 2006, Danial Thom wrote:
>>
>>
>>>Maybe someone can explain this output. The
>>
>>top line shows 99.6%idle. Is it 
>>
>>>just showing CPU 0s stats on the top line?
>>
>>Two types of measurements are taken: sampled
>>ticks regarding whether the 
>>system as a while is in {user, nice, system,
>>intr, idle}, and then sampling 
>>for individual processes.  Right now, the
>>system measurements are kept in a 
>>simple array of tick counters called cp_time. 
>>John Baldwin and others have 
>>changes that make these tick counters per-CPU. 
>>The lines at the top of 
>>top(1)'s output are derived from those tick
>>counters.  Ticks are measured on 
>>each CPU, so those are a summary across all
>>CPUs.  To add cpustat support, we 
>>need to merge John's patch to make cp_time
>>per-CPU (ie., different counters 
>>for different CPUs) and teach the userland
>>tools to retrieve them.  When you 
>>run top you'll notice that it adjusts the
>>measurements each refresh.  In 
>>effect, what it's doing is sampling the change
>>in tick counts over the window, 
>>pulling down the new values and calculating the
>>percentages of ticks in each 
>>"bucket" in the last window.
> 
> 
> That doesn't explain why the Top line shows 99.6%
> idle, but the cpu idle threads are showing
> significant usage. 
> 
> I'm getting a constant 6000 Interrupts / Second
> on my em controller, yet top jumps all over the
> place; sitting at 99% idle for 10 seconds, then
> jumping to 50%, then somewhere in between. It
> seems completely unreliable. The load I'm
> applying is constant.
> 
> DT

Be aware that there was a significant change made to if_em
in 7-CURRENT in Jan 2006 to improve load performance.  It'll
probably get backported for 6.2, but you might consider
looking at it before you make up your mind on 6.1 performance.

Sscott


From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 19:48:48 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5F92116A479
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 19:48:48 +0000 (UTC)
	(envelope-from danial_thom@yahoo.com)
Received: from web33313.mail.mud.yahoo.com (web33313.mail.mud.yahoo.com
	[68.142.206.128])
	by mx1.FreeBSD.org (Postfix) with SMTP id 2E7B643D46
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 19:48:47 +0000 (GMT)
	(envelope-from danial_thom@yahoo.com)
Received: (qmail 32771 invoked by uid 60001); 13 Jun 2006 19:48:46 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=TYFfr2lgZcAFxUWRyfXro/KwbhB7GRW2x7qJkiW+cR7/jFUgmtzxw3A2aLEVUG+EHozo/zo9YdC8CbRakS8vqTrPZsGCznQp4w/CDks+iyL6H7a9NY9E4aLYYAZ0dtHOCYc2qEX9vWto/gkH0z+PQEsLC6syhYMcP9MIKWRWDZ8=
	; 
Message-ID: <20060613194846.32769.qmail@web33313.mail.mud.yahoo.com>
Received: from [65.34.182.15] by web33313.mail.mud.yahoo.com via HTTP;
	Tue, 13 Jun 2006 12:48:46 PDT
Date: Tue, 13 Jun 2006 12:48:46 -0700 (PDT)
From: Danial Thom <danial_thom@yahoo.com>
To: Robert Watson <rwatson@FreeBSD.org>
In-Reply-To: <20060613195113.T26068@fledge.watson.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-performance@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: danial_thom@yahoo.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 19:48:48 -0000


--- Robert Watson <rwatson@FreeBSD.org> wrote:

> 
> On Tue, 13 Jun 2006, Danial Thom wrote:
> 
> >> Two types of measurements are taken: sampled
> ticks regarding whether the 
> >> system as a while is in {user, nice, system,
> intr, idle}, and then sampling 
> >> for individual processes.  Right now, the
> system measurements are kept in a 
> >> simple array of tick counters called
> cp_time. John Baldwin and others have 
> >> changes that make these tick counters
> per-CPU. The lines at the top of 
> >> top(1)'s output are derived from those tick
> counters.  Ticks are measured 
> >> on each CPU, so those are a summary across
> all CPUs.  To add cpustat 
> >> support, we need to merge John's patch to
> make cp_time per-CPU (ie., 
> >> different counters for different CPUs) and
> teach the userland tools to 
> >> retrieve them.  When you run top you'll
> notice that it adjusts the 
> >> measurements each refresh.  In effect, what
> it's doing is sampling the 
> >> change in tick counts over the window,
> pulling down the new values and 
> >> calculating the percentages of ticks in each
> "bucket" in the last window.
> >
> > That doesn't explain why the Top line shows
> 99.6% idle, but the cpu idle 
> > threads are showing significant usage.
> >
> > I'm getting a constant 6000 Interrupts /
> Second on my em controller, yet top 
> > jumps all over the place; sitting at 99% idle
> for 10 seconds, then jumping 
> > to 50%, then somewhere in between. It seems
> completely unreliable. The load 
> > I'm applying is constant.
> 
> I can't speak to the details of the
> thread/process use sampling model.  Top 
> uses something called the "weighted cpu
> percentage" by default; you can switch 
> to "unweighted" using the -C argument.  The top
> documentation fails to 
> document the semantics of the percentages, but
> I suspect -C will give you more 
> of what you expect.  The weighted CPU
> measurement takes into account process 
> history, so it takes a while for sudden spike
> in CPU use to be fully 
> reflected, and you may see seemingly
> counter-intuitive results, such as the 
> appearance of greater than 100% CPU use.  Try
> out -C and see if you see 
> something that makes more sense?
> 

It seems to work just fine with 1 CPU. Its
equally useless with the -C option in SMP mode.

 Here's a snip from 'systat -vmstat 1'

Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof 
Flt        cow   10009 total
             24     18353    1  129 156k    1    
  17108 wire        6: fdc0
                                                 
   7908 act         14: ata
 0.4%Sys   0.4%Intr  0.0%User  0.0%Nice 99.2%Idl 
   7236 inact       20: em0
|    |    |    |    |    |    |    |    |    |   
        cache  6000 21: em1
                                                 
 473456 free      5 24: bge

6000 interrupts per second and .4% interrupt
usage. Clearly the tools don't work at all in SMP
mode. I don't see how you can do development
without measurement tools that work. 

DT

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 19:57:40 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3CA4816A477
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 19:57:40 +0000 (UTC)
	(envelope-from danial_thom@yahoo.com)
Received: from web33310.mail.mud.yahoo.com (web33310.mail.mud.yahoo.com
	[68.142.206.125])
	by mx1.FreeBSD.org (Postfix) with SMTP id 246CB43D5A
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 19:57:39 +0000 (GMT)
	(envelope-from danial_thom@yahoo.com)
Received: (qmail 64421 invoked by uid 60001); 13 Jun 2006 19:57:38 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=CnDIE4AXYbMBBGrRd1L5jDg/cDGBNb6ecAPY+h27cCc/fi/JUPm9t2gKo9HvA37+GzUUkS/QmGmERFaD+7kKZF23N3jmSZWadlXQnmpZ4KXzimVeHJl1suVcLCEpebQaKP1/gQ2d1dRpNOZKSBvNXGaPkrE5jTrFCStxmKrcszk=
	; 
Message-ID: <20060613195738.64419.qmail@web33310.mail.mud.yahoo.com>
Received: from [65.34.182.15] by web33310.mail.mud.yahoo.com via HTTP;
	Tue, 13 Jun 2006 12:57:38 PDT
Date: Tue, 13 Jun 2006 12:57:38 -0700 (PDT)
From: Danial Thom <danial_thom@yahoo.com>
To: Scott Long <scottl@samsco.org>
In-Reply-To: <448F0C20.3090800@samsco.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-performance@freebsd.org, Robert Watson <rwatson@freebsd.org>,
	David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: danial_thom@yahoo.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 19:57:40 -0000


--- Scott Long <scottl@samsco.org> wrote:

> Danial Thom wrote:
> > 
> > --- Robert Watson <rwatson@FreeBSD.org>
> wrote:
> > 
> > 
> >>On Tue, 13 Jun 2006, Danial Thom wrote:
> >>
> >>
> >>>Maybe someone can explain this output. The
> >>
> >>top line shows 99.6%idle. Is it 
> >>
> >>>just showing CPU 0s stats on the top line?
> >>
> >>Two types of measurements are taken: sampled
> >>ticks regarding whether the 
> >>system as a while is in {user, nice, system,
> >>intr, idle}, and then sampling 
> >>for individual processes.  Right now, the
> >>system measurements are kept in a 
> >>simple array of tick counters called cp_time.
> 
> >>John Baldwin and others have 
> >>changes that make these tick counters
> per-CPU. 
> >>The lines at the top of 
> >>top(1)'s output are derived from those tick
> >>counters.  Ticks are measured on 
> >>each CPU, so those are a summary across all
> >>CPUs.  To add cpustat support, we 
> >>need to merge John's patch to make cp_time
> >>per-CPU (ie., different counters 
> >>for different CPUs) and teach the userland
> >>tools to retrieve them.  When you 
> >>run top you'll notice that it adjusts the
> >>measurements each refresh.  In 
> >>effect, what it's doing is sampling the
> change
> >>in tick counts over the window, 
> >>pulling down the new values and calculating
> the
> >>percentages of ticks in each 
> >>"bucket" in the last window.
> > 
> > 
> > That doesn't explain why the Top line shows
> 99.6%
> > idle, but the cpu idle threads are showing
> > significant usage. 
> > 
> > I'm getting a constant 6000 Interrupts /
> Second
> > on my em controller, yet top jumps all over
> the
> > place; sitting at 99% idle for 10 seconds,
> then
> > jumping to 50%, then somewhere in between. It
> > seems completely unreliable. The load I'm
> > applying is constant.
> > 
> > DT
> 
> Be aware that there was a significant change
> made to if_em
> in 7-CURRENT in Jan 2006 to improve load
> performance.  It'll
> probably get backported for 6.2, but you might
> consider
> looking at it before you make up your mind on
> 6.1 performance.

I can bridge 1 million pps with the em driver in
4.9, and it looks pretty much intact in 6.1, so
I'm not too worried about the em driver being the
problem here. Plus the measurements look just
fine with 1 cpu, and they are completely
impossible in SMP mode. So its reasonable to
conclude that the measurement tools simply don't
work.

Since everyone agrees that the load measuring
tools aren't all that accurate, what criteria was
used to determine that the changes made in 7 have
the effect that you think they have had?

DT
DT

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 20:02:00 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4236516A41A;
	Tue, 13 Jun 2006 20:02:00 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9637A43D46;
	Tue, 13 Jun 2006 20:01:59 +0000 (GMT)
	(envelope-from scottl@samsco.org)
Received: from [10.10.3.185] ([69.15.205.254]) (authenticated bits=0)
	by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k5DK1kTT068725;
	Tue, 13 Jun 2006 14:01:53 -0600 (MDT)
	(envelope-from scottl@samsco.org)
Message-ID: <448F19A4.8040901@samsco.org>
Date: Tue, 13 Jun 2006 14:01:40 -0600
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060206
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: danial_thom@yahoo.com
References: <20060613195738.64419.qmail@web33310.mail.mud.yahoo.com>
In-Reply-To: <20060613195738.64419.qmail@web33310.mail.mud.yahoo.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=0.0 required=3.8 tests=none autolearn=failed 
	version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org
Cc: freebsd-performance@freebsd.org, Robert Watson <rwatson@freebsd.org>,
	David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 20:02:00 -0000

Danial Thom wrote:
> 
> --- Scott Long <scottl@samsco.org> wrote:
> 
> 
>>Danial Thom wrote:
>>
>>>--- Robert Watson <rwatson@FreeBSD.org>
>>
>>wrote:
>>
>>>
>>>>On Tue, 13 Jun 2006, Danial Thom wrote:
>>>>
>>>>
>>>>
>>>>>Maybe someone can explain this output. The
>>>>
>>>>top line shows 99.6%idle. Is it 
>>>>
>>>>
>>>>>just showing CPU 0s stats on the top line?
>>>>
>>>>Two types of measurements are taken: sampled
>>>>ticks regarding whether the 
>>>>system as a while is in {user, nice, system,
>>>>intr, idle}, and then sampling 
>>>>for individual processes.  Right now, the
>>>>system measurements are kept in a 
>>>>simple array of tick counters called cp_time.
>>
>>>>John Baldwin and others have 
>>>>changes that make these tick counters
>>
>>per-CPU. 
>>
>>>>The lines at the top of 
>>>>top(1)'s output are derived from those tick
>>>>counters.  Ticks are measured on 
>>>>each CPU, so those are a summary across all
>>>>CPUs.  To add cpustat support, we 
>>>>need to merge John's patch to make cp_time
>>>>per-CPU (ie., different counters 
>>>>for different CPUs) and teach the userland
>>>>tools to retrieve them.  When you 
>>>>run top you'll notice that it adjusts the
>>>>measurements each refresh.  In 
>>>>effect, what it's doing is sampling the
>>
>>change
>>
>>>>in tick counts over the window, 
>>>>pulling down the new values and calculating
>>
>>the
>>
>>>>percentages of ticks in each 
>>>>"bucket" in the last window.
>>>
>>>
>>>That doesn't explain why the Top line shows
>>
>>99.6%
>>
>>>idle, but the cpu idle threads are showing
>>>significant usage. 
>>>
>>>I'm getting a constant 6000 Interrupts /
>>
>>Second
>>
>>>on my em controller, yet top jumps all over
>>
>>the
>>
>>>place; sitting at 99% idle for 10 seconds,
>>
>>then
>>
>>>jumping to 50%, then somewhere in between. It
>>>seems completely unreliable. The load I'm
>>>applying is constant.
>>>
>>>DT
>>
>>Be aware that there was a significant change
>>made to if_em
>>in 7-CURRENT in Jan 2006 to improve load
>>performance.  It'll
>>probably get backported for 6.2, but you might
>>consider
>>looking at it before you make up your mind on
>>6.1 performance.
> 
> 
> I can bridge 1 million pps with the em driver in
> 4.9, and it looks pretty much intact in 6.1, so
> I'm not too worried about the em driver being the
> problem here. Plus the measurements look just
> fine with 1 cpu, and they are completely
> impossible in SMP mode. So its reasonable to
> conclude that the measurement tools simply don't
> work.
> 
> Since everyone agrees that the load measuring
> tools aren't all that accurate, what criteria was
> used to determine that the changes made in 7 have
> the effect that you think they have had?
> 
> DT
> DT
> 

It was tested with a Smartbits packet generator.  The
tx rate on the generator was increased in steps until the host
started dropping packets or became otherwise unresponsive.

Scott


From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 19:34:54 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 51A6316A41A
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 19:34:54 +0000 (UTC)
	(envelope-from kip.macy@gmail.com)
Received: from nz-out-0102.google.com (nz-out-0102.google.com [64.233.162.207])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 067C543D70
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 19:34:45 +0000 (GMT)
	(envelope-from kip.macy@gmail.com)
Received: by nz-out-0102.google.com with SMTP id 13so1588274nzn
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 12:34:45 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com;
	h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=GSwUzKjwGtXzh1ZjzGJPfvhhku7JwAHM/r/siISETSBROle55LtPml718AKMKjvU2mLekZ392gR8ajhTvsc3wq9QtuJRCVIdFdFX/IffvNK0vPm4X2PZeDUqdfP0mcSQ66drQPH4gYOyjQYx4yoeKYSoIj6GTHrFHtRdVfyIsFw=
Received: by 10.65.59.4 with SMTP id m4mr3359480qbk;
	Tue, 13 Jun 2006 12:34:44 -0700 (PDT)
Received: by 10.65.231.11 with HTTP; Tue, 13 Jun 2006 12:34:44 -0700 (PDT)
Message-ID: <b1fa29170606131234h35631b27md45969b83081d6c5@mail.gmail.com>
Date: Tue, 13 Jun 2006 12:34:44 -0700
From: "Kip Macy" <kip.macy@gmail.com>
To: "Robert Watson" <rwatson@freebsd.org>
In-Reply-To: <20060613105930.N34121@fledge.watson.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
	<20060612210723.K26068@fledge.watson.org>
	<20060612203248.GA72885@xor.obsecurity.org>
	<200606130715.52425.davidxu@freebsd.org>
	<20060613105930.N34121@fledge.watson.org>
X-Mailman-Approved-At: Tue, 13 Jun 2006 20:51:03 +0000
Cc: Scott Long <scottl@samsco.org>, kmacy@freebsd.org,
	David Xu <davidxu@freebsd.org>, Kris Kennaway <kris@obsecurity.org>,
	freebsd-performance@freebsd.org, danial_thom@yahoo.com
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: kmacy@fsmware.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 19:34:54 -0000

I have a number of issues with our current locking regime and our
propensity for disabling interrupts. I have in mind some ideas for
reducing interrupt disabling and eliminating scheduling contention
except in the case of one cpu stealing a thread from another cpu's
runqueue. I'll try to dash that off early this evening. This should
also greatly reduce the overhead of timer interrupts.

                 -Kip

On 6/13/06, Robert Watson <rwatson@freebsd.org> wrote:
>
> On Tue, 13 Jun 2006, David Xu wrote:
>
> > On Tuesday 13 June 2006 04:32, Kris Kennaway wrote:
> >> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote:
> >>> On Mon, 12 Jun 2006, Scott Long wrote:
> >>>> I run a number of high-load production systems that do a lot of network
> >>>> and filesystem activity, all with HZ set to 100.  It has also been shown
> >>>> in the past that certain things in the network area where not fixed to
> >>>> deal with a high HZ value, so it's possible that it's even more
> >>>> stable/reliable with an HZ value of 100.
> >>>>
> >>>> My personal opinion is that HZ should gop back down to 100 in 7-CURRENT
> >>>> immediately, and only be incremented back up when/if it's proven to be
> >>>> the right thing to do. And, I say that as someone who (errantly) pushed
> >>>> for the increase to 1000 several years ago.
> >>>
> >>> I think it's probably a good idea to do it sooner rather than later.  It
> >>> may slightly negatively impact some services that rely on frequent timers
> >>> to do things like retransmit timing and the like.  But I haven't done any
> >>> measurements.
> >>
> >> As you know, but for the benefit of the list, restoring HZ=100 is often an
> >> important performance tweak on SMP systems with many CPUs because of all
> >> the sched_lock activity from statclock/hardclock, which scales with HZ and
> >> NCPUS.
> >
> > sched_lock is another big bottleneck, since if you 32 CPUs, in theory you
> > have 32X context switch speed, but now it still has only 1X speed, and there
> > are code abusing sched_lock, the M:N bits dynamically inserts a thread into
> > thread list at context switch time, this is a bug, this causes thread list
> > in a proc has to be protected by scheduler lock, and delivering a signal to
> > process has to hold scheduler lock and find a thread, if the proc has many
> > threads, this will introduce long scheduler latency, a proc lock is not
> > enough to find a thread, this is a bug, there are other code abusing
> > scheduler lock which really can use its own lock.
>
> I've added Kip Macy to the CC, who is working with a patch for Sun4v that
> eliminates sched_lock.  Maybe he can comment some more on this thread?
>
> Robert N M Watson
> Computer Laboratory
> Universty of Cambridge
>

From owner-freebsd-performance@FreeBSD.ORG  Tue Jun 13 21:00:30 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C7AD816A4A6;
	Tue, 13 Jun 2006 21:00:30 +0000 (UTC)
	(envelope-from kris@obsecurity.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 47BFD43D64;
	Tue, 13 Jun 2006 21:00:24 +0000 (GMT)
	(envelope-from kris@obsecurity.org)
Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196])
	by elvis.mu.org (Postfix) with ESMTP id 30AEE1A4DD5;
	Tue, 13 Jun 2006 14:00:24 -0700 (PDT)
Received: by obsecurity.dyndns.org (Postfix, from userid 1000)
	id 5B26B51566; Tue, 13 Jun 2006 17:00:23 -0400 (EDT)
Date: Tue, 13 Jun 2006 17:00:23 -0400
From: Kris Kennaway <kris@obsecurity.org>
To: Danial Thom <danial_thom@yahoo.com>
Message-ID: <20060613210022.GB5267@xor.obsecurity.org>
References: <448F0C20.3090800@samsco.org>
	<20060613195738.64419.qmail@web33310.mail.mud.yahoo.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="6TrnltStXW4iwmi0"
Content-Disposition: inline
In-Reply-To: <20060613195738.64419.qmail@web33310.mail.mud.yahoo.com>
User-Agent: Mutt/1.4.2.1i
Cc: Scott Long <scottl@samsco.org>, Robert Watson <rwatson@freebsd.org>,
	freebsd-performance@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2006 21:00:30 -0000


--6TrnltStXW4iwmi0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Tue, Jun 13, 2006 at 12:57:38PM -0700, Danial Thom wrote:

> Since everyone agrees that the load measuring
> tools aren't all that accurate, what criteria was
> used to determine that the changes made in 7 have
> the effect that you think they have had?

Not by using top(1).  vmstat seems to do a better job of reporting CPU
usage, but still you want to measure what the system can actually do,
not how accurately it estimates its own performance.

Kris

--6TrnltStXW4iwmi0
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFEjydmWry0BWjoQKURArK6AJ90ePQZLwsLX8OCVZtSEK5NVw9gYgCg+z1z
LeLrJqpW5EZxLdm9/UlV17A=
=Uq9m
-----END PGP SIGNATURE-----

--6TrnltStXW4iwmi0--

From owner-freebsd-performance@FreeBSD.ORG  Wed Jun 14 03:15:47 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2BADC16A474
	for <freebsd-performance@freebsd.org>;
	Wed, 14 Jun 2006 03:15:47 +0000 (UTC)
	(envelope-from kip.macy@gmail.com)
Received: from nz-out-0102.google.com (nz-out-0102.google.com [64.233.162.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C4A8D43D5C
	for <freebsd-performance@freebsd.org>;
	Wed, 14 Jun 2006 03:15:42 +0000 (GMT)
	(envelope-from kip.macy@gmail.com)
Received: by nz-out-0102.google.com with SMTP id 13so35914nzn
	for <freebsd-performance@freebsd.org>;
	Tue, 13 Jun 2006 20:15:42 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com;
	h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=cFaYB9oOQOaatvqI0nPbvKvAMvfzZGydwVCVbSL6HzuUk8YkP9ytabE8whpbXI74Xc2K1CDGX+/D8I9YPovZpu6yTpGuHZ1Cl2taIsbSBgseX92okkH8ntJTKpbqWXE3r1tLgXrii9ytBwfgP87XERavlq0m4B7Aw2C9Jtoc9Og=
Received: by 10.65.239.8 with SMTP id q8mr126486qbr;
	Tue, 13 Jun 2006 20:15:42 -0700 (PDT)
Received: by 10.65.231.11 with HTTP; Tue, 13 Jun 2006 20:15:42 -0700 (PDT)
Message-ID: <b1fa29170606132015p654e2877s1ec1da6184ce672e@mail.gmail.com>
Date: Tue, 13 Jun 2006 20:15:42 -0700
From: "Kip Macy" <kip.macy@gmail.com>
To: "Robert Watson" <rwatson@freebsd.org>
In-Reply-To: <20060613105930.N34121@fledge.watson.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
	<20060612210723.K26068@fledge.watson.org>
	<20060612203248.GA72885@xor.obsecurity.org>
	<200606130715.52425.davidxu@freebsd.org>
	<20060613105930.N34121@fledge.watson.org>
Cc: Scott Long <scottl@samsco.org>, kmacy@freebsd.org, Paul Saab <ps@mu.org>,
	David Xu <davidxu@freebsd.org>, Kris Kennaway <kris@obsecurity.org>,
	freebsd-performance@freebsd.org, danial_thom@yahoo.com
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: kmacy@fsmware.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Jun 2006 03:15:47 -0000

I apologize if this e-mail seems a bit disjoint, I'm quite tired from
hauling stuff around today.

I'm not entirely familiar with the system as a whole - but to give a
brief rundown of what I do know:
Context switches, thread prioritization, process statistics keeping,
and access to a handful of other random variables are all serialized
by sched_lock. Process creation, process exit, process scheduling
(schedcpu() access to the allproc_list) are all serialized through the
allproc_lock.

I've discovered that schedcpu()'s serialization needs doesn't fit in
well with sched_lock removal in the presence of a global process list
and global runqueue (I'll skip the tedious details for now). In other
words, I have missing prerequisites. My current plan for this week,
once I get back from Tahoe, is in a separate branch to do the
following:
 - replace the global process list with a per-cpu process list hung
off of pcpu protected by a  non-interrupt disabling spinlock
pcpu_proclist_lock
 - replace the global run queue with a per-cpu runqueue hung off of
pcpu protected by non-interrupt blocking pcpu_runq_lock

Once I have this stable I will integrate it into my branch where I
have replaced sched_lock with per-thread locks and re-do the current
locking I have in choosethread() which I believe causes performance
and stability problems.

At some point it may be desirable to add support for rebalancing the
pcpu process lists to avoid schedcpu/ps/top having to hold the
pcpu_proclist_lock for too long.

Why do I say "non-interrupt blocking?". Currently we have roughly a
half dozen locking primitives. The two that I am familiar with are
blocking and spinning mutexes. The general policy is to use blocking
locks except where a lock is used in interrupts or the scheduler. It
seems to me that in the scheduler interrupts only actually need to be
blocked across cpu_switch. Spin locks obviously have to be used
because a thread cannot very well context switch while its in the
middle of context switching - however, provided td_critnest > 0, there
is no reason that interrupts need to be blocked. Currently sched_lock
is acquired in cpu_hardclock and statclock - so it does need to block
interrupts. There is no reason that these two functions couldn't be
run in ast(). In my tree I set td_flags atomically to avoid the need
to acquire locks when setting or clearing flags. All the timer
interrupt really needs to do for purposes statistics etc. is set a
flag in td_flags indicating to ast() that the current thread is
returning from a timer interrupt so that cpu_hardclock and statclock
are called.

I have more in mind, but I'd like to keep the discussion simple by
focusing on the next week or two.

                  -Kip

On 6/13/06, Robert Watson <rwatson@freebsd.org> wrote:
>
> On Tue, 13 Jun 2006, David Xu wrote:
>
> > On Tuesday 13 June 2006 04:32, Kris Kennaway wrote:
> >> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote:
> >>> On Mon, 12 Jun 2006, Scott Long wrote:
> >>>> I run a number of high-load production systems that do a lot of network
> >>>> and filesystem activity, all with HZ set to 100.  It has also been shown
> >>>> in the past that certain things in the network area where not fixed to
> >>>> deal with a high HZ value, so it's possible that it's even more
> >>>> stable/reliable with an HZ value of 100.
> >>>>
> >>>> My personal opinion is that HZ should gop back down to 100 in 7-CURRENT
> >>>> immediately, and only be incremented back up when/if it's proven to be
> >>>> the right thing to do. And, I say that as someone who (errantly) pushed
> >>>> for the increase to 1000 several years ago.
> >>>
> >>> I think it's probably a good idea to do it sooner rather than later.  It
> >>> may slightly negatively impact some services that rely on frequent timers
> >>> to do things like retransmit timing and the like.  But I haven't done any
> >>> measurements.
> >>
> >> As you know, but for the benefit of the list, restoring HZ=100 is often an
> >> important performance tweak on SMP systems with many CPUs because of all
> >> the sched_lock activity from statclock/hardclock, which scales with HZ and
> >> NCPUS.
> >
> > sched_lock is another big bottleneck, since if you 32 CPUs, in theory you
> > have 32X context switch speed, but now it still has only 1X speed, and there
> > are code abusing sched_lock, the M:N bits dynamically inserts a thread into
> > thread list at context switch time, this is a bug, this causes thread list
> > in a proc has to be protected by scheduler lock, and delivering a signal to
> > process has to hold scheduler lock and find a thread, if the proc has many
> > threads, this will introduce long scheduler latency, a proc lock is not
> > enough to find a thread, this is a bug, there are other code abusing
> > scheduler lock which really can use its own lock.
>
> I've added Kip Macy to the CC, who is working with a patch for Sun4v that
> eliminates sched_lock.  Maybe he can comment some more on this thread?
>
> Robert N M Watson
> Computer Laboratory
> Universty of Cambridge
>

From owner-freebsd-performance@FreeBSD.ORG  Wed Jun 14 06:25:47 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0FCB016A474;
	Wed, 14 Jun 2006 06:25:47 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3385F43D46;
	Wed, 14 Jun 2006 06:25:45 +0000 (GMT) (envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.2.163])
	by mailout1.pacific.net.au (Postfix) with ESMTP id 2EEB9527FD4;
	Wed, 14 Jun 2006 16:22:55 +1000 (EST)
Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailproxy2.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP
	id k5E6MkkG030715; Wed, 14 Jun 2006 16:22:48 +1000
Date: Wed, 14 Jun 2006 16:22:46 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@epsplex.bde.org
To: kmacy@fsmware.com
In-Reply-To: <b1fa29170606132015p654e2877s1ec1da6184ce672e@mail.gmail.com>
Message-ID: <20060614133024.E1753@epsplex.bde.org>
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
	<20060612210723.K26068@fledge.watson.org>
	<20060612203248.GA72885@xor.obsecurity.org>
	<200606130715.52425.davidxu@freebsd.org>
	<20060613105930.N34121@fledge.watson.org>
	<b1fa29170606132015p654e2877s1ec1da6184ce672e@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Scott Long <scottl@samsco.org>, kmacy@freebsd.org, Paul Saab <ps@mu.org>,
	Robert Watson <rwatson@freebsd.org>, David Xu <davidxu@freebsd.org>,
	Kris Kennaway <kris@obsecurity.org>,
	freebsd-performance@freebsd.org, danial_thom@yahoo.com
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Jun 2006 06:25:47 -0000

On Tue, 13 Jun 2006, Kip Macy wrote:

> ...
> Why do I say "non-interrupt blocking?". Currently we have roughly a
> half dozen locking primitives. The two that I am familiar with are
> blocking and spinning mutexes. The general policy is to use blocking
> locks except where a lock is used in interrupts or the scheduler. It
> seems to me that in the scheduler interrupts only actually need to be
> blocked across cpu_switch. Spin locks obviously have to be used
> because a thread cannot very well context switch while its in the
> middle of context switching - however, provided td_critnest > 0, there
> is no reason that interrupts need to be blocked. Currently sched_lock
> is acquired in cpu_hardclock and statclock - so it does need to block
> interrupts. There is no reason that these two functions couldn't be
> run in ast().

These functions are called from "fast" interrupt handlers, so they
cannot use sleep locks.  They also cannot be run in ast(), since ast()
is only run on return to user mode and uses sleep locks a lot.  Gathering
of some user-mode statistics could be deferred until return to user
mode, but this wouldn't work for kernel-mode statistics, which is never
for threads that never leave the kernel, and large changes would be
required for the user-mode statistics: algorithmic changes: various,
mainly to keep kernel-mode separate; locking: ast() uses sched_lock,
so without large changes you would just move the problem (there would
be up to hz + stathz extra calls to ast() per second); the statistics
fields are all locked by sched_lock, and although this would not be
needed for access in ast() some locking would still be needed for many
which are accessed from elsewhere).

What they (and all fast interrupt handlers or even "fast" interrupt
handlers) can do better is use spin locks != sched_lock (and for fast
interrupt handlers, != mtx_lock_spin(any)).  This is not easy to do
in general, and is especially difficult for clock interrupt handlers,
because all accesses to data accessed by a fast interrupt handler must
be locked by a common lock (especially outside of the handlers) and
clock interrupt handlers access a lot of data.  Currently, clock
interrupt handlers use sched_lock and depend on sched_lock being used
too much so that most of the data accessed by clock interrupt handlers
is locked automatically.  Even then, there are large gaps in the locking.
E.g., hardclock() starts by calling tc_ticktock() which mostly uses
very delicate time-domain locking but sometimes races with syscalls
that use sleep locking, most frequently by calling ntp_update_second().
Most of kern_ntptime.c is documented (in comments) as being required
to run at splclock() or higher, but it is actually all locked only by
Giant, so sched_lock'ing and other spinlocking for it is neither
necessary or sufficient, and calling it correctly from a "fast" interrupt
handler is impossible.

In my kernel, fast interrupt handlers (and associated non-handler code
that shares data) are actually fast (== low-latency &&
!(very-large-footprint || takes-very-long)).  This requires:
- mtx_lock_spin() to not mask interrupts, since masking interrupts gives
   !low-latency at least in the UP case.
- fast interrupt handlers to not use sched_lock, since sched_lock gives
   very-large-footprint.
- fast interrupt handlers to not use only mtx_lock_spin(), since that no
   longer masks them.  My implementation actually uses simple_locks plus
   explicit per-cpu interrupt disabling (as in RELENG_4).  This also avoids
   having to turn off features like WITNESS and KTR which don't honor the
   rules for fast interrupt handlers.
- fast interrupt handlers to not use normal scheduling (things like
   swi_sched()), since that uses sched_lock and is generally very
   inefficient.  My implementation uses a combination of timeouts
   and a hack to metamorphose into a SWI handler.  The latter is a
   very expensive operation and should be avoided.  swi_sched() encourages
   this inefficiency except in the SWI_DELAY case.  The SWI_DELAY case
   only takes 50-100 times as many instructions as corresponding
   scheduling in RELENG_4.  SWI_DELAY seems to be unused except in
   my drivers.  My implementation enforces non-use of normal scheduling
   and some other invalid data accesses (e.g., to curthread) unmapping
   PCPU data in fast interrupt handlers.
- clock interrupt handlers to not be fast interrupt handlers.  They
   have far too large a footprint to be fast interrupt handlers.  Locking
   them is hard enough when they are only "fast" interrupt handlers.
   I made them normal interrupt handlers and don't support "fast" interrupt
   handlers.

I get very few benefits from this.  Normal interrupt handlers for
clocks are inefficient.  They don't take very long, but switching to
them is inefficient.  I get lower interrupt latency, but this is
not very important now that CPUs are very fast compared with i/o
for all devices that I have.  I get the possibility of simpler
locking in clock interrupt handlers, but haven't simplified or fixed
their locking.  I get enforced smallness and complexity for fast
interrupt handlers since large ones would be too complicated and
normal scheduling and locking cannot be used.

Bruce

From owner-freebsd-performance@FreeBSD.ORG  Wed Jun 14 17:38:14 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 67F0016A47D
	for <freebsd-performance@freebsd.org>;
	Wed, 14 Jun 2006 17:38:14 +0000 (UTC)
	(envelope-from danial_thom@yahoo.com)
Received: from web33311.mail.mud.yahoo.com (web33311.mail.mud.yahoo.com
	[68.142.206.126])
	by mx1.FreeBSD.org (Postfix) with SMTP id 541A343D5C
	for <freebsd-performance@freebsd.org>;
	Wed, 14 Jun 2006 17:38:13 +0000 (GMT)
	(envelope-from danial_thom@yahoo.com)
Received: (qmail 60911 invoked by uid 60001); 14 Jun 2006 17:38:12 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=FDsm+Nkpzuw6co7Aule1Sy9EZ1gce/JkswZp7nXGaTzeJwvI5OmFt1MIHFPy0a0jqjDtq34SKUxxlhznm7p7FlOAX80k2zkrlzXXzVTPGY8R50pCu1DNdbv6tr68N2ImOfWNbAB3qY48/WgvbpZuBe2PKh5DZTGg6izpZHPjLm4=
	; 
Message-ID: <20060614173812.60909.qmail@web33311.mail.mud.yahoo.com>
Received: from [65.34.182.15] by web33311.mail.mud.yahoo.com via HTTP;
	Wed, 14 Jun 2006 10:38:12 PDT
Date: Wed, 14 Jun 2006 10:38:12 -0700 (PDT)
From: Danial Thom <danial_thom@yahoo.com>
To: Kris Kennaway <kris@obsecurity.org>
In-Reply-To: <20060613210022.GB5267@xor.obsecurity.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: Scott Long <scottl@samsco.org>, Robert Watson <rwatson@freebsd.org>,
	freebsd-performance@freebsd.org, David Xu <davidxu@freebsd.org>
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: danial_thom@yahoo.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Jun 2006 17:38:14 -0000


--- Kris Kennaway <kris@obsecurity.org> wrote:

> On Tue, Jun 13, 2006 at 12:57:38PM -0700,
> Danial Thom wrote:
> 
> > Since everyone agrees that the load measuring
> > tools aren't all that accurate, what criteria
> was
> > used to determine that the changes made in 7
> have
> > the effect that you think they have had?
> 
> Not by using top(1).  vmstat seems to do a
> better job of reporting CPU
> usage, but still you want to measure what the
> system can actually do,
> not how accurately it estimates its own
> performance.
> 
> Kris
> 

Regarding vmstat:

I'm getting the same (obviously wrong) results
from vmstat. Which is no usage. I believe I cut
and pasted a snippet which showed 6000
ints/second on em with 99.x% idle. It works fine
in UP mode, which implies that you aren't
accounting properly in SMP mode. Hopefully you
(folks) can come to terms with the fact that its
broken otherwise it will never be of any use.

Regarding testing:

My view is that you are making a big mistake if
you measure everything at the edge of
performance, which is why benchmarks lie and are
generally useless. As the bus becomes saturated,
and queues become unnaturally large, timings
change. You may be measuring how well the system
recovers from events that never happen when you
just try to "see how much you can do". For
example, as the pci bus becomes saturated I/Os
take exponentially longer, so you're not really
measuring your code. You end up measuring
properties which may be very different under
normal conditions. And if you try to optimize
your code for conditions which rarely if ever
occur, you may hose it for normal use (I'm a bit
frightened by the 7.0 em changes). 

"efficiency" is what's important. I want to know
how the machine works under normal loads, not
when its in constant recovery from overloads. I
want to run a realistic load on a machine when I
test new code, to see what effect it has on
system load. For that I need tools that work.

If your machine can push 500Mb/s at 99% load or
it can do 492Mb/s at 60% load, my view is that
the 492Mb/s system is the better system. In the
long run the more efficient systems are the ones
that perform better generally.

DT

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From owner-freebsd-performance@FreeBSD.ORG  Wed Jun 14 20:48:04 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A877416A482
	for <freebsd-performance@freebsd.org>;
	Wed, 14 Jun 2006 20:48:04 +0000 (UTC)
	(envelope-from kip.macy@gmail.com)
Received: from nz-out-0102.google.com (nz-out-0102.google.com [64.233.162.197])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 529F043D49
	for <freebsd-performance@freebsd.org>;
	Wed, 14 Jun 2006 20:48:02 +0000 (GMT)
	(envelope-from kip.macy@gmail.com)
Received: by nz-out-0102.google.com with SMTP id 9so414219nzo
	for <freebsd-performance@freebsd.org>;
	Wed, 14 Jun 2006 13:48:01 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com;
	h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=Wgrm4FfXw2t5USU2Mazk+SwkHdIXCVHqdz921SXp50RmiIroe1Lz77VwZH37PBLl+Me7QF6GwDYD5uoZoMMPA1fF3HfaLGjd0/fPTQZ1GqwlzpDRZomqsFPU1V9EeYccF3DUrFxUE7yrMTWuJOBm8vwCCg/NFHaOfD1NZOdLVoI=
Received: by 10.65.215.4 with SMTP id s4mr994291qbq;
	Wed, 14 Jun 2006 13:48:01 -0700 (PDT)
Received: by 10.65.231.11 with HTTP; Wed, 14 Jun 2006 13:48:01 -0700 (PDT)
Message-ID: <b1fa29170606141348j4ebb3140q7c4960758d5b9784@mail.gmail.com>
Date: Wed, 14 Jun 2006 13:48:01 -0700
From: "Kip Macy" <kip.macy@gmail.com>
To: "Bruce Evans" <bde@zeta.org.au>
In-Reply-To: <20060614133024.E1753@epsplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com>
	<20060612210723.K26068@fledge.watson.org>
	<20060612203248.GA72885@xor.obsecurity.org>
	<200606130715.52425.davidxu@freebsd.org>
	<20060613105930.N34121@fledge.watson.org>
	<b1fa29170606132015p654e2877s1ec1da6184ce672e@mail.gmail.com>
	<20060614133024.E1753@epsplex.bde.org>
Cc: Scott Long <scottl@samsco.org>, kmacy@freebsd.org, Paul Saab <ps@mu.org>,
	Robert Watson <rwatson@freebsd.org>, David Xu <davidxu@freebsd.org>,
	Kris Kennaway <kris@obsecurity.org>,
	freebsd-performance@freebsd.org, danial_thom@yahoo.com
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: kmacy@fsmware.com
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Jun 2006 20:48:04 -0000

Hi Bruce -
Thanks for the lengthy response. I should not have brought up
interrupt handling as a) its a tertiary concern for me at the moment
b) everyone has an opinion on it c) I could cut off several fingers
and still count on one hand the number of people who understand why
its bad that ithreads go through the scheduler in the default case
(having a pcpu_runq only helps affinity).

To make it easier for future respondents to stay on topic let me
explain my situation. I have ported FreeBSD to Sun's new UltraSPARC
architecture, sun4v. The current implementation, the T1, has 6-8 cores
with 4 threads per core. Unlike HTT on x86, these machines actually
have ample memory bandwith ~26GB/s so threading can actually be
useful. On my 32-cpu system benchmarks like supersmack max out at 9
threads - i.e. one can't get the system below 70% idle. Across the
board context switches on solaris/T1 take 2x as long as they do on
linux/T1. Because of lock contention FreeBSD in turn takes between 10%
- 100% longer than Solaris to context switch.

I would like to be able to tout FreeBSD as a strong competitor on the
sun4v architecture. At the moment I can't. Perhaps this isn't the
right forum for discussing my concerns - a freebsd-scalability list
might be in order.

                            -Kip

On 6/13/06, Bruce Evans <bde@zeta.org.au> wrote:
> On Tue, 13 Jun 2006, Kip Macy wrote:
>
> > ...
> > Why do I say "non-interrupt blocking?". Currently we have roughly a
> > half dozen locking primitives. The two that I am familiar with are
> > blocking and spinning mutexes. The general policy is to use blocking
> > locks except where a lock is used in interrupts or the scheduler. It
> > seems to me that in the scheduler interrupts only actually need to be
> > blocked across cpu_switch. Spin locks obviously have to be used
> > because a thread cannot very well context switch while its in the
> > middle of context switching - however, provided td_critnest > 0, there
> > is no reason that interrupts need to be blocked. Currently sched_lock
> > is acquired in cpu_hardclock and statclock - so it does need to block
> > interrupts. There is no reason that these two functions couldn't be
> > run in ast().
>
> These functions are called from "fast" interrupt handlers, so they
> cannot use sleep locks.  They also cannot be run in ast(), since ast()
> is only run on return to user mode and uses sleep locks a lot.  Gathering
> of some user-mode statistics could be deferred until return to user
> mode, but this wouldn't work for kernel-mode statistics, which is never
> for threads that never leave the kernel, and large changes would be
> required for the user-mode statistics: algorithmic changes: various,
> mainly to keep kernel-mode separate; locking: ast() uses sched_lock,
> so without large changes you would just move the problem (there would
> be up to hz + stathz extra calls to ast() per second); the statistics
> fields are all locked by sched_lock, and although this would not be
> needed for access in ast() some locking would still be needed for many
> which are accessed from elsewhere).
>
> What they (and all fast interrupt handlers or even "fast" interrupt
> handlers) can do better is use spin locks != sched_lock (and for fast
> interrupt handlers, != mtx_lock_spin(any)).  This is not easy to do
> in general, and is especially difficult for clock interrupt handlers,
> because all accesses to data accessed by a fast interrupt handler must
> be locked by a common lock (especially outside of the handlers) and
> clock interrupt handlers access a lot of data.  Currently, clock
> interrupt handlers use sched_lock and depend on sched_lock being used
> too much so that most of the data accessed by clock interrupt handlers
> is locked automatically.  Even then, there are large gaps in the locking.
> E.g., hardclock() starts by calling tc_ticktock() which mostly uses
> very delicate time-domain locking but sometimes races with syscalls
> that use sleep locking, most frequently by calling ntp_update_second().
> Most of kern_ntptime.c is documented (in comments) as being required
> to run at splclock() or higher, but it is actually all locked only by
> Giant, so sched_lock'ing and other spinlocking for it is neither
> necessary or sufficient, and calling it correctly from a "fast" interrupt
> handler is impossible.
>
> In my kernel, fast interrupt handlers (and associated non-handler code
> that shares data) are actually fast (== low-latency &&
> !(very-large-footprint || takes-very-long)).  This requires:
> - mtx_lock_spin() to not mask interrupts, since masking interrupts gives
>    !low-latency at least in the UP case.
> - fast interrupt handlers to not use sched_lock, since sched_lock gives
>    very-large-footprint.
> - fast interrupt handlers to not use only mtx_lock_spin(), since that no
>    longer masks them.  My implementation actually uses simple_locks plus
>    explicit per-cpu interrupt disabling (as in RELENG_4).  This also avoids
>    having to turn off features like WITNESS and KTR which don't honor the
>    rules for fast interrupt handlers.
> - fast interrupt handlers to not use normal scheduling (things like
>    swi_sched()), since that uses sched_lock and is generally very
>    inefficient.  My implementation uses a combination of timeouts
>    and a hack to metamorphose into a SWI handler.  The latter is a
>    very expensive operation and should be avoided.  swi_sched() encourages
>    this inefficiency except in the SWI_DELAY case.  The SWI_DELAY case
>    only takes 50-100 times as many instructions as corresponding
>    scheduling in RELENG_4.  SWI_DELAY seems to be unused except in
>    my drivers.  My implementation enforces non-use of normal scheduling
>    and some other invalid data accesses (e.g., to curthread) unmapping
>    PCPU data in fast interrupt handlers.
> - clock interrupt handlers to not be fast interrupt handlers.  They
>    have far too large a footprint to be fast interrupt handlers.  Locking
>    them is hard enough when they are only "fast" interrupt handlers.
>    I made them normal interrupt handlers and don't support "fast" interrupt
>    handlers.
>
> I get very few benefits from this.  Normal interrupt handlers for
> clocks are inefficient.  They don't take very long, but switching to
> them is inefficient.  I get lower interrupt latency, but this is
> not very important now that CPUs are very fast compared with i/o
> for all devices that I have.  I get the possibility of simpler
> locking in clock interrupt handlers, but haven't simplified or fixed
> their locking.  I get enforced smallness and complexity for fast
> interrupt handlers since large ones would be too complicated and
> normal scheduling and locking cannot be used.
>
> Bruce
>

From owner-freebsd-performance@FreeBSD.ORG  Wed Jun 14 21:17:27 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A2B6C16A482
	for <freebsd-performance@freebsd.org>;
	Wed, 14 Jun 2006 21:17:27 +0000 (UTC)
	(envelope-from arne_woerner@yahoo.com)
Received: from web30313.mail.mud.yahoo.com (web30313.mail.mud.yahoo.com
	[68.142.201.231])
	by mx1.FreeBSD.org (Postfix) with SMTP id 0231143D55
	for <freebsd-performance@freebsd.org>;
	Wed, 14 Jun 2006 21:17:26 +0000 (GMT)
	(envelope-from arne_woerner@yahoo.com)
Received: (qmail 58759 invoked by uid 60001); 14 Jun 2006 21:17:26 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=587Qpe7S+UFsOFqn+KGTlTzQmJJYdW3VkV0nbIgsgSAHipE8NES75MXyi8bc10X6QbxlwfXbVWh5N8QHbgpMOD3wRaBZN9y7gbgdHcVNhSw3vmHSVV4VQ5ij7pYHOnBseUMQG5mPYD0Q2e3tQNr7X+ihThdkFdhHSuufTW09yZg=
	; 
Message-ID: <20060614211726.58757.qmail@web30313.mail.mud.yahoo.com>
Received: from [213.54.67.226] by web30313.mail.mud.yahoo.com via HTTP;
	Wed, 14 Jun 2006 14:17:26 PDT
Date: Wed, 14 Jun 2006 14:17:26 -0700 (PDT)
From: "R. B. Riddick" <arne_woerner@yahoo.com>
To: kmacy@fsmware.com
In-Reply-To: <b1fa29170606141348j4ebb3140q7c4960758d5b9784@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-performance@freebsd.org, kmacy@freebsd.org
Subject: Re: Initial 6.1 questions
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Jun 2006 21:17:27 -0000

Hi boys and girls! *giggle*

I hope the following does not sound too much like the product of a bipolar
disorder of mine...

Some years ago (in or about in 1993) I heard, that there is a computer program,
that was able to produce some mathematical theorems out of axioms (even some
new, I think; but somehow the process became quite slow somewhen, so that we
still use human mathematicians...).

Is it possible to describe important sequences in a computer (that would be in
this case those sequences, which are performance relevant; like things that
involve locks, context switches, ...) mathematically correct? The answer should
be "yes", when we omit the philosophical and the pathological perspective...

If yes: Couldn't we find nicer/faster algorithms by some kind of a directed
search in the space of all possible computer programs? I am not sure, why I
dont know of such tool on my box (most likely there is none)... Is the space
just too huge? Somehow it feels astonishing, that all relevant computer
languages are like C today, although one of my professors already in 1992 was
quite excited about his all new computer language, that finds its own
algorithm, after the program described the problem, and that mostly existed
just in his fantasy...

Or r we already using some kind of "optimal kernel generator"?

42?

Bye
Arne


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From owner-freebsd-performance@FreeBSD.ORG  Sat Jun 17 12:50:49 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: performance@FreeBSD.org
Delivered-To: freebsd-performance@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 473E716A482
	for <performance@FreeBSD.org>; Sat, 17 Jun 2006 12:50:49 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id F0B6943D45
	for <performance@FreeBSD.org>; Sat, 17 Jun 2006 12:50:48 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 85CDC46BC3
	for <performance@FreeBSD.org>; Sat, 17 Jun 2006 08:50:48 -0400 (EDT)
Date: Sat, 17 Jun 2006 13:50:48 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: performance@FreeBSD.org
Message-ID: <20060617134402.O8526@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: 
Subject: HZ=100: not necessarily better?
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 17 Jun 2006 12:50:49 -0000


Scott asked me if I could take a look at the impact of changing HZ for some 
simple TCP performance tests.  I ran the first couple, and got some results 
that were surprising, so I thought I'd post about them and ask people who are 
interested if they could do some investigation also.  The short of it is that 
we had speculated that the increased CPU overhead of a higher HZ would be 
significant when it came to performance measurement, but in fact, I measure 
improved performance under high HTTP load with a higher HZ.  This was, of 
course, the reason we first looked at increasing HZ: improving timer 
granularity helps improve the performance of network protocols, such as TCP. 
Recent popular opinion has swung in the opposite direction, that higher HZ 
overhead outweighs this benefit, and I think we should be cautious and do a 
lot more investigating before assuming that is true.

Simple performance results below.  Two boxes on a gig-e network with if_em 
ethernet cards, one running a simple web server hosting 100 byte pages, and 
the other downloading them in parallel (netrate/http and netrate/httpd).  The 
performance difference is marginal, but at least in the SMP case, likely more 
than a measurement error or cache alignment fluke.  Results are 
transactions/second sustained over a 30 second test -- bigger is better; box 
is a dual xeon p4 with HTT; 'vendor.*' are the default 7-CURRENT HZ setting 
(1000) and 'hz.*' are the HZ=100 versions of the same kernels.  Regardless, 
there wasn't an obvious performance improvement by reducing HZ from 1000 to 
100.  Results may vary, use only as directed.

What we might want to explore is using a programmable timer to set up high 
precision timeouts, such as TCP timers, while keeping base statistics 
profiling and context switching at 100hz.  I think phk has previously proposed 
doing this with the HPET timer.

I'll run some more diverse tests today, such as raw bandwidth tests, pps on 
UDP, and so on, and see where things sit.  The reduced overhead should be 
measurable in cases where the test is CPU-bound and there's no clear benefit 
to more accurate timing, such as with TCP, but it would be good to confirm 
that.

Robert N M Watson
Computer Laboratory
University of Cambridge


peppercorn:~/tmp/netperf/hz> ministat *SMP
x hz.SMP
+ vendor.SMP
+--------------------------------------------------------------------------+
|xx x xx   x       xx  x     +              +   +  +   +    ++ +         ++|
|  |_______A________|                     |_____________A___M________|     |
+--------------------------------------------------------------------------+
     N           Min           Max        Median           Avg        Stddev
x  10         13715         13793         13750       13751.1     29.319883
+  10         13813         13970         13921       13906.5     47.551726
Difference at 95.0% confidence
         155.4 +/- 37.1159
         1.13009% +/- 0.269913%
         (Student's t, pooled s = 39.502)

peppercorn:~/tmp/netperf/hz> ministat *UP
x hz.UP
+ vendor.UP
+--------------------------------------------------------------------------+
|x           x xx   x      xx+   ++x+   ++  * +    +                      +|
|         |_________M_A_______|___|______M_A____________|                  |
+--------------------------------------------------------------------------+
     N           Min           Max        Median           Avg        Stddev
x  10         14067         14178         14116       14121.2     31.279386
+  10         14141         14257         14170       14175.9     33.248058
Difference at 95.0% confidence
         54.7 +/- 30.329
         0.387361% +/- 0.214776%
         (Student's t, pooled s = 32.2787)