From owner-freebsd-performance@FreeBSD.ORG Sun Jun 11 17:45:29 2006 Return-Path: X-Original-To: performance@FreeBSD.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 832C216A494; Sun, 11 Jun 2006 17:45:29 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3F63D43D49; Sun, 11 Jun 2006 17:45:29 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 20CC11A3C2D; Sun, 11 Jun 2006 10:45:29 -0700 (PDT) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 598F65157C; Sun, 11 Jun 2006 13:45:28 -0400 (EDT) Date: Sun, 11 Jun 2006 13:45:28 -0400 From: Kris Kennaway To: performance@FreeBSD.org Message-ID: <20060611174527.GA31119@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3V7upXqbjpZ4EhLz" Content-Disposition: inline User-Agent: Mutt/1.4.2.1i Cc: scrappy@FreeBSD.org Subject: Postgresql performance profiling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Jun 2006 17:45:29 -0000 --3V7upXqbjpZ4EhLz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I set up supersmack against postgresql 8.1 from ports (default config) on a 12 CPU E4500. It scales and performs somewhat better than mysql on this machine (which is heavily limited by contention between threads in a process), but there are a number of obvious performance bottlenecks: * The postgres processes seem to change their proctitle hundreds or thousands of times per second. This is currently done via a Giant-locked sysctl (kern.proc.args) so there is enormous contention for Giant. Even when this is fixed (thanks to a patch from csjp@), each of them requires a syscall and syscalls ain't free. This is not a clever thing to be doing from a performance standpoint. * pgsql uses select() and this seems to be a major choke point. I bet you'd see fairly impressive performance gains (especially on SMP) if it was modified to use kqueue instead of select. * You really want to avoid using IPv6 for transport (since it's Giant-locked). This was an issue at first since I was running against localhost, which maps to ::1 by default. We should reconsider the preference for IPv6 over IPv4 until IPv6 is Giant-free - there are probably many other situations where IPv6 is being secretly used "because it is there" and costing performance. * The sysv IPC code is still giant-locked. pgsql makes a lot of semop() calls which grab Giant, and it also msleep()s on the Giant lock in the semwait channel. * When semop() wants to wake up some sleeping processes because semaphores have been released, it does a wakeup() and wakes them all up. This means a thundering herd (I see up to 11 CPUs being woken here). Since we know exactly how many resources are available, it would be better to only wakeup_one() that number of times instead. Here are what seem to be the relevant heavily-contended mutex acquisitions (ratio = cnt_lock/count measures how many times this lock was contended by something else while held by this code line): count cnt_hold cnt_lock ratio name 106080 7420 19238 .181 kern/kern_synch.c:222 (lockbuilder mtxpool) <-- vfs 175435 13952 42365 .241 kern/kern_condvar.c:113 (lockbuilder mtxpool) <-- vfs 1075841 271138 419862 .390 kern/kern_synch.c:220 (Giant) <-- msleep with Giant 734613 248249 291969 .397 kern/sys_generic.c:1140 (sellck) <-- select 800332 379020 326324 .407 kern/sys_generic.c:944 (sellck) <-- select 401751 19731 175305 .436 kern/sys_generic.c:1092 (sellck) <-- select 400280 198880 176623 .441 kern/sys_generic.c:935 (sellck) <-- select 1361163 695637 624171 .458 sparc64/sparc64/trap.c:586 (Giant) <-- semop 400190 193112 238578 .596 kern/kern_condvar.c:208 (sellck) <-- select Kris --3V7upXqbjpZ4EhLz Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEjFa3Wry0BWjoQKURAnpmAKClAPG4O9VCh82gg30kdE4xVyw6gwCgw1fz Xr5QpUf1hCBIIXmcZuNdx8U= =Tu8r -----END PGP SIGNATURE----- --3V7upXqbjpZ4EhLz-- From owner-freebsd-performance@FreeBSD.ORG Sun Jun 11 18:01:10 2006 Return-Path: X-Original-To: performance@FreeBSD.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4800116A420; Sun, 11 Jun 2006 18:01:10 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id ED6E943D49; Sun, 11 Jun 2006 18:01:09 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 9277846B94; Sun, 11 Jun 2006 14:01:09 -0400 (EDT) Date: Sun, 11 Jun 2006 19:01:09 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kris Kennaway In-Reply-To: <20060611174527.GA31119@xor.obsecurity.org> Message-ID: <20060611185702.L26634@fledge.watson.org> References: <20060611174527.GA31119@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: scrappy@FreeBSD.org, performance@FreeBSD.org Subject: Re: Postgresql performance profiling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Jun 2006 18:01:10 -0000 On Sun, 11 Jun 2006, Kris Kennaway wrote: > * The postgres processes seem to change their proctitle hundreds or > thousands of times per second. This is currently done via a Giant-locked > sysctl (kern.proc.args) so there is enormous contention for Giant. Even > when this is fixed (thanks to a patch from csjp@), each of them requires a > syscall and syscalls ain't free. This is not a clever thing to be doing > from a performance standpoint. You might consider disabling setproctitle() entirely to see what impact that has? > * pgsql uses select() and this seems to be a major choke point. I bet you'd > see fairly impressive performance gains (especially on SMP) if it was > modified to use kqueue instead of select. > > * You really want to avoid using IPv6 for transport (since it's > Giant-locked). This was an issue at first since I was running against > localhost, which maps to ::1 by default. We should reconsider the > preference for IPv6 over IPv4 until IPv6 is Giant-free - there are probably > many other situations where IPv6 is being secretly used "because it is > there" and costing performance. FYI, for purely loopback traffic, it's probably safe to mark the IPv6 netisr as MPSAFE. Add NETISR_MPSAFE as a flag to the following line in ip6_input.c: ip6_input.c: netisr_register(NETISR_IPV6, ip6_input, &ip6intrq, 0); If you have non-loopback traffic, you may put yourself at greater risks of panic in the IPv6 multicast and neighbor discovery code, however, so this should be done with caution. It might be an interesting exercise though. > * The sysv IPC code is still giant-locked. pgsql makes a lot of semop() > calls which grab Giant, and it also msleep()s on the Giant lock in the > semwait channel. It is likely quite easy to put subsystem locks around System V IPC subsystems. I'm a bit surprised no one has done it already. sysvshm is a bit more tricky, but sysvsem and sysvmsg should be quite straight forward. > * When semop() wants to wake up some sleeping processes because semaphores > have been released, it does a wakeup() and wakes them all up. This means a > thundering herd (I see up to 11 CPUs being woken here). Since we know > exactly how many resources are available, it would be better to only > wakeup_one() that number of times instead. Should be easy to experiment with. Robert N M Watson Computer Laboratory Universty of Cambridge From owner-freebsd-performance@FreeBSD.ORG Sun Jun 11 20:31:58 2006 Return-Path: X-Original-To: performance@FreeBSD.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0CA3016A418; Sun, 11 Jun 2006 20:31:58 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8717E43D70; Sun, 11 Jun 2006 20:31:51 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 0486C1A3C2D; Sun, 11 Jun 2006 13:31:51 -0700 (PDT) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id CEA6D521A2; Sun, 11 Jun 2006 16:31:46 -0400 (EDT) Date: Sun, 11 Jun 2006 16:31:44 -0400 From: Kris Kennaway To: Kris Kennaway Message-ID: <20060611203144.GA34123@xor.obsecurity.org> References: <20060611174527.GA31119@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="PNTmBPCT7hxwcZjr" Content-Disposition: inline In-Reply-To: <20060611174527.GA31119@xor.obsecurity.org> User-Agent: Mutt/1.4.2.1i Cc: scrappy@FreeBSD.org, performance@FreeBSD.org Subject: Re: Postgresql performance profiling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Jun 2006 20:31:58 -0000 --PNTmBPCT7hxwcZjr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jun 11, 2006 at 01:45:28PM -0400, Kris Kennaway wrote: > I set up supersmack against postgresql 8.1 from ports (default config) > on a 12 CPU E4500. It scales and performs somewhat better than mysql > on this machine (which is heavily limited by contention between > threads in a process), but there are a number of obvious performance > bottlenecks: FYI, on a dual p4 + HTT, mysql significantly outperforms pgsql (by >55% peak performance, probably more if I was using libthr which I cannot on this machine for technical reasons) on select-key.smack when configured the same way (i.e. transport over IPv4 instead of local socket, which supersmack prefers for mysql). Contention is still a big issue here (only listing mutexes contended more than 10% of acquisitions): 0 0 142969 0 1996 14458 .101 kern/kern_= synch.c:218 (Giant) 0 0 199028 0 11649 27944 .140 kern/kern_= condvar.c:208 (sellck) 0 0 400103 0 111216 91336 .228 kern/kern_= sysctl.c:1317 (Giant) 0 0 303147 0 108735 131237 .432 i386/i386/= trap.c:1005 (Giant) I turned off process title setting and got an 8% performance boost. Contention is now a bit better but still serious: 0 0 22952 0 2067 2521 .109 vm/vm_faul= t.c:987 (vm object) 0 0 199153 0 12589 31512 .158 kern/kern_= condvar.c:208 (sellck) 0 0 361305 0 124766 130901 .362 i386/i386/= trap.c:1005 (Giant) i.e. semop() (the Giant-locked syscall) is contending with itself a lot, and select() is a secondary problem. Actually rwatson noticed that semop() is marked MPSAFE, so it's not clear (but nevertheless true) why Giant is acquired here. OK, pjd worked out that it's because SYSCALL_MODULE_HELPER() *never* sets the mpsafe flag, so all such syscalls registered that way (i.e. those which are part of subsystems that may be loaded from kld) are Giant-locked regardless of what syscalls.master says. I removed the SYSCALL_MODULE_HELPERs from sysv_sem.c but now postgresql hangs when trying to start; possibly the locking in sysv_sem.c is just broken since it was never in fact tested. Kris > * The postgres processes seem to change their proctitle hundreds or > thousands of times per second. This is currently done via a > Giant-locked sysctl (kern.proc.args) so there is enormous contention > for Giant. Even when this is fixed (thanks to a patch from csjp@), > each of them requires a syscall and syscalls ain't free. This is not > a clever thing to be doing from a performance standpoint. >=20 > * pgsql uses select() and this seems to be a major choke point. I bet > you'd see fairly impressive performance gains (especially on SMP) if > it was modified to use kqueue instead of select. >=20 > * You really want to avoid using IPv6 for transport (since it's > Giant-locked). This was an issue at first since I was running against > localhost, which maps to ::1 by default. We should reconsider the > preference for IPv6 over IPv4 until IPv6 is Giant-free - there are > probably many other situations where IPv6 is being secretly used > "because it is there" and costing performance. >=20 > * The sysv IPC code is still giant-locked. pgsql makes a lot of > semop() calls which grab Giant, and it also msleep()s on the Giant > lock in the semwait channel. >=20 > * When semop() wants to wake up some sleeping processes because > semaphores have been released, it does a wakeup() and wakes them all > up. This means a thundering herd (I see up to 11 CPUs being woken > here). Since we know exactly how many resources are available, it > would be better to only wakeup_one() that number of times instead. >=20 > Here are what seem to be the relevant heavily-contended mutex > acquisitions (ratio =3D cnt_lock/count measures how many times this lock > was contended by something else while held by this code line): >=20 > count cnt_hold cnt_lock ratio name > 106080 7420 19238 .181 kern/kern_synch.c:222 (lockbuilder mtxpo= ol) <-- vfs > 175435 13952 42365 .241 kern/kern_condvar.c:113 (lockbuilder mtx= pool) <-- vfs > 1075841 271138 419862 .390 kern/kern_synch.c:220 (Giant) <-- msleep= with Giant > 734613 248249 291969 .397 kern/sys_generic.c:1140 (sellck) <-- sel= ect > 800332 379020 326324 .407 kern/sys_generic.c:944 (sellck) <-- sele= ct > 401751 19731 175305 .436 kern/sys_generic.c:1092 (sellck) <-- sel= ect > 400280 198880 176623 .441 kern/sys_generic.c:935 (sellck) <-- sele= ct > 1361163 695637 624171 .458 sparc64/sparc64/trap.c:586 (Giant) <-- s= emop > 400190 193112 238578 .596 kern/kern_condvar.c:208 (sellck) <-- sel= ect >=20 > Kris --PNTmBPCT7hxwcZjr Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEjH2uWry0BWjoQKURAsonAKCarmABCAfQLdp+3DnJNvN7AuOF3ACfcxkt a8UTiVQhh/fDu/xeADalNeg= =DsOF -----END PGP SIGNATURE----- --PNTmBPCT7hxwcZjr-- From owner-freebsd-performance@FreeBSD.ORG Sun Jun 11 21:37:04 2006 Return-Path: X-Original-To: performance@freebsd.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 59C1116A418; Sun, 11 Jun 2006 21:37:04 +0000 (UTC) (envelope-from scrappy@hub.org) Received: from hub.org (hub.org [200.46.204.220]) by mx1.FreeBSD.org (Postfix) with ESMTP id E2BB743D46; Sun, 11 Jun 2006 21:37:03 +0000 (GMT) (envelope-from scrappy@hub.org) Received: from localhost (mx1.hub.org [200.46.208.251]) by hub.org (Postfix) with ESMTP id F3049290C25; Sun, 11 Jun 2006 18:36:59 -0300 (ADT) Received: from hub.org ([200.46.204.220]) by localhost (mx1.hub.org [200.46.208.251]) (amavisd-new, port 10024) with ESMTP id 58527-06; Sun, 11 Jun 2006 18:37:03 -0300 (ADT) Received: from ganymede.hub.org (blk-7-151-244.eastlink.ca [71.7.151.244]) by hub.org (Postfix) with ESMTP id 743C2290C20; Sun, 11 Jun 2006 18:36:59 -0300 (ADT) Received: by ganymede.hub.org (Postfix, from userid 1000) id DDFAB3EC22; Sun, 11 Jun 2006 18:37:05 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id D53553EA1B; Sun, 11 Jun 2006 18:37:05 -0300 (ADT) Date: Sun, 11 Jun 2006 18:37:05 -0300 (ADT) From: "Marc G. Fournier" To: Kris Kennaway In-Reply-To: <20060611174527.GA31119@xor.obsecurity.org> Message-ID: <20060611183544.D1114@ganymede.hub.org> References: <20060611174527.GA31119@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: scrappy@FreeBSD.org, performance@FreeBSD.org Subject: Re: Postgresql performance profiling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Jun 2006 21:37:04 -0000 On Sun, 11 Jun 2006, Kris Kennaway wrote: > * The postgres processes seem to change their proctitle hundreds or > thousands of times per second. This is currently done via a > Giant-locked sysctl (kern.proc.args) so there is enormous contention for > Giant. Even when this is fixed (thanks to a patch from csjp@), each of > them requires a syscall and syscalls ain't free. This is not a clever > thing to be doing from a performance standpoint. to disable for testing, after you run configure, manually edit src/include/pg_config.h and undef HAVE_SETPROCTITLE ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 00:30:36 2006 Return-Path: X-Original-To: performance@FreeBSD.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D25C416A41F; Mon, 12 Jun 2006 00:30:36 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8A42A43D45; Mon, 12 Jun 2006 00:30:36 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 453E11A3C24; Sun, 11 Jun 2006 17:30:36 -0700 (PDT) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id F3E15516F6; Sun, 11 Jun 2006 20:30:34 -0400 (EDT) Date: Sun, 11 Jun 2006 20:30:34 -0400 From: Kris Kennaway To: Kris Kennaway Message-ID: <20060612003034.GA37926@xor.obsecurity.org> References: <20060611174527.GA31119@xor.obsecurity.org> <20060611203144.GA34123@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0F1p//8PRICkK4MW" Content-Disposition: inline In-Reply-To: <20060611203144.GA34123@xor.obsecurity.org> User-Agent: Mutt/1.4.2.1i Cc: scrappy@FreeBSD.org, performance@FreeBSD.org Subject: Re: Postgresql performance profiling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 00:30:37 -0000 --0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jun 11, 2006 at 04:31:44PM -0400, Kris Kennaway wrote: > On Sun, Jun 11, 2006 at 01:45:28PM -0400, Kris Kennaway wrote: > > I set up supersmack against postgresql 8.1 from ports (default config) > > on a 12 CPU E4500. It scales and performs somewhat better than mysql > > on this machine (which is heavily limited by contention between > > threads in a process), but there are a number of obvious performance > > bottlenecks: >=20 > FYI, on a dual p4 + HTT, mysql significantly outperforms pgsql (by > >55% peak performance, probably more if I was using libthr which I > cannot on this machine for technical reasons) on select-key.smack when > configured the same way (i.e. transport over IPv4 instead of local > socket, which supersmack prefers for mysql). >=20 > Contention is still a big issue here (only listing mutexes contended > more than 10% of acquisitions): >=20 > 0 0 142969 0 1996 14458 .101 kern/ker= n_synch.c:218 (Giant) > 0 0 199028 0 11649 27944 .140 kern/ker= n_condvar.c:208 (sellck) > 0 0 400103 0 111216 91336 .228 kern/ker= n_sysctl.c:1317 (Giant) > 0 0 303147 0 108735 131237 .432 i386/i38= 6/trap.c:1005 (Giant) >=20 > I turned off process title setting and got an 8% performance boost. >=20 > Contention is now a bit better but still serious: >=20 > 0 0 22952 0 2067 2521 .109 vm/vm_fa= ult.c:987 (vm object) > 0 0 199153 0 12589 31512 .158 kern/ker= n_condvar.c:208 (sellck) > 0 0 361305 0 124766 130901 .362 i386/i38= 6/trap.c:1005 (Giant) >=20 > i.e. semop() (the Giant-locked syscall) is contending with itself a > lot, and select() is a secondary problem. >=20 > Actually rwatson noticed that semop() is marked MPSAFE, so it's not > clear (but nevertheless true) why Giant is acquired here. OK, pjd > worked out that it's because SYSCALL_MODULE_HELPER() *never* sets the > mpsafe flag, so all such syscalls registered that way (i.e. those > which are part of subsystems that may be loaded from kld) are > Giant-locked regardless of what syscalls.master says. >=20 > I removed the SYSCALL_MODULE_HELPERs from sysv_sem.c but now > postgresql hangs when trying to start; possibly the locking in > sysv_sem.c is just broken since it was never in fact tested. That was my mistake, the syscalls weren't getting registered. I made SYSCALL_MODULE_HELPER add the SYF_MPSAFE flag to work around it instead. The new mutex contention looks like: 0 0 199118 0 12134 30704 .154 kern/kern_= condvar.c:208 (sellck) 0 0 354890 0 100749 110295 .310 kern/sysv_= sem.c:1011 (semid) i.e. semaphores are still contending with themselves. It didn't make any performance difference on this workload, as expected since it was only contending with itself and still is (but in mixed workloads with other Giant activity it will help, of course). Kris --0F1p//8PRICkK4MW Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEjLWqWry0BWjoQKURArMtAJwMTP19UbohRLWGvMoKU4pFhdrCNQCeIYcO 7mTKb5txb7l6XmZzE+SRQ54= =6KxN -----END PGP SIGNATURE----- --0F1p//8PRICkK4MW-- From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 14:21:05 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BB9F416A503 for ; Mon, 12 Jun 2006 14:21:05 +0000 (UTC) (envelope-from danial_thom@yahoo.com) Received: from web33301.mail.mud.yahoo.com (web33301.mail.mud.yahoo.com [68.142.206.116]) by mx1.FreeBSD.org (Postfix) with SMTP id 4A19343D46 for ; Mon, 12 Jun 2006 14:21:05 +0000 (GMT) (envelope-from danial_thom@yahoo.com) Received: (qmail 190 invoked by uid 60001); 12 Jun 2006 14:21:04 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=u9ZgsQ0AAWN5Esw6KkYKduRPhsGET+KQhKc7RXws6PtiCQDhUrtLfyfU1+DnnwVTqAgvNjgomoe6kuymMOAemHD75i2jbTn6hEZMie3iChG4DUpWz8TJzZANaNhkTpUxS0VP9dhdSVoXAc+WOl8mYoAA46uyOz7Zt/Ty5zhzcPE= ; Message-ID: <20060612142104.188.qmail@web33301.mail.mud.yahoo.com> Received: from [65.34.182.15] by web33301.mail.mud.yahoo.com via HTTP; Mon, 12 Jun 2006 07:21:04 PDT Date: Mon, 12 Jun 2006 07:21:04 -0700 (PDT) From: Danial Thom To: freebsd-performance@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Subject: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: danial_thom@yahoo.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 14:21:05 -0000 I'm just setting up to evaluate 6.1 for a project, and before I tune I hoped to get some feedback on why some things are the way they are. first, why is the default for HZ now 1000? It seems that 900 extra clock interrupts aren't a performance enhancement. Is there a reason that ITR isn't a tunable in the em driver? It seems more usable generally to end users than the delays. Running a simple test with a traffic generator (firing udp packets to a blackhole), the system overhead with a single processor goes up from 10% to 15% when running a kernel with SMP enabled (and nothing else different). I have ITR set to 6000 interrupts per second. That seems like an awful lot of overhead. Is there some problem running an SMP-enabled kernel when only 1 processor is present, or is there really 50% extra overhead on an SMP scheduler? I'll have a dual core in a few days to test with. Lastly, is there a utility similar to cpustat in DragonflyBSD which shows the per-cpu usage stats? Thanks, DT __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 15:00:36 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1AC1B16A41B for ; Mon, 12 Jun 2006 15:00:36 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6BCA043D45 for ; Mon, 12 Jun 2006 15:00:35 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 3021846C2F; Mon, 12 Jun 2006 11:00:31 -0400 (EDT) Date: Mon, 12 Jun 2006 16:00:30 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Danial Thom In-Reply-To: <20060612142104.188.qmail@web33301.mail.mud.yahoo.com> Message-ID: <20060612155149.S24745@fledge.watson.org> References: <20060612142104.188.qmail@web33301.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 15:00:36 -0000 On Mon, 12 Jun 2006, Danial Thom wrote: > first, why is the default for HZ now 1000? It seems that 900 extra clock > interrupts aren't a performance enhancement. This is a design change that is in the process of being reconsidered. I expect that HZ will not be 1000 in 7.x, but can't tell you whether it will go back to 100, or some middle ground. There are a number of benefits to a higher HZ, not least is more accurate timing of some network timer events. Since I don't have my hands in the timer code, I can't speak to what the decision process here is, or when any change might happen, but I do expect to see some change. > Running a simple test with a traffic generator (firing udp packets to a > blackhole), the system overhead with a single processor goes up from 10% to > 15% when running a kernel with SMP enabled (and nothing else different). I > have ITR set to 6000 interrupts per second. That seems like an awful lot of > overhead. Is there some problem running an SMP-enabled kernel when only 1 > processor is present, or is there really 50% extra overhead on an SMP > scheduler? I'll have a dual core in a few days to test with. I don't know about the particular number, but there is a significant overhead to building in SMP support currently -- in particular, you pick up a lot of atomic instructions which increases the cost of locking operations even without contention. Some of that overhead reduces as the workload goes up, as there's coalescing of work under locked regions, reduced context switch rates as work is performed in batches, etc. There is currently extremely active work in the area of reducing the overhead of scheduling and context switching, being driven in part by the 32-processor support in Sun4v. I don't expect to see large portions of that merged to RELENG_6, but it will be in RELENG_7. Again, not my area of expertise, but there is work going on in this area. Finally, there is a known performance problem involving loopback network traffic and preemption, which results in additional context switches. You may want to try disabling preemption and see if/how that impacts your numbers. There has been seen quite a bit of discussion of this problem, and I expect to see a solution for it in the near future. This problem does not manifest for remote traffic, only loopback traffic. Robert N M Watson Computer Laboratory Universty of Cambridge From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 19:58:02 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A23CB16A49E for ; Mon, 12 Jun 2006 19:58:02 +0000 (UTC) (envelope-from danial_thom@yahoo.com) Received: from web33306.mail.mud.yahoo.com (web33306.mail.mud.yahoo.com [68.142.206.121]) by mx1.FreeBSD.org (Postfix) with SMTP id 0026B43D4C for ; Mon, 12 Jun 2006 19:58:01 +0000 (GMT) (envelope-from danial_thom@yahoo.com) Received: (qmail 72454 invoked by uid 60001); 12 Jun 2006 19:57:54 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=ys3GSJpJnj3CE+8YNrTvInDIozOoouRPWoBQpcQ1zTRHu1P79Ffw3bff+00nClo9q1ABVNnWM3I1hGbgC9VxJFl+OJ/tT8W54OKjzEbnhaT3SlMSSLokLqtzGjXjf07dn/LYL4m08cPJTqnHFvfGVMDS/gxmxcLihx0HqCFAsjM= ; Message-ID: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> Received: from [65.34.182.15] by web33306.mail.mud.yahoo.com via HTTP; Mon, 12 Jun 2006 12:57:54 PDT Date: Mon, 12 Jun 2006 12:57:54 -0700 (PDT) From: Danial Thom To: Robert Watson , freebsd-performance@freebsd.org In-Reply-To: <20060612155149.S24745@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: danial_thom@yahoo.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 19:58:02 -0000 --- Robert Watson wrote: > On Mon, 12 Jun 2006, Danial Thom wrote: > > > first, why is the default for HZ now 1000? It > seems that 900 extra clock > > interrupts aren't a performance enhancement. > > This is a design change that is in the process > of being reconsidered. I > expect that HZ will not be 1000 in 7.x, but > can't tell you whether it will go > back to 100, or some middle ground. There are > a number of benefits to a > higher HZ, not least is more accurate timing of > some network timer events. > Since I don't have my hands in the timer code, > I can't speak to what the > decision process here is, or when any change > might happen, but I do expect to > see some change. Will anything break if I tweek this downward? > > > Running a simple test with a traffic > generator (firing udp packets to a > > blackhole), the system overhead with a single > processor goes up from 10% to > > 15% when running a kernel with SMP enabled > (and nothing else different). I > > have ITR set to 6000 interrupts per second. > That seems like an awful lot of > > overhead. Is there some problem running an > SMP-enabled kernel when only 1 > > processor is present, or is there really 50% > extra overhead on an SMP > > scheduler? I'll have a dual core in a few > days to test with. > > I don't know about the particular number, but > there is a significant overhead > to building in SMP support currently -- in > particular, you pick up a lot of > atomic instructions which increases the cost of > locking operations even > without contention. Some of that overhead > reduces as the workload goes up, as > there's coalescing of work under locked > regions, reduced context switch rates > as work is performed in batches, etc. There is > currently extremely active > work in the area of reducing the overhead of > scheduling and context switching, > being driven in part by the 32-processor > support in Sun4v. I don't expect to > see large portions of that merged to RELENG_6, > but it will be in RELENG_7. > Again, not my area of expertise, but there is > work going on in this area. > > Finally, there is a known performance problem > involving loopback network > traffic and preemption, which results in > additional context switches. You may > want to try disabling preemption and see if/how > that impacts your numbers. > There has been seen quite a bit of discussion > of this problem, and I expect to > see a solution for it in the near future. This > problem does not manifest for > remote traffic, only loopback traffic. I'm sending this traffic from an external device, receiving on an em controller with blackhole set to 1. So I assume this loopback issue doesn't apply to this test? DT __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 20:01:46 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0C8DF16A41A; Mon, 12 Jun 2006 20:01:46 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2879043D81; Mon, 12 Jun 2006 20:01:45 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [10.10.3.185] ([69.15.205.254]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k5CK1YMF060726; Mon, 12 Jun 2006 14:01:41 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <448DC818.9070100@samsco.org> Date: Mon, 12 Jun 2006 14:01:28 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060206 X-Accept-Language: en-us, en MIME-Version: 1.0 To: danial_thom@yahoo.com References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> In-Reply-To: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=3.8 tests=none autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: freebsd-performance@freebsd.org, Robert Watson Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 20:01:46 -0000 Danial Thom wrote: > > --- Robert Watson wrote: > > >>On Mon, 12 Jun 2006, Danial Thom wrote: >> >> >>>first, why is the default for HZ now 1000? It >> >>seems that 900 extra clock >> >>>interrupts aren't a performance enhancement. >> >>This is a design change that is in the process >>of being reconsidered. I >>expect that HZ will not be 1000 in 7.x, but >>can't tell you whether it will go >>back to 100, or some middle ground. There are >>a number of benefits to a >>higher HZ, not least is more accurate timing of >>some network timer events. >>Since I don't have my hands in the timer code, >>I can't speak to what the >>decision process here is, or when any change >>might happen, but I do expect to >>see some change. > > > Will anything break if I tweek this downward? > I run a number of high-load production systems that do a lot of network and filesystem activity, all with HZ set to 100. It has also been shown in the past that certain things in the network area where not fixed to deal with a high HZ value, so it's possible that it's even more stable/reliable with an HZ value of 100. My personal opinion is that HZ should gop back down to 100 in 7-CURRENT immediately, and only be incremented back up when/if it's proven to be the right thing to do. And, I say that as someone who (errantly) pushed for the increase to 1000 several years ago. Scott From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 20:02:52 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0C9C116A418 for ; Mon, 12 Jun 2006 20:02:52 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id B3A4443D46 for ; Mon, 12 Jun 2006 20:02:51 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 36CE346C72; Mon, 12 Jun 2006 16:02:51 -0400 (EDT) Date: Mon, 12 Jun 2006 21:02:51 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Danial Thom In-Reply-To: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> Message-ID: <20060612210054.S26068@fledge.watson.org> References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 20:02:52 -0000 On Mon, 12 Jun 2006, Danial Thom wrote: >> This is a design change that is in the process of being reconsidered. I >> expect that HZ will not be 1000 in 7.x, but can't tell you whether it will >> go back to 100, or some middle ground. There are a number of benefits to a >> higher HZ, not least is more accurate timing of some network timer events. >> Since I don't have my hands in the timer code, I can't speak to what the >> decision process here is, or when any change might happen, but I do expect >> to see some change. > > Will anything break if I tweek this downward? No, shouldn't do. I wouldn't go below 100 though, as things like process statistics, involuntary context switches, etc, are all affected. >> Finally, there is a known performance problem involving loopback network >> traffic and preemption, which results in additional context switches. You >> may want to try disabling preemption and see if/how that impacts your >> numbers. There has been seen quite a bit of discussion of this problem, and >> I expect to see a solution for it in the near future. This problem does >> not manifest for remote traffic, only loopback traffic. > > I'm sending this traffic from an external device, receiving on an em > controller with blackhole set to 1. So I assume this loopback issue doesn't > apply to this test? The above comments only refer to traffic being sent over if_loop interfaces or certain other deferred work scenarios. Basically, defering of work to the netisr from a user thread rather than an interrupt thread results in a premature context switch. Robert N M Watson Computer Laboratory Universty of Cambridge From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 20:08:14 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2D5A616A479 for ; Mon, 12 Jun 2006 20:08:14 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5C4E743D66 for ; Mon, 12 Jun 2006 20:08:13 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 8FA9946C58; Mon, 12 Jun 2006 16:08:12 -0400 (EDT) Date: Mon, 12 Jun 2006 21:08:12 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Scott Long In-Reply-To: <448DC818.9070100@samsco.org> Message-ID: <20060612210723.K26068@fledge.watson.org> References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <448DC818.9070100@samsco.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org, danial_thom@yahoo.com Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 20:08:14 -0000 On Mon, 12 Jun 2006, Scott Long wrote: > I run a number of high-load production systems that do a lot of network and > filesystem activity, all with HZ set to 100. It has also been shown in the > past that certain things in the network area where not fixed to deal with a > high HZ value, so it's possible that it's even more stable/reliable with an > HZ value of 100. > > My personal opinion is that HZ should gop back down to 100 in 7-CURRENT > immediately, and only be incremented back up when/if it's proven to be the > right thing to do. And, I say that as someone who (errantly) pushed for the > increase to 1000 several years ago. I think it's probably a good idea to do it sooner rather than later. It may slightly negatively impact some services that rely on frequent timers to do things like retransmit timing and the like. But I haven't done any measurements. Robert N M Watson Computer Laboratory Universty of Cambridge From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 20:32:50 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0722116A418; Mon, 12 Jun 2006 20:32:50 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id B518B43D46; Mon, 12 Jun 2006 20:32:49 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 99FFA1A4DA8; Mon, 12 Jun 2006 13:32:49 -0700 (PDT) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 0321A5153E; Mon, 12 Jun 2006 16:32:48 -0400 (EDT) Date: Mon, 12 Jun 2006 16:32:48 -0400 From: Kris Kennaway To: Robert Watson Message-ID: <20060612203248.GA72885@xor.obsecurity.org> References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <448DC818.9070100@samsco.org> <20060612210723.K26068@fledge.watson.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="mYCpIKhGyMATD0i+" Content-Disposition: inline In-Reply-To: <20060612210723.K26068@fledge.watson.org> User-Agent: Mutt/1.4.2.1i Cc: Scott Long , danial_thom@yahoo.com, freebsd-performance@freebsd.org Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 20:32:50 -0000 --mYCpIKhGyMATD0i+ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote: > On Mon, 12 Jun 2006, Scott Long wrote: >=20 > >I run a number of high-load production systems that do a lot of network= =20 > >and filesystem activity, all with HZ set to 100. It has also been shown= =20 > >in the past that certain things in the network area where not fixed to= =20 > >deal with a high HZ value, so it's possible that it's even more=20 > >stable/reliable with an HZ value of 100. > > > >My personal opinion is that HZ should gop back down to 100 in 7-CURRENT= =20 > >immediately, and only be incremented back up when/if it's proven to be t= he=20 > >right thing to do. And, I say that as someone who (errantly) pushed for= =20 > >the increase to 1000 several years ago. >=20 > I think it's probably a good idea to do it sooner rather than later. It= =20 > may slightly negatively impact some services that rely on frequent timers= =20 > to do things like retransmit timing and the like. But I haven't done any= =20 > measurements. As you know, but for the benefit of the list, restoring HZ=3D100 is often an important performance tweak on SMP systems with many CPUs because of all the sched_lock activity from statclock/hardclock, which scales with HZ and NCPUS. Kris --mYCpIKhGyMATD0i+ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEjc9wWry0BWjoQKURAoX7AKD3jrbSgbmpMEQibSGwucYvLxt9aACg3Y/i 5SbAlN+kIKUkkGdkZ3genJs= =+GDa -----END PGP SIGNATURE----- --mYCpIKhGyMATD0i+-- From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 23:16:02 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from localhost.my.domain (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id B5F5516A418; Mon, 12 Jun 2006 23:16:01 +0000 (UTC) (envelope-from davidxu@freebsd.org) From: David Xu To: freebsd-performance@freebsd.org Date: Tue, 13 Jun 2006 07:15:52 +0800 User-Agent: KMail/1.8.2 References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <20060612210723.K26068@fledge.watson.org> <20060612203248.GA72885@xor.obsecurity.org> In-Reply-To: <20060612203248.GA72885@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200606130715.52425.davidxu@freebsd.org> Cc: danial_thom@yahoo.com, Scott Long , Robert Watson , Kris Kennaway Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 23:16:02 -0000 On Tuesday 13 June 2006 04:32, Kris Kennaway wrote: > On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote: > > On Mon, 12 Jun 2006, Scott Long wrote: > > >I run a number of high-load production systems that do a lot of network > > >and filesystem activity, all with HZ set to 100. It has also been shown > > >in the past that certain things in the network area where not fixed to > > >deal with a high HZ value, so it's possible that it's even more > > >stable/reliable with an HZ value of 100. > > > > > >My personal opinion is that HZ should gop back down to 100 in 7-CURRENT > > >immediately, and only be incremented back up when/if it's proven to be > > > the right thing to do. And, I say that as someone who (errantly) pushed > > > for the increase to 1000 several years ago. > > > > I think it's probably a good idea to do it sooner rather than later. It > > may slightly negatively impact some services that rely on frequent timers > > to do things like retransmit timing and the like. But I haven't done any > > measurements. > > As you know, but for the benefit of the list, restoring HZ=100 is > often an important performance tweak on SMP systems with many CPUs > because of all the sched_lock activity from statclock/hardclock, which > scales with HZ and NCPUS. > > Kris sched_lock is another big bottleneck, since if you 32 CPUs, in theory you have 32X context switch speed, but now it still has only 1X speed, and there are code abusing sched_lock, the M:N bits dynamically inserts a thread into thread list at context switch time, this is a bug, this causes thread list in a proc has to be protected by scheduler lock, and delivering a signal to process has to hold scheduler lock and find a thread, if the proc has many threads, this will introduce long scheduler latency, a proc lock is not enough to find a thread, this is a bug, there are other code abusing scheduler lock which really can use its own lock. David Xu From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 23:19:59 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9BA5216A41F; Mon, 12 Jun 2006 23:19:59 +0000 (UTC) (envelope-from arr@watson.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 08F3D43D45; Mon, 12 Jun 2006 23:19:54 +0000 (GMT) (envelope-from arr@watson.org) Received: from fledge.watson.org (localhost.watson.org [127.0.0.1]) by fledge.watson.org (8.13.4/8.13.4) with ESMTP id k5CNJqUX043240; Mon, 12 Jun 2006 19:19:52 -0400 (EDT) (envelope-from arr@watson.org) Received: from localhost (arr@localhost) by fledge.watson.org (8.13.4/8.13.4/Submit) with ESMTP id k5CNJqIb043237; Mon, 12 Jun 2006 19:19:52 -0400 (EDT) (envelope-from arr@watson.org) X-Authentication-Warning: fledge.watson.org: arr owned process doing -bs Date: Mon, 12 Jun 2006 19:19:52 -0400 (EDT) From: "Andrew R. Reiter" To: David Xu In-Reply-To: <200606130715.52425.davidxu@freebsd.org> Message-ID: <20060612191828.A38957@fledge.watson.org> References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <20060612210723.K26068@fledge.watson.org> <20060612203248.GA72885@xor.obsecurity.org> <200606130715.52425.davidxu@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: Robert Watson , freebsd-performance@freebsd.org, danial_thom@yahoo.com, Scott Long , Kris Kennaway Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 23:19:59 -0000 On Tue, 13 Jun 2006, David Xu wrote: :On Tuesday 13 June 2006 04:32, Kris Kennaway wrote: :> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote: :> > On Mon, 12 Jun 2006, Scott Long wrote: :> > >I run a number of high-load production systems that do a lot of network :> > >and filesystem activity, all with HZ set to 100. It has also been shown :> > >in the past that certain things in the network area where not fixed to :> > >deal with a high HZ value, so it's possible that it's even more :> > >stable/reliable with an HZ value of 100. :> > > :> > >My personal opinion is that HZ should gop back down to 100 in 7-CURRENT :> > >immediately, and only be incremented back up when/if it's proven to be :> > > the right thing to do. And, I say that as someone who (errantly) pushed :> > > for the increase to 1000 several years ago. :> > :> > I think it's probably a good idea to do it sooner rather than later. It :> > may slightly negatively impact some services that rely on frequent timers :> > to do things like retransmit timing and the like. But I haven't done any :> > measurements. :> :> As you know, but for the benefit of the list, restoring HZ=100 is :> often an important performance tweak on SMP systems with many CPUs :> because of all the sched_lock activity from statclock/hardclock, which :> scales with HZ and NCPUS. :> :> Kris : :sched_lock is another big bottleneck, since if you 32 CPUs, in theory :you have 32X context switch speed, but now it still has only 1X speed, :and there are code abusing sched_lock, the M:N bits dynamically inserts :a thread into thread list at context switch time, this is a bug, this :causes thread list in a proc has to be protected by scheduler lock, :and delivering a signal to process has to hold scheduler lock and :find a thread, if the proc has many threads, this will introduce :long scheduler latency, a proc lock is not enough to find a thread, :this is a bug, there are other code abusing scheduler lock which :really can use its own lock. : :David Xu Given that it seems that various scenarios for locking bottlenecks can occur on various systems with different numbers of CPUs. Has there been any research done on providing "best fit" profiles for varied N cpu systems? Cheers, Andrew -- arr@watson.org From owner-freebsd-performance@FreeBSD.ORG Mon Jun 12 23:21:09 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9CA2016A41B; Mon, 12 Jun 2006 23:21:09 +0000 (UTC) (envelope-from arr@watson.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 31B8543D49; Mon, 12 Jun 2006 23:21:09 +0000 (GMT) (envelope-from arr@watson.org) Received: from fledge.watson.org (localhost.watson.org [127.0.0.1]) by fledge.watson.org (8.13.4/8.13.4) with ESMTP id k5CNL8GH043290; Mon, 12 Jun 2006 19:21:08 -0400 (EDT) (envelope-from arr@watson.org) Received: from localhost (arr@localhost) by fledge.watson.org (8.13.4/8.13.4/Submit) with ESMTP id k5CNL8Jh043287; Mon, 12 Jun 2006 19:21:08 -0400 (EDT) (envelope-from arr@watson.org) X-Authentication-Warning: fledge.watson.org: arr owned process doing -bs Date: Mon, 12 Jun 2006 19:21:08 -0400 (EDT) From: "Andrew R. Reiter" To: David Xu In-Reply-To: <20060612191828.A38957@fledge.watson.org> Message-ID: <20060612192015.G38957@fledge.watson.org> References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <20060612210723.K26068@fledge.watson.org> <20060612203248.GA72885@xor.obsecurity.org> <200606130715.52425.davidxu@freebsd.org> <20060612191828.A38957@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: danial_thom@yahoo.com, freebsd-performance@freebsd.org, Robert Watson , Scott Long , Kris Kennaway Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2006 23:21:09 -0000 Sorry to reply to myself ... On Mon, 12 Jun 2006, Andrew R. Reiter wrote: :On Tue, 13 Jun 2006, David Xu wrote: : ::On Tuesday 13 June 2006 04:32, Kris Kennaway wrote: ::> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote: ::> > On Mon, 12 Jun 2006, Scott Long wrote: ::> > >I run a number of high-load production systems that do a lot of network ::> > >and filesystem activity, all with HZ set to 100. It has also been shown ::> > >in the past that certain things in the network area where not fixed to ::> > >deal with a high HZ value, so it's possible that it's even more ::> > >stable/reliable with an HZ value of 100. ::> > > ::> > >My personal opinion is that HZ should gop back down to 100 in 7-CURRENT ::> > >immediately, and only be incremented back up when/if it's proven to be ::> > > the right thing to do. And, I say that as someone who (errantly) pushed ::> > > for the increase to 1000 several years ago. ::> > ::> > I think it's probably a good idea to do it sooner rather than later. It ::> > may slightly negatively impact some services that rely on frequent timers ::> > to do things like retransmit timing and the like. But I haven't done any ::> > measurements. ::> ::> As you know, but for the benefit of the list, restoring HZ=100 is ::> often an important performance tweak on SMP systems with many CPUs ::> because of all the sched_lock activity from statclock/hardclock, which ::> scales with HZ and NCPUS. ::> ::> Kris :: ::sched_lock is another big bottleneck, since if you 32 CPUs, in theory ::you have 32X context switch speed, but now it still has only 1X speed, ::and there are code abusing sched_lock, the M:N bits dynamically inserts ::a thread into thread list at context switch time, this is a bug, this ::causes thread list in a proc has to be protected by scheduler lock, ::and delivering a signal to process has to hold scheduler lock and ::find a thread, if the proc has many threads, this will introduce ::long scheduler latency, a proc lock is not enough to find a thread, ::this is a bug, there are other code abusing scheduler lock which ::really can use its own lock. :: ::David Xu : :Given that it seems that various scenarios for locking bottlenecks can :occur on various systems with different numbers of CPUs. Has there been :any research done on providing "best fit" profiles for varied N cpu :systems? Meaning at compile time certain profiles are taken for a given system to provide a good effort at providing a "best fit" for locking with their system. : :Cheers, :Andrew : :-- :arr@watson.org :_______________________________________________ :freebsd-performance@freebsd.org mailing list :http://lists.freebsd.org/mailman/listinfo/freebsd-performance :To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" : : -- arr@watson.org From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 10:01:11 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6AC2A16A41B; Tue, 13 Jun 2006 10:01:11 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 03E8F43D46; Tue, 13 Jun 2006 10:01:10 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 993A046C43; Tue, 13 Jun 2006 06:01:10 -0400 (EDT) Date: Tue, 13 Jun 2006 11:01:10 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: David Xu In-Reply-To: <200606130715.52425.davidxu@freebsd.org> Message-ID: <20060613105930.N34121@fledge.watson.org> References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <20060612210723.K26068@fledge.watson.org> <20060612203248.GA72885@xor.obsecurity.org> <200606130715.52425.davidxu@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org, kmacy@FreeBSD.org, danial_thom@yahoo.com, Scott Long , Kris Kennaway Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 10:01:11 -0000 On Tue, 13 Jun 2006, David Xu wrote: > On Tuesday 13 June 2006 04:32, Kris Kennaway wrote: >> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote: >>> On Mon, 12 Jun 2006, Scott Long wrote: >>>> I run a number of high-load production systems that do a lot of network >>>> and filesystem activity, all with HZ set to 100. It has also been shown >>>> in the past that certain things in the network area where not fixed to >>>> deal with a high HZ value, so it's possible that it's even more >>>> stable/reliable with an HZ value of 100. >>>> >>>> My personal opinion is that HZ should gop back down to 100 in 7-CURRENT >>>> immediately, and only be incremented back up when/if it's proven to be >>>> the right thing to do. And, I say that as someone who (errantly) pushed >>>> for the increase to 1000 several years ago. >>> >>> I think it's probably a good idea to do it sooner rather than later. It >>> may slightly negatively impact some services that rely on frequent timers >>> to do things like retransmit timing and the like. But I haven't done any >>> measurements. >> >> As you know, but for the benefit of the list, restoring HZ=100 is often an >> important performance tweak on SMP systems with many CPUs because of all >> the sched_lock activity from statclock/hardclock, which scales with HZ and >> NCPUS. > > sched_lock is another big bottleneck, since if you 32 CPUs, in theory you > have 32X context switch speed, but now it still has only 1X speed, and there > are code abusing sched_lock, the M:N bits dynamically inserts a thread into > thread list at context switch time, this is a bug, this causes thread list > in a proc has to be protected by scheduler lock, and delivering a signal to > process has to hold scheduler lock and find a thread, if the proc has many > threads, this will introduce long scheduler latency, a proc lock is not > enough to find a thread, this is a bug, there are other code abusing > scheduler lock which really can use its own lock. I've added Kip Macy to the CC, who is working with a patch for Sun4v that eliminates sched_lock. Maybe he can comment some more on this thread? Robert N M Watson Computer Laboratory Universty of Cambridge From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 15:24:54 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F0E7616A47F for ; Tue, 13 Jun 2006 15:24:54 +0000 (UTC) (envelope-from danial_thom@yahoo.com) Received: from web33306.mail.mud.yahoo.com (web33306.mail.mud.yahoo.com [68.142.206.121]) by mx1.FreeBSD.org (Postfix) with SMTP id 603AB43D81 for ; Tue, 13 Jun 2006 15:24:20 +0000 (GMT) (envelope-from danial_thom@yahoo.com) Received: (qmail 87826 invoked by uid 60001); 13 Jun 2006 15:24:10 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=tnbzVcKCD7YcmQwTOqF4Mn8m9RmHiSGMm9+kw8T9xKtLl2CqyGq9sNQ77OSOWqH31PcFR4yANPZexBYdgh3gPAal6t6lDf/h/42Fa3y9oIFchBcL9tAs9mXGWIPXr4S0IazPdGsPBW66QfbFmK90PX3a5RWhc6hN7dOEvTXeFEI= ; Message-ID: <20060613152410.87824.qmail@web33306.mail.mud.yahoo.com> Received: from [65.34.182.15] by web33306.mail.mud.yahoo.com via HTTP; Tue, 13 Jun 2006 08:24:10 PDT Date: Tue, 13 Jun 2006 08:24:10 -0700 (PDT) From: Danial Thom To: Robert Watson , David Xu In-Reply-To: <20060613105930.N34121@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-performance@freebsd.org Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: danial_thom@yahoo.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 15:24:55 -0000 I'm sorry if I missed it, but I don't believe anyone answered this question: >Lastly, is there a utility similar to cpustat in >DragonflyBSD which shows the per-cpu usage >stats? I need to gauge the efficiency of SMP for a particular application, and also have some way of measuring the effects of code changes. Thanks, DT __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 15:30:26 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 302A716A478; Tue, 13 Jun 2006 15:30:26 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9810A43D77; Tue, 13 Jun 2006 15:30:24 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 86B4746B0F; Tue, 13 Jun 2006 11:30:13 -0400 (EDT) Date: Tue, 13 Jun 2006 16:30:13 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Danial Thom In-Reply-To: <20060613152410.87824.qmail@web33306.mail.mud.yahoo.com> Message-ID: <20060613162933.U88691@fledge.watson.org> References: <20060613152410.87824.qmail@web33306.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org, David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 15:30:26 -0000 On Tue, 13 Jun 2006, Danial Thom wrote: > I'm sorry if I missed it, but I don't believe anyone answered this question: > >> Lastly, is there a utility similar to cpustat in > >> DragonflyBSD which shows the per-cpu usage stats? > > I need to gauge the efficiency of SMP for a particular application, and also > have some way of measuring the effects of code changes. I didn't answer it because I don't know what output cpustat provides. What output does cpustat provide on DragonflyBSD? Robert N M Watson Computer Laboratory Universty of Cambridge From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 16:08:16 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 25C8916A41A for ; Tue, 13 Jun 2006 16:08:16 +0000 (UTC) (envelope-from danial_thom@yahoo.com) Received: from web33307.mail.mud.yahoo.com (web33307.mail.mud.yahoo.com [68.142.206.122]) by mx1.FreeBSD.org (Postfix) with SMTP id 1695943D53 for ; Tue, 13 Jun 2006 16:08:15 +0000 (GMT) (envelope-from danial_thom@yahoo.com) Received: (qmail 41748 invoked by uid 60001); 13 Jun 2006 16:08:14 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=28FpI0xpG4cnO8SAQ9qoQNfJbW7F0gd4Jq9l8bOYTQe7WYuxFCPOd0f15OQpkla4PLxNN6PKIiQyWf+8rFqsWD2tyZEitJYOInYpcJDKahwMQzgdcn6vjqye+7TThg5Reqv4Atj4XtRlPd78Q1OY2vT7ADxv+opJbw3MDL7p9hY= ; Message-ID: <20060613160814.41746.qmail@web33307.mail.mud.yahoo.com> Received: from [65.34.182.15] by web33307.mail.mud.yahoo.com via HTTP; Tue, 13 Jun 2006 09:08:14 PDT Date: Tue, 13 Jun 2006 09:08:14 -0700 (PDT) From: Danial Thom To: Robert Watson In-Reply-To: <20060613162933.U88691@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-performance@freebsd.org, David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: danial_thom@yahoo.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 16:08:16 -0000 --- Robert Watson wrote: > > On Tue, 13 Jun 2006, Danial Thom wrote: > > > I'm sorry if I missed it, but I don't believe > anyone answered this question: > > > >> Lastly, is there a utility similar to > cpustat in > > > >> DragonflyBSD which shows the per-cpu usage > stats? > > > > I need to gauge the efficiency of SMP for a > particular application, and also > > have some way of measuring the effects of > code changes. > > I didn't answer it because I don't know what > output cpustat provides. What > output does cpustat provide on DragonflyBSD? Its a simple output such as: CPU-0 state: 14.00% user, 0.00% nice, 2.00% sys, 6.00% intr, 78.00% idle CPU-1 state: 4.00% user, 0.00% nice, 17.00% sys, 2.00% intr, 77.00% idle Of course, hp-ux type output for top would be ideal: Load averages: 0.27, 0.28, 0.28 203 processes: 186 sleeping, 17 running Cpu states: CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS 0 0.05 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0% 1 0.92 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0% 2 0.03 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0% 3 0.08 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0% --- ---- ----- ----- ----- ----- ----- ----- ----- ----- avg 0.27 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0% What is the plan for FreeBSD, as I don't see that top shows any distribution among cpus? DT __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 16:31:06 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7D45F16A473; Tue, 13 Jun 2006 16:31:06 +0000 (UTC) (envelope-from tbyte@otel.net) Received: from mail.otel.net (gw3.OTEL.net [212.36.8.151]) by mx1.FreeBSD.org (Postfix) with ESMTP id ECD1A43D48; Tue, 13 Jun 2006 16:31:05 +0000 (GMT) (envelope-from tbyte@otel.net) Received: from dragon.otel.net ([212.36.8.135]) by mail.otel.net with esmtp (Exim 4.62 (FreeBSD)) (envelope-from ) id 1FqBnD-000DTf-BK; Tue, 13 Jun 2006 19:31:03 +0300 From: Iasen Kostov To: danial_thom@yahoo.com In-Reply-To: <20060613160814.41746.qmail@web33307.mail.mud.yahoo.com> References: <20060613160814.41746.qmail@web33307.mail.mud.yahoo.com> Content-Type: text/plain Date: Tue, 13 Jun 2006 19:31:02 +0300 Message-Id: <1150216262.81055.0.camel@DraGoN.OTEL.net> Mime-Version: 1.0 X-Mailer: Evolution 2.6.2 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: freebsd-performance@freebsd.org, Robert Watson , David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 16:31:06 -0000 On Tue, 2006-06-13 at 09:08 -0700, Danial Thom wrote: > > --- Robert Watson wrote: > > > > > On Tue, 13 Jun 2006, Danial Thom wrote: > > > > > I'm sorry if I missed it, but I don't believe > > anyone answered this question: > > > > > >> Lastly, is there a utility similar to > > cpustat in > > > > > >> DragonflyBSD which shows the per-cpu usage > > stats? > > > > > > I need to gauge the efficiency of SMP for a > > particular application, and also > > > have some way of measuring the effects of > > code changes. > > > > I didn't answer it because I don't know what > > output cpustat provides. What > > output does cpustat provide on DragonflyBSD? > > Its a simple output such as: > > CPU-0 state: 14.00% user, 0.00% nice, 2.00% > sys, 6.00% intr, 78.00% idle > CPU-1 state: 4.00% user, 0.00% nice, 17.00% > sys, 2.00% intr, 77.00% idle > > Of course, hp-ux type output for top would be > ideal: > > Load averages: 0.27, 0.28, 0.28 > 203 processes: 186 sleeping, 17 running > Cpu states: > CPU LOAD USER NICE SYS IDLE BLOCK > SWAIT INTR SSYS > 0 0.05 0.0% 0.0% 0.0% 100.0% 0.0% > 0.0% 0.0% 0.0% > 1 0.92 0.0% 0.0% 0.0% 100.0% 0.0% > 0.0% 0.0% 0.0% > 2 0.03 0.0% 0.0% 0.0% 100.0% 0.0% > 0.0% 0.0% 0.0% > 3 0.08 0.0% 0.0% 0.0% 100.0% 0.0% > 0.0% 0.0% 0.0% > --- ---- ----- ----- ----- ----- ----- > ----- ----- ----- > avg 0.27 0.0% 0.0% 0.0% 100.0% 0.0% > 0.0% 0.0% 0.0% > > What is the plan for FreeBSD, as I don't see that > top shows any distribution among cpus? > Probably You've missed the -S option: last pid: 37969; load averages: 1.85, 1.92, 2.20 up 1+02:28:38 19:29:53 336 processes: 9 running, 311 sleeping, 1 zombie, 15 waiting CPU states: 25.0% user, 1.5% nice, 20.6% system, 1.5% interrupt, 51.5% idle Mem: 1945M Active, 2793M Inact, 1008M Wired, 307M Cache, 214M Buf, 1690M Free Swap: 4096M Total, 408K Used, 4095M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 14 root 1 171 52 0K 16K RUN 0 19.2H 62.99% idle: cpu0 13 root 1 171 52 0K 16K RUN 1 810:43 61.77% idle: cpu1 11 root 1 171 52 0K 16K RUN 3 17.6H 61.52% idle: cpu3 12 root 1 171 52 0K 16K RUN 2 931:34 60.99% idle: cpu2 From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 16:57:53 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 21F2616A476; Tue, 13 Jun 2006 16:57:53 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id AB71143D58; Tue, 13 Jun 2006 16:57:52 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 4C4AF46BBA; Tue, 13 Jun 2006 12:57:52 -0400 (EDT) Date: Tue, 13 Jun 2006 17:57:52 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Danial Thom In-Reply-To: <20060613160814.41746.qmail@web33307.mail.mud.yahoo.com> Message-ID: <20060613175531.S26068@fledge.watson.org> References: <20060613160814.41746.qmail@web33307.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org, David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 16:57:53 -0000 On Tue, 13 Jun 2006, Danial Thom wrote: >> I didn't answer it because I don't know what output cpustat provides. What >> output does cpustat provide on DragonflyBSD? > > Its a simple output such as: > > CPU-0 state: 14.00% user, 0.00% nice, 2.00% > sys, 6.00% intr, 78.00% idle > CPU-1 state: 4.00% user, 0.00% nice, 17.00% > sys, 2.00% intr, 77.00% idle > > Of course, hp-ux type output for top would be > ideal: > > Load averages: 0.27, 0.28, 0.28 > 203 processes: 186 sleeping, 17 running > Cpu states: > CPU LOAD USER NICE SYS IDLE BLOCK > SWAIT INTR SSYS > 0 0.05 0.0% 0.0% 0.0% 100.0% 0.0% > 0.0% 0.0% 0.0% > 1 0.92 0.0% 0.0% 0.0% 100.0% 0.0% > 0.0% 0.0% 0.0% > 2 0.03 0.0% 0.0% 0.0% 100.0% 0.0% > 0.0% 0.0% 0.0% > 3 0.08 0.0% 0.0% 0.0% 100.0% 0.0% > 0.0% 0.0% 0.0% > --- ---- ----- ----- ----- ----- ----- > ----- ----- ----- > avg 0.27 0.0% 0.0% 0.0% 100.0% 0.0% > 0.0% 0.0% 0.0% > > What is the plan for FreeBSD, as I don't see that top shows any distribution > among cpus? top displays some CPU information, especially with -S which shows you the level of activity for the idle thread on each CPU. The above looks useful, and should be fairly easy to add. I've been thinking about adding a few new pages to systat output: - Kernel memory allocator stats, based on memstat/memtop (and similar to what vmstat -z and vmstat -m show). - CPU statistics such as the above. I think there are some patches floating around already that gather per-cpu cp_time measurements, but Kris has commented to me that they reduce performance somewhat, so I'll have to investigate some. That may be a caching effect of some sort. Robert N M Watson Computer Laboratory Universty of Cambridge From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 18:23:42 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 723B416A41B for ; Tue, 13 Jun 2006 18:23:42 +0000 (UTC) (envelope-from danial_thom@yahoo.com) Received: from web33304.mail.mud.yahoo.com (web33304.mail.mud.yahoo.com [68.142.206.119]) by mx1.FreeBSD.org (Postfix) with SMTP id 13E3743D46 for ; Tue, 13 Jun 2006 18:23:42 +0000 (GMT) (envelope-from danial_thom@yahoo.com) Received: (qmail 76246 invoked by uid 60001); 13 Jun 2006 18:23:28 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=b0o1m8kfJsLzMLLyQFqu5hTpY1SnEYfMrp/2nOTCSwpf+zGNV5DEIvZvfpUMJ6M/kvJ0oAQ72GmGxUDQw6UvkiK8t2H4EGRTag9hVZVb5KNIAIxjHv5MHTV0CALsVXUzc8ek8VZsK4GGTxUepjxVftBvrGIkW4YOyfUkpzELpSs= ; Message-ID: <20060613182328.76244.qmail@web33304.mail.mud.yahoo.com> Received: from [65.34.182.15] by web33304.mail.mud.yahoo.com via HTTP; Tue, 13 Jun 2006 11:23:28 PDT Date: Tue, 13 Jun 2006 11:23:28 -0700 (PDT) From: Danial Thom To: Robert Watson In-Reply-To: <20060613175531.S26068@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-performance@freebsd.org, David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: danial_thom@yahoo.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 18:23:42 -0000 --- Robert Watson wrote: > On Tue, 13 Jun 2006, Danial Thom wrote: > > >> I didn't answer it because I don't know what > output cpustat provides. What > >> output does cpustat provide on DragonflyBSD? > > > > Its a simple output such as: > > > > CPU-0 state: 14.00% user, 0.00% nice, > 2.00% > > sys, 6.00% intr, 78.00% idle > > CPU-1 state: 4.00% user, 0.00% nice, > 17.00% > > sys, 2.00% intr, 77.00% idle > > > > Of course, hp-ux type output for top would be > > ideal: > > > > Load averages: 0.27, 0.28, 0.28 > > 203 processes: 186 sleeping, 17 running > > Cpu states: > > CPU LOAD USER NICE SYS IDLE BLOCK > > SWAIT INTR SSYS > > 0 0.05 0.0% 0.0% 0.0% 100.0% 0.0% > > 0.0% 0.0% 0.0% > > 1 0.92 0.0% 0.0% 0.0% 100.0% 0.0% > > 0.0% 0.0% 0.0% > > 2 0.03 0.0% 0.0% 0.0% 100.0% 0.0% > > 0.0% 0.0% 0.0% > > 3 0.08 0.0% 0.0% 0.0% 100.0% 0.0% > > 0.0% 0.0% 0.0% > > --- ---- ----- ----- ----- ----- ----- > > ----- ----- ----- > > avg 0.27 0.0% 0.0% 0.0% 100.0% 0.0% > > 0.0% 0.0% 0.0% > > > > What is the plan for FreeBSD, as I don't see > that top shows any distribution > > among cpus? > > top displays some CPU information, especially > with -S which shows you the > level of activity for the idle thread on each > CPU. The above looks useful, > and should be fairly easy to add. I've been > thinking about adding a few new > pages to systat output: > > - Kernel memory allocator stats, based on > memstat/memtop (and similar to what > vmstat -z and vmstat -m show). > - CPU statistics such as the above. > > I think there are some patches floating around > already that gather per-cpu > cp_time measurements, but Kris has commented to > me that they reduce > performance somewhat, so I'll have to > investigate some. That may be a caching > effect of some sort. Maybe someone can explain this output. The top line shows 99.6%idle. Is it just showing CPU 0s stats on the top line? last pid: 705; load averages: 0.06, 0.02, 0.00 up 0+00:29:36 14:22:42 69 processes: 3 running, 48 sleeping, 18 waiting CPU states: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle Mem: 8160K Active, 8108K Inact, 17M Wired, 9712K Buf, 461M Free Swap: 512M Total, 512M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 1 171 52 0K 8K RUN 1 28:58 98.97% idle: cpu1 12 root 1 171 52 0K 8K CPU0 0 27:34 77.64% idle: cpu0 23 root 1 -68 -187 0K 8K WAIT 0 1:07 17.14% irq21: em1 __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 18:36:24 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2EDF616A41A; Tue, 13 Jun 2006 18:36:24 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id AE1A643D46; Tue, 13 Jun 2006 18:36:23 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id C748846BD3; Tue, 13 Jun 2006 14:36:21 -0400 (EDT) Date: Tue, 13 Jun 2006 19:36:21 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Danial Thom In-Reply-To: <20060613182328.76244.qmail@web33304.mail.mud.yahoo.com> Message-ID: <20060613193040.O26068@fledge.watson.org> References: <20060613182328.76244.qmail@web33304.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org, David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 18:36:24 -0000 On Tue, 13 Jun 2006, Danial Thom wrote: > Maybe someone can explain this output. The top line shows 99.6%idle. Is it > just showing CPU 0s stats on the top line? Two types of measurements are taken: sampled ticks regarding whether the system as a while is in {user, nice, system, intr, idle}, and then sampling for individual processes. Right now, the system measurements are kept in a simple array of tick counters called cp_time. John Baldwin and others have changes that make these tick counters per-CPU. The lines at the top of top(1)'s output are derived from those tick counters. Ticks are measured on each CPU, so those are a summary across all CPUs. To add cpustat support, we need to merge John's patch to make cp_time per-CPU (ie., different counters for different CPUs) and teach the userland tools to retrieve them. When you run top you'll notice that it adjusts the measurements each refresh. In effect, what it's doing is sampling the change in tick counts over the window, pulling down the new values and calculating the percentages of ticks in each "bucket" in the last window. Robert N M Watson Computer Laboratory Universty of Cambridge From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 18:43:40 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1A63216A47B for ; Tue, 13 Jun 2006 18:43:40 +0000 (UTC) (envelope-from danial_thom@yahoo.com) Received: from web33314.mail.mud.yahoo.com (web33314.mail.mud.yahoo.com [68.142.206.129]) by mx1.FreeBSD.org (Postfix) with SMTP id 1D05B43D48 for ; Tue, 13 Jun 2006 18:43:37 +0000 (GMT) (envelope-from danial_thom@yahoo.com) Received: (qmail 60231 invoked by uid 60001); 13 Jun 2006 18:43:36 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=DyK7tmlOn0MN29p52zFolf4agqne+9Xi5q4ilu2ic25Yg1fFrVAj6uYFNaJ2op1xqiA9w2JUiFZ/RyXGHCEmXqte39QyyzU+Qq0pWfSd/OTPjQ1vCYKZa41g5uWno3Z3rEKQC9XxbeYhN+c/Y5X6EVVYGpMBq61GrBIav7fZXVU= ; Message-ID: <20060613184336.60229.qmail@web33314.mail.mud.yahoo.com> Received: from [65.34.182.15] by web33314.mail.mud.yahoo.com via HTTP; Tue, 13 Jun 2006 11:43:36 PDT Date: Tue, 13 Jun 2006 11:43:36 -0700 (PDT) From: Danial Thom To: Robert Watson In-Reply-To: <20060613193040.O26068@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-performance@freebsd.org, David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: danial_thom@yahoo.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 18:43:40 -0000 --- Robert Watson wrote: > > On Tue, 13 Jun 2006, Danial Thom wrote: > > > Maybe someone can explain this output. The > top line shows 99.6%idle. Is it > > just showing CPU 0s stats on the top line? > > Two types of measurements are taken: sampled > ticks regarding whether the > system as a while is in {user, nice, system, > intr, idle}, and then sampling > for individual processes. Right now, the > system measurements are kept in a > simple array of tick counters called cp_time. > John Baldwin and others have > changes that make these tick counters per-CPU. > The lines at the top of > top(1)'s output are derived from those tick > counters. Ticks are measured on > each CPU, so those are a summary across all > CPUs. To add cpustat support, we > need to merge John's patch to make cp_time > per-CPU (ie., different counters > for different CPUs) and teach the userland > tools to retrieve them. When you > run top you'll notice that it adjusts the > measurements each refresh. In > effect, what it's doing is sampling the change > in tick counts over the window, > pulling down the new values and calculating the > percentages of ticks in each > "bucket" in the last window. That doesn't explain why the Top line shows 99.6% idle, but the cpu idle threads are showing significant usage. I'm getting a constant 6000 Interrupts / Second on my em controller, yet top jumps all over the place; sitting at 99% idle for 10 seconds, then jumping to 50%, then somewhere in between. It seems completely unreliable. The load I'm applying is constant. DT __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 19:01:42 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4B70816A474; Tue, 13 Jun 2006 19:01:42 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id CF86543D45; Tue, 13 Jun 2006 19:01:41 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 638AC46C0F; Tue, 13 Jun 2006 15:01:40 -0400 (EDT) Date: Tue, 13 Jun 2006 20:01:40 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Danial Thom In-Reply-To: <20060613184336.60229.qmail@web33314.mail.mud.yahoo.com> Message-ID: <20060613195113.T26068@fledge.watson.org> References: <20060613184336.60229.qmail@web33314.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org, David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 19:01:42 -0000 On Tue, 13 Jun 2006, Danial Thom wrote: >> Two types of measurements are taken: sampled ticks regarding whether the >> system as a while is in {user, nice, system, intr, idle}, and then sampling >> for individual processes. Right now, the system measurements are kept in a >> simple array of tick counters called cp_time. John Baldwin and others have >> changes that make these tick counters per-CPU. The lines at the top of >> top(1)'s output are derived from those tick counters. Ticks are measured >> on each CPU, so those are a summary across all CPUs. To add cpustat >> support, we need to merge John's patch to make cp_time per-CPU (ie., >> different counters for different CPUs) and teach the userland tools to >> retrieve them. When you run top you'll notice that it adjusts the >> measurements each refresh. In effect, what it's doing is sampling the >> change in tick counts over the window, pulling down the new values and >> calculating the percentages of ticks in each "bucket" in the last window. > > That doesn't explain why the Top line shows 99.6% idle, but the cpu idle > threads are showing significant usage. > > I'm getting a constant 6000 Interrupts / Second on my em controller, yet top > jumps all over the place; sitting at 99% idle for 10 seconds, then jumping > to 50%, then somewhere in between. It seems completely unreliable. The load > I'm applying is constant. I can't speak to the details of the thread/process use sampling model. Top uses something called the "weighted cpu percentage" by default; you can switch to "unweighted" using the -C argument. The top documentation fails to document the semantics of the percentages, but I suspect -C will give you more of what you expect. The weighted CPU measurement takes into account process history, so it takes a while for sudden spike in CPU use to be fully reflected, and you may see seemingly counter-intuitive results, such as the appearance of greater than 100% CPU use. Try out -C and see if you see something that makes more sense? Robert N M Watson Computer Laboratory Universty of Cambridge From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 19:04:24 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1503016A41B; Tue, 13 Jun 2006 19:04:24 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 677EA43D49; Tue, 13 Jun 2006 19:04:20 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [10.10.3.185] ([69.15.205.254]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k5DJ4A1S068353; Tue, 13 Jun 2006 13:04:16 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <448F0C20.3090800@samsco.org> Date: Tue, 13 Jun 2006 13:04:00 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060206 X-Accept-Language: en-us, en MIME-Version: 1.0 To: danial_thom@yahoo.com References: <20060613184336.60229.qmail@web33314.mail.mud.yahoo.com> In-Reply-To: <20060613184336.60229.qmail@web33314.mail.mud.yahoo.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=3.8 tests=none autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: freebsd-performance@freebsd.org, Robert Watson , David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 19:04:24 -0000 Danial Thom wrote: > > --- Robert Watson wrote: > > >>On Tue, 13 Jun 2006, Danial Thom wrote: >> >> >>>Maybe someone can explain this output. The >> >>top line shows 99.6%idle. Is it >> >>>just showing CPU 0s stats on the top line? >> >>Two types of measurements are taken: sampled >>ticks regarding whether the >>system as a while is in {user, nice, system, >>intr, idle}, and then sampling >>for individual processes. Right now, the >>system measurements are kept in a >>simple array of tick counters called cp_time. >>John Baldwin and others have >>changes that make these tick counters per-CPU. >>The lines at the top of >>top(1)'s output are derived from those tick >>counters. Ticks are measured on >>each CPU, so those are a summary across all >>CPUs. To add cpustat support, we >>need to merge John's patch to make cp_time >>per-CPU (ie., different counters >>for different CPUs) and teach the userland >>tools to retrieve them. When you >>run top you'll notice that it adjusts the >>measurements each refresh. In >>effect, what it's doing is sampling the change >>in tick counts over the window, >>pulling down the new values and calculating the >>percentages of ticks in each >>"bucket" in the last window. > > > That doesn't explain why the Top line shows 99.6% > idle, but the cpu idle threads are showing > significant usage. > > I'm getting a constant 6000 Interrupts / Second > on my em controller, yet top jumps all over the > place; sitting at 99% idle for 10 seconds, then > jumping to 50%, then somewhere in between. It > seems completely unreliable. The load I'm > applying is constant. > > DT Be aware that there was a significant change made to if_em in 7-CURRENT in Jan 2006 to improve load performance. It'll probably get backported for 6.2, but you might consider looking at it before you make up your mind on 6.1 performance. Sscott From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 19:48:48 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5F92116A479 for ; Tue, 13 Jun 2006 19:48:48 +0000 (UTC) (envelope-from danial_thom@yahoo.com) Received: from web33313.mail.mud.yahoo.com (web33313.mail.mud.yahoo.com [68.142.206.128]) by mx1.FreeBSD.org (Postfix) with SMTP id 2E7B643D46 for ; Tue, 13 Jun 2006 19:48:47 +0000 (GMT) (envelope-from danial_thom@yahoo.com) Received: (qmail 32771 invoked by uid 60001); 13 Jun 2006 19:48:46 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=TYFfr2lgZcAFxUWRyfXro/KwbhB7GRW2x7qJkiW+cR7/jFUgmtzxw3A2aLEVUG+EHozo/zo9YdC8CbRakS8vqTrPZsGCznQp4w/CDks+iyL6H7a9NY9E4aLYYAZ0dtHOCYc2qEX9vWto/gkH0z+PQEsLC6syhYMcP9MIKWRWDZ8= ; Message-ID: <20060613194846.32769.qmail@web33313.mail.mud.yahoo.com> Received: from [65.34.182.15] by web33313.mail.mud.yahoo.com via HTTP; Tue, 13 Jun 2006 12:48:46 PDT Date: Tue, 13 Jun 2006 12:48:46 -0700 (PDT) From: Danial Thom To: Robert Watson In-Reply-To: <20060613195113.T26068@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-performance@freebsd.org, David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: danial_thom@yahoo.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 19:48:48 -0000 --- Robert Watson wrote: > > On Tue, 13 Jun 2006, Danial Thom wrote: > > >> Two types of measurements are taken: sampled > ticks regarding whether the > >> system as a while is in {user, nice, system, > intr, idle}, and then sampling > >> for individual processes. Right now, the > system measurements are kept in a > >> simple array of tick counters called > cp_time. John Baldwin and others have > >> changes that make these tick counters > per-CPU. The lines at the top of > >> top(1)'s output are derived from those tick > counters. Ticks are measured > >> on each CPU, so those are a summary across > all CPUs. To add cpustat > >> support, we need to merge John's patch to > make cp_time per-CPU (ie., > >> different counters for different CPUs) and > teach the userland tools to > >> retrieve them. When you run top you'll > notice that it adjusts the > >> measurements each refresh. In effect, what > it's doing is sampling the > >> change in tick counts over the window, > pulling down the new values and > >> calculating the percentages of ticks in each > "bucket" in the last window. > > > > That doesn't explain why the Top line shows > 99.6% idle, but the cpu idle > > threads are showing significant usage. > > > > I'm getting a constant 6000 Interrupts / > Second on my em controller, yet top > > jumps all over the place; sitting at 99% idle > for 10 seconds, then jumping > > to 50%, then somewhere in between. It seems > completely unreliable. The load > > I'm applying is constant. > > I can't speak to the details of the > thread/process use sampling model. Top > uses something called the "weighted cpu > percentage" by default; you can switch > to "unweighted" using the -C argument. The top > documentation fails to > document the semantics of the percentages, but > I suspect -C will give you more > of what you expect. The weighted CPU > measurement takes into account process > history, so it takes a while for sudden spike > in CPU use to be fully > reflected, and you may see seemingly > counter-intuitive results, such as the > appearance of greater than 100% CPU use. Try > out -C and see if you see > something that makes more sense? > It seems to work just fine with 1 CPU. Its equally useless with the -C option in SMP mode. Here's a snip from 'systat -vmstat 1' Proc:r p d s w Csw Trp Sys Int Sof Flt cow 10009 total 24 18353 1 129 156k 1 17108 wire 6: fdc0 7908 act 14: ata 0.4%Sys 0.4%Intr 0.0%User 0.0%Nice 99.2%Idl 7236 inact 20: em0 | | | | | | | | | | cache 6000 21: em1 473456 free 5 24: bge 6000 interrupts per second and .4% interrupt usage. Clearly the tools don't work at all in SMP mode. I don't see how you can do development without measurement tools that work. DT __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 19:57:40 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3CA4816A477 for ; Tue, 13 Jun 2006 19:57:40 +0000 (UTC) (envelope-from danial_thom@yahoo.com) Received: from web33310.mail.mud.yahoo.com (web33310.mail.mud.yahoo.com [68.142.206.125]) by mx1.FreeBSD.org (Postfix) with SMTP id 246CB43D5A for ; Tue, 13 Jun 2006 19:57:39 +0000 (GMT) (envelope-from danial_thom@yahoo.com) Received: (qmail 64421 invoked by uid 60001); 13 Jun 2006 19:57:38 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=CnDIE4AXYbMBBGrRd1L5jDg/cDGBNb6ecAPY+h27cCc/fi/JUPm9t2gKo9HvA37+GzUUkS/QmGmERFaD+7kKZF23N3jmSZWadlXQnmpZ4KXzimVeHJl1suVcLCEpebQaKP1/gQ2d1dRpNOZKSBvNXGaPkrE5jTrFCStxmKrcszk= ; Message-ID: <20060613195738.64419.qmail@web33310.mail.mud.yahoo.com> Received: from [65.34.182.15] by web33310.mail.mud.yahoo.com via HTTP; Tue, 13 Jun 2006 12:57:38 PDT Date: Tue, 13 Jun 2006 12:57:38 -0700 (PDT) From: Danial Thom To: Scott Long In-Reply-To: <448F0C20.3090800@samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-performance@freebsd.org, Robert Watson , David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: danial_thom@yahoo.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 19:57:40 -0000 --- Scott Long wrote: > Danial Thom wrote: > > > > --- Robert Watson > wrote: > > > > > >>On Tue, 13 Jun 2006, Danial Thom wrote: > >> > >> > >>>Maybe someone can explain this output. The > >> > >>top line shows 99.6%idle. Is it > >> > >>>just showing CPU 0s stats on the top line? > >> > >>Two types of measurements are taken: sampled > >>ticks regarding whether the > >>system as a while is in {user, nice, system, > >>intr, idle}, and then sampling > >>for individual processes. Right now, the > >>system measurements are kept in a > >>simple array of tick counters called cp_time. > > >>John Baldwin and others have > >>changes that make these tick counters > per-CPU. > >>The lines at the top of > >>top(1)'s output are derived from those tick > >>counters. Ticks are measured on > >>each CPU, so those are a summary across all > >>CPUs. To add cpustat support, we > >>need to merge John's patch to make cp_time > >>per-CPU (ie., different counters > >>for different CPUs) and teach the userland > >>tools to retrieve them. When you > >>run top you'll notice that it adjusts the > >>measurements each refresh. In > >>effect, what it's doing is sampling the > change > >>in tick counts over the window, > >>pulling down the new values and calculating > the > >>percentages of ticks in each > >>"bucket" in the last window. > > > > > > That doesn't explain why the Top line shows > 99.6% > > idle, but the cpu idle threads are showing > > significant usage. > > > > I'm getting a constant 6000 Interrupts / > Second > > on my em controller, yet top jumps all over > the > > place; sitting at 99% idle for 10 seconds, > then > > jumping to 50%, then somewhere in between. It > > seems completely unreliable. The load I'm > > applying is constant. > > > > DT > > Be aware that there was a significant change > made to if_em > in 7-CURRENT in Jan 2006 to improve load > performance. It'll > probably get backported for 6.2, but you might > consider > looking at it before you make up your mind on > 6.1 performance. I can bridge 1 million pps with the em driver in 4.9, and it looks pretty much intact in 6.1, so I'm not too worried about the em driver being the problem here. Plus the measurements look just fine with 1 cpu, and they are completely impossible in SMP mode. So its reasonable to conclude that the measurement tools simply don't work. Since everyone agrees that the load measuring tools aren't all that accurate, what criteria was used to determine that the changes made in 7 have the effect that you think they have had? DT DT __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 20:02:00 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4236516A41A; Tue, 13 Jun 2006 20:02:00 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9637A43D46; Tue, 13 Jun 2006 20:01:59 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [10.10.3.185] ([69.15.205.254]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k5DK1kTT068725; Tue, 13 Jun 2006 14:01:53 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <448F19A4.8040901@samsco.org> Date: Tue, 13 Jun 2006 14:01:40 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060206 X-Accept-Language: en-us, en MIME-Version: 1.0 To: danial_thom@yahoo.com References: <20060613195738.64419.qmail@web33310.mail.mud.yahoo.com> In-Reply-To: <20060613195738.64419.qmail@web33310.mail.mud.yahoo.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=3.8 tests=none autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: freebsd-performance@freebsd.org, Robert Watson , David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 20:02:00 -0000 Danial Thom wrote: > > --- Scott Long wrote: > > >>Danial Thom wrote: >> >>>--- Robert Watson >> >>wrote: >> >>> >>>>On Tue, 13 Jun 2006, Danial Thom wrote: >>>> >>>> >>>> >>>>>Maybe someone can explain this output. The >>>> >>>>top line shows 99.6%idle. Is it >>>> >>>> >>>>>just showing CPU 0s stats on the top line? >>>> >>>>Two types of measurements are taken: sampled >>>>ticks regarding whether the >>>>system as a while is in {user, nice, system, >>>>intr, idle}, and then sampling >>>>for individual processes. Right now, the >>>>system measurements are kept in a >>>>simple array of tick counters called cp_time. >> >>>>John Baldwin and others have >>>>changes that make these tick counters >> >>per-CPU. >> >>>>The lines at the top of >>>>top(1)'s output are derived from those tick >>>>counters. Ticks are measured on >>>>each CPU, so those are a summary across all >>>>CPUs. To add cpustat support, we >>>>need to merge John's patch to make cp_time >>>>per-CPU (ie., different counters >>>>for different CPUs) and teach the userland >>>>tools to retrieve them. When you >>>>run top you'll notice that it adjusts the >>>>measurements each refresh. In >>>>effect, what it's doing is sampling the >> >>change >> >>>>in tick counts over the window, >>>>pulling down the new values and calculating >> >>the >> >>>>percentages of ticks in each >>>>"bucket" in the last window. >>> >>> >>>That doesn't explain why the Top line shows >> >>99.6% >> >>>idle, but the cpu idle threads are showing >>>significant usage. >>> >>>I'm getting a constant 6000 Interrupts / >> >>Second >> >>>on my em controller, yet top jumps all over >> >>the >> >>>place; sitting at 99% idle for 10 seconds, >> >>then >> >>>jumping to 50%, then somewhere in between. It >>>seems completely unreliable. The load I'm >>>applying is constant. >>> >>>DT >> >>Be aware that there was a significant change >>made to if_em >>in 7-CURRENT in Jan 2006 to improve load >>performance. It'll >>probably get backported for 6.2, but you might >>consider >>looking at it before you make up your mind on >>6.1 performance. > > > I can bridge 1 million pps with the em driver in > 4.9, and it looks pretty much intact in 6.1, so > I'm not too worried about the em driver being the > problem here. Plus the measurements look just > fine with 1 cpu, and they are completely > impossible in SMP mode. So its reasonable to > conclude that the measurement tools simply don't > work. > > Since everyone agrees that the load measuring > tools aren't all that accurate, what criteria was > used to determine that the changes made in 7 have > the effect that you think they have had? > > DT > DT > It was tested with a Smartbits packet generator. The tx rate on the generator was increased in steps until the host started dropping packets or became otherwise unresponsive. Scott From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 19:34:54 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 51A6316A41A for ; Tue, 13 Jun 2006 19:34:54 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from nz-out-0102.google.com (nz-out-0102.google.com [64.233.162.207]) by mx1.FreeBSD.org (Postfix) with ESMTP id 067C543D70 for ; Tue, 13 Jun 2006 19:34:45 +0000 (GMT) (envelope-from kip.macy@gmail.com) Received: by nz-out-0102.google.com with SMTP id 13so1588274nzn for ; Tue, 13 Jun 2006 12:34:45 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=GSwUzKjwGtXzh1ZjzGJPfvhhku7JwAHM/r/siISETSBROle55LtPml718AKMKjvU2mLekZ392gR8ajhTvsc3wq9QtuJRCVIdFdFX/IffvNK0vPm4X2PZeDUqdfP0mcSQ66drQPH4gYOyjQYx4yoeKYSoIj6GTHrFHtRdVfyIsFw= Received: by 10.65.59.4 with SMTP id m4mr3359480qbk; Tue, 13 Jun 2006 12:34:44 -0700 (PDT) Received: by 10.65.231.11 with HTTP; Tue, 13 Jun 2006 12:34:44 -0700 (PDT) Message-ID: Date: Tue, 13 Jun 2006 12:34:44 -0700 From: "Kip Macy" To: "Robert Watson" In-Reply-To: <20060613105930.N34121@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <20060612210723.K26068@fledge.watson.org> <20060612203248.GA72885@xor.obsecurity.org> <200606130715.52425.davidxu@freebsd.org> <20060613105930.N34121@fledge.watson.org> X-Mailman-Approved-At: Tue, 13 Jun 2006 20:51:03 +0000 Cc: Scott Long , kmacy@freebsd.org, David Xu , Kris Kennaway , freebsd-performance@freebsd.org, danial_thom@yahoo.com Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: kmacy@fsmware.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 19:34:54 -0000 I have a number of issues with our current locking regime and our propensity for disabling interrupts. I have in mind some ideas for reducing interrupt disabling and eliminating scheduling contention except in the case of one cpu stealing a thread from another cpu's runqueue. I'll try to dash that off early this evening. This should also greatly reduce the overhead of timer interrupts. -Kip On 6/13/06, Robert Watson wrote: > > On Tue, 13 Jun 2006, David Xu wrote: > > > On Tuesday 13 June 2006 04:32, Kris Kennaway wrote: > >> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote: > >>> On Mon, 12 Jun 2006, Scott Long wrote: > >>>> I run a number of high-load production systems that do a lot of network > >>>> and filesystem activity, all with HZ set to 100. It has also been shown > >>>> in the past that certain things in the network area where not fixed to > >>>> deal with a high HZ value, so it's possible that it's even more > >>>> stable/reliable with an HZ value of 100. > >>>> > >>>> My personal opinion is that HZ should gop back down to 100 in 7-CURRENT > >>>> immediately, and only be incremented back up when/if it's proven to be > >>>> the right thing to do. And, I say that as someone who (errantly) pushed > >>>> for the increase to 1000 several years ago. > >>> > >>> I think it's probably a good idea to do it sooner rather than later. It > >>> may slightly negatively impact some services that rely on frequent timers > >>> to do things like retransmit timing and the like. But I haven't done any > >>> measurements. > >> > >> As you know, but for the benefit of the list, restoring HZ=100 is often an > >> important performance tweak on SMP systems with many CPUs because of all > >> the sched_lock activity from statclock/hardclock, which scales with HZ and > >> NCPUS. > > > > sched_lock is another big bottleneck, since if you 32 CPUs, in theory you > > have 32X context switch speed, but now it still has only 1X speed, and there > > are code abusing sched_lock, the M:N bits dynamically inserts a thread into > > thread list at context switch time, this is a bug, this causes thread list > > in a proc has to be protected by scheduler lock, and delivering a signal to > > process has to hold scheduler lock and find a thread, if the proc has many > > threads, this will introduce long scheduler latency, a proc lock is not > > enough to find a thread, this is a bug, there are other code abusing > > scheduler lock which really can use its own lock. > > I've added Kip Macy to the CC, who is working with a patch for Sun4v that > eliminates sched_lock. Maybe he can comment some more on this thread? > > Robert N M Watson > Computer Laboratory > Universty of Cambridge > From owner-freebsd-performance@FreeBSD.ORG Tue Jun 13 21:00:30 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C7AD816A4A6; Tue, 13 Jun 2006 21:00:30 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 47BFD43D64; Tue, 13 Jun 2006 21:00:24 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 30AEE1A4DD5; Tue, 13 Jun 2006 14:00:24 -0700 (PDT) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 5B26B51566; Tue, 13 Jun 2006 17:00:23 -0400 (EDT) Date: Tue, 13 Jun 2006 17:00:23 -0400 From: Kris Kennaway To: Danial Thom Message-ID: <20060613210022.GB5267@xor.obsecurity.org> References: <448F0C20.3090800@samsco.org> <20060613195738.64419.qmail@web33310.mail.mud.yahoo.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6TrnltStXW4iwmi0" Content-Disposition: inline In-Reply-To: <20060613195738.64419.qmail@web33310.mail.mud.yahoo.com> User-Agent: Mutt/1.4.2.1i Cc: Scott Long , Robert Watson , freebsd-performance@freebsd.org, David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jun 2006 21:00:30 -0000 --6TrnltStXW4iwmi0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Jun 13, 2006 at 12:57:38PM -0700, Danial Thom wrote: > Since everyone agrees that the load measuring > tools aren't all that accurate, what criteria was > used to determine that the changes made in 7 have > the effect that you think they have had? Not by using top(1). vmstat seems to do a better job of reporting CPU usage, but still you want to measure what the system can actually do, not how accurately it estimates its own performance. Kris --6TrnltStXW4iwmi0 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEjydmWry0BWjoQKURArK6AJ90ePQZLwsLX8OCVZtSEK5NVw9gYgCg+z1z LeLrJqpW5EZxLdm9/UlV17A= =Uq9m -----END PGP SIGNATURE----- --6TrnltStXW4iwmi0-- From owner-freebsd-performance@FreeBSD.ORG Wed Jun 14 03:15:47 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2BADC16A474 for ; Wed, 14 Jun 2006 03:15:47 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from nz-out-0102.google.com (nz-out-0102.google.com [64.233.162.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id C4A8D43D5C for ; Wed, 14 Jun 2006 03:15:42 +0000 (GMT) (envelope-from kip.macy@gmail.com) Received: by nz-out-0102.google.com with SMTP id 13so35914nzn for ; Tue, 13 Jun 2006 20:15:42 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=cFaYB9oOQOaatvqI0nPbvKvAMvfzZGydwVCVbSL6HzuUk8YkP9ytabE8whpbXI74Xc2K1CDGX+/D8I9YPovZpu6yTpGuHZ1Cl2taIsbSBgseX92okkH8ntJTKpbqWXE3r1tLgXrii9ytBwfgP87XERavlq0m4B7Aw2C9Jtoc9Og= Received: by 10.65.239.8 with SMTP id q8mr126486qbr; Tue, 13 Jun 2006 20:15:42 -0700 (PDT) Received: by 10.65.231.11 with HTTP; Tue, 13 Jun 2006 20:15:42 -0700 (PDT) Message-ID: Date: Tue, 13 Jun 2006 20:15:42 -0700 From: "Kip Macy" To: "Robert Watson" In-Reply-To: <20060613105930.N34121@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <20060612210723.K26068@fledge.watson.org> <20060612203248.GA72885@xor.obsecurity.org> <200606130715.52425.davidxu@freebsd.org> <20060613105930.N34121@fledge.watson.org> Cc: Scott Long , kmacy@freebsd.org, Paul Saab , David Xu , Kris Kennaway , freebsd-performance@freebsd.org, danial_thom@yahoo.com Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: kmacy@fsmware.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jun 2006 03:15:47 -0000 I apologize if this e-mail seems a bit disjoint, I'm quite tired from hauling stuff around today. I'm not entirely familiar with the system as a whole - but to give a brief rundown of what I do know: Context switches, thread prioritization, process statistics keeping, and access to a handful of other random variables are all serialized by sched_lock. Process creation, process exit, process scheduling (schedcpu() access to the allproc_list) are all serialized through the allproc_lock. I've discovered that schedcpu()'s serialization needs doesn't fit in well with sched_lock removal in the presence of a global process list and global runqueue (I'll skip the tedious details for now). In other words, I have missing prerequisites. My current plan for this week, once I get back from Tahoe, is in a separate branch to do the following: - replace the global process list with a per-cpu process list hung off of pcpu protected by a non-interrupt disabling spinlock pcpu_proclist_lock - replace the global run queue with a per-cpu runqueue hung off of pcpu protected by non-interrupt blocking pcpu_runq_lock Once I have this stable I will integrate it into my branch where I have replaced sched_lock with per-thread locks and re-do the current locking I have in choosethread() which I believe causes performance and stability problems. At some point it may be desirable to add support for rebalancing the pcpu process lists to avoid schedcpu/ps/top having to hold the pcpu_proclist_lock for too long. Why do I say "non-interrupt blocking?". Currently we have roughly a half dozen locking primitives. The two that I am familiar with are blocking and spinning mutexes. The general policy is to use blocking locks except where a lock is used in interrupts or the scheduler. It seems to me that in the scheduler interrupts only actually need to be blocked across cpu_switch. Spin locks obviously have to be used because a thread cannot very well context switch while its in the middle of context switching - however, provided td_critnest > 0, there is no reason that interrupts need to be blocked. Currently sched_lock is acquired in cpu_hardclock and statclock - so it does need to block interrupts. There is no reason that these two functions couldn't be run in ast(). In my tree I set td_flags atomically to avoid the need to acquire locks when setting or clearing flags. All the timer interrupt really needs to do for purposes statistics etc. is set a flag in td_flags indicating to ast() that the current thread is returning from a timer interrupt so that cpu_hardclock and statclock are called. I have more in mind, but I'd like to keep the discussion simple by focusing on the next week or two. -Kip On 6/13/06, Robert Watson wrote: > > On Tue, 13 Jun 2006, David Xu wrote: > > > On Tuesday 13 June 2006 04:32, Kris Kennaway wrote: > >> On Mon, Jun 12, 2006 at 09:08:12PM +0100, Robert Watson wrote: > >>> On Mon, 12 Jun 2006, Scott Long wrote: > >>>> I run a number of high-load production systems that do a lot of network > >>>> and filesystem activity, all with HZ set to 100. It has also been shown > >>>> in the past that certain things in the network area where not fixed to > >>>> deal with a high HZ value, so it's possible that it's even more > >>>> stable/reliable with an HZ value of 100. > >>>> > >>>> My personal opinion is that HZ should gop back down to 100 in 7-CURRENT > >>>> immediately, and only be incremented back up when/if it's proven to be > >>>> the right thing to do. And, I say that as someone who (errantly) pushed > >>>> for the increase to 1000 several years ago. > >>> > >>> I think it's probably a good idea to do it sooner rather than later. It > >>> may slightly negatively impact some services that rely on frequent timers > >>> to do things like retransmit timing and the like. But I haven't done any > >>> measurements. > >> > >> As you know, but for the benefit of the list, restoring HZ=100 is often an > >> important performance tweak on SMP systems with many CPUs because of all > >> the sched_lock activity from statclock/hardclock, which scales with HZ and > >> NCPUS. > > > > sched_lock is another big bottleneck, since if you 32 CPUs, in theory you > > have 32X context switch speed, but now it still has only 1X speed, and there > > are code abusing sched_lock, the M:N bits dynamically inserts a thread into > > thread list at context switch time, this is a bug, this causes thread list > > in a proc has to be protected by scheduler lock, and delivering a signal to > > process has to hold scheduler lock and find a thread, if the proc has many > > threads, this will introduce long scheduler latency, a proc lock is not > > enough to find a thread, this is a bug, there are other code abusing > > scheduler lock which really can use its own lock. > > I've added Kip Macy to the CC, who is working with a patch for Sun4v that > eliminates sched_lock. Maybe he can comment some more on this thread? > > Robert N M Watson > Computer Laboratory > Universty of Cambridge > From owner-freebsd-performance@FreeBSD.ORG Wed Jun 14 06:25:47 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0FCB016A474; Wed, 14 Jun 2006 06:25:47 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3385F43D46; Wed, 14 Jun 2006 06:25:45 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout1.pacific.net.au (Postfix) with ESMTP id 2EEB9527FD4; Wed, 14 Jun 2006 16:22:55 +1000 (EST) Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k5E6MkkG030715; Wed, 14 Jun 2006 16:22:48 +1000 Date: Wed, 14 Jun 2006 16:22:46 +1000 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: kmacy@fsmware.com In-Reply-To: Message-ID: <20060614133024.E1753@epsplex.bde.org> References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <20060612210723.K26068@fledge.watson.org> <20060612203248.GA72885@xor.obsecurity.org> <200606130715.52425.davidxu@freebsd.org> <20060613105930.N34121@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Scott Long , kmacy@freebsd.org, Paul Saab , Robert Watson , David Xu , Kris Kennaway , freebsd-performance@freebsd.org, danial_thom@yahoo.com Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jun 2006 06:25:47 -0000 On Tue, 13 Jun 2006, Kip Macy wrote: > ... > Why do I say "non-interrupt blocking?". Currently we have roughly a > half dozen locking primitives. The two that I am familiar with are > blocking and spinning mutexes. The general policy is to use blocking > locks except where a lock is used in interrupts or the scheduler. It > seems to me that in the scheduler interrupts only actually need to be > blocked across cpu_switch. Spin locks obviously have to be used > because a thread cannot very well context switch while its in the > middle of context switching - however, provided td_critnest > 0, there > is no reason that interrupts need to be blocked. Currently sched_lock > is acquired in cpu_hardclock and statclock - so it does need to block > interrupts. There is no reason that these two functions couldn't be > run in ast(). These functions are called from "fast" interrupt handlers, so they cannot use sleep locks. They also cannot be run in ast(), since ast() is only run on return to user mode and uses sleep locks a lot. Gathering of some user-mode statistics could be deferred until return to user mode, but this wouldn't work for kernel-mode statistics, which is never for threads that never leave the kernel, and large changes would be required for the user-mode statistics: algorithmic changes: various, mainly to keep kernel-mode separate; locking: ast() uses sched_lock, so without large changes you would just move the problem (there would be up to hz + stathz extra calls to ast() per second); the statistics fields are all locked by sched_lock, and although this would not be needed for access in ast() some locking would still be needed for many which are accessed from elsewhere). What they (and all fast interrupt handlers or even "fast" interrupt handlers) can do better is use spin locks != sched_lock (and for fast interrupt handlers, != mtx_lock_spin(any)). This is not easy to do in general, and is especially difficult for clock interrupt handlers, because all accesses to data accessed by a fast interrupt handler must be locked by a common lock (especially outside of the handlers) and clock interrupt handlers access a lot of data. Currently, clock interrupt handlers use sched_lock and depend on sched_lock being used too much so that most of the data accessed by clock interrupt handlers is locked automatically. Even then, there are large gaps in the locking. E.g., hardclock() starts by calling tc_ticktock() which mostly uses very delicate time-domain locking but sometimes races with syscalls that use sleep locking, most frequently by calling ntp_update_second(). Most of kern_ntptime.c is documented (in comments) as being required to run at splclock() or higher, but it is actually all locked only by Giant, so sched_lock'ing and other spinlocking for it is neither necessary or sufficient, and calling it correctly from a "fast" interrupt handler is impossible. In my kernel, fast interrupt handlers (and associated non-handler code that shares data) are actually fast (== low-latency && !(very-large-footprint || takes-very-long)). This requires: - mtx_lock_spin() to not mask interrupts, since masking interrupts gives !low-latency at least in the UP case. - fast interrupt handlers to not use sched_lock, since sched_lock gives very-large-footprint. - fast interrupt handlers to not use only mtx_lock_spin(), since that no longer masks them. My implementation actually uses simple_locks plus explicit per-cpu interrupt disabling (as in RELENG_4). This also avoids having to turn off features like WITNESS and KTR which don't honor the rules for fast interrupt handlers. - fast interrupt handlers to not use normal scheduling (things like swi_sched()), since that uses sched_lock and is generally very inefficient. My implementation uses a combination of timeouts and a hack to metamorphose into a SWI handler. The latter is a very expensive operation and should be avoided. swi_sched() encourages this inefficiency except in the SWI_DELAY case. The SWI_DELAY case only takes 50-100 times as many instructions as corresponding scheduling in RELENG_4. SWI_DELAY seems to be unused except in my drivers. My implementation enforces non-use of normal scheduling and some other invalid data accesses (e.g., to curthread) unmapping PCPU data in fast interrupt handlers. - clock interrupt handlers to not be fast interrupt handlers. They have far too large a footprint to be fast interrupt handlers. Locking them is hard enough when they are only "fast" interrupt handlers. I made them normal interrupt handlers and don't support "fast" interrupt handlers. I get very few benefits from this. Normal interrupt handlers for clocks are inefficient. They don't take very long, but switching to them is inefficient. I get lower interrupt latency, but this is not very important now that CPUs are very fast compared with i/o for all devices that I have. I get the possibility of simpler locking in clock interrupt handlers, but haven't simplified or fixed their locking. I get enforced smallness and complexity for fast interrupt handlers since large ones would be too complicated and normal scheduling and locking cannot be used. Bruce From owner-freebsd-performance@FreeBSD.ORG Wed Jun 14 17:38:14 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 67F0016A47D for ; Wed, 14 Jun 2006 17:38:14 +0000 (UTC) (envelope-from danial_thom@yahoo.com) Received: from web33311.mail.mud.yahoo.com (web33311.mail.mud.yahoo.com [68.142.206.126]) by mx1.FreeBSD.org (Postfix) with SMTP id 541A343D5C for ; Wed, 14 Jun 2006 17:38:13 +0000 (GMT) (envelope-from danial_thom@yahoo.com) Received: (qmail 60911 invoked by uid 60001); 14 Jun 2006 17:38:12 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=FDsm+Nkpzuw6co7Aule1Sy9EZ1gce/JkswZp7nXGaTzeJwvI5OmFt1MIHFPy0a0jqjDtq34SKUxxlhznm7p7FlOAX80k2zkrlzXXzVTPGY8R50pCu1DNdbv6tr68N2ImOfWNbAB3qY48/WgvbpZuBe2PKh5DZTGg6izpZHPjLm4= ; Message-ID: <20060614173812.60909.qmail@web33311.mail.mud.yahoo.com> Received: from [65.34.182.15] by web33311.mail.mud.yahoo.com via HTTP; Wed, 14 Jun 2006 10:38:12 PDT Date: Wed, 14 Jun 2006 10:38:12 -0700 (PDT) From: Danial Thom To: Kris Kennaway In-Reply-To: <20060613210022.GB5267@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: Scott Long , Robert Watson , freebsd-performance@freebsd.org, David Xu Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: danial_thom@yahoo.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jun 2006 17:38:14 -0000 --- Kris Kennaway wrote: > On Tue, Jun 13, 2006 at 12:57:38PM -0700, > Danial Thom wrote: > > > Since everyone agrees that the load measuring > > tools aren't all that accurate, what criteria > was > > used to determine that the changes made in 7 > have > > the effect that you think they have had? > > Not by using top(1). vmstat seems to do a > better job of reporting CPU > usage, but still you want to measure what the > system can actually do, > not how accurately it estimates its own > performance. > > Kris > Regarding vmstat: I'm getting the same (obviously wrong) results from vmstat. Which is no usage. I believe I cut and pasted a snippet which showed 6000 ints/second on em with 99.x% idle. It works fine in UP mode, which implies that you aren't accounting properly in SMP mode. Hopefully you (folks) can come to terms with the fact that its broken otherwise it will never be of any use. Regarding testing: My view is that you are making a big mistake if you measure everything at the edge of performance, which is why benchmarks lie and are generally useless. As the bus becomes saturated, and queues become unnaturally large, timings change. You may be measuring how well the system recovers from events that never happen when you just try to "see how much you can do". For example, as the pci bus becomes saturated I/Os take exponentially longer, so you're not really measuring your code. You end up measuring properties which may be very different under normal conditions. And if you try to optimize your code for conditions which rarely if ever occur, you may hose it for normal use (I'm a bit frightened by the 7.0 em changes). "efficiency" is what's important. I want to know how the machine works under normal loads, not when its in constant recovery from overloads. I want to run a realistic load on a machine when I test new code, to see what effect it has on system load. For that I need tools that work. If your machine can push 500Mb/s at 99% load or it can do 492Mb/s at 60% load, my view is that the 492Mb/s system is the better system. In the long run the more efficient systems are the ones that perform better generally. DT __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Wed Jun 14 20:48:04 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A877416A482 for ; Wed, 14 Jun 2006 20:48:04 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from nz-out-0102.google.com (nz-out-0102.google.com [64.233.162.197]) by mx1.FreeBSD.org (Postfix) with ESMTP id 529F043D49 for ; Wed, 14 Jun 2006 20:48:02 +0000 (GMT) (envelope-from kip.macy@gmail.com) Received: by nz-out-0102.google.com with SMTP id 9so414219nzo for ; Wed, 14 Jun 2006 13:48:01 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Wgrm4FfXw2t5USU2Mazk+SwkHdIXCVHqdz921SXp50RmiIroe1Lz77VwZH37PBLl+Me7QF6GwDYD5uoZoMMPA1fF3HfaLGjd0/fPTQZ1GqwlzpDRZomqsFPU1V9EeYccF3DUrFxUE7yrMTWuJOBm8vwCCg/NFHaOfD1NZOdLVoI= Received: by 10.65.215.4 with SMTP id s4mr994291qbq; Wed, 14 Jun 2006 13:48:01 -0700 (PDT) Received: by 10.65.231.11 with HTTP; Wed, 14 Jun 2006 13:48:01 -0700 (PDT) Message-ID: Date: Wed, 14 Jun 2006 13:48:01 -0700 From: "Kip Macy" To: "Bruce Evans" In-Reply-To: <20060614133024.E1753@epsplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <20060612210723.K26068@fledge.watson.org> <20060612203248.GA72885@xor.obsecurity.org> <200606130715.52425.davidxu@freebsd.org> <20060613105930.N34121@fledge.watson.org> <20060614133024.E1753@epsplex.bde.org> Cc: Scott Long , kmacy@freebsd.org, Paul Saab , Robert Watson , David Xu , Kris Kennaway , freebsd-performance@freebsd.org, danial_thom@yahoo.com Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: kmacy@fsmware.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jun 2006 20:48:04 -0000 Hi Bruce - Thanks for the lengthy response. I should not have brought up interrupt handling as a) its a tertiary concern for me at the moment b) everyone has an opinion on it c) I could cut off several fingers and still count on one hand the number of people who understand why its bad that ithreads go through the scheduler in the default case (having a pcpu_runq only helps affinity). To make it easier for future respondents to stay on topic let me explain my situation. I have ported FreeBSD to Sun's new UltraSPARC architecture, sun4v. The current implementation, the T1, has 6-8 cores with 4 threads per core. Unlike HTT on x86, these machines actually have ample memory bandwith ~26GB/s so threading can actually be useful. On my 32-cpu system benchmarks like supersmack max out at 9 threads - i.e. one can't get the system below 70% idle. Across the board context switches on solaris/T1 take 2x as long as they do on linux/T1. Because of lock contention FreeBSD in turn takes between 10% - 100% longer than Solaris to context switch. I would like to be able to tout FreeBSD as a strong competitor on the sun4v architecture. At the moment I can't. Perhaps this isn't the right forum for discussing my concerns - a freebsd-scalability list might be in order. -Kip On 6/13/06, Bruce Evans wrote: > On Tue, 13 Jun 2006, Kip Macy wrote: > > > ... > > Why do I say "non-interrupt blocking?". Currently we have roughly a > > half dozen locking primitives. The two that I am familiar with are > > blocking and spinning mutexes. The general policy is to use blocking > > locks except where a lock is used in interrupts or the scheduler. It > > seems to me that in the scheduler interrupts only actually need to be > > blocked across cpu_switch. Spin locks obviously have to be used > > because a thread cannot very well context switch while its in the > > middle of context switching - however, provided td_critnest > 0, there > > is no reason that interrupts need to be blocked. Currently sched_lock > > is acquired in cpu_hardclock and statclock - so it does need to block > > interrupts. There is no reason that these two functions couldn't be > > run in ast(). > > These functions are called from "fast" interrupt handlers, so they > cannot use sleep locks. They also cannot be run in ast(), since ast() > is only run on return to user mode and uses sleep locks a lot. Gathering > of some user-mode statistics could be deferred until return to user > mode, but this wouldn't work for kernel-mode statistics, which is never > for threads that never leave the kernel, and large changes would be > required for the user-mode statistics: algorithmic changes: various, > mainly to keep kernel-mode separate; locking: ast() uses sched_lock, > so without large changes you would just move the problem (there would > be up to hz + stathz extra calls to ast() per second); the statistics > fields are all locked by sched_lock, and although this would not be > needed for access in ast() some locking would still be needed for many > which are accessed from elsewhere). > > What they (and all fast interrupt handlers or even "fast" interrupt > handlers) can do better is use spin locks != sched_lock (and for fast > interrupt handlers, != mtx_lock_spin(any)). This is not easy to do > in general, and is especially difficult for clock interrupt handlers, > because all accesses to data accessed by a fast interrupt handler must > be locked by a common lock (especially outside of the handlers) and > clock interrupt handlers access a lot of data. Currently, clock > interrupt handlers use sched_lock and depend on sched_lock being used > too much so that most of the data accessed by clock interrupt handlers > is locked automatically. Even then, there are large gaps in the locking. > E.g., hardclock() starts by calling tc_ticktock() which mostly uses > very delicate time-domain locking but sometimes races with syscalls > that use sleep locking, most frequently by calling ntp_update_second(). > Most of kern_ntptime.c is documented (in comments) as being required > to run at splclock() or higher, but it is actually all locked only by > Giant, so sched_lock'ing and other spinlocking for it is neither > necessary or sufficient, and calling it correctly from a "fast" interrupt > handler is impossible. > > In my kernel, fast interrupt handlers (and associated non-handler code > that shares data) are actually fast (== low-latency && > !(very-large-footprint || takes-very-long)). This requires: > - mtx_lock_spin() to not mask interrupts, since masking interrupts gives > !low-latency at least in the UP case. > - fast interrupt handlers to not use sched_lock, since sched_lock gives > very-large-footprint. > - fast interrupt handlers to not use only mtx_lock_spin(), since that no > longer masks them. My implementation actually uses simple_locks plus > explicit per-cpu interrupt disabling (as in RELENG_4). This also avoids > having to turn off features like WITNESS and KTR which don't honor the > rules for fast interrupt handlers. > - fast interrupt handlers to not use normal scheduling (things like > swi_sched()), since that uses sched_lock and is generally very > inefficient. My implementation uses a combination of timeouts > and a hack to metamorphose into a SWI handler. The latter is a > very expensive operation and should be avoided. swi_sched() encourages > this inefficiency except in the SWI_DELAY case. The SWI_DELAY case > only takes 50-100 times as many instructions as corresponding > scheduling in RELENG_4. SWI_DELAY seems to be unused except in > my drivers. My implementation enforces non-use of normal scheduling > and some other invalid data accesses (e.g., to curthread) unmapping > PCPU data in fast interrupt handlers. > - clock interrupt handlers to not be fast interrupt handlers. They > have far too large a footprint to be fast interrupt handlers. Locking > them is hard enough when they are only "fast" interrupt handlers. > I made them normal interrupt handlers and don't support "fast" interrupt > handlers. > > I get very few benefits from this. Normal interrupt handlers for > clocks are inefficient. They don't take very long, but switching to > them is inefficient. I get lower interrupt latency, but this is > not very important now that CPUs are very fast compared with i/o > for all devices that I have. I get the possibility of simpler > locking in clock interrupt handlers, but haven't simplified or fixed > their locking. I get enforced smallness and complexity for fast > interrupt handlers since large ones would be too complicated and > normal scheduling and locking cannot be used. > > Bruce > From owner-freebsd-performance@FreeBSD.ORG Wed Jun 14 21:17:27 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A2B6C16A482 for ; Wed, 14 Jun 2006 21:17:27 +0000 (UTC) (envelope-from arne_woerner@yahoo.com) Received: from web30313.mail.mud.yahoo.com (web30313.mail.mud.yahoo.com [68.142.201.231]) by mx1.FreeBSD.org (Postfix) with SMTP id 0231143D55 for ; Wed, 14 Jun 2006 21:17:26 +0000 (GMT) (envelope-from arne_woerner@yahoo.com) Received: (qmail 58759 invoked by uid 60001); 14 Jun 2006 21:17:26 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=587Qpe7S+UFsOFqn+KGTlTzQmJJYdW3VkV0nbIgsgSAHipE8NES75MXyi8bc10X6QbxlwfXbVWh5N8QHbgpMOD3wRaBZN9y7gbgdHcVNhSw3vmHSVV4VQ5ij7pYHOnBseUMQG5mPYD0Q2e3tQNr7X+ihThdkFdhHSuufTW09yZg= ; Message-ID: <20060614211726.58757.qmail@web30313.mail.mud.yahoo.com> Received: from [213.54.67.226] by web30313.mail.mud.yahoo.com via HTTP; Wed, 14 Jun 2006 14:17:26 PDT Date: Wed, 14 Jun 2006 14:17:26 -0700 (PDT) From: "R. B. Riddick" To: kmacy@fsmware.com In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-performance@freebsd.org, kmacy@freebsd.org Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jun 2006 21:17:27 -0000 Hi boys and girls! *giggle* I hope the following does not sound too much like the product of a bipolar disorder of mine... Some years ago (in or about in 1993) I heard, that there is a computer program, that was able to produce some mathematical theorems out of axioms (even some new, I think; but somehow the process became quite slow somewhen, so that we still use human mathematicians...). Is it possible to describe important sequences in a computer (that would be in this case those sequences, which are performance relevant; like things that involve locks, context switches, ...) mathematically correct? The answer should be "yes", when we omit the philosophical and the pathological perspective... If yes: Couldn't we find nicer/faster algorithms by some kind of a directed search in the space of all possible computer programs? I am not sure, why I dont know of such tool on my box (most likely there is none)... Is the space just too huge? Somehow it feels astonishing, that all relevant computer languages are like C today, although one of my professors already in 1992 was quite excited about his all new computer language, that finds its own algorithm, after the program described the problem, and that mostly existed just in his fantasy... Or r we already using some kind of "optimal kernel generator"? 42? Bye Arne __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Sat Jun 17 12:50:49 2006 Return-Path: X-Original-To: performance@FreeBSD.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 473E716A482 for ; Sat, 17 Jun 2006 12:50:49 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id F0B6943D45 for ; Sat, 17 Jun 2006 12:50:48 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 85CDC46BC3 for ; Sat, 17 Jun 2006 08:50:48 -0400 (EDT) Date: Sat, 17 Jun 2006 13:50:48 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: performance@FreeBSD.org Message-ID: <20060617134402.O8526@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: HZ=100: not necessarily better? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Jun 2006 12:50:49 -0000 Scott asked me if I could take a look at the impact of changing HZ for some simple TCP performance tests. I ran the first couple, and got some results that were surprising, so I thought I'd post about them and ask people who are interested if they could do some investigation also. The short of it is that we had speculated that the increased CPU overhead of a higher HZ would be significant when it came to performance measurement, but in fact, I measure improved performance under high HTTP load with a higher HZ. This was, of course, the reason we first looked at increasing HZ: improving timer granularity helps improve the performance of network protocols, such as TCP. Recent popular opinion has swung in the opposite direction, that higher HZ overhead outweighs this benefit, and I think we should be cautious and do a lot more investigating before assuming that is true. Simple performance results below. Two boxes on a gig-e network with if_em ethernet cards, one running a simple web server hosting 100 byte pages, and the other downloading them in parallel (netrate/http and netrate/httpd). The performance difference is marginal, but at least in the SMP case, likely more than a measurement error or cache alignment fluke. Results are transactions/second sustained over a 30 second test -- bigger is better; box is a dual xeon p4 with HTT; 'vendor.*' are the default 7-CURRENT HZ setting (1000) and 'hz.*' are the HZ=100 versions of the same kernels. Regardless, there wasn't an obvious performance improvement by reducing HZ from 1000 to 100. Results may vary, use only as directed. What we might want to explore is using a programmable timer to set up high precision timeouts, such as TCP timers, while keeping base statistics profiling and context switching at 100hz. I think phk has previously proposed doing this with the HPET timer. I'll run some more diverse tests today, such as raw bandwidth tests, pps on UDP, and so on, and see where things sit. The reduced overhead should be measurable in cases where the test is CPU-bound and there's no clear benefit to more accurate timing, such as with TCP, but it would be good to confirm that. Robert N M Watson Computer Laboratory University of Cambridge peppercorn:~/tmp/netperf/hz> ministat *SMP x hz.SMP + vendor.SMP +--------------------------------------------------------------------------+ |xx x xx x xx x + + + + + ++ + ++| | |_______A________| |_____________A___M________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 13715 13793 13750 13751.1 29.319883 + 10 13813 13970 13921 13906.5 47.551726 Difference at 95.0% confidence 155.4 +/- 37.1159 1.13009% +/- 0.269913% (Student's t, pooled s = 39.502) peppercorn:~/tmp/netperf/hz> ministat *UP x hz.UP + vendor.UP +--------------------------------------------------------------------------+ |x x xx x xx+ ++x+ ++ * + + +| | |_________M_A_______|___|______M_A____________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 14067 14178 14116 14121.2 31.279386 + 10 14141 14257 14170 14175.9 33.248058 Difference at 95.0% confidence 54.7 +/- 30.329 0.387361% +/- 0.214776% (Student's t, pooled s = 32.2787)