Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Jan 2003 18:19:07 -0800
From:      Peter Wemm <peter@wemm.org>
To:        Marcel Moolenaar <marcel@xcllnt.net>
Cc:        John Baldwin <jhb@FreeBSD.org>, Nate Lawson <nate@root.org>, cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org, Attila Nagy <bra@fsn.hu>
Subject:   Re: cvs commit: src/sys/i386/i386 identcpu.c initcpu.c locore.s 
Message-ID:  <20030125021907.51B482A89E@canning.wemm.org>
In-Reply-To: <20030125013344.GA54764@dhcp01.pn.xcllnt.net> 

next in thread | previous in thread | raw e-mail | index | archive | help
Marcel Moolenaar wrote:
> On Fri, Jan 24, 2003 at 05:25:27PM -0800, Peter Wemm wrote:
> > John Baldwin wrote:

> > > Maybe.  Preliminary buildworld tests on 4.x seem to suggest that HTT
> > > is slower than UP, but buildworld is just one application.  HTT will
> > > probably be optional on stable.  On -current we will eventually use
> > > ACPI to enumerate CPU's which means that we will respect BIOS settings
> > > with regards to whether or not HTT is enabled.
> > 
> > Did you remember to set machdep.cpu_idle_hlt to 1?  Failing to set this
> > will really suck because the logical cores will be spinning like crazy and
> > stealing execution resources from functional tasks on the other part of the
> > cpu.
> 
> What about an increase in cache misses due to a degradation of locality
> by having a larger, less coherent/dense working set?

Sure, cache etc doesn't come free.  But losing up to every second pipeline
slot to the "idle" spinloop because we dont ever halt the cpu in SMP mode
isn't going to help either.

For example, with the default settings:
# tcsh ./time.sh
machdep.cpu_idle_hlt: 0 
62.441u 11.219s 1:10.10 105.0%  1716+2807k 0+644io 0pf+0w
62.507u 11.304s 1:10.48 104.7%  1705+2804k 0+596io 0pf+0w
62.774u 10.689s 1:10.18 104.6%  1705+2798k 0+596io 0pf+0w
62.561u 11.314s 1:10.68 104.5%  1701+2791k 0+597io 0pf+0w

And then after changing the sysctl:
# tcsh ./time.sh
machdep.cpu_idle_hlt: 1
47.184u 8.622s 0:53.79 103.7%   1724+2830k 4+669io 0pf+0w
46.670u 9.065s 0:53.19 104.7%   1724+2814k 0+634io 0pf+0w
47.239u 8.606s 0:53.80 103.7%   1728+2812k 0+625io 0pf+0w
46.955u 8.789s 0:53.87 103.4%   1731+2821k 0+656io 0pf+0w

Personally, I think that avoiding a 32% slowdown speaks very well for
turning the halt instuction on by default in the idle loop.

This is a plain kernel build, entirely from memory.
#! /bin/tcsh
sysctl machdep.cpu_idle_hlt
make -s clean depend
time make -s
make -s clean depend
time make -s
make -s clean depend
time make -s
make -s clean depend
time make -s

COPTFLAGS has got "-O -pipe" in /etc/make.conf.  Note that I'm not using
-jN.

I cant test this machine without HTT enabled because it wont boot (except
in UP mode).

CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2799.70-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf27  Stepping = 7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Hyperthreading: 2 logical CPUs
...
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00050014, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00050014, at 0xfee00000
 cpu2 (AP):  apic id:  6, version: 0x00050014, at 0xfee00000
 cpu3 (AP):  apic id:  7, version: 0x00050014, at 0xfee00000
 io0 (APIC): apic id:  2, version: 0x000f0011, at 0xfec00000
 io1 (APIC): apic id:  3, version: 0x000f0011, at 0xfec01000
 io2 (APIC): apic id:  4, version: 0x000f0011, at 0xfec02000
 io3 (APIC): apic id:  5, version: 0x000f0011, at 0xfec03000

Note that this is a SMP P4 Xeon, not a new HTT P4.

Oh, and in case somebody asks about the -jN case..

cpu_idle_hlt=0, -j4   (default)
81.127u 15.299s 0:33.10 291.2%  1751+2771k 3+528io 0pf+0w
81.046u 15.483s 0:33.14 291.2%  1747+2773k 3+612io 0pf+0w
 
cpu_idle_hlt=1, -j4 
76.891u 13.749s 0:31.28 289.7%  1743+2745k 3+646io 0pf+0w
76.230u 14.105s 0:31.82 283.8%  1750+2755k 3+591io 0pf+0w

Again, it is faster with a true halt rather than a spinloop.

cpu_idle_hlt=0, -j6   (default)
84.083u 15.899s 0:29.54 338.4%  1764+2791k 3+629io 0pf+0w
84.790u 15.030s 0:29.75 335.5%  1759+2782k 3+606io 0pf+0w

cpu_idle_hlt=1, -j6
81.572u 14.802s 0:29.59 325.6%  1754+2762k 3+689io 0pf+0w
82.642u 13.887s 0:29.10 331.6%  1764+2768k 3+625io 0pf+0w

Not quite as significant, but still an improvement.  I didn't try any
larger -jN numbers.

The last time I tried this on a non-HTT system, enabling the true halt
caused a slight slowdown.  But the machine used a lot less power and
the room was cooler. :-]

Cheers,
-Peter
--
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030125021907.51B482A89E>