FreeBSD Mail Archives

Date:      Mon, 09 Sep 1996 15:07:42 -0600
From:      Steve Passe <smp@csn.net>
To:        Peter Wemm <peter@spinner.dialix.com>
Cc:        rv@groa.uct.ac.za (Russell Vincent), freebsd-smp@freebsd.org
Subject:   Re: Intel XXpress - some SMP benchmarks 
Message-ID:  <199609092107.PAA27929@clem.systemsix.com>
In-Reply-To: Your message of "Mon, 09 Sep 1996 18:43:58 %2B0800." <199609091043.SAA08111@spinner.DIALix.COM>

Hi,

> Russell Vincent wrote:
> > 'lmbench 1.0' results for:
> 
> Ahem.. Enough said. :-)  But regardless of the accuracy issue, it 
> certainly gives an indication of the various bottlenecks.

exactly, I wanted a benchmark from which to measure improvement.

> >  o Option (3), although not that good in the benchmarks, certainly
> >    appears faster in interactive use. That could just be my imagination,
> >    though.  :-)
> 
> Several things to consider:
>  - the second cpu is never pre-empted while running.  This is bad 
> (obviously :-) since a process that does a while(1); till run on the cpu 
> forever unless it gets killed or paged.  And on that note, we don't make 

at the very least something needs to be done along the lines of using the
2nd CPU's internal timer to context-switch it on the time-quantum.

or perhaps using the apic InterProcessorInterrupt facility to allow the 1st
CPU to tell the others when to call cpu_switch.  Program the context switch
timer for quantum/NCPU, then send each timer INT to the next CPU.
this method would tend to keep the context switching by each CPU separated
in time, thus avoiding mp_lock contention.

(excuse me if I'm suggesting something stupid here, I have minimal knowledge
of the kernel's internals at this point).

> any allowance for the page tables being changed while one cpu is in user 
> mode.  (we flush during the context switch, but that doesn't help if a 
> page is stolen). 

I will elaborate on the  differences between options 3 & 4 in a later mailing,
at this point it is my guess that either cache or page tables/page flushing
is the issue.

> I've been trying to decipher some of the more obscure 
> parts of the apic docs, and it appears that we can sort-of simulate a 
> round-robin approach on certain interrupts without too much reliability, 
> but it's better than nothing I think.  (I have in mind setting all the cpu 
> "priorities" the same, and let the apic's use their internal tie-breaking 
> weighting.  I've not read enough on it yet, but I think it's possible...)

won't this also require code to enable/manage the I/O APIC?

>  - the smp_idleloop is currently killing the performance when one process 
> is running, because the idleloop is constantly bouncing back and forwards 
> between the two idle procs.  ie: _whichidqs is always true, so it's 
> constantly locking, and unlocking causing extreme congestion on that lock. 

my debug code for tracking the mp_lock shows long periods where the 1st CPU
is running the count thru 1,2,3,3,2,2,3,3,2,1,2,3... type progressions.
I think these are INTerrupt periods.

I can't explain the following progressions:

 ...
cpu #1 requests mplock, lock is free
cpu #1 gets mplock,     count: 1
                                        cpu #2 requests mplock, count: 1
cpu #1 enters free,     count: 1
cpu #1 leaves free,     lock is free
cpu #1 requests mplock, count: 1
                                        cpu #2 enters free,     count: 1
                                        cpu #2 leaves free,     lock is free
cpu #1 gets mplock,     count: 1
                                        cpu #2 requests mplock, count: 1
cpu #1 enters free,     count: 1
cpu #1 leaves free,     lock is free
cpu #1 requests mplock, lock is free
                                        cpu #2 gets mplock,     count: 1
                                        cpu #2 enters free,     count: 1
                                        cpu #2 leaves free,     lock is free
cpu #1 gets mplock,     count: 1
 ...

Note that the 2nd CPU goes directly from requesting the lock to entering
rel_mplock(), without "gets get_mplock".  the second time #2 requests
the lock the progression looks valid.  Note that the code that
records these mp_lock changes is itself subject to race conditions and thus
the data could become corrupt, but I see it happen too often to believe that
this is what I am seeing here...  I will get into more detail in the mailing
about this debug code (to follow later...)

>  There has got to be a better way to do the locking (I have ideas).  When 

if you would like me to start coding some test cases let me know.

> - "less debug code"..  Have you looked very closely at the implications of 
> your chipset bios settings?  Is it possible that some of the speedups are 
> deferring cpu cache writebacks too long and one cpu is getting data from 
> RAM that has just been entered into the other chipset's "write buffer"?  
> (ie: cache thinks it's been written back, but it's not in RAM yet, so the 
> MESI protocol is defeated?  I have no idea if this is possible or not.. 
> just a wild guess.  if "lock cmpxchg" is truely atomic, then the problem 
> you see should not be happening...  I presume you have tried the 
> motherboard on "maximum pessimitic settings"?

we tried alot of different settings, nothing noticable.  the Intel BIOS doesn't
give one alot of choices, it doesn't even let you enable/disable cache (at
least that we could find!).  Note that the XXPRESS differs from many boards
in that it has seperate cache sections for each CPU.  Again, I'll get into
the "less code" issue in the mailing specific to this test.

--
Steve Passe	| powered by
smp@csn.net	|            FreeBSD

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609092107.PAA27929>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation