Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Nov 1999 09:51:29 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Peter Wemm <peter@netplex.com.au>, Tommy Hallgren <thallgren@yahoo.com>, freebsd-smp@FreeBSD.ORG
Subject:   more on... Re: Matt's new unlock optimiazation 
Message-ID:  <199911231751.JAA10135@apollo.backplane.com>
References:  <19991123140128.3A7D41C6D@overcee.netplex.com.au> <199911231703.JAA09896@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
::> 
::> The subject is: spin_unlock optimization(i386)
::
::A bit worrying, to say the least, especially coming from Linus (even moreso
::in light of his work at transmeta and what they're doing with/to Intel cpu's).
::
::Cheers,
::-Peter
:
:    hmm.  I was under the impression that the Pentium serialized writes
:    by reserving locations through their caches.  But knowing Intel, Linus 
:    is probably right.
:
:    Sometimes I wish I could just take a gun to the Pentium.
:
:    But this isn't a big deal, we should simply be able to do a locked 
:    write into the per-cpu area to synchronize just before we release
:    the lock.  This is still going to be a whole lot more efficient then
:    trying to lock a write to the shared lock, because we will almost certainly
:    already own that memory location.
:
:    I'll run some tests and commit a solution  Nobody commit anything.  No
:    matter what, we still get the benefit of the recursion lock optimization
:    which is actually the more important one.

    Ok, there's a problem but I don't believe you have to use a locked
    instruction to get around it.  All you should need to do is synchronize
    the instruction stream.  I remember from somewhere that 'NOP' (which is
    really just and xchg instruction) does this.  But I am not sure, I am 
    going to have to do some more research.

    I did test and am correct about the cache line ownership change overhead.
    On an SMP box, with two competing processors, using a locked instruction
    on the *same* physical memory location results in 3x the overhead 
    whereas the same locked instruction on different memory locations are
    more efficient.

    So if I can't find a definitive way to do instruction synchronization,
    we will simply do a dummy locked instruction into the per-cpu area.

    With cmpxchgl

	test3:/home/dillon# ./lock shared
	165 nS/loop
	test3:/home/dillon# ./lock private
	53 nS/loop

    With just xchgl

	test3:/home/dillon# ./lock shared
	160 nS/loop
	test3:/home/dillon# ./lock private
	47 nS/loop

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199911231751.JAA10135>