Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 May 2004 09:52:27 -0400 
From:      Don Bowman <don@sandvine.com>
To:        'Bruce Evans' <bde@zeta.org.au>, Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        Gerrit Nagelhout <gnagelhout@sandvine.com>
Subject:   RE: 4.7 vs 5.2.1 SMP/UP bridging performance
Message-ID:  <FE045D4D9F7AED4CBFF1B3B813C85337045D8CB5@mail.sandvine.com>

next in thread | raw e-mail | index | archive | help
From: Bruce Evans [mailto:bde@zeta.org.au]
> On Wed, 5 May 2004, Andrew Gallatin wrote:
> 
 ...

> >
> > Actually, I think his tests are accurate and bus locked instructions
> > take an eternity on P4.  See
> > http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html
> >
> > For example, with your test above, I see 212 cycles for the 
> UP case on
> > a 2.53GHz P4.  Replacing the atomic_store_rel_int(&slock, 0) with a
> > simple slock = 0; reduces that count to 18 cycles.
> 
> This seems to be right, unfortunately.  I wonder if this has 
> anything to
> do with freebsd.org having no P4 machines.
> 
> > If its really safe to remove the xchg* from non-SMP 
> atomic_store_rel*,
> > then I think you should do it.  Of course, that still leaves mutexes
> > as very expensive on SMP (253 cycles on the 2.53GHz from above).
> 
> I forgot (again) that there are memory access ordering issues.  A lock
> may be needed to get everything synced.  See the comment 
> before the i386
> versions in i386/include/atomic.h.  A single lock may be enough.  The
> best example I could think of easily is:

On the P4, there are mfence,lfence,sfence instructions to enforce
memory ordering. These are cheaper than "lock; andl" or "cpuid",
which are the traditional 'sync' instructions.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C85337045D8CB5>