Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 May 2000 18:52:45 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Chuck Paterson <cp@bsdi.com>
Cc:        arch@FreeBSD.ORG
Subject:   Re: Short summary
Message-ID:  <200005250152.SAA78130@apollo.backplane.com>

next in thread | raw e-mail | index | archive | help
:virtually the same speed as the Giant lock BSD/OS kernel in a uniprocessor
:environment. It occurred to me today that in a uniprocessor environment
:the lock prefix to the cmpxchg can be removed.  I ran some
:experiments. The following data is from a very limited sample size. On
:a couple of different systems with different clock rates removing
:the lock prefix reduced execution time of mutex operations to one
:third of their original value. Running the same job with two kernels
:whose only difference was the lock prefix there was a reduction in
:system time of 2.5 percent. This suggested that the total system
:time used for locking with the SMP locks in place is 3.6 percent
:and with the the locks trimmed for uniprocessor only operation is
:1.2 percent. (Please excuse rounding errors).

    Chuck, there was extensive debate and testing on both Linux and
    FreeBSD with regards to locked instructions in an SMP environment.
    It was determined that there is an optimization one can make which
    improves lock performance on SMP systems.

    The jist of the optimization is that if you use a lock prefix when
    locking, you do *not* need a lock prefix when unlocking.  Write 
    ordering is guarenteed on Intel (586 or above).

    Also, for recursive locks for the case where you ALREADY hold the lock,
    you do not need a lock prefix when incrementing or decrementing the
    count.

    Take a look at the FreeBSD mp_unlock code in 4.x or 5.x (with a reasonably 
    recent cvs update) for an example.  /usr/src/sys/i386/i386/mplock.s,
    the MPrellock_edx subroutine.  These changes saved over a microsecond
    in syscall overhead for FreeBSD SMP.

    This optimization radically improves the performance of an unlock at 
    the cost of adding a slight delay before contending cpu's see the 
    change.  Since there is no lock contention 99.999% of the time, the
    delay is completely absorbed and you realize an increase in performance
    across the board.

    The recursion optimization makes recursive locks practical in an SMP 
    setting.  There is virtually *NO* overhead after you've obtained the
    initial lock.

						-Matt



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200005250152.SAA78130>