Date: Wed, 24 May 2000 18:52:45 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Chuck Paterson <cp@bsdi.com> Cc: arch@FreeBSD.ORG Subject: Re: Short summary Message-ID: <200005250152.SAA78130@apollo.backplane.com>
next in thread | raw e-mail | index | archive | help
:virtually the same speed as the Giant lock BSD/OS kernel in a uniprocessor :environment. It occurred to me today that in a uniprocessor environment :the lock prefix to the cmpxchg can be removed. I ran some :experiments. The following data is from a very limited sample size. On :a couple of different systems with different clock rates removing :the lock prefix reduced execution time of mutex operations to one :third of their original value. Running the same job with two kernels :whose only difference was the lock prefix there was a reduction in :system time of 2.5 percent. This suggested that the total system :time used for locking with the SMP locks in place is 3.6 percent :and with the the locks trimmed for uniprocessor only operation is :1.2 percent. (Please excuse rounding errors). Chuck, there was extensive debate and testing on both Linux and FreeBSD with regards to locked instructions in an SMP environment. It was determined that there is an optimization one can make which improves lock performance on SMP systems. The jist of the optimization is that if you use a lock prefix when locking, you do *not* need a lock prefix when unlocking. Write ordering is guarenteed on Intel (586 or above). Also, for recursive locks for the case where you ALREADY hold the lock, you do not need a lock prefix when incrementing or decrementing the count. Take a look at the FreeBSD mp_unlock code in 4.x or 5.x (with a reasonably recent cvs update) for an example. /usr/src/sys/i386/i386/mplock.s, the MPrellock_edx subroutine. These changes saved over a microsecond in syscall overhead for FreeBSD SMP. This optimization radically improves the performance of an unlock at the cost of adding a slight delay before contending cpu's see the change. Since there is no lock contention 99.999% of the time, the delay is completely absorbed and you realize an increase in performance across the board. The recursion optimization makes recursive locks practical in an SMP setting. There is virtually *NO* overhead after you've obtained the initial lock. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200005250152.SAA78130>