Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Jul 1999 10:41:26 -0700
From:      Mike Haertel <mike@ducky.net>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Luoqi Chen <luoqi@watermarkgroup.com>, dfr@nlsystems.com, jeremyp@gsmx07.alcatel.com.au, freebsd-current@FreeBSD.ORG, mike@ducky.net, mike@ducky.net
Subject:   Re: "objtrm" problem probably found (was Re: Stuck in "objtrm") 
Message-ID:  <199907121741.KAA17837@ducky.net>
In-Reply-To: Your message of "Mon, 12 Jul 1999 09:47:11 PDT." <199907121647.JAA70249@apollo.backplane.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
You might think that, due to MESI state bits in the cache and bus
coherency protocols, that locks are "free".

Unfortunately, the lock prefix has a measurable cost on a UP system,
at least on P6 and later processors.  The reason is that the locked
memory operation is an "at-retirement" operation, which means that
it waits for the out-of-order execution of all instructions logically
older than it to complete before it even starts to operate.
(Suppose locks were not at-retirement--then locks on cache lines
could be obtained out-of-order, and this would lead to a possibility
of global deadlocks even if the original code was deadlock-free.)

Locks may in fact have further serializing effects, like draining
the store queues prior to obtaining the lock, I have forgotten.
Hmm, I am almost sure the lock needs to drain the store queues.
Let's assume it does.

This all adds up to "locks are painful".

Some data:
	Loop:	addl $1, foo
		subl $1, %ecx
		jne Loop
requires about 30 seconds to 10M iterations, and with "lock; addl $1, foo"
requires about 4 minutes and 30 seconds on my 333 MHz P-II.
(This loop has other problems and someone else just posted a much
better lock benchmark than this.  Anyway...)

As future processors become more deeply out-of-order, locks will
become even more painful (although one could imagine at some point
that they might cross the pain threshold that would justify heroic
hardware solutions allowing OOO locks).



Anyway, taking all that into account, I still agree with Dillon
that it is a better software solution to allow the same loadable
drivers to work for both UP and MP systems whenever possible.

One way to do this while not feeling the full pain of locks
would be to make the atomic operations actual function calls
through function pointers.  They could point to the locked or
non-locked versions depending on whether the kernel was SMP.
Although function calls are more expensive than inline code,
they aren't necessarily a lot more so, and function calls to
non-locked RMW operations are certainly much cheaper than
inline locked RMW operations.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199907121741.KAA17837>