From owner-freebsd-current  Mon Jul 12 23:33:35 1999
Delivered-To: freebsd-current@freebsd.org
Received: from alcanet.com.au (border.alcanet.com.au [203.62.196.10])
	by hub.freebsd.org (Postfix) with ESMTP id A898615214
	for <freebsd-current@FreeBSD.ORG>; Mon, 12 Jul 1999 23:33:30 -0700 (PDT)
	(envelope-from jeremyp@gsmx07.alcatel.com.au)
Received: by border.alcanet.com.au id <40326>; Tue, 13 Jul 1999 16:15:25 +1000
Date: Tue, 13 Jul 1999 16:33:11 +1000
From: Peter Jeremy <jeremyp@gsmx07.alcatel.com.au>
Subject: Re: "objtrm" problem probably found (was Re: Stuck in "objtrm")
In-reply-to: <199907130528.WAA74299@apollo.backplane.com>
To: dillon@apollo.backplane.com
Cc: freebsd-current@FreeBSD.ORG
Message-Id: <99Jul13.161525est.40326@border.alcanet.com.au>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Matthew Dillon <dillon@apollo.backplane.com> wrote:
>:[1] A locked instruction implies a synchronous RMW cycle.  In order
>:    to meet write-ordering guarantees (without which, a locked RMW
>:    cycle would be useless as a semaphore primitive), it implies a
>:    complete write serialization, and probably some level of
>:    instruction serialisation.  Since write-back pipelines will get
>
>    A locked instruction only implies cache coherency across the 
>    instruction.  It does not imply serialization.  Intel blows it
>    big time, but that's intel for you.

Ooops, looks like foot-in-mouth time for me :-(.

Maybe I should have said that "without any other cache coherency
protocol, you need serialisation" :-).

Given this correction, the lock degradation is much less than I
suggested.  I suspect there _will_ be gradual degradation though.

>:    longer and parallel execution units more numerous, the cost of
>:    a serialisation operation will get relatively higher.  Also,
>
>    It is not a large number of execution units that implies a higher
>    cost of serialization but instead data interdependancies.

I was thinking more that a locked instruction is inconsistent with
parallel execution, but that's probably not true either.

>    Modern cache coherency protocols do not have a problem with 
>    a large number of caches in a parallel processing subsystem.
I thought we were talking about Intel :-).

>    So, using the above rules as an example, a locked instruction can cost
>    as little as 0 extra cycles no matter how many cpu's you have running
>    in parallel.  There is no need to serialize or synchronize anything.

Assuming a non-contested access.  If you've got two CPU's fighting
over a lock, then you'll have a bus cycle - and CPU core speeds are
increasing faster than bus speeds.  (486's were normally 1 or 2
times the bus speed, a PIII-450 is 4.5 times bus speed).

And as you pointed out elsewhere, call/return sequences can't get
too much faster - which suggests that the relative costs should stay
fairly similar.  At least for a well-designed architecture...

Peter


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message