From owner-freebsd-current Mon Jul 12 22:29:14 1999 Delivered-To: freebsd-current@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 52DB614F66 for ; Mon, 12 Jul 1999 22:29:11 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id WAA74299; Mon, 12 Jul 1999 22:28:02 -0700 (PDT) (envelope-from dillon) Date: Mon, 12 Jul 1999 22:28:02 -0700 (PDT) From: Matthew Dillon Message-Id: <199907130528.WAA74299@apollo.backplane.com> To: Peter Jeremy Cc: mike@smith.net.au, freebsd-current@FreeBSD.ORG Subject: Re: "objtrm" problem probably found (was Re: Stuck in "objtrm") References: <99Jul13.134051est.40360@border.alcanet.com.au> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :Based on general computer architecture principles, I'd say that a lock :prefix is likely to become more expensive[1], whilst a function call :will become cheaper[2] over time. :... : :[1] A locked instruction implies a synchronous RMW cycle. In order : to meet write-ordering guarantees (without which, a locked RMW : cycle would be useless as a semaphore primitive), it implies a : complete write serialization, and probably some level of : instruction serialisation. Since write-back pipelines will get A locked instruction only implies cache coherency across the instruction. It does not imply serialization. Intel blows it big time, but that's intel for you. : longer and parallel execution units more numerous, the cost of : a serialisation operation will get relatively higher. Also, :Peter It is not a large number of execution units that implies a higher cost of serialization but instead data interdependancies. A locked instruction doe snot have to imply serialization. Modern cache coherency protocols do not have a problem with a large number of caches in a parallel processing subsystem. This is how a typical two-level cache coherency protocol works with an L1 and L2 cache: * The L2 cache is usually a superset of the L1 cache. That is, all cache lines in the L1 cache also exist in the L2 cache. * Both the L1 and L2 caches implement a shared and an exclusive bit, usually on a per-cache-line basis. * When a processor, A, issues a memory op that is not in its cache, it can request the cache line from main memory either shared (unlocked) or exclusive (locked). * All other processors snoop the main memory access. * If another processor's L2 cache contains the cache line being requested by processor A, it can provide the cache line to processor A and no main memory op actually occurs. In this case, the shared and exclusive bits in processor B's L1 and L2 caches are updated appropriately. - if A is requesting shared (unlocked) access, both A and B will set the shared bit and B will clear the exclusive bit. (B will cause A to stall if it is in the middle of operating on the locked cache line). - if A is requesting exclusive (locked) access, B will invalidate its cache line and clear both the shared and exclusive bits, and A will set its exclusive bit. A now owns that cache line. - If no other processor owns the cache line, A obtains the data from main memory and other processors have the option to snoop the access in their L2 caches. So, using the above rules as an example, a locked instruction can cost as little as 0 extra cycles no matter how many cpu's you have running in parallel. There is no need to serialize or synchronize anything. The worst case is actually not even as bad as a complete cache-miss case. The cost of snooping another cpu's L2 cache is much less then the cost of going to main memory. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message