From owner-freebsd-hackers Tue Jan 5 16:23:03 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA10454 for freebsd-hackers-outgoing; Tue, 5 Jan 1999 16:23:03 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA10448 for ; Tue, 5 Jan 1999 16:23:02 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.1/8.9.1) id QAA98857; Tue, 5 Jan 1999 16:22:34 -0800 (PST) (envelope-from dillon) Date: Tue, 5 Jan 1999 16:22:34 -0800 (PST) From: Matthew Dillon Message-Id: <199901060022.QAA98857@apollo.backplane.com> To: Alfred Perlstein Cc: Terry Lambert , wes@softweyr.com, hackers@FreeBSD.ORG Subject: Re: question about re-entrancy. References: Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> : :> :-Alfred :> :> Generally speaking you do not want to use multi-valued locks for SMP :> operations. i.e. the concept of a 'shared' lock verses an 'exclusive' :> lock does not work well for locks used to manage SMP functions. The :> reason it doesn't work well is because of truely serious starvation :> issues -- it is virtually always more efficient for the fine-grained :> SMP locks as *only* exclusive locks and then work on optimizing the :> pipelines. : :You may have misunderstood me, several threads could enter a 'read' state, :when a 'write' must occur no more threads are allowed into the 'read' :state until the 'write' state is complete. All the readers have to drain :out of their code before a write can take place. : :Or is this exactly what you are objecting to? This is the typical hack that people make, but it isn't a good idea, especially for a fine-grained SMP lock. The reason it isn't a good idea is multi-fold fold: First, it adds complexity and cpu cycles to the locking mechanism that can often equal the complexity of the code being locked, making the result less efficient. Second, because we are talking about fine-grained SMP locks here, not course-grained locks, and it is more important to maintain the time locality of the pipeline then to try to optimize parallel readers, because even if you can use read-locks in one section of code you won't be able to use them everywhere, so parallelizing one section of code doesn't gain you anything when you have to serialize another other then destroying the time locality of the pipeline. The 'bunching' that occurs with multiple cpu's all trying for the same lock destroys the efficiency of the *hardware* cache coherency mechanisms, further slowing things down. But if the processes are already pipelined, the memory location containing the lock tends to be 'free', allowing a cpu to grab it more quickly. Third, because you have a lock starvation issue even when you block further readers unless you also block further writers in the face of a held write lock and pending reader requests. This can lead into flip-flop situations which devolve down into just doing an exclusive lock in the first place... but with a less efficient locking mechanism. You have to protect against the inefficient flip-flop situation. Fourth, when you have multiple cpu's vying for the *same* lock (the bunching, again), not only is hardware cache coherency strained but you get into the situation where cpu's can jump ahead of each other going through the same pipeline. This disrupts the pipeline further, creating temporary interlocks and more bunching down the line, destroying the time locality of the pipeline. Usually for a fine-grained SMP lock, the key issue is to be able to get and release the lock as quickly as possible. If it takes 4 cpu cycles to get a simple lock and 4 to release it in a pipeline, it might take 8 cpu cycles to get a read/write lock (and 4 to release it). When vying for the *same* lock, it might take 10 cycles to get it and 8 cycles to release it due to hardware cache coherency collisions (or even memory commit collisions!). If this is a fine-grained SMP lock, there is probably only one or two cpu's going through it even in a 128-cpu SMP machine. You want to optimize for the case that is most likely to occur and make it go as fast as possible. -Matt :> course-grained locks should typically *not* be SMP locks but instead be :> structural locks with real blocking (the 'go to sleep' type of blocking). :> The exception, of course, is the One-Big-Lock model: the model is really :> just a big hack so the normal rules do not apply to it. : :Well yes, ala SunOS 4.x.x :) : :-Alfred :-who's finding -current a better source of knowledge than his short :college stint :) : :thanks for the patience and explanations. Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet Communications & God knows what else. (Please include original email in any response) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message