From owner-freebsd-hackers  Tue Jan  5 16:23:03 1999
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id QAA10454
          for freebsd-hackers-outgoing; Tue, 5 Jan 1999 16:23:03 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA10448
          for <hackers@FreeBSD.ORG>; Tue, 5 Jan 1999 16:23:02 -0800 (PST)
          (envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.1/8.9.1) id QAA98857;
	Tue, 5 Jan 1999 16:22:34 -0800 (PST)
	(envelope-from dillon)
Date: Tue, 5 Jan 1999 16:22:34 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199901060022.QAA98857@apollo.backplane.com>
To: Alfred Perlstein <bright@hotjobs.com>
Cc: Terry Lambert <tlambert@primenet.com>, wes@softweyr.com,
        hackers@FreeBSD.ORG
Subject: Re: question about re-entrancy.
References:  <Pine.BSF.4.05.9901051838400.37756-100000@bright.fx.genx.net>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:> :
:> :-Alfred
:> 
:>     Generally speaking you do not want to use multi-valued locks for SMP
:>     operations.  i.e. the concept of a 'shared' lock verses an 'exclusive'
:>     lock does not work well for locks used to manage SMP functions.  The
:>     reason it doesn't work well is because of truely serious starvation
:>     issues -- it is virtually always more efficient for the fine-grained 
:>     SMP locks as *only* exclusive locks and then work on optimizing the 
:>     pipelines.
:
:You may have misunderstood me, several threads could enter a 'read' state,
:when a 'write' must occur no more threads are allowed into the 'read'
:state until the 'write' state is complete.  All the readers have to drain
:out of their code before a write can take place.
:
:Or is this exactly what you are objecting to?

    This is the typical hack that people make, but it isn't a good idea,
    especially for a fine-grained SMP lock.  The reason it isn't a good idea
    is multi-fold fold:  First, it adds complexity and cpu cycles to the
    locking mechanism that can often equal the complexity of the code being
    locked, making the result less efficient. 

    Second, because we are talking about fine-grained SMP locks here, not 
    course-grained locks, and it is more important to maintain the time 
    locality of the pipeline then to try to optimize parallel readers,
    because even if you can use read-locks in one section of code you won't
    be able to use them everywhere, so parallelizing one section of code 
    doesn't gain you anything when you have to serialize another other then
    destroying the time locality of the pipeline. 

    The 'bunching' that occurs with multiple cpu's all trying for the same
    lock destroys the efficiency of the *hardware* cache coherency mechanisms,
    further slowing things down.  But if the processes are already pipelined,
    the memory location containing the lock tends to be 'free', allowing a
    cpu to grab it more quickly.

    Third, because you have a lock starvation issue even when you block 
    further readers unless you also block further writers in the face of a 
    held write lock and pending reader requests.  This can lead into flip-flop
    situations which devolve down into just doing an exclusive lock in the
    first place... but with a less efficient locking mechanism.  You have
    to protect against the inefficient flip-flop situation.

    Fourth, when you have multiple cpu's vying for the *same* lock (the
    bunching, again), not only is hardware cache coherency strained but 
    you get into the situation where cpu's can jump ahead of each other going
    through the same pipeline.  This disrupts the pipeline further, creating
    temporary interlocks and more bunching down the line, destroying the
    time locality of the pipeline.

    Usually for a fine-grained SMP lock, the key issue is to be able to get
    and release the lock as quickly as possible.  If it takes 4 cpu cycles
    to get a simple lock and 4 to release it in a pipeline, it might take 8 cpu
    cycles to get a read/write lock (and 4 to release it).  When vying for the
    *same* lock, it might take 10 cycles to get it and 8 cycles to release
    it due to hardware cache coherency collisions (or even memory commit
    collisions!).  If this is a fine-grained SMP lock, there is probably
    only one or two cpu's going through it even in a 128-cpu SMP machine.
    You want to optimize for the case that is most likely to occur and make
    it go as fast as possible.

						-Matt

:>     course-grained locks should typically *not* be SMP locks but instead be
:>     structural locks with real blocking (the 'go to sleep' type of blocking).
:>     The exception, of course, is the One-Big-Lock model:  the model is really
:>     just a big hack so the normal rules do not apply to it.
:
:Well yes, ala SunOS 4.x.x :)
:
:-Alfred 
:-who's finding -current a better source of knowledge than his short
:college stint :)
:
:thanks for the patience and explanations.

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message