From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 17 21:11:23 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8C1441065670 for ; Fri, 17 Sep 2010 21:11:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 56AD78FC08 for ; Fri, 17 Sep 2010 21:11:23 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id D29E946B66; Fri, 17 Sep 2010 17:11:22 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id EBA198A03C; Fri, 17 Sep 2010 17:11:21 -0400 (EDT) From: John Baldwin To: Andrey Simonenko Date: Fri, 17 Sep 2010 17:11:21 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> <201009161416.05759.jhb@freebsd.org> <20100917174244.GA2570@pm513-1.comsys.ntu-kpi.kiev.ua> In-Reply-To: <20100917174244.GA2570@pm513-1.comsys.ntu-kpi.kiev.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009171711.21307.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 17 Sep 2010 17:11:21 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-hackers@freebsd.org, Matthew Fleming Subject: Re: Questions about mutex implementation in kern/kern_mutex.c X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2010 21:11:23 -0000 On Friday, September 17, 2010 1:42:44 pm Andrey Simonenko wrote: > On Thu, Sep 16, 2010 at 02:16:05PM -0400, John Baldwin wrote: > > On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: > > > The mtx_owned(9) macro uses this property, mtx_owned() does not use anything > > > special to compare the value of m->mtx_lock (volatile) with current thread > > > pointer, all other functions that update m->mtx_lock of unowned mutex use > > > compare-and-set instruction. Also I cannot find anything special in > > > generated Assembler code for volatile variables (except for ia64 where > > > acquire loads and release stores are used). > > > > No, mtx_owned() is just not harmed by the races it loses. You can certainly > > read a stale value of mtx_lock in mtx_owned() if some other thread owns the > > lock or has just released the lock. However, we don't care, because in both > > of those cases, mtx_owned() returns false. What does matter is that > > mtx_owned() can only return true if we currently hold the mutex. This works > > because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the > > same time, and 2) even CPUs that hold writes in store buffers will snoop their > > store buffer for local reads on that CPU. That is, a given CPU will never > > read a stale value of a memory word that is "older" than a write it has > > performed to that word. > > Looks like I understand the logic why mtx_owned() works correctly when > mtx_lock is present in CPU cache or is absent in CPU cache. The mtx_lock > value definitely can say whether lock is held by the current thread, but > it cannot say whether it is unowned or is owned by another thread. > > Let me ask another one question about memory barriers and thread migration. > > Let a thread locked a mutex, modified shared data protected by this mutex > and was migrated from CPU1 to CPU2 (mutex is still locked). In this scenario > just migrated thread will not see stale data for a mutex itself (the > m->mtx_lock value) and for shared data on CPU2 because when it was migrated > from CPU1 there was at least one unlock call for some another mutex that had > release semantics and appropriate memory barrier instruction was run > implicitly or explicitly. As a result this "rel" memory barrier made all > modifications from CPU1 visible on another CPUs. When CPU2 switched to just > migrated thread there was at least on lock call for some another mutex with > acquire semantics, so "rel/acq" memory barriers pair works here together. > (Also I consider case when CPU2 did not work with that mutex, but worked > with its memory before. Some thread on CPU2 could allocate some memory, > worked with it and freed it. Later the same part of memory was allocated > by a thread on CPU1 for mutex). > > Is the above written description correct? Yes. > > > There are some places in the kernel where a variable is updated in > > > something like "do { v = value; } while (!atomic_cmpset_int(&value, ...));" > > > and that variable is not "volatile", but the compiler generates correct > > > Assembler code. So "volatile" is not a requirement for all cases. > > > > Hmm, I suspect that many of those places actually do use volatile. The > > various lock cookies (mtx_lock, etc.) are declared volatile in the structure. > > Otherwise the compiler would be free to conclude that 'v = value;' is a loop > > invariant and move it out of the loop which would break. Given that, the > > construct you referred to does in fact require 'value' to be volatile. > > I checked Assembler code for these functions: > > kern/subr_msgbuf.c:msgbuf_addchar() > vm/vm_map.c:vmspace_free() They may happen to accidentally work because atomic_cmpset() clobbers all of memory, but these should be marked volatile. Index: vm/vm_map.c =================================================================== --- vm/vm_map.c (revision 212801) +++ vm/vm_map.c (working copy) @@ -343,10 +343,7 @@ if (vm->vm_refcnt == 0) panic("vmspace_free: attempt to free already freed vmspace"); - do - refcnt = vm->vm_refcnt; - while (!atomic_cmpset_int(&vm->vm_refcnt, refcnt, refcnt - 1)); - if (refcnt == 1) + if (atomic_fetchadd_int(&vm->vm_refcnt, -1) == 1) vmspace_dofree(vm); } Index: vm/vm_map.h =================================================================== --- vm/vm_map.h (revision 212801) +++ vm/vm_map.h (working copy) @@ -237,7 +237,7 @@ caddr_t vm_taddr; /* (c) user virtual address of text */ caddr_t vm_daddr; /* (c) user virtual address of data */ caddr_t vm_maxsaddr; /* user VA at max stack growth */ - int vm_refcnt; /* number of references */ + volatile int vm_refcnt; /* number of references */ /* * Keep the PMAP last, so that CPU-specific variations of that * structure on a single architecture don't result in offset Index: sys/msgbuf.h =================================================================== --- sys/msgbuf.h (revision 212801) +++ sys/msgbuf.h (working copy) @@ -38,7 +38,7 @@ #define MSG_MAGIC 0x063062 u_int msg_magic; u_int msg_size; /* size of buffer area */ - u_int msg_wseq; /* write sequence number */ + volatile u_int msg_wseq; /* write sequence number */ u_int msg_rseq; /* read sequence number */ u_int msg_cksum; /* checksum of contents */ u_int msg_seqmod; /* range for sequence numbers */ -- John Baldwin