From owner-freebsd-hackers@FreeBSD.ORG  Fri Sep 17 21:11:23 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8C1441065670
	for <freebsd-hackers@freebsd.org>; Fri, 17 Sep 2010 21:11:23 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 56AD78FC08
	for <freebsd-hackers@freebsd.org>; Fri, 17 Sep 2010 21:11:23 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id D29E946B66;
	Fri, 17 Sep 2010 17:11:22 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id EBA198A03C;
	Fri, 17 Sep 2010 17:11:21 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Date: Fri, 17 Sep 2010 17:11:21 -0400
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; )
References: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua>
	<201009161416.05759.jhb@freebsd.org>
	<20100917174244.GA2570@pm513-1.comsys.ntu-kpi.kiev.ua>
In-Reply-To: <20100917174244.GA2570@pm513-1.comsys.ntu-kpi.kiev.ua>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201009171711.21307.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Fri, 17 Sep 2010 17:11:21 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: freebsd-hackers@freebsd.org, Matthew Fleming <mdf356@gmail.com>
Subject: Re: Questions about mutex implementation in kern/kern_mutex.c
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Sep 2010 21:11:23 -0000

On Friday, September 17, 2010 1:42:44 pm Andrey Simonenko wrote:
> On Thu, Sep 16, 2010 at 02:16:05PM -0400, John Baldwin wrote:
> > On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote:
> > > The mtx_owned(9) macro uses this property, mtx_owned() does not use anything
> > > special to compare the value of m->mtx_lock (volatile) with current thread
> > > pointer, all other functions that update m->mtx_lock of unowned mutex use
> > > compare-and-set instruction.  Also I cannot find anything special in
> > > generated Assembler code for volatile variables (except for ia64 where
> > > acquire loads and release stores are used).
> > 
> > No, mtx_owned() is just not harmed by the races it loses.  You can certainly 
> > read a stale value of mtx_lock in mtx_owned() if some other thread owns the 
> > lock or has just released the lock.  However, we don't care, because in both 
> > of those cases, mtx_owned() returns false.  What does matter is that 
> > mtx_owned() can only return true if we currently hold the mutex.  This works 
> > because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the 
> > same time, and 2) even CPUs that hold writes in store buffers will snoop their 
> > store buffer for local reads on that CPU.  That is, a given CPU will never 
> > read a stale value of a memory word that is "older" than a write it has 
> > performed to that word.
> 
> Looks like I understand the logic why mtx_owned() works correctly when
> mtx_lock is present in CPU cache or is absent in CPU cache.  The mtx_lock
> value definitely can say whether lock is held by the current thread, but
> it cannot say whether it is unowned or is owned by another thread.
> 
> Let me ask another one question about memory barriers and thread migration.
> 
> Let a thread locked a mutex, modified shared data protected by this mutex
> and was migrated from CPU1 to CPU2 (mutex is still locked).  In this scenario
> just migrated thread will not see stale data for a mutex itself (the
> m->mtx_lock value) and for shared data on CPU2 because when it was migrated
> from CPU1 there was at least one unlock call for some another mutex that had
> release semantics and appropriate memory barrier instruction was run
> implicitly or explicitly.  As a result this "rel" memory barrier made all
> modifications from CPU1 visible on another CPUs.  When CPU2 switched to just
> migrated thread there was at least on lock call for some another mutex with
> acquire semantics, so "rel/acq" memory barriers pair works here together.
> (Also I consider case when CPU2 did not work with that mutex, but worked
> with its memory before.  Some thread on CPU2 could allocate some memory,
> worked with it and freed it.  Later the same part of memory was allocated
> by a thread on CPU1 for mutex).
> 
> Is the above written description correct?

Yes.

> > > There are some places in the kernel where a variable is updated in
> > > something like "do { v = value; } while (!atomic_cmpset_int(&value, ...));"
> > > and that variable is not "volatile", but the compiler generates correct
> > > Assembler code.  So "volatile" is not a requirement for all cases.
> > 
> > Hmm, I suspect that many of those places actually do use volatile.  The 
> > various lock cookies (mtx_lock, etc.) are declared volatile in the structure.  
> > Otherwise the compiler would be free to conclude that 'v = value;' is a loop 
> > invariant and move it out of the loop which would break.  Given that, the 
> > construct you referred to does in fact require 'value' to be volatile.
> 
> I checked Assembler code for these functions:
> 
> kern/subr_msgbuf.c:msgbuf_addchar()
> vm/vm_map.c:vmspace_free()

They may happen to accidentally work because atomic_cmpset() clobbers all of
memory, but these should be marked volatile.

Index: vm/vm_map.c
===================================================================
--- vm/vm_map.c	(revision 212801)
+++ vm/vm_map.c	(working copy)
@@ -343,10 +343,7 @@
 	if (vm->vm_refcnt == 0)
 		panic("vmspace_free: attempt to free already freed vmspace");
 
-	do
-		refcnt = vm->vm_refcnt;
-	while (!atomic_cmpset_int(&vm->vm_refcnt, refcnt, refcnt - 1));
-	if (refcnt == 1)
+	if (atomic_fetchadd_int(&vm->vm_refcnt, -1) == 1)
 		vmspace_dofree(vm);
 }
 
Index: vm/vm_map.h
===================================================================
--- vm/vm_map.h	(revision 212801)
+++ vm/vm_map.h	(working copy)
@@ -237,7 +237,7 @@
 	caddr_t vm_taddr;	/* (c) user virtual address of text */
 	caddr_t vm_daddr;	/* (c) user virtual address of data */
 	caddr_t vm_maxsaddr;	/* user VA at max stack growth */
-	int	vm_refcnt;	/* number of references */
+	volatile int vm_refcnt;	/* number of references */
 	/*
 	 * Keep the PMAP last, so that CPU-specific variations of that
 	 * structure on a single architecture don't result in offset
Index: sys/msgbuf.h
===================================================================
--- sys/msgbuf.h	(revision 212801)
+++ sys/msgbuf.h	(working copy)
@@ -38,7 +38,7 @@
 #define	MSG_MAGIC	0x063062
 	u_int	msg_magic;
 	u_int	msg_size;		/* size of buffer area */
-	u_int	msg_wseq;		/* write sequence number */
+	volatile u_int msg_wseq;	/* write sequence number */
 	u_int	msg_rseq;		/* read sequence number */
 	u_int	msg_cksum;		/* checksum of contents */
 	u_int	msg_seqmod;		/* range for sequence numbers */

-- 
John Baldwin