From owner-svn-src-head@FreeBSD.ORG Thu Apr 21 00:45:57 2011 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4746C1065674; Thu, 21 Apr 2011 00:45:57 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id D39968FC0A; Thu, 21 Apr 2011 00:45:56 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p3L0jr8e014297 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 21 Apr 2011 10:45:54 +1000 Date: Thu, 21 Apr 2011 10:45:52 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Pawel Jakub Dawidek In-Reply-To: <20110420161655.GA1907@garage.freebsd.pl> Message-ID: <20110421101020.V1056@besplex.bde.org> References: <20110420054655.GD1826@garage.freebsd.pl> <630616771.329277.1303301372199.JavaMail.root@erie.cs.uoguelph.ca> <20110420161655.GA1907@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@freebsd.org, Rick Macklem , svn-src-all@freebsd.org, Rick Macklem , src-committers@freebsd.org Subject: Re: svn commit: r220877 - head/sys/fs/nfsclient X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Apr 2011 00:45:57 -0000 On Wed, 20 Apr 2011, Pawel Jakub Dawidek wrote: > On Wed, Apr 20, 2011 at 08:09:32AM -0400, Rick Macklem wrote: >>>> + tmp_off = uio->uio_offset + uio->uio_resid; >>>> + mtx_lock(&nmp->nm_mtx); >>>> + if (tmp_off > nmp->nm_maxfilesize || tmp_off < uio->uio_offset) { >>>> + mtx_unlock(&nmp->nm_mtx); >>>> return (EFBIG); >>>> + } >>>> + mtx_unlock(&nmp->nm_mtx); >>> >>> I don't think you need the lock to protect nm_maxfilesize. Can it >>> change >>> from under us? My guess is that it is set on mount time and is not >>> modified afterwards. I said the same in a private reply which was initially about overflow errors in the overflow checking, but expanded to also cover errors in the return values (EFBIG instead of EOF or EOVERFLOW for read()) and this point. >>> >> Good question. >> For NFSv3 - it is only modified by the first fsinfo RPC and that normally >> happens at mount time, as you guessed above. (This is consistent with >> RFC1813, which defines the fsinfo RPC as getting non-volatile information.) >> For NFSv4 - it gets it each time VFS_STATFS() happens. I am not sure that >> this is correct, but I don't know of anywhere in RFC3530 where it states >> that this attribute will not change. In practice, I suspect that servers >> seldom, if ever, change it. >> >> So, it is unlikely to change and I'd be comfortable taking the mutex lock >> off the check for it, if others are? (As you might be aware, I started a >> thread on hackers-freebsd@ where my question was basically "do you need to >> mutex lock when you read a global variable". My main concern there was a >> case that I'm working on w.r.t. forced dismounts. jhb@ suggested that he >> thinks it is good practice to always lock, to play it safe. At least that >> was my interpretation?) > > This is not that easy, I'm afraid. You need to ask yourself a question > what you are trying to protect from. Here, the mutex only guarantees to > have consistent view of the nm_maxfilesize field. For example if this > field modification wouldn't be atomic you would need the mutex to ensure > that the value is correct. The modification isn't necessarily atomic since the field is a typedefed type. The actual type is 64 bits, so the modification happens to be atomic on 64 bit arches but not on 32 bit arches. > Imagine a situation where it is modifed not by simple 'a = b', but by > something like this: > > mtx_lock(&nmp->nm_mtx); > nmp->nm_maxfilesize = some_32bit_array[0]; > nmp->nm_maxfilesize |= some_32bit_array[1] << 32; > mtx_unlock(&nmp->nm_mtx); > > To read that properly you need a mutex to ensure you won't read the > value between those two operations. Modification of 64 bit values on 32 bit arches is essentially the same as a modification of an array consisting of 2 32 bit elements. However, for numeric values, it is very unlikely that the modification gives a fatally invalid in-between value, no matter which order the writes and reads occur. E.g., for some values, the upper bits might never be changed. Only modifications like incrementing from 0xffffffff to 0x100000000 may give fatally different values (this can give a value of 0 from a new value of 0 in the lower bits and an old value of 0 in the upper bits). > If it is not the case - its modification is atomic and reading it is > atomic then you don't need mutex to read the value as it will always be > consistent. The question is what will happen if it changes after you > read it. > > thread0 thread1 > ------- ------- > > mtx_lock(&nmp->nm_mtx); > nmp->nm_maxfilesize = 8192; > mtx_unlock(&nmp->nm_mtx); > > mtx_lock(&nmp->nm_mtx); > if (tmp_off > nmp->nm_maxfilesize) { > mtx_unlock(&nmp->nm_mtx); > return (EFBIG); > } > mtx_unlock(&nmp->nm_mtx); > > mtx_lock(&nmp->nm_mtx); > nmp->nm_maxfilesize = 2048; > mtx_unlock(&nmp->nm_mtx); > > > > Now, if tmp_off is 4096 what will happen if you have a race like the > above? Is it critical? Then you need to protect with > this mutex as well. Locking of the whole transaction is especially impossible for this field. nm_maxfilesize is from the server (except for local resource limits which may require reducing it). If it changes after being initialized, it is probably the server that changed it (since any change would be foot-shooting so you shouldn't do it locally). There is no way to prevent the server changing its limit. The copy of the limit in nm_maxfilesize may have become out of date slightly before it was written there, or between the write there and the read here, or after it is checked here, so the server may reject the write for many slightly different temporal reasons. Any reduction of these race windows or remote checks that they aren't there or remote locking to prevent them would be huge pessimizations. So we shouldn't do any locking here, except whatever is needed to make the read sufficiently atomic (hopefully nothing). I'm interested in lock-free algorithms for this. See my reply about similar excessive locking/atomicity for tsc_freq. "Locking for read" is usually at best a waste of time since the variable may change immediately after you release the lock. Often it is OK to release the lock and thus use a variable that may be out of date. For this, any old value and any new value must work. Then any value between the old and new value[s] and any value not too much smaller or larger than the old and new value[s] will probably work too. Perhaps the writer can update the value so that all read orders give one of these working values. Bruce