Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Apr 2011 10:45:52 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Pawel Jakub Dawidek <pjd@freebsd.org>
Cc:        svn-src-head@freebsd.org, Rick Macklem <rmacklem@freebsd.org>, svn-src-all@freebsd.org, Rick Macklem <rmacklem@uoguelph.ca>, src-committers@freebsd.org
Subject:   Re: svn commit: r220877 - head/sys/fs/nfsclient
Message-ID:  <20110421101020.V1056@besplex.bde.org>
In-Reply-To: <20110420161655.GA1907@garage.freebsd.pl>
References:  <20110420054655.GD1826@garage.freebsd.pl> <630616771.329277.1303301372199.JavaMail.root@erie.cs.uoguelph.ca> <20110420161655.GA1907@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 20 Apr 2011, Pawel Jakub Dawidek wrote:

> On Wed, Apr 20, 2011 at 08:09:32AM -0400, Rick Macklem wrote:
>>>> + tmp_off = uio->uio_offset + uio->uio_resid;
>>>> + mtx_lock(&nmp->nm_mtx);
>>>> + if (tmp_off > nmp->nm_maxfilesize || tmp_off < uio->uio_offset) {
>>>> + mtx_unlock(&nmp->nm_mtx);
>>>>  		return (EFBIG);
>>>> + }
>>>> + mtx_unlock(&nmp->nm_mtx);
>>>
>>> I don't think you need the lock to protect nm_maxfilesize. Can it
>>> change
>>> from under us? My guess is that it is set on mount time and is not
>>> modified afterwards.

I said the same in a private reply which was initially about overflow
errors in the overflow checking, but expanded to also cover errors in
the return values (EFBIG instead of EOF or EOVERFLOW for read()) and
this point.

>>>
>> Good question.
>> For NFSv3 - it is only modified by the first fsinfo RPC and that normally
>>     happens at mount time, as you guessed above. (This is consistent with
>>     RFC1813, which defines the fsinfo RPC as getting non-volatile information.)
>> For NFSv4 - it gets it each time VFS_STATFS() happens. I am not sure that
>>     this is correct, but I don't know of anywhere in RFC3530 where it states
>>     that this attribute will not change. In practice, I suspect that servers
>>     seldom, if ever, change it.
>>
>> So, it is unlikely to change and I'd be comfortable taking the mutex lock
>> off the check for it, if others are? (As you might be aware, I started a
>> thread on hackers-freebsd@ where my question was basically "do you need to
>> mutex lock when you read a global variable". My main concern there was a
>> case that I'm working on w.r.t. forced dismounts. jhb@ suggested that he
>> thinks it is good practice to always lock, to play it safe. At least that
>> was my interpretation?)
>
> This is not that easy, I'm afraid. You need to ask yourself a question
> what you are trying to protect from. Here, the mutex only guarantees to
> have consistent view of the nm_maxfilesize field. For example if this
> field modification wouldn't be atomic you would need the mutex to ensure
> that the value is correct.

The modification isn't necessarily atomic since the field is a typedefed
type.  The actual type is 64 bits, so the modification happens to be
atomic on 64 bit arches but not on 32 bit arches.

> Imagine a situation where it is modifed not by simple 'a = b', but by
> something like this:
>
> 	mtx_lock(&nmp->nm_mtx);
> 	nmp->nm_maxfilesize = some_32bit_array[0];
> 	nmp->nm_maxfilesize |= some_32bit_array[1] << 32;
> 	mtx_unlock(&nmp->nm_mtx);
>
> To read that properly you need a mutex to ensure you won't read the
> value between those two operations.

Modification of 64 bit values on 32 bit arches is essentially the same
as a modification of an array consisting of 2 32 bit elements.  However,
for numeric values, it is very unlikely that the modification gives a
fatally invalid in-between value, no matter which order the writes and
reads occur.  E.g., for some values, the upper bits might never be changed.
Only modifications like incrementing from 0xffffffff to 0x100000000 may
give fatally different values (this can give a value of 0 from a new value
of 0 in the lower bits and an old value of 0 in the upper bits).

> If it is not the case - its modification is atomic and reading it is
> atomic then you don't need mutex to read the value as it will always be
> consistent. The question is what will happen if it changes after you
> read it.
>
> thread0					thread1
> -------					-------
>
> 					mtx_lock(&nmp->nm_mtx);
> 					nmp->nm_maxfilesize = 8192;
> 					mtx_unlock(&nmp->nm_mtx);
>
> mtx_lock(&nmp->nm_mtx);
> if (tmp_off > nmp->nm_maxfilesize) {
> 	mtx_unlock(&nmp->nm_mtx);
> 	return (EFBIG);
> }
> mtx_unlock(&nmp->nm_mtx);
>
> 					mtx_lock(&nmp->nm_mtx);
> 					nmp->nm_maxfilesize = 2048;
> 					mtx_unlock(&nmp->nm_mtx);
>
> <some other code>
>
> Now, if tmp_off is 4096 what will happen if you have a race like the
> above? Is it critical? Then you need to protect <some other code> with
> this mutex as well.

Locking of the whole transaction is especially impossible for this
field.  nm_maxfilesize is from the server (except for local resource
limits which may require reducing it).  If it changes after being
initialized, it is probably the server that changed it (since any
change would be foot-shooting so you shouldn't do it locally).  There
is no way to prevent the server changing its limit.  The copy of the
limit in nm_maxfilesize may have become out of date slightly before
it was written there, or between the write there and the read here,
or after it is checked here, so the server may reject the write for
many slightly different temporal reasons.  Any reduction of these
race windows or remote checks that they aren't there or remote locking
to prevent them would be huge pessimizations.  So we shouldn't do any
locking here, except whatever is needed to make the read sufficiently
atomic (hopefully nothing).

I'm interested in lock-free algorithms for this.  See my reply about
similar excessive locking/atomicity for tsc_freq.  "Locking for read"
is usually at best a waste of time since the variable may change
immediately after you release the lock.  Often it is OK to release
the lock and thus use a variable that may be out of date.  For this,
any old value and any new value must work.  Then any value between
the old and new value[s] and any value not too much smaller or larger
than the old and new value[s] will probably work too.  Perhaps the
writer can update the value so that all read orders give one of these
working values.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110421101020.V1056>