Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Nov 2013 13:48:53 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Steven Hartland <smh@FreeBSD.org>, hartzell@alerce.com, freebsd-stable@FreeBSD.org
Subject:   Re: Help with filing a [maybe] ZFS/mmap bug.
Message-ID:  <5284B8A5.8040604@FreeBSD.org>
In-Reply-To: <4EB902F80CE84DD2BF36C85EF4CE8EF8@multiplay.co.uk>
References:  <20967.760.95825.310085@gargle.gargle.HOWL><51E80B30.1090004@FreeBSD.org><20968.10645.880772.30501@gargle.gargle.HOWL><520202E5.30300@FreeBSD.org><20994.55913.93606.436124@gargle.gargle.HOWL><FEE7BDCF7F494EE1BA0BE9424275AA91@multiplay.co.uk> <21111.12085.958991.356982@gargle.gargle.HOWL> <4EB902F80CE84DD2BF36C85EF4CE8EF8@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
on 12/11/2013 04:32 Steven Hartland said the following:
> 
> ----- Original Message ----- From: "George Hartzell" <hartzell@alerce.com>
>> > > Andriy Gapon writes:
>> > > > on 18/07/2013 20:44 George Hartzell said the following:
>> > > > > Andriy Gapon writes:
>> > > > >  > on 17/07/2013 23:47 George Hartzell said the following:
>> > > > >  > > How should I move forward with this?
>> > > > >  > > > > >  > Could you please try to reproduce this problem using a
>> kernel built with
>> > > > >  > INVARIANTS options?
>> > > > > > > > > I added INVARIANT_SUPPORT and INVARIANTS options to the GENERIC
>> > > > > kernel, rebuilt it, installed it and running through my "test case"
>> > > > > generated a lot of invalid flac files.  I"m not sure what the options
>> > > > > are/were supposed to do though, it looks like they generally lead to
>> > > > > KASSERTS, which lead to abort()'s.  Nothing in /var/log/messages or on
>> > > > > the console.
>> > > > > > > George,
>> > > > > > > do you have anything new on this issue?
>> > > > > Since the message that you quoted I narrowed down my "test case"
>> > > somewhat but I have not yet produced a stand-alone tool that
>> > > reproduces it (you still have to go through picard et al.).
>> > > > > > Could you please try the following patch?
>> > > > http://people.freebsd.org/~avg/zfs-putpages.diff
>> > > > > > > I expect it to not really fix the issue, but it may help to narrow
>> it down.
>> > > > Please keep INVARIANTS.
>> > > > > Absolutely.  Probably not until the weekend, but I'll give it a go.
>> > > > > Thanks for following up.
>> > > Did you manage to make any progress with this?
>> > > We're seeing a problem where rrdcached corrupts rrd files and remembering
>> > this thread and knowning it uses mmap and we're on ZFS I was wondering
>> > it this may be the cause for this issue too.
>> > > I've just recompiled rrdtool without mmap support and am clearing down
>> > all corrupted files but it would be good to know if any progress was
>> > made on this?
>> > >     Regards
>> >     Steve
>>
>> I was able recreate the problem on a 10-BETA-something-or-other
>> recently (I'd only been using 9 up until then).  Andriy's patches
>> didn't make a difference.  I haven't heard anything since reporting
>> back to him.
> 
> I've pretty much confirmed mmap support is causing the corruption when
> running rrdcached as since rebuilding with mmap disabled I've had no
> further corruption.

Well, this is not a _proof_, of course...

> @George when you got corruption what did the files look like? I ask as
> here I see lots of zeros as through the file size was correct but pretty
> much blanked.

Steve, could you please provide a little bit more of description of the
corruption that you got.  Lengths of those zeroed regions, their offsets (modulo
page size).
Anything that could establish a pattern (if any exists).

> @avg what was your thinking behind what may be the issue here?

To be honest I did not have any clear idea of what could be wrong.
I had a suspicion that we did not follow the rules of changing contents in
pages.  Like not taking some required lock(s) or not properly setting page
status, etc.  So I hoped that INVARIANTS would catch that, but no violation was
detected by the existing checks.

> If this is a mmap bug in zfs its a pretty serious one given the amount
> of silent corruption you can get.

I went through the code again and now I am more confident than ever that the
relevant code has a sound logic for interacting with the VM.  That is, in theory
there should not be any corruption under any usage patterns.

HOWEVER.  I think that there is a bug that I introduced in r246293.
Specifically I changed
	vm_page_undirty(pp);
to
	pmap_remove_write(pp);
	vm_page_clear_dirty(pp, off, nbytes);

vm_page_undirty() would be a very serious (and probably obvious) bug, if it were
not a NOP in effect.  The details are explained in the commit message.
But when I used vm_page_clear_dirty I completely missed the fact that *extends*
the range to DEV_BSIZE aligned boundaries[*].  So, given the described behavior
and that pmap_remove_write clears the page modified bit, it is possible that the
data dirty data in the extended areas will be marked as clean.

> @re Although reported incidents appear to be rare as its silent data
> corruption users may be blissfully unaware its happening. Given that
> my gut feeling is this is serious enough that we need to get something
> in place before 10 release, even if this is make ZFS report ENOTSUP
> for mmap calls, would you agree?

I will try to come up with a fix before the release.
I hope that it would be possible to confirm what the actual problem is and if it
is really fixed.

[*]
To be honest the current behavior of vm_page_clear_dirty is surprising to me.
vm_page_set_validclean has a comment that describes the behavior that I actually
expected, but the code that implements that logic is commented out.
So I guess that there must be a strong reason to do what we do now.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5284B8A5.8040604>