Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Jun 2012 11:34:48 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-fs@freebsd.org, Pavlo <devgs@ukr.net>
Subject:   Re: mmap() incoherency on hi I/O load (FS is zfs)
Message-ID:  <1116727909.1836239.1339774488001.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20120614122456.GZ2337@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Kostik wrote:
> On Thu, Jun 14, 2012 at 07:32:36AM -0400, Rick Macklem wrote:
> > Pavlo wrote:
> > > There's a case when some parts of files that are mapped and then
> > > modified getting corrupted. By corrupted I mean some data is ok
> > > (one
> > > that
> > > was written using write()/pwrite()) but some looks like it never
> > > existed.
> > > Like it was some time in buffers, when several processes
> > > simultaneously
> > > (of course access was synchronised) used shared pages and reported
> > > it's
> > > existence. But after time pass they (processes) screamed that it
> > > is
> > > now
> > > lost. Only part of data written with pwrite() was there.
> > > Everything
> > > that
> > > was written via mmap() is zero.
> > >
> > > So as I said it occurs on hi I/O busyness. When in background 4+
> > > processes do indexing of huge ammount of data. Also I want to
> > > note, it
> > > never occurred in the life of our project while we used mmap()
> > > under
> > > same I/O stress conditions when mapping was done for a whole file
> > > of
> > > just
> > > a part(header) starting from a beginning of a file. First time we
> > > used
> > > mapping of individual pages, just to save RAM, and this popped up.
> > >
> > > Solution for this problem is msync() before any munmap(). But man
> > > says:
> > >
> > > The msync() system call is usually not needed since BSD implements
> > > a
> > > coherent file system buffer cache. However, it may be used to
> > > associate
> > > dirty VM pages with file system buffers and thus cause them to be
> > > flushed
> > > to physical media sooner rather than later.
> > >
> > > Any thoughts? Thanks.
> > >
> > With a recent kernel from head, I am seeing dirty mmap'd pages being
> > written
> > quite late for the NFSv4 client. Even after the NFS client
> > VOP_RECLAIM() has
> > been called, it seems. I didn't observe this behaviour in a kernel
> > from
> > head in March. (I don't know enough about the vm/mmap area to know
> > if this
> > is correct behaviour or not?)
> >
> > I thought I'd mention this, since you didn't say how recent a kernel
> > you
> > were running and thought it might be caused by the same change?
> Can you, please, comment more on this ?
> How is this possible at all ?
> 
> Could you please show at least a backtrace for the moment when a write
> request is made for the page which belong to already reclaimed vnode ?
After some off list discussion, it was determined that my problem was
doing nfsrpc_close() before vnode_destroy_object() in the NFSv4 client's
VOP_RECLAIM(). This is an NFSv4 specific bug and wouldn't be related to
the above issue.

Sorry about the noise, rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1116727909.1836239.1339774488001.JavaMail.root>