Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Nov 2013 09:24:35 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Kirk McKusick <mckusick@mckusick.com>, FreeBSD FS <freebsd-fs@freebsd.org>
Subject:   Re: RFC: NFS client patch to reduce sychronous writes
Message-ID:  <20131128072435.GI59496@kib.kiev.ua>
In-Reply-To: <1476192898.22291791.1385597987782.JavaMail.root@uoguelph.ca>
References:  <201311272320.rARNKEKQ045789@chez.mckusick.com> <1476192898.22291791.1385597987782.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

--ffNf1iMHjKeYni8r
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Nov 27, 2013 at 07:19:47PM -0500, Rick Macklem wrote:
> Kirk wrote:
> > > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST)
> > > From: Rick Macklem <rmacklem@uoguelph.ca>
> > > To: Konstantin Belousov <kostikbel@gmail.com>
> > > Subject: Re: RFC: NFS client patch to reduce sychronous writes
> > >=20
> > > Kostik wrote:
> > >> Sorry, I do not understand the question. mmap(2) itself does not
> > >> change
> > >> file size.  But if mmaped area includes the last page, I still
> > >> think
> > >> that the situation I described before is possible.
> > >=20
> > > Yes, I'll need to look at this. If it is a problem, all I can think
> > > of
> > > is bzeroing all new pages when they're allocated to the buffer
> > > cache.
> > >=20
> > > Thanks for looking at it, rick
> > > ps: Btw, jhb@'s patch didn't have the bzeroing in it.
> >=20
> > The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS
> > for this problem and it killed write performance of the filesystem
> > by nearly half. We corrected this by only doing the bzero when the
> > file is mmap'ed which helped things considerably (since most files
> > being written are not also bmap'ed).
> >=20
> > 	Kirk
> >=20
> Ok, thanks. I've been trying to reproduce the problem over NFS and
> haven't been able to break my patch. I was using the attached trivial
> test program and would simply make a copy of the source file (529 bytes)
> to test on. I got the same results both locally and over NFS:
> - built without -DWRITEIT, the setting of a value after EOF would be
>   lost, because nothing grew the file from 529 bytes to over 4080bytes.
> - built with -DWRITEIT, both the 'A' and 'B' are in the result, since
>   my patch bzeros the grown segment in the write(2) syscall.
>=20
> - If I move the write (code in #ifdef WRITEIT) to after the "*cp"
>   of the mapped page, the 'A' assigned to "*cp" gets lost for
>   both UFS and NFS.
>   Is this correct behaviour?
>=20
> If it is correct behaviour, I can't see how the patch is broken, but
> if you think it might still be, I'll look at doing what Kirk suggests,
> which is bzeroing all new buffer cache pages when the file is mmap()d.
>=20
Replying there, since text description is more informative than the code.

You cannot get the situation I described, with single process.
You should have a writer in one thread, and reader through the mmaped
area in another.  Even than, the race window is thin.

Let me describe the issue which could exist one more time:

Thread A (writer) issued write(2).  The kernel does two things:
1. zeroes part of the last buffer of the affected file.
2. kernel uiomove()s the write data into the buffer b_data.

Now, assume that thread B has the same file mmaped somewhere, and
accesses the page of the buffer after the [1] but before [2].  Than,
it would see zeroes instead of the valid content.

I said that this breaks write/mmap consistency, since thread B can
see a content in the file which was never written there.  The condition
is transient, it self-repairs after thread A passes point 2.

--ffNf1iMHjKeYni8r
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJSlu+yAAoJEJDCuSvBvK1BXZwP/1satyMCYlVo26vy0f9PjEon
t0wTKkCUVAniDsMAOgLQgoM1HmTrXjpHdkojn3CPutXEjpt6aXHOj1qz4B0DbAkM
+spqzWLCne7WGfYZ/1ckIkOnApyk8/X/vecyVml1y54FzAl+qHTZAwGq8F2WfO8N
WF4sIg4AWXWKoPm1inzhpAARWXysdoPCMtDrNxBXIdBGjkcblvy33oz9gdPioViI
MYvw7/wK5xVETHfpWOv5WT9loPvKOz8Not08L6pP0X3NIubbfHwyWFqSPNRo9OXK
qb28TUcX303DGJzQm8KpD1S7c/MSS1AM2q3U+7jpB3FrgVMzcdWMHNsMXKL5jZoZ
TPRAtGnFzidZ4B4XQOK1HWONqSUjSKR564TzlaRk5SUJlgqiHDx7zcLYMi0J4u2T
Af2BXKkz5epeS7qrxLNP/J4zADDaLId5gxcu+y7V0UNHHSSzG+5hbSqsmFHsW6SI
X95dL9ZcMC/iDvyDnGMoFqadFuI5GZM8ZoFyhHImcDxC1CL2xrP7yK1mXTy2dmZu
bgl8dt1f6wqZDObWfGqVXJKfeZ19eaqAZ43ZlY0A16cUoyHeDbQeKuOew7+qqkIF
0mmJHaVA2kOreZIZWPh3OcV2EMaB5Yzp8zznVcHW702x+406KaPcT+9F+G+lFeKu
CTIJhCgTFZtXYR0Ii7TH
=DwbK
-----END PGP SIGNATURE-----

--ffNf1iMHjKeYni8r--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131128072435.GI59496>