From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 07:24:45 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1B8CA9E8 for ; Thu, 28 Nov 2013 07:24:45 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 993C31A44 for ; Thu, 28 Nov 2013 07:24:44 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAS7OZXQ013368; Thu, 28 Nov 2013 09:24:35 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAS7OZXQ013368 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAS7OZ1t013367; Thu, 28 Nov 2013 09:24:35 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 28 Nov 2013 09:24:35 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: RFC: NFS client patch to reduce sychronous writes Message-ID: <20131128072435.GI59496@kib.kiev.ua> References: <201311272320.rARNKEKQ045789@chez.mckusick.com> <1476192898.22291791.1385597987782.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ffNf1iMHjKeYni8r" Content-Disposition: inline In-Reply-To: <1476192898.22291791.1385597987782.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Kirk McKusick , FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 07:24:45 -0000 --ffNf1iMHjKeYni8r Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Nov 27, 2013 at 07:19:47PM -0500, Rick Macklem wrote: > Kirk wrote: > > > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST) > > > From: Rick Macklem > > > To: Konstantin Belousov > > > Subject: Re: RFC: NFS client patch to reduce sychronous writes > > >=20 > > > Kostik wrote: > > >> Sorry, I do not understand the question. mmap(2) itself does not > > >> change > > >> file size. But if mmaped area includes the last page, I still > > >> think > > >> that the situation I described before is possible. > > >=20 > > > Yes, I'll need to look at this. If it is a problem, all I can think > > > of > > > is bzeroing all new pages when they're allocated to the buffer > > > cache. > > >=20 > > > Thanks for looking at it, rick > > > ps: Btw, jhb@'s patch didn't have the bzeroing in it. > >=20 > > The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS > > for this problem and it killed write performance of the filesystem > > by nearly half. We corrected this by only doing the bzero when the > > file is mmap'ed which helped things considerably (since most files > > being written are not also bmap'ed). > >=20 > > Kirk > >=20 > Ok, thanks. I've been trying to reproduce the problem over NFS and > haven't been able to break my patch. I was using the attached trivial > test program and would simply make a copy of the source file (529 bytes) > to test on. I got the same results both locally and over NFS: > - built without -DWRITEIT, the setting of a value after EOF would be > lost, because nothing grew the file from 529 bytes to over 4080bytes. > - built with -DWRITEIT, both the 'A' and 'B' are in the result, since > my patch bzeros the grown segment in the write(2) syscall. >=20 > - If I move the write (code in #ifdef WRITEIT) to after the "*cp" > of the mapped page, the 'A' assigned to "*cp" gets lost for > both UFS and NFS. > Is this correct behaviour? >=20 > If it is correct behaviour, I can't see how the patch is broken, but > if you think it might still be, I'll look at doing what Kirk suggests, > which is bzeroing all new buffer cache pages when the file is mmap()d. >=20 Replying there, since text description is more informative than the code. You cannot get the situation I described, with single process. You should have a writer in one thread, and reader through the mmaped area in another. Even than, the race window is thin. Let me describe the issue which could exist one more time: Thread A (writer) issued write(2). The kernel does two things: 1. zeroes part of the last buffer of the affected file. 2. kernel uiomove()s the write data into the buffer b_data. Now, assume that thread B has the same file mmaped somewhere, and accesses the page of the buffer after the [1] but before [2]. Than, it would see zeroes instead of the valid content. I said that this breaks write/mmap consistency, since thread B can see a content in the file which was never written there. The condition is transient, it self-repairs after thread A passes point 2. --ffNf1iMHjKeYni8r Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSlu+yAAoJEJDCuSvBvK1BXZwP/1satyMCYlVo26vy0f9PjEon t0wTKkCUVAniDsMAOgLQgoM1HmTrXjpHdkojn3CPutXEjpt6aXHOj1qz4B0DbAkM +spqzWLCne7WGfYZ/1ckIkOnApyk8/X/vecyVml1y54FzAl+qHTZAwGq8F2WfO8N WF4sIg4AWXWKoPm1inzhpAARWXysdoPCMtDrNxBXIdBGjkcblvy33oz9gdPioViI MYvw7/wK5xVETHfpWOv5WT9loPvKOz8Not08L6pP0X3NIubbfHwyWFqSPNRo9OXK qb28TUcX303DGJzQm8KpD1S7c/MSS1AM2q3U+7jpB3FrgVMzcdWMHNsMXKL5jZoZ TPRAtGnFzidZ4B4XQOK1HWONqSUjSKR564TzlaRk5SUJlgqiHDx7zcLYMi0J4u2T Af2BXKkz5epeS7qrxLNP/J4zADDaLId5gxcu+y7V0UNHHSSzG+5hbSqsmFHsW6SI X95dL9ZcMC/iDvyDnGMoFqadFuI5GZM8ZoFyhHImcDxC1CL2xrP7yK1mXTy2dmZu bgl8dt1f6wqZDObWfGqVXJKfeZ19eaqAZ43ZlY0A16cUoyHeDbQeKuOew7+qqkIF 0mmJHaVA2kOreZIZWPh3OcV2EMaB5Yzp8zznVcHW702x+406KaPcT+9F+G+lFeKu CTIJhCgTFZtXYR0Ii7TH =DwbK -----END PGP SIGNATURE----- --ffNf1iMHjKeYni8r--