From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 08:52:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0C2CA2AB; Wed, 27 Nov 2013 08:52:52 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9E7F7214B; Wed, 27 Nov 2013 08:52:51 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAR8qjvW027494; Wed, 27 Nov 2013 10:52:45 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAR8qjvW027494 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAR8qjEj027493; Wed, 27 Nov 2013 10:52:45 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 27 Nov 2013 10:52:45 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: RFC: NFS client patch to reduce sychronous writes Message-ID: <20131127085245.GW59496@kib.kiev.ua> References: <1139579526.21452374.1385509250511.JavaMail.root@uoguelph.ca> <731168702.21452440.1385509273449.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Wmw2osJQsLCWSWY8" Content-Disposition: inline In-Reply-To: <731168702.21452440.1385509273449.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 08:52:52 -0000 --Wmw2osJQsLCWSWY8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Nov 26, 2013 at 06:41:13PM -0500, Rick Macklem wrote: > Hi, >=20 > The current NFS client does a synchronous write > to the server when a non-contiguous write to the > same buffer cache block occurs. This is done because > there is a single dirty byte range recorded in the > buf structure. This results in a lot of synchronous > writes for software builds (I believe it is the loader > that loves to write small non-contiguous chunks to > its output file). Some users disable synchronous > writing on the server to improve performance, but > this puts them at risk of data loss when the server > crashes. >=20 > Long ago jhb@ emailed me a small patch that avoided > the synchronous writes by simply making the dirty byte > range a superset of the bytes written. The problem > with doing this is that for a rare (possibly non-existent) > application that writes non-overlapping byte ranges > to the same file from multiple clients concurrently, > some of these writes might get lost by stale data in > the superset of the byte range being written back to > the server. (Crappy, run on sentence, but hopefully > it makes sense;-) >=20 > I created a patch that maintained a list of dirty byte > ranges. It was complicated and I found that the list > often had to be > 100 entries to avoid the synchronous > writes. >=20 > So, I think his solution is preferable, although I've > added a couple of tweaks: > - The synchronous writes (old/current algorithm) is still > used if there has been file locking done on the file. > (I think any app. that writes a file from multiple clients > will/should use file locking.) > - The synchronous writes (old/current algorithm) is used > if a sysctl is set. This will avoid breakage for any app. > (if there is one) that writes a file from multiple clients > without doing file locking. My feeling is that global sysctl is too coarse granularity for the control. IMO the setting should be per-mount. It is not unreasonable to have shared files on one export, and perform private client operations on another. E.g. mailboxes and /usr/obj; I understand that mailboxes case should be covered by the advlock heuristic. But note that if advlock is acquired after some writes, the heuristic breaks. Also, since the feature might cause very hard to diagnose corruption, I think a facility to detect that dirty range coalescing was done would be helpful. It could be a counter printed by mount -v, or even a warning printed once per mount. >=20 > For testing on my very slow single core hardware, I see about > a 10% improvement in kernel build times, but with fewer I/O > RPCs: > Read RPCs Write RPCs > old/current 50K 122K > patched 39K 40K > --> it reduced the Read RPC count by about 20% and cut the > Write RPC count to 1/3rd. > I think jhb@ saw pretty good performance results with his patch. >=20 > Anyhow, the patch is attached and can also be found here: > http://people.freebsd.org/~rmacklem/noncontig-write.patch >=20 > I'd like to invite folks to comment/review/test this patch, > since I think it is ready for head/current. >=20 > Thanks, rick > ps: Kostik, maybe you could look at it. In particular, I am > wondering if I zero'd out the buffer the correct way, via > vfs_bio_bzero_buf()? Both vfs_bio_bzero_buf() and plain bzero() would work for NFS client, since it does not use unmapped buffers. Use of vfs_bio_bzero_buf() is preferred since it makes one less place to find if converting NFS to unmapped i/o. In fact, I do not understand why the zeroing is needed. Could you, please, comment on it ? More, the zeroing seems to overwrite part of the previously valid content of the buffer, unless I mis-read the patch. This breaks the mmap/file consistency, since buffer pages may be mapped by usermode process, and it could observe the 'out of thin air' zeroes before the writing thread progressed to fill the zeroed part with new content. --Wmw2osJQsLCWSWY8 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSlbLcAAoJEJDCuSvBvK1Bp0QQAKhA3TwLJTVpN4PjBac7ZbOX SEKBXMTdIfaoEwiRHmmOFdfz6wryFk3VKan5a0tzsvdpkGwPiu7Yy0pYhH+9PIea WKqHCjJVUJdf/a1f72Lw/T36/1lJ+8/73D1rbAzEH0i6mys0lFrQiDxmojRpTS3l GCtv+p7KKdRyw235/u0zndn/ChtYHL0ehu6oRE5mU+6Pssbu5NIfAdm4uc6BWXeu iyeYPpzkNUBWokA4OJ2YMpC44EvxEUVjRBMUy9DIjxyi1Ab4ohm6eTKp7Q/eguuy BUn8B2TywWY5ZIFce/ocDuWgm9vuSvG4pPmMr3XcisdXyVwF5FOD7Qogz71nO7AV AnpHCe0ytEHY04Re8fjHTeIAkWg78E767D3vIJOHDJUK3Hspfh0NTchFk7s8+1A0 ShYKawm/fXPr4kAZx6OuprgefxyaUc3VoLhNlpzIkryDnD4dWcVlBFM7N568cGSn H5iubTBjm8ID+aoE27Hqda+/TzYeHWwiFmAnW/qFdg3ywQ/wTcriPpXT5j1DCKPA eCLZJFSXGbGYV+OkYSfSIFjdcnmLdgoRxotMUkG9mMGJOUBfJ6Yw4XOKpbd+nmli RvjwBs4nieWGQJKpaxH/Fue9iDzxSby2H9Cesc6rr+ksNQ7p5xiXdI46wEpuPrJW lrjtp0lyTMTDjNWUui9i =T4xD -----END PGP SIGNATURE----- --Wmw2osJQsLCWSWY8--