From owner-freebsd-fs@FreeBSD.ORG Tue Nov 26 23:41:15 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 56B22160; Tue, 26 Nov 2013 23:41:15 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id C54242744; Tue, 26 Nov 2013 23:41:14 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAOQwlVKDaFve/2dsb2JhbABRCBaDKVOCergJgT90giwjBFJEGQIEVQYRHYdmDa5LkRIMC44mIhkbB4JrgUgDiUKGb4kTkGODRh4EgWo X-IronPort-AV: E=Sophos;i="4.93,778,1378872000"; d="scan'208";a="72619630" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 26 Nov 2013 18:41:13 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 707B1B3F2B; Tue, 26 Nov 2013 18:41:13 -0500 (EST) Date: Tue, 26 Nov 2013 18:41:13 -0500 (EST) From: Rick Macklem To: FreeBSD FS Message-ID: <731168702.21452440.1385509273449.JavaMail.root@uoguelph.ca> In-Reply-To: <1139579526.21452374.1385509250511.JavaMail.root@uoguelph.ca> Subject: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_21452438_736821265.1385509273446" X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: Kostik Belousov X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Nov 2013 23:41:15 -0000 ------=_Part_21452438_736821265.1385509273446 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi, The current NFS client does a synchronous write to the server when a non-contiguous write to the same buffer cache block occurs. This is done because there is a single dirty byte range recorded in the buf structure. This results in a lot of synchronous writes for software builds (I believe it is the loader that loves to write small non-contiguous chunks to its output file). Some users disable synchronous writing on the server to improve performance, but this puts them at risk of data loss when the server crashes. Long ago jhb@ emailed me a small patch that avoided the synchronous writes by simply making the dirty byte range a superset of the bytes written. The problem with doing this is that for a rare (possibly non-existent) application that writes non-overlapping byte ranges to the same file from multiple clients concurrently, some of these writes might get lost by stale data in the superset of the byte range being written back to the server. (Crappy, run on sentence, but hopefully it makes sense;-) I created a patch that maintained a list of dirty byte ranges. It was complicated and I found that the list often had to be > 100 entries to avoid the synchronous writes. So, I think his solution is preferable, although I've added a couple of tweaks: - The synchronous writes (old/current algorithm) is still used if there has been file locking done on the file. (I think any app. that writes a file from multiple clients will/should use file locking.) - The synchronous writes (old/current algorithm) is used if a sysctl is set. This will avoid breakage for any app. (if there is one) that writes a file from multiple clients without doing file locking. For testing on my very slow single core hardware, I see about a 10% improvement in kernel build times, but with fewer I/O RPCs: Read RPCs Write RPCs old/current 50K 122K patched 39K 40K --> it reduced the Read RPC count by about 20% and cut the Write RPC count to 1/3rd. I think jhb@ saw pretty good performance results with his patch. Anyhow, the patch is attached and can also be found here: http://people.freebsd.org/~rmacklem/noncontig-write.patch I'd like to invite folks to comment/review/test this patch, since I think it is ready for head/current. Thanks, rick ps: Kostik, maybe you could look at it. In particular, I am wondering if I zero'd out the buffer the correct way, via vfs_bio_bzero_buf()? ------=_Part_21452438_736821265.1385509273446 Content-Type: text/x-patch; name=noncontig-write.patch Content-Disposition: attachment; filename=noncontig-write.patch Content-Transfer-Encoding: base64 LS0tIGZzL25mc2NsaWVudC9uZnNfY2xiaW8uYy5vcmlnCTIwMTMtMDgtMjggMTg6NDU6NDEuMDAw MDAwMDAwIC0wNDAwCisrKyBmcy9uZnNjbGllbnQvbmZzX2NsYmlvLmMJMjAxMy0xMS0yNSAyMTo0 MjoxNi4wMDAwMDAwMDAgLTA1MDAKQEAgLTcyLDYgKzcyLDEyIEBAIGV4dGVybiBpbnQgbmZzX2tl ZXBfZGlydHlfb25fZXJyb3I7CiAKIGludCBuY2xfcGJ1Zl9mcmVlY250ID0gLTE7CS8qIHN0YXJ0 IG91dCB1bmxpbWl0ZWQgKi8KIAorU1lTQ1RMX0RFQ0woX3Zmc19uZnMpOworCitzdGF0aWMgaW50 CW5jbF9vbGRub25jb250aWd3cml0aW5nID0gMDsKK1NZU0NUTF9JTlQoX3Zmc19uZnMsIE9JRF9B VVRPLCBvbGRfbm9uY29udGlnX3dyaXRpbmcsIENUTEZMQUdfUlcsCisJICAgJm5jbF9vbGRub25j b250aWd3cml0aW5nLCAwLCAiTkZTIHVzZSBvbGQgbm9uY29udGlnIHdyaXRpbmcgYWxnIik7CisK IHN0YXRpYyBzdHJ1Y3QgYnVmICpuZnNfZ2V0Y2FjaGVibGsoc3RydWN0IHZub2RlICp2cCwgZGFk ZHJfdCBibiwgaW50IHNpemUsCiAgICAgc3RydWN0IHRocmVhZCAqdGQpOwogc3RhdGljIGludCBu ZnNfZGlyZWN0aW9fd3JpdGUoc3RydWN0IHZub2RlICp2cCwgc3RydWN0IHVpbyAqdWlvcCwKQEAg LTg3NCw3ICs4ODAsNyBAQCBuY2xfd3JpdGUoc3RydWN0IHZvcF93cml0ZV9hcmdzICphcCkKIAlz dHJ1Y3QgdmF0dHIgdmF0dHI7CiAJc3RydWN0IG5mc21vdW50ICpubXAgPSBWRlNUT05GUyh2cC0+ dl9tb3VudCk7CiAJZGFkZHJfdCBsYm47Ci0JaW50IGJjb3VudDsKKwlpbnQgYmNvdW50LCBub25j b250aWdfd3JpdGUsIG9iY291bnQ7CiAJaW50IGJwX2NhY2hlZCwgbiwgb24sIGVycm9yID0gMCwg ZXJyb3IxOwogCXNpemVfdCBvcmlnX3Jlc2lkLCBsb2NhbF9yZXNpZDsKIAlvZmZfdCBvcmlnX3Np emUsIHRtcF9vZmY7CkBAIC0xMDM3LDcgKzEwNDMsMTUgQEAgYWdhaW46CiAJCSAqIHVuYWxpZ25l ZCBidWZmZXIgc2l6ZS4KIAkJICovCiAJCW10eF9sb2NrKCZucC0+bl9tdHgpOwotCQlpZiAodWlv LT51aW9fb2Zmc2V0ID09IG5wLT5uX3NpemUgJiYgbikgeworCQlpZiAoKG5wLT5uX2ZsYWcgJiBO SEFTQkVFTkxPQ0tFRCkgPT0gMCAmJgorCQkgICAgbmNsX29sZG5vbmNvbnRpZ3dyaXRpbmcgPT0g MCkKKwkJCW5vbmNvbnRpZ193cml0ZSA9IDE7CisJCWVsc2UKKwkJCW5vbmNvbnRpZ193cml0ZSA9 IDA7CisJCWlmICgodWlvLT51aW9fb2Zmc2V0ID09IG5wLT5uX3NpemUgfHwKKwkJICAgIChub25j b250aWdfd3JpdGUgIT0gMCAmJgorCQkgICAgbGJuID09IChucC0+bl9zaXplIC8gYmlvc2l6ZSkg JiYKKwkJICAgIHVpby0+dWlvX29mZnNldCArIG4gPiBucC0+bl9zaXplKSkgJiYgbikgewogCQkJ bXR4X3VubG9jaygmbnAtPm5fbXR4KTsKIAkJCS8qCiAJCQkgKiBHZXQgdGhlIGJ1ZmZlciAoaW4g aXRzIHByZS1hcHBlbmQgc3RhdGUgdG8gbWFpbnRhaW4KQEAgLTEwNDUsOCArMTA1OSw4IEBAIGFn YWluOgogCQkJICogbmZzbm9kZSBhZnRlciB3ZSBoYXZlIGxvY2tlZCB0aGUgYnVmZmVyIHRvIHBy ZXZlbnQKIAkJCSAqIHJlYWRlcnMgZnJvbSByZWFkaW5nIGdhcmJhZ2UuCiAJCQkgKi8KLQkJCWJj b3VudCA9IG9uOwotCQkJYnAgPSBuZnNfZ2V0Y2FjaGVibGsodnAsIGxibiwgYmNvdW50LCB0ZCk7 CisJCQlvYmNvdW50ID0gbnAtPm5fc2l6ZSAtIChsYm4gKiBiaW9zaXplKTsKKwkJCWJwID0gbmZz X2dldGNhY2hlYmxrKHZwLCBsYm4sIG9iY291bnQsIHRkKTsKIAogCQkJaWYgKGJwICE9IE5VTEwp IHsKIAkJCQlsb25nIHNhdmU7CkBAIC0xMDU4LDkgKzEwNzIsMTIgQEAgYWdhaW46CiAJCQkJbXR4 X3VubG9jaygmbnAtPm5fbXR4KTsKIAogCQkJCXNhdmUgPSBicC0+Yl9mbGFncyAmIEJfQ0FDSEU7 Ci0JCQkJYmNvdW50ICs9IG47CisJCQkJYmNvdW50ID0gb24gKyBuOwogCQkJCWFsbG9jYnVmKGJw LCBiY291bnQpOwogCQkJCWJwLT5iX2ZsYWdzIHw9IHNhdmU7CisJCQkJaWYgKG5vbmNvbnRpZ193 cml0ZSAhPSAwICYmIGJjb3VudCA+IG9iY291bnQpCisJCQkJCXZmc19iaW9fYnplcm9fYnVmKGJw LCBvYmNvdW50LCBiY291bnQgLQorCQkJCQkgICAgb2Jjb3VudCk7CiAJCQl9CiAJCX0gZWxzZSB7 CiAJCQkvKgpAQCAtMTE1OSwxOSArMTE3NiwyMyBAQCBhZ2FpbjoKIAkJICogYXJlYSwganVzdCB1 cGRhdGUgdGhlIGJfZGlydHlvZmYgYW5kIGJfZGlydHllbmQsCiAJCSAqIG90aGVyd2lzZSBmb3Jj ZSBhIHdyaXRlIHJwYyBvZiB0aGUgb2xkIGRpcnR5IGFyZWEuCiAJCSAqCisJCSAqIElmIHRoZXJl IGhhcyBiZWVuIGEgZmlsZSBsb2NrIGFwcGxpZWQgdG8gdGhpcyBmaWxlCisJCSAqIG9yIHZmcy5u ZnMub2xkX25vbmNvbnRpZ193cml0aW5nIGlzIHNldCwgZG8gdGhlIGZvbGxvd2luZzoKIAkJICog V2hpbGUgaXQgaXMgcG9zc2libGUgdG8gbWVyZ2UgZGlzY29udGlndW91cyB3cml0ZXMgZHVlIHRv CiAJCSAqIG91ciBoYXZpbmcgYSBCX0NBQ0hFIGJ1ZmZlciAoIGFuZCB0aHVzIHZhbGlkIHJlYWQg ZGF0YQogCQkgKiBmb3IgdGhlIGhvbGUpLCB3ZSBkb24ndCBiZWNhdXNlIGl0IGNvdWxkIGxlYWQg dG8KIAkJICogc2lnbmlmaWNhbnQgY2FjaGUgY29oZXJlbmN5IHByb2JsZW1zIHdpdGggbXVsdGlw bGUgY2xpZW50cywKIAkJICogZXNwZWNpYWxseSBpZiBsb2NraW5nIGlzIGltcGxlbWVudGVkIGxh dGVyIG9uLgogCQkgKgotCQkgKiBBcyBhbiBvcHRpbWl6YXRpb24gd2UgY291bGQgdGhlb3JldGlj YWxseSBtYWludGFpbgotCQkgKiBhIGxpbmtlZCBsaXN0IG9mIGRpc2NvbnRpbnVvdXMgYXJlYXMs IGJ1dCB3ZSB3b3VsZCBzdGlsbAotCQkgKiBoYXZlIHRvIGNvbW1pdCB0aGVtIHNlcGFyYXRlbHkg c28gdGhlcmUgaXNuJ3QgbXVjaAotCQkgKiBhZHZhbnRhZ2UgdG8gaXQgZXhjZXB0IHBlcmhhcHMg YSBiaXQgb2YgYXN5bmNocm9uaXphdGlvbi4KKwkJICogSWYgdmZzLm5mcy5vbGRfbm9uY29udGln X3dyaXRpbmcgaXMgbm90IHNldCBhbmQgdGhlcmUgaGFzCisJCSAqIG5vdCBiZWVuIGZpbGUgbG9j a2luZyBkb25lIG9uIHRoaXMgZmlsZToKKwkJICogUmVsYXggY29oZXJlbmN5IGEgYml0IGZvciB0 aGUgc2FrZSBvZiBwZXJmb3JtYW5jZSBhbmQKKwkJICogZXhwYW5kIHRoZSBjdXJyZW50IGRpcnR5 IHJlZ2lvbiB0byBjb250YWluIHRoZSBuZXcKKwkJICogd3JpdGUgZXZlbiBpZiBpdCBtZWFucyB3 ZSBtYXJrIHNvbWUgbm9uLWRpcnR5IGRhdGEgYXMKKwkJICogZGlydHkuCiAJCSAqLwogCi0JCWlm IChicC0+Yl9kaXJ0eWVuZCA+IDAgJiYKKwkJaWYgKG5vbmNvbnRpZ193cml0ZSA9PSAwICYmIGJw LT5iX2RpcnR5ZW5kID4gMCAmJgogCQkgICAgKG9uID4gYnAtPmJfZGlydHllbmQgfHwgKG9uICsg bikgPCBicC0+Yl9kaXJ0eW9mZikpIHsKIAkJCWlmIChid3JpdGUoYnApID09IEVJTlRSKSB7CiAJ CQkJZXJyb3IgPSBFSU5UUjsKLS0tIGZzL25mc2NsaWVudC9uZnNub2RlLmgub3JpZwkyMDEzLTEx LTE5IDE4OjE3OjM3LjAwMDAwMDAwMCAtMDUwMAorKysgZnMvbmZzY2xpZW50L25mc25vZGUuaAky MDEzLTExLTI1IDIxOjI5OjU4LjAwMDAwMDAwMCAtMDUwMApAQCAtMTU3LDYgKzE1Nyw3IEBAIHN0 cnVjdCBuZnNub2RlIHsKICNkZWZpbmUJTkxPQ0tXQU5UCTB4MDAwMTAwMDAgIC8qIFdhbnQgdGhl IHNsZWVwIGxvY2sgKi8KICNkZWZpbmUJTk5PTEFZT1VUCTB4MDAwMjAwMDAgIC8qIENhbid0IGdl dCBhIGxheW91dCBmb3IgdGhpcyBmaWxlICovCiAjZGVmaW5lCU5XUklURU9QRU5FRAkweDAwMDQw MDAwICAvKiBIYXMgYmVlbiBvcGVuZWQgZm9yIHdyaXRpbmcgKi8KKyNkZWZpbmUJTkhBU0JFRU5M T0NLRUQJMHgwMDA4MDAwMCAgLyogSGFzIGJlZW4gZmlsZSBsb2NrZWQuICovCiAKIC8qCiAgKiBD b252ZXJ0IGJldHdlZW4gbmZzbm9kZSBwb2ludGVycyBhbmQgdm5vZGUgcG9pbnRlcnMKLS0tIGZz L25mc2NsaWVudC9uZnNfY2x2bm9wcy5jLm9yaWcJMjAxMy0xMS0xOSAxODoxOTo0Mi4wMDAwMDAw MDAgLTA1MDAKKysrIGZzL25mc2NsaWVudC9uZnNfY2x2bm9wcy5jCTIwMTMtMTEtMjUgMjE6MzI6 NDcuMDAwMDAwMDAwIC0wNTAwCkBAIC0zMDc5LDYgKzMwNzksMTAgQEAgbmZzX2FkdmxvY2soc3Ry dWN0IHZvcF9hZHZsb2NrX2FyZ3MgKmFwKQogCQkJCQlucC0+bl9jaGFuZ2UgPSB2YS52YV9maWxl cmV2OwogCQkJCX0KIAkJCX0KKwkJCS8qIE1hcmsgdGhhdCBhIGZpbGUgbG9jayBoYXMgYmVlbiBh Y3F1aXJlZC4gKi8KKwkJCW10eF9sb2NrKCZucC0+bl9tdHgpOworCQkJbnAtPm5fZmxhZyB8PSBO SEFTQkVFTkxPQ0tFRDsKKwkJCW10eF91bmxvY2soJm5wLT5uX210eCk7CiAJCX0KIAkJTkZTVk9Q VU5MT0NLKHZwLCAwKTsKIAkJcmV0dXJuICgwKTsKQEAgLTMwOTgsNiArMzEwMiwxMiBAQCBuZnNf YWR2bG9jayhzdHJ1Y3Qgdm9wX2FkdmxvY2tfYXJncyAqYXApCiAJCQkJZXJyb3IgPSBFTk9MQ0s7 CiAJCQl9CiAJCX0KKwkJaWYgKGVycm9yID09IDAgJiYgYXAtPmFfb3AgPT0gRl9TRVRMSykgewor CQkJLyogTWFyayB0aGF0IGEgZmlsZSBsb2NrIGhhcyBiZWVuIGFjcXVpcmVkLiAqLworCQkJbXR4 X2xvY2soJm5wLT5uX210eCk7CisJCQlucC0+bl9mbGFnIHw9IE5IQVNCRUVOTE9DS0VEOworCQkJ bXR4X3VubG9jaygmbnAtPm5fbXR4KTsKKwkJfQogCX0KIAlyZXR1cm4gKGVycm9yKTsKIH0K ------=_Part_21452438_736821265.1385509273446--