From owner-freebsd-fs@FreeBSD.ORG Sun Nov 24 03:15:59 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 64F0F5C9 for ; Sun, 24 Nov 2013 03:15:59 +0000 (UTC) Received: from mail-qa0-x236.google.com (mail-qa0-x236.google.com [IPv6:2607:f8b0:400d:c00::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 259CD2A24 for ; Sun, 24 Nov 2013 03:15:58 +0000 (UTC) Received: by mail-qa0-f54.google.com with SMTP id f11so6366938qae.20 for ; Sat, 23 Nov 2013 19:15:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=BuszNd3QYXv5O0HnvG+uw3+9JWfYKneA2TZthAx1OjA=; b=Hv45ANryhs/klWr9+7OkZEGteKNCLNeMm7D1eAyCSdzO/ZB7LDIhMwLjZMdOCXaBv0 LgMSDLCBqG0mZDSgajvUMaak0tJyOpHIL0G/1iInVHN+lK5Xg6owIr1CLJKQToAqoHo/ 09SfdxZRkBQ357ue6RyydaSsVI9h4nPVylCBI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=BuszNd3QYXv5O0HnvG+uw3+9JWfYKneA2TZthAx1OjA=; b=MVnZevHvC/XBMWJrywkOTFlya7Wl64bES4oe+keicyUT5igLfCMTMoe1gAjlmlnsrF 3TfLz+nesFE0DKaEWcF1ZKqzdCLPT7vCXTu7mn3/ZaNSEuw6oObC5GYoVgm4BpRtjs+m NeHq1Opm9eqEnBtfbeQP7gapmP0MgIFOqnm65DoVGyndtohEW/fZMr5ZF8u57JvOUWx3 XDoPf1Ps8u+QuEGo8BjZxZQcv60BM5a+ljklk0BbXulPhSFirmWNE4tQvkUH+CrVhsXe UDZQmHLo+0LT4ZAjy4HK0KbWF52cXLSYp+JroY3HnU3BZRoYPeQh7g6lDU2ZwisuTgbB W3aw== X-Gm-Message-State: ALoCoQljjlBfEA9+xyARkchzVRh0OfkbuY38mydYzJZg1e4GwIMXWJ1/Cqo8Yaau6Y+YxRZFYn0d X-Received: by 10.49.35.112 with SMTP id g16mr34558245qej.13.1385262958118; Sat, 23 Nov 2013 19:15:58 -0800 (PST) MIME-Version: 1.0 Received: by 10.96.63.101 with HTTP; Sat, 23 Nov 2013 19:15:28 -0800 (PST) In-Reply-To: <5290E0CF.20704@gibfest.dk> References: <5290E0CF.20704@gibfest.dk> From: Eitan Adler Date: Sat, 23 Nov 2013 22:15:28 -0500 Message-ID: Subject: Re: ZFS (or something) is absurdly slow To: Thomas Steen Rasmussen Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Nov 2013 03:15:59 -0000 On Sat, Nov 23, 2013 at 12:07 PM, Thomas Steen Rasmussen wrote: > On 23-11-2013 09:26, Eitan Adler wrote: >> >> Every so often I see absurdly slow tasks stuck on "zio->io_cv". >> >> For example a recent "git checkout file.c" did not complete for many >> minutes >> load: 0.65 cmd: git 74577 [zio->io_cv] 435.58r 0.20u 2.54s 0% 71488k >> >> I have seen "ls ~" take tends of minutes to complete. Even "ls >> /var/empty" can take just as long. This length of time is variable >> but is usually much longer than expected. >> >> Does anyone have any suggestions for helping to figure out what is >> taking a long time? > > Hello, > > If "top -m io -o total" doesn't reveal what is using the disks, Nothing unexpected here though I will pay attention during the slow times. > I've had good experiences with the following dtrace script, > you'd need to build dtrace support in your kernel though: > vfsstat.d https://forums.freebsd.org/showpost.php?p=182070&postcount=6 I can run this script, what output should I be looking for? > You can also check systat -iostast 1 and check the TPS count for > the disks. The regular (spinning) harddisk can manage 200-300 iops > if it is a regular consumer class disk. Are you "running out" of > iops for some reason ? tps stays between 0 and 30 or so. I rarely if ever seen numbers above 30 (This is a consumer grade laptop HDD). tty ada0 ada1 cd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 25 1144 6.88 5 0.03 13.76 35 0.46 0.00 0 0.00 1 4 2 0 93 -- Eitan Adler From owner-freebsd-fs@FreeBSD.ORG Sun Nov 24 08:03:31 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9F352558 for ; Sun, 24 Nov 2013 08:03:31 +0000 (UTC) Received: from mail.tyknet.dk (mail.tyknet.dk [176.9.9.186]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 5C8A824A0 for ; Sun, 24 Nov 2013 08:03:31 +0000 (UTC) Received: from [10.255.193.199] (d153234.upc-d.chello.nl [213.46.153.234]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.tyknet.dk (Postfix) with ESMTPSA id E91321CA716; Sun, 24 Nov 2013 09:03:28 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.8.3 mail.tyknet.dk E91321CA716 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=gibfest.dk; s=default; t=1385280209; bh=mC+WZTp6MCUrIoheYZttSk2fmtgTJqMrcBJKL0D5CY8=; h=Date:From:To:CC:Subject:References:In-Reply-To; b=Yy/NstuiO7Lfe/uS0oenKf6jNWs2MmR+oLfneTRFy6ibvhvbuzKLQCR5hz1Pb9HJF I51H3cbYZ+jFxolErsy4eypXktvwB7cPBYe5U+/TZEEqJL7UNSc4qxQk0PfVWd13Lw HhOnM+bfFwroi3j6MhaynUXXs735GkT9SLG0Wd0Q= Message-ID: <5291B2CC.2040907@gibfest.dk> Date: Sun, 24 Nov 2013 09:03:24 +0100 From: Thomas Steen Rasmussen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: Eitan Adler Subject: Re: ZFS (or something) is absurdly slow References: <5290E0CF.20704@gibfest.dk> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Nov 2013 08:03:31 -0000 On 24-11-2013 04:15, Eitan Adler wrote: > >> vfsstat.d https://forums.freebsd.org/showpost.php?p=182070&postcount=6 > I can run this script, what output should I be looking for? Check the sample output on the page: It shows two lists, "Number of operations" and "Bytes read or write". The lists are ordered with the busiest at the bottom and seperated by filesystem location. While things are slow, try running it to see if some location on the filesystem is being hammered. One case I had was where /home/pgsql/data was the culprit, due to the default sync setting in postgres, along with a malfunctioning webapp that kept making a lot of queries. This activity wasn't shown with "top -m io -o total" for some reason. Another thing you should probably do is run a SMART check on the disk to see if something is wrong with it. I had another case with a zfs mirror that performed appalingly, turned out it was because one of the disks was dodgy, not in a way that made zfs show checksum errors, but enough to make it really really slow. Since a ZFS vdev only performs as good as the slowest disk in a vdev, which in turn will slow the whole pool down, replacing the disk made everything much better. What does diskinfo -ct /dev/whatever say about the seek times on the bad disk ? Are the results the same if you boot off of an usb stick and test the disk when it is completely idle and independent of the running OS ? /Thomas From owner-freebsd-fs@FreeBSD.ORG Sun Nov 24 14:59:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DAEBC329 for ; Sun, 24 Nov 2013 14:59:13 +0000 (UTC) Received: from mail-pb0-f47.google.com (mail-pb0-f47.google.com [209.85.160.47]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B09C62484 for ; Sun, 24 Nov 2013 14:59:13 +0000 (UTC) Received: by mail-pb0-f47.google.com with SMTP id um1so4009654pbc.20 for ; Sun, 24 Nov 2013 06:59:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=GW7W4j2Ds5nOejOLw0lHI2T3VLXI9GH1MvfxSXiZjdo=; b=gBqdvOHtk1nBhpGXFCKVn3h+fo5GO3TdM3H7g582jRE07zyd1gNH+hWqknorFUNRsj FvZnEHfe7bs5NDDoATXAVqAy8P98NVWudqA2ImtsZN1VXBpKxpDjBjBUW1GRxtOAVGPr XWzAuXVLN4ceAktQgJasmJDzlElhZrvL+RvjrChnpiasYvIY/VZN8h/iC6D91mHON7xL BjiPrwYul8P5hz/zKBnbGqQ7a9WI1UDL4n7pxupN8+aCkC92xgtL1uQ4KitZfdKCSDKW BWx4WTgWcJSKBGFt3mUQCAW/DGrHjIQmKXCFVbYuoNYx+c3BKD4H3MsLeE80yFcLh6IX l8ig== X-Gm-Message-State: ALoCoQm54Ne7XxFcOU634FLA9qW2nBUCefw4a7F2PbkmWvdJTP538L4AXesecCGjpMIhlJllFqwl MIME-Version: 1.0 X-Received: by 10.66.219.233 with SMTP id pr9mr22606953pac.45.1385305146093; Sun, 24 Nov 2013 06:59:06 -0800 (PST) Received: by 10.70.102.133 with HTTP; Sun, 24 Nov 2013 06:59:05 -0800 (PST) In-Reply-To: References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca> <9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com> <5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com> <18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> Date: Sun, 24 Nov 2013 07:59:05 -0700 Message-ID: Subject: Re: Performance difference between UFS and ZFS with NFS From: Eric Browning To: aurfalien Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Nov 2013 14:59:14 -0000 On a side note I forgot that I had used dd to test the disk performance a while ago when I was using ZFS. ZFS performance: 3072000000 bytes transferred in 34.167480 secs (89910055 bytes/sec) 34.17s real 0.61s user 31.89s sys UFS performance: 3072000000 bytes transferred in 11.848883 secs (259264942 bytes/sec) 11.85s real 0.58s user 11.25s sys Again, even with dd performance is about 3x faster with UFS with the same disks. On Thu, Nov 21, 2013 at 4:04 PM, Eric Browning < ericbrowning@skaggscatholiccenter.org> wrote: > Just as a bit of a followup I had 163 kids all logged in at once today and > nfsd usage was only 1-5% > > @Aurf > How are your results with your AE and C4D clients going? > > > On Tue, Nov 19, 2013 at 12:38 PM, aurfalien wrote: > >> Wow, those are great mount options, I use em too :) >> >> Well, this is very interesting on the +3x access/getattrs with ZFS. >> >> I'll report back my findings as I'm going down a similar road, albeit not >> home dirs but rendering using AE and C4D on many clients. >> >> Until then hoping some one chime in on this with some added nuggets. >> >> - aurf >> >> On Nov 19, 2013, at 11:11 AM, Eric Browning wrote: >> >> Locking is set to locallocks, cache folders and similar folders are >> redirected to the local hard drive. All applications run just fine >> including Adobe CS6 and MS 2011 apps. >> >> This is my client NFS conf: >> nfs.client.mount.options = >> noatime,nobrowse,tcp,vers=3,rsize=32768,wsize=32768,readahead=0,acregmax=3600,acdirmax=3600,locallocks,inet,noquota,nfc >> nfs.client.statfs_rate_limit = 5 >> nfs.client.access_for_getattr = 1 >> nfs.client.is_mobile = 0 >> >> I'm sure this is more complex than it needs to be and I can probably get >> rid of most of this now, forcing nfc did cure some unicode issues between >> mac and freebsd. Packets are not being fragmented and there are only one or >> two errors here and there despite traversing vlans through the core router, >> MSS is set at 1460. >> >> One thing Rick M suggested is actually trying these entire setup on a UFS >> system. I tested by copying my home folder to another server with a UFS >> system and ran it for like 45 minutes and compared it to another 45 minute >> jaunt on the main file server and I had about 3x less Access and Getattrs >> on UFS than I had on ZFS. Seeing this prompted me to move one server over >> to a UFS raid and since doing that it's like day and night >> performance-wise. >> >> Server's NFS is set to 256 threads ARC is currently only at 46G of 56G >> total and NFS is 9.9G on the ZFS server and CPU usage is 878%. On the UFS >> server NFS is the same 256 threads and 9.9G but as I look at it with >> currently 52 users logged in NFS is at CPU 0.00% usage. >> >> This is the server NFS configs from rc.conf >> ## NFS Server >> rpcbind_enable="YES" >> nfs_server_enable="YES" >> mountd_flags="-r -l" >> nfsd_enable="YES" >> mountd_enable="YES" >> rpc_lockd_enable="NO" >> rpc_statd_enable="NO" >> nfs_server_flags="-t -n 256" >> nfsv4_server_enable="NO" >> nfsuserd_enable="YES" >> >> UFS Server mem stats: >> Mem: 49M Active, 56G Inact, 3246M Wired, 1434M Cache, 1654M Buf, 1002M >> Free >> ARC: 1884K Total, 149K MFU, 1563K MRU, 16K Anon, 56K Header, 99K Other >> Swap: 4096M Total, 528K Used, 4095M Free >> >> ZFS mem stats: >> Mem: 3180K Active, 114M Inact, 60G Wired, 1655M Buf, 2412M Free >> ARC: 46G Total, 26G MFU, 13G MRU, 3099K Anon, 4394M Header, 4067M Other >> Swap: 4096M Total, 4096M Free >> >> >> >> On Tue, Nov 19, 2013 at 11:25 AM, aurfalien wrote: >> >>> Curious. >>> >>> Do you have NFS locking enabled client side? >>> >>> Most likely you do as Mac Mail will not run w/o locks, nor will Adobe >>> prefs like temp cache. etc... >>> >>> So being this is prolly the case, could it be a mem pressure issue and >>> not enough RAM? >>> >>> So NFS locks take up RAM as does ARC. What are your mem stats and swap >>> stats during the 700% (yikes) experience? >>> >>> - aurf >>> >>> On Nov 19, 2013, at 10:19 AM, Eric Browning wrote: >>> >>> Aurf, >>> >>> I ran those two commands and it doesn't seem to have made a difference. >>> Usage is still above 700% and it still takes 30s to list a directory. The >>> time to list is proportional to the number of users logged in. On UFS with >>> all students logged in and hammering away at their files there is no >>> noticeable speed decrease. >>> >>> >>> On Tue, Nov 19, 2013 at 11:12 AM, aurfalien wrote: >>> >>>> >>>> On Nov 19, 2013, at 5:12 AM, Rick Macklem wrote: >>>> >>>> > Eric Browning wrote: >>>> >> Some background: >>>> >> -Two identical servers, dual AMD Athlon 6220's 16 cores total @ 3Ghz, >>>> >> -64GB ram each server >>>> >> -Four Intel DC S3700 800GB SSDs for primary storage, each server. >>>> >> -FreeBSD 9 stable as of 902503 >>>> >> -ZFS v28 and later updated to feature flags (v29?) >>>> >> -LSI 9200-8i controller >>>> >> -Intel I350T4 nic (only one port being used currently) using all four >>>> >> in >>>> >> LACP overtaxed the server's NFS queue from what we found out making >>>> >> the >>>> >> server basically unusable. >>>> >> >>>> >> There is definitely something going on between NFS and ZFS when used >>>> >> as a >>>> >> file server (random workload) for mac home directories. They do not >>>> >> jive >>>> >> well at all and pretty much drag down these beefy servers and cause >>>> >> 20-30 >>>> >> second delays when just attempting to list a directory on Mac 10.7, >>>> >> 10.8 >>>> >> clients although throughput seems fast when copying files. >>>> >> >>>> >> This server's NFS was sitting north of 700% (7+ cores) all day long >>>> >> when >>>> >> using ZFSv28 raidz1. I have also tried stripe, compression on/off, >>>> >> sync >>>> >> enabled/disabled, and no dedup with 56GB of ram dedicated to ARC. >>>> >> I've >>>> >> tried just 100% stock settings in loader.conf and and some >>>> >> recommended >>>> >> tuning from various sources on the freebsd lists and other sites >>>> >> including >>>> >> the freebsd handbook. >>>> >> >>>> >> This is my mountpoint creation: >>>> >> zfs create -o mountpoint=/users -o sharenfs=on -o >>>> >> casesensitivity=insensitive -o aclmode=passthrough -o compression=lz4 >>>> >> -o >>>> >> atime=off -o aclinherit=passthrough tank/users >>>> >> >>>> >> This last weekend I switched one of these servers over to a UFS raid >>>> >> 0 >>>> >> setup and NFS now only eats about 36% of one core during the initial >>>> >> login >>>> >> phase of 150-ish users over about 10 minutes and sits under 1-3% >>>> >> during >>>> >> normal usage and directories all list instantly even when drilling >>>> >> down 10 >>>> >> or so directories on the client's home files. The same NFS config on >>>> >> server >>>> >> and clients are still active. >>>> >> >>>> >> Right now I'm going to have to abandon ZFS until it works with NFS. >>>> >> I >>>> >> don't want to get into a finger pointing game, I'd just like to help >>>> >> get >>>> >> this fixed, I have one old i386 server I can try things out on if >>>> >> that >>>> >> helps and it's already on 9 stable and ZFS v28. >>>> >> >>>> > Btw, in previous discussions with Eric on this, he provided nfsstat >>>> > output that seemed to indicate most of his RPC load from the Macs >>>> > were Access and Getattr RPCs. >>>> > >>>> > I suspect the way ZFS handles VOP_ACCESSX() and VOP_GETATTR() is a >>>> > significant part of this issue. I know nothing about ZFS, but I >>>> believe >>>> > it does always have ACLs enabled and presumably needs to check the >>>> > ACL for each VOP_ACCESSX(). >>>> > >>>> > Hopefully someone familiar with how ZFS handles VOP_ACCESSX() and >>>> > VOP_GETATTR() can look at these? >>>> >>>> Indeed. However couldn't one simply disable ACL mode via; >>>> >>>> zfs set aclinherit=discard pool/dataset >>>> zfs set aclmode=discard pool/dataset >>>> >>>> Eric, mind setting these and see? >>>> >>>> Mid/late this week I'll be doing a rather large render farm test >>>> amongst our Mac fleet against ZFS. >>>> >>>> Will reply to this thread with outcome when I'm done. Should be >>>> interesting. >>>> >>>> - aurf >>>> >>>> > >>>> > rick >>>> > >>>> >> Thanks, >>>> >> -- >>>> >> Eric Browning >>>> >> Systems Administrator >>>> >> 801-984-7623 >>>> >> >>>> >> Skaggs Catholic Center >>>> >> Juan Diego Catholic High School >>>> >> Saint John the Baptist Middle >>>> >> Saint John the Baptist Elementary >>>> >> _______________________________________________ >>>> >> freebsd-fs@freebsd.org mailing list >>>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org >>>> " >>>> >> >>>> > _______________________________________________ >>>> > freebsd-fs@freebsd.org mailing list >>>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>> >>>> >>> >>> >>> -- >>> Eric Browning >>> Systems Administrator >>> 801-984-7623 >>> >>> Skaggs Catholic Center >>> Juan Diego Catholic High School >>> Saint John the Baptist Middle >>> Saint John the Baptist Elementary >>> >>> >>> >> >> >> -- >> Eric Browning >> Systems Administrator >> 801-984-7623 >> >> Skaggs Catholic Center >> Juan Diego Catholic High School >> Saint John the Baptist Middle >> Saint John the Baptist Elementary >> >> >> > > > -- > Eric Browning > Systems Administrator > 801-984-7623 > > Skaggs Catholic Center > Juan Diego Catholic High School > Saint John the Baptist Middle > Saint John the Baptist Elementary > -- Eric Browning Systems Administrator 801-984-7623 Skaggs Catholic Center Juan Diego Catholic High School Saint John the Baptist Middle Saint John the Baptist Elementary From owner-freebsd-fs@FreeBSD.ORG Sun Nov 24 15:16:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DE79A6EF for ; Sun, 24 Nov 2013 15:16:06 +0000 (UTC) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7A6C32543 for ; Sun, 24 Nov 2013 15:16:06 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50006813420.msg for ; Sun, 24 Nov 2013 15:15:29 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 24 Nov 2013 15:15:29 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=10402a06b3=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Eric Browning" , "aurfalien" References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca> <9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com> <5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com> <18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> Subject: Re: Performance difference between UFS and ZFS with NFS Date: Sun, 24 Nov 2013 15:15:19 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Nov 2013 15:16:06 -0000 ----- Original Message ----- From: "Eric Browning" > On a side note I forgot that I had used dd to test the disk performance a > while ago when I was using ZFS. > > ZFS performance: > 3072000000 bytes transferred in 34.167480 secs (89910055 bytes/sec) > 34.17s real 0.61s user 31.89s sys > > UFS performance: > 3072000000 bytes transferred in 11.848883 secs (259264942 bytes/sec) > 11.85s real 0.58s user 11.25s sys > > Again, even with dd performance is about 3x faster with UFS with the same > disks. Interesting, what was you command exactly? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Sun Nov 24 16:10:02 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9D71366E for ; Sun, 24 Nov 2013 16:10:02 +0000 (UTC) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) by mx1.freebsd.org (Postfix) with ESMTP id 61F83278E for ; Sun, 24 Nov 2013 16:10:01 +0000 (UTC) Received: from inetmail.dmz (unknown [10.3.0.4]) by dss.incore.de (Postfix) with ESMTP id 93BD35C07B; Sun, 24 Nov 2013 17:00:15 +0100 (CET) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.4]) (amavisd-new, port 10024) with LMTP id WHsbMvPX2DHy; Sun, 24 Nov 2013 17:00:13 +0100 (CET) Received: from mail.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id 671035C082; Sun, 24 Nov 2013 17:00:12 +0100 (CET) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.incore (Postfix) with ESMTP id 0C57F50C0A; Sun, 24 Nov 2013 17:00:11 +0100 (CET) Message-ID: <5292228B.8070008@incore.de> Date: Sun, 24 Nov 2013 17:00:11 +0100 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: Albert Shih Subject: Re: mountd can't delete exports for References: <20131105123028.GA21794@pcjas.obspm.fr> In-Reply-To: <20131105123028.GA21794@pcjas.obspm.fr> Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Nov 2013 16:10:02 -0000 Albert Shih schrieb: > Hi, > > I've some very strange behavior since some day on my FreeBSD zfs nfs server. > > This server run FreeBSD 9.0 and everything (= nfs and backup server) works fine. > > But since some days I've got lot of message like > > Nov 5 13:12:49 filer1 mountd[76392]: can't delete exports for /filer1/sauvegardes/filer2/.zfs/snapshot/Daily-2013-10-22: Invalid argument > Nov 5 13:12:49 filer1 mountd[76392]: can't delete exports for /filer1/sauvegardes/filer2/.zfs/snapshot/Daily-2013-10-10: Invalid argument > Nov 5 13:12:49 filer1 mountd[76392]: can't delete exports for /filer1/sauvegardes/filer2/.zfs/snapshot/Daily-2013-10-09: Invalid argument > Nov 5 13:12:49 filer1 mountd[76392]: can't delete exports for /filer1/sauvegardes/filer2/.zfs/snapshot/Daily-2013-10-17: Invalid argument > Nov 5 13:12:49 filer1 mountd[76392]: can't delete exports for /filer1/sauvegardes/filer2/.zfs/snapshot/Daily-2013-10-15: Invalid argument > Nov 5 13:12:49 filer1 mountd[76392]: can't delete exports for /filer1/sauvegardes/filer2/.zfs/snapshot/Daily-2013-10-06: Invalid argument > Nov 5 13:12:49 filer1 mountd[76392]: can't delete exports for /filer1/sauvegardes/filer2/.zfs/snapshot/Monthly-2013-10-01: Invalid argument > Nov 5 13:12:49 filer1 mountd[76392]: can't delete exports for /filer1/sauvegardes/filer2/.zfs/snapshot/Daily-2013-10-20: Invalid argument > > I've do some google and don't find any solution. > > On this server I've got many zfs partition, only 4 are export through nfs, > and none of this 4 have any snapshots. > > The rest of zfs partition who have snapshots aren't exported. > > Regards. > Probably you find an explanation at http://lists.freebsd.org/pipermail/freebsd-fs/2013-August/018008.html -- Andreas Longwitz From owner-freebsd-fs@FreeBSD.ORG Sun Nov 24 16:29:37 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C4823CC7 for ; Sun, 24 Nov 2013 16:29:37 +0000 (UTC) Received: from mail-qe0-x22c.google.com (mail-qe0-x22c.google.com [IPv6:2607:f8b0:400d:c02::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6065E2854 for ; Sun, 24 Nov 2013 16:29:37 +0000 (UTC) Received: by mail-qe0-f44.google.com with SMTP id nd7so2285759qeb.31 for ; Sun, 24 Nov 2013 08:29:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=S9AN62/oKePlL3ze3AUb6dBtK/BJYqbm+aZ7jHJK39Y=; b=oBVd+78W9p09iCvyEcszqzh8ZAFFqo64y0/MB3VDqt5mkF9FRxXW9DL0xdj4KsmwS3 ljDqFXH4aEs3oaECjHaUjTYLD5AbWmmsYV7upL0A9sGbj0WbAQDazFD5RWlmlv8bzv2F qmNjXPkZm3u95G2AXPdWwDVtaQLe4Uq+G0Mgs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=S9AN62/oKePlL3ze3AUb6dBtK/BJYqbm+aZ7jHJK39Y=; b=BAnFC3vktgwmJzaZhKG+3P8idkJt3/iGKsv/kFOR1+XF+dClzoxXH+8peInAbQGXzr haKf/fqnIp4lWldeaFmzTBqrI4iP/z4vfSDDaPxdtsHp7OqpmHTDcVGOk3/AXb2Za86s hUmXFW6r5WoJrUtLLAKrjUz6wajaEcs3ZDGDiEHcn1JTuBkl/D5nL2FFb1/4tX2hg2hj QPBd3efubCxelb4pp1fA1RebtN6x9KszaaxJjF+f4Hcl4KVaOAUSP1ySfu/HlYsu0OB/ PsjH6/9ZGf1G15Vudd1sQ1T5UYfmhbh1hyKilOdEM2fSpI6K92P3Q69v9EmZdPLebhin wOYQ== X-Gm-Message-State: ALoCoQkpkwYH/W0hPSeb+fQk44I1VJ7ef3exIDUfnqKd0e2e0JC1BFimLaE/L9X8RAoHjwLS88os X-Received: by 10.224.151.209 with SMTP id d17mr39109868qaw.45.1385310576490; Sun, 24 Nov 2013 08:29:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.96.63.101 with HTTP; Sun, 24 Nov 2013 08:29:06 -0800 (PST) In-Reply-To: <5291B2CC.2040907@gibfest.dk> References: <5290E0CF.20704@gibfest.dk> <5291B2CC.2040907@gibfest.dk> From: Eitan Adler Date: Sun, 24 Nov 2013 11:29:06 -0500 Message-ID: Subject: Re: ZFS (or something) is absurdly slow To: Thomas Steen Rasmussen Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Nov 2013 16:29:37 -0000 On Sun, Nov 24, 2013 at 3:03 AM, Thomas Steen Rasmussen wrote: > On 24-11-2013 04:15, Eitan Adler wrote: >> >> >>> vfsstat.d https://forums.freebsd.org/showpost.php?p=182070&postcount=6 >> >> I can run this script, what output should I be looking for? > > > Check the sample output on the page: It shows two lists, "Number > of operations" and "Bytes read or write". The lists are ordered > with the busiest at the bottom and seperated by filesystem > location. While things are slow, try running it to see if some > location on the filesystem is being hammered..... I will do so. > Another thing you should probably do is run a SMART check on the > disk to see if something is wrong with it. See the complete output below. The only thing which stands out to me is: 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 19275 > I had another case with > a zfs mirror that performed appalingly, turned out it was because > one of the disks was dodgy, not in a way that made zfs show > checksum errors, but enough to make it really really slow. Since a > ZFS vdev only performs as good as the slowest disk in a vdev, > which in turn will slow the whole pool down, replacing the disk > made everything much better. =============== smartctl 6.2 2013-07-26 r3841 [FreeBSD 11.0-CURRENT amd64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Momentus SpinPoint M8 (AF) Device Model: ST1000LM024 HN-M101MBB Serial Number: S2U5J9FCB79134 LU WWN Device Id: 5 0004cf 20904e7cf Firmware Version: 2AR10001 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sun Nov 24 11:23:21 2013 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (12780) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 213) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 25 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 089 089 025 Pre-fail Always - 3453 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 158 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 7285 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 280 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 180 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 155 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 055 047 000 Old_age Always - 45 (Min/Max 18/63) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 19275 223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 280 225 Load_Cycle_Count 0x0032 091 091 000 Old_age Always - 95664 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Completed [00% left] (0-65535) 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ===================== > What does diskinfo -ct /dev/whatever say about the seek times on the > bad disk ? [10005 root@gravity (100%) /home/eitan !2!]#diskinfo -ct /dev/ada1 /dev/ada1 512 # sectorsize 1000204886016 # mediasize in bytes (932G) 1953525168 # mediasize in sectors 4096 # stripesize 0 # stripeoffset 1938021 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. S2U5J9FCB79134 # Disk ident. I/O command overhead: time to read 10MB block 0.099713 sec = 0.005 msec/sector time to read 20480 sectors 1.447996 sec = 0.071 msec/sector calculated command overhead = 0.066 msec/sector Seek times: Full stroke: 250 iter in 8.036950 sec = 32.148 msec Half stroke: 250 iter in 5.463750 sec = 21.855 msec Quarter stroke: 500 iter in 10.542506 sec = 21.085 msec Short forward: 400 iter in 5.707363 sec = 14.268 msec Short backward: 400 iter in 4.645333 sec = 11.613 msec Seq outer: 2048 iter in 0.096977 sec = 0.047 msec Seq inner: 2048 iter in 1.853596 sec = 0.905 msec Transfer rates: outside: 102400 kbytes in 0.949048 sec = 107898 kbytes/sec middle: 102400 kbytes in 1.659245 sec = 61715 kbytes/sec inside: 102400 kbytes in 2.020322 sec = 50685 kbytes/sec > Are the results the same if you boot off of an usb stick and > test the disk when it is completely idle and independent of the running OS ? Good question. I can not check this at the moment. -- Eitan Adler From owner-freebsd-fs@FreeBSD.ORG Sun Nov 24 19:31:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B15DD1D5 for ; Sun, 24 Nov 2013 19:31:42 +0000 (UTC) Received: from mail-qa0-x235.google.com (mail-qa0-x235.google.com [IPv6:2607:f8b0:400d:c00::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 70F72217A for ; Sun, 24 Nov 2013 19:31:42 +0000 (UTC) Received: by mail-qa0-f53.google.com with SMTP id j5so5889887qaq.5 for ; Sun, 24 Nov 2013 11:31:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=lMqu7iPphZGbCiAcOKyWCgol4Zfq1QV1gPiJFPm9MTo=; b=l16cuuskRGV3ZN7PWWN57YVprLbB0u7kVdk3TGVaK0PVEj6VqSi1RtgNTp4kTJn/9o k+olr6ldr4S8bXTdUI+MqoQu9pFO9v0BGqOa5+1GWJIQZBLW2Y5ggmmcTpIU5C+UE/d9 mzGMK90T4nQaF6KjzDsD4htw1qSEfzUzkbXvASYX+Td/rudmEjF8QOw0mREqNQcDFJNE 7LZAS3ktbaICapnPcnTaYU/oWESjooYxFbzGvL+3JP8irLkZ9k5qAdmn9kOenHAbtBw+ dDsBlBPtnbYevaEiY5stsr1sy+w5cmwRIk5+0eb0h1ecJp1VpahA+AGQpECU2L+lC/u7 lqyQ== MIME-Version: 1.0 X-Received: by 10.49.28.226 with SMTP id e2mr5929083qeh.80.1385321501607; Sun, 24 Nov 2013 11:31:41 -0800 (PST) Received: by 10.224.36.137 with HTTP; Sun, 24 Nov 2013 11:31:41 -0800 (PST) In-Reply-To: References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca> <9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com> <5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com> <18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> Date: Sun, 24 Nov 2013 19:31:41 +0000 Message-ID: Subject: Re: Performance difference between UFS and ZFS with NFS From: krad To: Steven Hartland Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Nov 2013 19:31:42 -0000 I was thinking the same, if it was using /dev/zero as an input any compression would skew the results a little. On 24 November 2013 15:15, Steven Hartland wrote: > ----- Original Message ----- From: "Eric Browning" > > > > On a side note I forgot that I had used dd to test the disk performance a >> while ago when I was using ZFS. >> >> ZFS performance: >> 3072000000 bytes transferred in 34.167480 secs (89910055 bytes/sec) >> 34.17s real 0.61s user 31.89s sys >> >> UFS performance: >> 3072000000 bytes transferred in 11.848883 secs (259264942 bytes/sec) >> 11.85s real 0.58s user 11.25s sys >> >> Again, even with dd performance is about 3x faster with UFS with the same >> disks. >> > > Interesting, what was you command exactly? > > Regards > Steve > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of misdirection, > the recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Sun Nov 24 20:29:34 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DFCF7893 for ; Sun, 24 Nov 2013 20:29:34 +0000 (UTC) Received: from mout.gmx.net (mout.gmx.net [74.208.4.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B01B723FD for ; Sun, 24 Nov 2013 20:29:34 +0000 (UTC) Received: from [192.168.43.111] ([80.187.101.48]) by mail.gmx.com (mrgmxus001) with ESMTPSA (Nemesis) id 0MCtef-1VtAkW0w0x-009j3u for ; Sun, 24 Nov 2013 21:29:28 +0100 Message-ID: <5292619A.8020707@gmx.com> Date: Sun, 24 Nov 2013 21:29:14 +0100 From: Nikos Vassiliadis User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131103 Icedove/17.0.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: trying to grow a zvol panics the kernel on recent head Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:fqNKw8j/hwM+aI0DGs8e3PQvswoeVbRlpHiPKHqKNavP4Vrdr1V uKYmKzpUidLZ556OxY0CARc0m5WGvY2+em9kEJTBx3EUJ22tcIVnQCkLVcTIZGrRY4GEkUa 4lPFRG+sffFA4PpHYGaf1Bm1iQLfQBpLYXxqcCGfIB2mMyc98NNpRqhuPwfZNrlBOKn060v 1xEYxCvOY69Hb/psgoyHQ== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Nov 2013 20:29:34 -0000 Hi, While trying to grow a volume I got a kernel panic. Steps to reproduce: mdconfig -at swap -s 300M > md0 zpool create test10 /dev/md0 zfs create -V 250M test10/testvol zfs get volsize test10/testvol > NAME PROPERTY VALUE SOURCE > test10/testvol volsize 250M local zfs set volsize=280M test10/testvol > Unread portion of the kernel message buffer: > panic: solaris assert: !rrw_held(&dp->dp_config_rwlock, RW_READER), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c, line: 1055 > cpuid = 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00956f8500 > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00956f85b0 > vpanic() at vpanic+0x126/frame 0xfffffe00956f85f0 > panic() at panic+0x43/frame 0xfffffe00956f8650 > assfail() at assfail+0x22/frame 0xfffffe00956f8660 > dsl_pool_hold() at dsl_pool_hold+0x69/frame 0xfffffe00956f86a0 > dmu_objset_hold() at dmu_objset_hold+0x21/frame 0xfffffe00956f86e0 > dsl_prop_get_integer() at dsl_prop_get_integer+0x28/frame 0xfffffe00956f8720 > zvol_set_volsize() at zvol_set_volsize+0xca/frame 0xfffffe00956f87b0 > zfs_prop_set_special() at zfs_prop_set_special+0x3c4/frame 0xfffffe00956f8840 > zfs_set_prop_nvlist() at zfs_set_prop_nvlist+0x213/frame 0xfffffe00956f88c0 > zfs_ioc_set_prop() at zfs_ioc_set_prop+0x100/frame 0xfffffe00956f8920 > zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfffffe00956f89c0 > devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfffffe00956f8a20 > kern_ioctl() at kern_ioctl+0x2ca/frame 0xfffffe00956f8a90 > sys_ioctl() at sys_ioctl+0x11f/frame 0xfffffe00956f8ae0 > amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe00956f8bf0 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00956f8bf0 > --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019d326a, rsp = 0x7fffffffa898, rbp = 0x7fffffffa920 --- > KDB: enter: panic On recent head: r258425 Just reporting, Nikos From owner-freebsd-fs@FreeBSD.ORG Mon Nov 25 11:06:48 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 879D5556 for ; Mon, 25 Nov 2013 11:06:48 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6989B2F43 for ; Mon, 25 Nov 2013 11:06:48 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rAPB6mIl089842 for ; Mon, 25 Nov 2013 11:06:48 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id rAPB6m72089840 for freebsd-fs@FreeBSD.org; Mon, 25 Nov 2013 11:06:48 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 25 Nov 2013 11:06:48 GMT Message-Id: <201311251106.rAPB6m72089840@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Nov 2013 11:06:48 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/182570 fs [zfs] [patch] ZFS panic in receive o kern/182536 fs [zfs] zfs deadlock o kern/181966 fs [zfs] Kernel panic in ZFS I/O: solaris assert: BP_EQUA o kern/181834 fs [nfs] amd mounting NFS directories can drive a dead-lo o kern/181565 fs [swap] Problem with vnode-backed swap space. o kern/181377 fs [zfs] zfs recv causes an inconsistant pool o kern/181281 fs [msdosfs] stack trace after successfull 'umount /mnt' o kern/181082 fs [fuse] [ntfs] Write to mounted NTFS filesystem using F o kern/180979 fs [netsmb][patch]: Fix large files handling o kern/180876 fs [zfs] [hast] ZFS with trim,bio_flush or bio_delete loc o kern/180678 fs [NFS] succesfully exported filesystems being reported o kern/180438 fs [smbfs] [patch] mount_smbfs fails on arm because of wr p kern/180236 fs [zfs] [nullfs] Leakage free space using ZFS with nullf o kern/178854 fs [ufs] FreeBSD kernel crash in UFS s kern/178467 fs [zfs] [request] Optimized Checksum Code for ZFS o kern/178412 fs [smbfs] Coredump when smbfs mounted o kern/178388 fs [zfs] [patch] allow up to 8MB recordsize o kern/178387 fs [zfs] [patch] sparse files performance improvements o kern/178349 fs [zfs] zfs scrub on deduped data could be much less see o kern/178329 fs [zfs] extended attributes leak o kern/178238 fs [nullfs] nullfs don't release i-nodes on unlink. f kern/178231 fs [nfs] 8.3 nfsv4 client reports "nfsv4 client/server pr o kern/177985 fs [zfs] disk usage problem when copying from one zfs dat o kern/177971 fs [nfs] FreeBSD 9.1 nfs client dirlist problem w/ nfsv3, o kern/177966 fs [zfs] resilver completes but subsequent scrub reports o kern/177658 fs [ufs] FreeBSD panics after get full filesystem with uf o kern/177536 fs [zfs] zfs livelock (deadlock) with high write-to-disk o kern/177445 fs [hast] HAST panic o kern/177240 fs [zfs] zpool import failed with state UNAVAIL but all d o kern/176978 fs [zfs] [panic] zfs send -D causes "panic: System call i o kern/176857 fs [softupdates] [panic] 9.1-RELEASE/amd64/GENERIC panic o bin/176253 fs zpool(8): zfs pool indentation is misleading/wrong o kern/176141 fs [zfs] sharesmb=on makes errors for sharenfs, and still o kern/175950 fs [zfs] Possible deadlock in zfs after long uptime o kern/175897 fs [zfs] operations on readonly zpool hang o kern/175449 fs [unionfs] unionfs and devfs misbehaviour o kern/175179 fs [zfs] ZFS may attach wrong device on move o kern/175071 fs [ufs] [panic] softdep_deallocate_dependencies: unrecov o kern/174372 fs [zfs] Pagefault appears to be related to ZFS o kern/174315 fs [zfs] chflags uchg not supported o kern/174310 fs [zfs] root point mounting broken on CURRENT with multi o kern/174279 fs [ufs] UFS2-SU+J journal and filesystem corruption o kern/173830 fs [zfs] Brain-dead simple change to ZFS error descriptio o kern/173718 fs [zfs] phantom directory in zraid2 pool f kern/173657 fs [nfs] strange UID map with nfsuserd o kern/173363 fs [zfs] [panic] Panic on 'zpool replace' on readonly poo o kern/173136 fs [unionfs] mounting above the NFS read-only share panic o kern/172942 fs [smbfs] Unmounting a smb mount when the server became o kern/172348 fs [unionfs] umount -f of filesystem in use with readonly o kern/172334 fs [unionfs] unionfs permits recursive union mounts; caus o kern/171626 fs [tmpfs] tmpfs should be noisier when the requested siz o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161864 fs [ufs] removing journaling from UFS partition fails on o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic f kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server o kern/145750 fs [unionfs] [hang] unionfs locks the machine s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141950 fs [unionfs] [lor] ufs/unionfs/ufs Lock order reversal o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/137588 fs [unionfs] [lor] LOR nfs/ufs/nfs o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis p kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126973 fs [unionfs] [hang] System hang with unionfs and init chr o kern/126553 fs [unionfs] unionfs move directory problem 2 (files appe o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o bin/123574 fs [unionfs] df(1) -t option destroys info for unionfs (a o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o kern/121385 fs [unionfs] unionfs cross mount -> kernel panic o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/67326 fs [msdosfs] crash after attempt to mount write protected o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t o kern/9619 fs [nfs] Restarting mountd kills existing mounts 336 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 25 18:15:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E05EA14B for ; Mon, 25 Nov 2013 18:15:05 +0000 (UTC) Received: from mail-pb0-f51.google.com (mail-pb0-f51.google.com [209.85.160.51]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B5D7E2D4A for ; Mon, 25 Nov 2013 18:15:05 +0000 (UTC) Received: by mail-pb0-f51.google.com with SMTP id up15so6238824pbc.38 for ; Mon, 25 Nov 2013 10:14:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:cc:content-type; bh=fhk8DiJ/XaV5MIDRP8BXxfW3ccbS3pTfWaqW+beGOHY=; b=DUomUhcJIeW+k2YTmAyW0EJxXUxp2YNC0cB5fTzATmw9g7BszgujK6GTmumuM9RQbs GFeoY4EktAi+gXbjWIoWo5fI0efpQXFp+MwjmxX3s35+aEwIcOLi6ScM0NucUpvIY2Gb Mb23HhOkMqHVfFBx2glAeXMrLdhtXTNtQTQqcnuqkbBLrRTyVjoOeNRqr6EIQ+EpkIaI YBKjyV8Q/kKtnuBuDmxyojYIg3SIivDyV+z+andDs0bcnPShA0udgg94E1RZs37dgzPM y2bHQWudHaJI4tPlfA9PA0ccjlPgw5nDa746dF6tvFArlngk2ZBnDW238NxTGH1kPOhk UW4A== X-Gm-Message-State: ALoCoQlSw59o/CDPolv/Yq4pvBQyrI3ti2GTB2slEm/JNwuuV6kdJCagEva/747ad9ivC+UUEGhZ MIME-Version: 1.0 X-Received: by 10.66.26.17 with SMTP id h17mt2738863pag.181.1385403299182; Mon, 25 Nov 2013 10:14:59 -0800 (PST) Received: by 10.70.102.133 with HTTP; Mon, 25 Nov 2013 10:14:59 -0800 (PST) In-Reply-To: References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca> <9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com> <5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com> <18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> Date: Mon, 25 Nov 2013 11:14:59 -0700 Message-ID: Subject: Re: Performance difference between UFS and ZFS with NFS From: Eric Browning Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Nov 2013 18:15:06 -0000 I am using /dev/zero /usr/bin/time -h dd if=/dev/zero of=sometestfile bs=1024 count=3000000 On Sun, Nov 24, 2013 at 12:31 PM, krad wrote: > I was thinking the same, if it was using /dev/zero as an input any > compression would skew the results a little. > > > On 24 November 2013 15:15, Steven Hartland wrote: > >> ----- Original Message ----- From: "Eric Browning" >> >> >> >> On a side note I forgot that I had used dd to test the disk performance a >>> while ago when I was using ZFS. >>> >>> ZFS performance: >>> 3072000000 bytes transferred in 34.167480 secs (89910055 bytes/sec) >>> 34.17s real 0.61s user 31.89s sys >>> >>> UFS performance: >>> 3072000000 bytes transferred in 11.848883 secs (259264942 bytes/sec) >>> 11.85s real 0.58s user 11.25s sys >>> >>> Again, even with dd performance is about 3x faster with UFS with the same >>> disks. >>> >> >> Interesting, what was you command exactly? >> >> Regards >> Steve >> >> ================================================ >> This e.mail is private and confidential between Multiplay (UK) Ltd. and >> the person or entity to whom it is addressed. In the event of misdirection, >> the recipient is prohibited from using, copying, printing or otherwise >> disseminating it or any information contained in it. >> In the event of misdirection, illegible or incomplete transmission please >> telephone +44 845 868 1337 >> or return the E.mail to postmaster@multiplay.co.uk. >> >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > > -- Eric Browning Systems Administrator 801-984-7623 Skaggs Catholic Center Juan Diego Catholic High School Saint John the Baptist Middle Saint John the Baptist Elementary From owner-freebsd-fs@FreeBSD.ORG Mon Nov 25 18:18:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E63EC38D for ; Mon, 25 Nov 2013 18:18:31 +0000 (UTC) Received: from mail-oa0-x22f.google.com (mail-oa0-x22f.google.com [IPv6:2607:f8b0:4003:c02::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id ACAE02D8B for ; Mon, 25 Nov 2013 18:18:31 +0000 (UTC) Received: by mail-oa0-f47.google.com with SMTP id k1so4791828oag.34 for ; Mon, 25 Nov 2013 10:18:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=6iI09ssepuyL7KNKZcPVnh5VLoQTgjIWsLghcDdCMeM=; b=KYFsTFN+WlR9RIc0JdrEjkwZaY1v/eOie65GJOPIVZ5EbUa+5G6ZminkUie0EbhpFd V9LCNuWRZ11TFL7jgaduheRMrRWuTVnxh2d38M2jD382kk5BXSw3lAAy+MtEKWMZyk0A oZpvHIg1ok9z6GGfMKmNLcQyOdbBSY26qH5NSDvD0CuPG+5uF7XGmUDF3x+FWFczLNry s8yvFqiLBTGaCDvRfvydNhAId1qnPw4MtUScr4i8aylORmgWKnyBOkbwtnFVAuhSKek9 EI2//Q9DTUVArleKm9vxdCF9kEYf8gp5Eam+XbFKdM+y5nvDARGgClHJS3yv5CwMXmQ0 ztog== MIME-Version: 1.0 X-Received: by 10.60.93.67 with SMTP id cs3mr26037297oeb.12.1385403510984; Mon, 25 Nov 2013 10:18:30 -0800 (PST) Received: by 10.76.132.9 with HTTP; Mon, 25 Nov 2013 10:18:30 -0800 (PST) In-Reply-To: References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca> <9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com> <5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com> <18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> Date: Mon, 25 Nov 2013 10:18:30 -0800 Message-ID: Subject: Re: Performance difference between UFS and ZFS with NFS From: Freddie Cash To: Eric Browning Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Nov 2013 18:18:32 -0000 On Mon, Nov 25, 2013 at 10:14 AM, Eric Browning < ericbrowning@skaggscatholiccenter.org> wrote: > I am using /dev/zero > > /usr/bin/time -h dd if=3D/dev/zero of=3Dsometestfile bs=3D1024 count=3D30= 00000 > You really shouldn't use /dev/zero to "benchmark" ZFS. It doesn't work the way you think it will, especially if dedupe or compression are enabled. Either use a proper filesystem benchmarking tool like iozone or bonnie++ or fio; or create a big file using /dev/random and then dd that file to various places in the filesystem.=E2=80=8B=E2=80=8B # =E2=80=8Bdd if=3D/dev/random of=3Dbigfile.100M bs=3D1M count=3D100=E2=80= =8B =E2=80=8BReboot to clear caches. Then use if=3Dbigfile.100M in your actual= testing.=E2=80=8B --=20 Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Mon Nov 25 18:21:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 17E2F557 for ; Mon, 25 Nov 2013 18:21:05 +0000 (UTC) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A67D12DDC for ; Mon, 25 Nov 2013 18:21:04 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50006825437.msg for ; Mon, 25 Nov 2013 18:20:55 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 25 Nov 2013 18:20:55 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=10418f1d7e=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Eric Browning" References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca><9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com><5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com><18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> Subject: Re: Performance difference between UFS and ZFS with NFS Date: Mon, 25 Nov 2013 18:20:47 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Nov 2013 18:21:05 -0000 And whats your ZFS pool layout and ashift? Regards Steve ----- Original Message ----- From: "Eric Browning" >I am using /dev/zero > > /usr/bin/time -h dd if=/dev/zero of=sometestfile bs=1024 count=3000000 > > > On Sun, Nov 24, 2013 at 12:31 PM, krad wrote: > >> I was thinking the same, if it was using /dev/zero as an input any >> compression would skew the results a little. >> >> >> On 24 November 2013 15:15, Steven Hartland wrote: >> >>> ----- Original Message ----- From: "Eric Browning" >>> >>> >>> >>> On a side note I forgot that I had used dd to test the disk performance a >>>> while ago when I was using ZFS. >>>> >>>> ZFS performance: >>>> 3072000000 bytes transferred in 34.167480 secs (89910055 bytes/sec) >>>> 34.17s real 0.61s user 31.89s sys >>>> >>>> UFS performance: >>>> 3072000000 bytes transferred in 11.848883 secs (259264942 bytes/sec) >>>> 11.85s real 0.58s user 11.25s sys >>>> >>>> Again, even with dd performance is about 3x faster with UFS with the same >>>> disks. >>>> >>> >>> Interesting, what was you command exactly? >>> >>> Regards >>> Steve >>> >>> ================================================ >>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>> the person or entity to whom it is addressed. In the event of misdirection, >>> the recipient is prohibited from using, copying, printing or otherwise >>> disseminating it or any information contained in it. >>> In the event of misdirection, illegible or incomplete transmission please >>> telephone +44 845 868 1337 >>> or return the E.mail to postmaster@multiplay.co.uk. >>> >>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> >> >> > > > -- > Eric Browning > Systems Administrator > 801-984-7623 > > Skaggs Catholic Center > Juan Diego Catholic High School > Saint John the Baptist Middle > Saint John the Baptist Elementary > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 25 18:35:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3AF7BD9B for ; Mon, 25 Nov 2013 18:35:21 +0000 (UTC) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C925F2ED0 for ; Mon, 25 Nov 2013 18:35:20 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50006825578.msg for ; Mon, 25 Nov 2013 18:35:19 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 25 Nov 2013 18:35:19 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=10418f1d7e=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Steven Hartland" , "Eric Browning" References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca><9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com><5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com><18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> Subject: Re: Performance difference between UFS and ZFS with NFS Date: Mon, 25 Nov 2013 18:35:10 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Nov 2013 18:35:21 -0000 Just tried this here and I'm seeing CPU saturation from dd, is this your case too? Switching to: /usr/bin/time -h dd if=/dev/zero of=/home/smh/testfile bs=1m count=2930 Iit changed from: 3072000000 bytes transferred in 170.102366 secs (18059714 bytes/sec) 2m50.48s real 1.05s user 2m48.28s sys to: 3072327680 bytes transferred in 14.094856 secs (217975102 bytes/sec) 15.19s real 0.00s user 4.87s sys Regards Steve ----- Original Message ----- From: "Steven Hartland" To: "Eric Browning" Cc: "FreeBSD FS" Sent: Monday, November 25, 2013 6:20 PM Subject: Re: Performance difference between UFS and ZFS with NFS > And whats your ZFS pool layout and ashift? > > Regards > Steve > ----- Original Message ----- > From: "Eric Browning" > > >>I am using /dev/zero >> >> /usr/bin/time -h dd if=/dev/zero of=sometestfile bs=1024 count=3000000 >> >> >> On Sun, Nov 24, 2013 at 12:31 PM, krad wrote: >> >>> I was thinking the same, if it was using /dev/zero as an input any >>> compression would skew the results a little. >>> >>> >>> On 24 November 2013 15:15, Steven Hartland wrote: >>> >>>> ----- Original Message ----- From: "Eric Browning" >>>> >>>> >>>> >>>> On a side note I forgot that I had used dd to test the disk performance a >>>>> while ago when I was using ZFS. >>>>> >>>>> ZFS performance: >>>>> 3072000000 bytes transferred in 34.167480 secs (89910055 bytes/sec) >>>>> 34.17s real 0.61s user 31.89s sys >>>>> >>>>> UFS performance: >>>>> 3072000000 bytes transferred in 11.848883 secs (259264942 bytes/sec) >>>>> 11.85s real 0.58s user 11.25s sys >>>>> >>>>> Again, even with dd performance is about 3x faster with UFS with the same >>>>> disks. >>>>> >>>> >>>> Interesting, what was you command exactly? >>>> >>>> Regards >>>> Steve >>>> >>>> ================================================ >>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>>> the person or entity to whom it is addressed. In the event of misdirection, >>>> the recipient is prohibited from using, copying, printing or otherwise >>>> disseminating it or any information contained in it. >>>> In the event of misdirection, illegible or incomplete transmission please >>>> telephone +44 845 868 1337 >>>> or return the E.mail to postmaster@multiplay.co.uk. >>>> >>>> >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>> >>> >>> >> >> >> -- >> Eric Browning >> Systems Administrator >> 801-984-7623 >> >> Skaggs Catholic Center >> Juan Diego Catholic High School >> Saint John the Baptist Middle >> Saint John the Baptist Elementary >> > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the > event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any > information contained in it. > In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 25 19:24:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A7F6F41C for ; Mon, 25 Nov 2013 19:24:50 +0000 (UTC) Received: from mail-pd0-f169.google.com (mail-pd0-f169.google.com [209.85.192.169]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7C2CD2292 for ; Mon, 25 Nov 2013 19:24:50 +0000 (UTC) Received: by mail-pd0-f169.google.com with SMTP id v10so6165433pde.0 for ; Mon, 25 Nov 2013 11:24:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=XN48ZqSFwqVYByCVeXEd9UTSPyXHjoeGUbr8JXgIbKg=; b=C/2gWoO7zgk87CfN749I6lGXS0eyjNTL3yv7Rw6mhT+EQcVp0+ivPX98HDsgcUehhK UywmuxlqZssEDcTv+Q7NKE39YzeNFrJFFTl05AuGGFJ/dEDlQwTk1pRrfjfpYoEOv49L aT6NqgfhJUAz6zFjlIe9k8LJMEwzcbriqqpCLcIIjqow23hzEnBpbt6I8SVtAjfpYpAe a6lfDFvtHFS1UYV98kcY6tWEiaZnx4QdLXV55v79RW2cuGTmYmy96TYm5xS2cJ7pn1nz DTJKA89mx+qDFTYRCW/CkTLAZKxkffi6uC5Cm2sH2uZZTUQ/JUmQLi8ej9IfsVToHzFw ewPg== X-Gm-Message-State: ALoCoQlfH3F/wUns7dGLLg4I4diQBStw9vD+Bjp1vogYPwSGtvP8zEHbKF3Q61dUiMo81STkggkE MIME-Version: 1.0 X-Received: by 10.68.170.66 with SMTP id ak2mr28911986pbc.5.1385407489698; Mon, 25 Nov 2013 11:24:49 -0800 (PST) Received: by 10.70.102.133 with HTTP; Mon, 25 Nov 2013 11:24:49 -0800 (PST) In-Reply-To: References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca> <9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com> <5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com> <18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> Date: Mon, 25 Nov 2013 12:24:49 -0700 Message-ID: Subject: Re: Performance difference between UFS and ZFS with NFS From: Eric Browning To: Steven Hartland Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Nov 2013 19:24:50 -0000 Steven, I've tried to 4K align these SSDs with gnop but they are currently ashift 9. Pool layout is just 4 drives in a zfs stripe. I've also tried raidz1 with no noticeable performance impacts other than a loss of space for parity. On Mon, Nov 25, 2013 at 11:35 AM, Steven Hartland wrote: > Just tried this here and I'm seeing CPU saturation from dd, is this > your case too? > > Switching to: > /usr/bin/time -h dd if=/dev/zero of=/home/smh/testfile bs=1m count=2930 > > Iit changed from: > 3072000000 bytes transferred in 170.102366 secs (18059714 bytes/sec) > 2m50.48s real 1.05s user 2m48.28s sys > > to: > 3072327680 bytes transferred in 14.094856 secs (217975102 bytes/sec) > 15.19s real 0.00s user 4.87s sys > > Regards > Steve > ----- Original Message ----- From: "Steven Hartland" < > killing@multiplay.co.uk> > To: "Eric Browning" > Cc: "FreeBSD FS" > Sent: Monday, November 25, 2013 6:20 PM > Subject: Re: Performance difference between UFS and ZFS with NFS > > > > And whats your ZFS pool layout and ashift? >> >> Regards >> Steve >> ----- Original Message ----- From: "Eric Browning" > skaggscatholiccenter.org> >> >> >> I am using /dev/zero >>> >>> /usr/bin/time -h dd if=/dev/zero of=sometestfile bs=1024 count=3000000 >>> >>> >>> On Sun, Nov 24, 2013 at 12:31 PM, krad wrote: >>> >>> I was thinking the same, if it was using /dev/zero as an input any >>>> compression would skew the results a little. >>>> >>>> >>>> On 24 November 2013 15:15, Steven Hartland >>>> wrote: >>>> >>>> ----- Original Message ----- From: "Eric Browning" >>>>> >>>>> >>>>> >>>>> On a side note I forgot that I had used dd to test the disk >>>>> performance a >>>>> >>>>>> while ago when I was using ZFS. >>>>>> >>>>>> ZFS performance: >>>>>> 3072000000 bytes transferred in 34.167480 secs (89910055 bytes/sec) >>>>>> 34.17s real 0.61s user 31.89s sys >>>>>> >>>>>> UFS performance: >>>>>> 3072000000 bytes transferred in 11.848883 secs (259264942 bytes/sec) >>>>>> 11.85s real 0.58s user 11.25s sys >>>>>> >>>>>> Again, even with dd performance is about 3x faster with UFS with the >>>>>> same >>>>>> disks. >>>>>> >>>>>> >>>>> Interesting, what was you command exactly? >>>>> >>>>> Regards >>>>> Steve >>>>> >>>>> ================================================ >>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>>>> the person or entity to whom it is addressed. In the event of >>>>> misdirection, >>>>> the recipient is prohibited from using, copying, printing or otherwise >>>>> disseminating it or any information contained in it. >>>>> In the event of misdirection, illegible or incomplete transmission >>>>> please >>>>> telephone +44 845 868 1337 >>>>> or return the E.mail to postmaster@multiplay.co.uk. >>>>> >>>>> >>>>> _______________________________________________ >>>>> freebsd-fs@freebsd.org mailing list >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>>> >>>>> >>>> >>>> >>> >>> -- >>> Eric Browning >>> Systems Administrator >>> 801-984-7623 >>> >>> Skaggs Catholic Center >>> Juan Diego Catholic High School >>> Saint John the Baptist Middle >>> Saint John the Baptist Elementary >>> >>> >> ================================================ >> This e.mail is private and confidential between Multiplay (UK) Ltd. and >> the person or entity to whom it is addressed. In the event of misdirection, >> the recipient is prohibited from using, copying, printing or otherwise >> disseminating it or any information contained in it. >> In the event of misdirection, illegible or incomplete transmission please >> telephone +44 845 868 1337 >> or return the E.mail to postmaster@multiplay.co.uk. >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> >> > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of misdirection, > the recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > -- Eric Browning Systems Administrator 801-984-7623 Skaggs Catholic Center Juan Diego Catholic High School Saint John the Baptist Middle Saint John the Baptist Elementary From owner-freebsd-fs@FreeBSD.ORG Mon Nov 25 21:28:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B5F619A6 for ; Mon, 25 Nov 2013 21:28:05 +0000 (UTC) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4262B2AB8 for ; Mon, 25 Nov 2013 21:28:04 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50006826930.msg for ; Mon, 25 Nov 2013 21:28:01 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 25 Nov 2013 21:28:01 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=10418f1d7e=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <50BA5B82E6FE46CFA0757216F13390BA@multiplay.co.uk> From: "Steven Hartland" To: "Eric Browning" References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca><9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com><5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com><18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> Subject: Re: Performance difference between UFS and ZFS with NFS Date: Mon, 25 Nov 2013 21:27:52 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Nov 2013 21:28:05 -0000 I'm pretty sure when you retest you'll see dd at 100% cpu, you could even try this for just a few seconds to confirm. Basially ZFS is doing a lot more under the covers than UFS which means that transfering 1k blocks in a serial manor is resulting cpu saturation and not testing throughput to the disks. Regards Steve ----- Original Message ----- From: "Eric Browning" To: "Steven Hartland" Cc: "FreeBSD FS" Sent: Monday, November 25, 2013 7:24 PM Subject: Re: Performance difference between UFS and ZFS with NFS > Steven, > > I've tried to 4K align these SSDs with gnop but they are currently ashift > 9. Pool layout is just 4 drives in a zfs stripe. I've also tried raidz1 > with no noticeable performance impacts other than a loss of space for > parity. > > > On Mon, Nov 25, 2013 at 11:35 AM, Steven Hartland > wrote: > >> Just tried this here and I'm seeing CPU saturation from dd, is this >> your case too? >> >> Switching to: >> /usr/bin/time -h dd if=/dev/zero of=/home/smh/testfile bs=1m count=2930 >> >> Iit changed from: >> 3072000000 bytes transferred in 170.102366 secs (18059714 bytes/sec) >> 2m50.48s real 1.05s user 2m48.28s sys >> >> to: >> 3072327680 bytes transferred in 14.094856 secs (217975102 bytes/sec) >> 15.19s real 0.00s user 4.87s sys >> >> Regards >> Steve >> ----- Original Message ----- From: "Steven Hartland" < >> killing@multiplay.co.uk> >> To: "Eric Browning" >> Cc: "FreeBSD FS" >> Sent: Monday, November 25, 2013 6:20 PM >> Subject: Re: Performance difference between UFS and ZFS with NFS >> >> >> >> And whats your ZFS pool layout and ashift? >>> >>> Regards >>> Steve >>> ----- Original Message ----- From: "Eric Browning" >> skaggscatholiccenter.org> >>> >>> >>> I am using /dev/zero >>>> >>>> /usr/bin/time -h dd if=/dev/zero of=sometestfile bs=1024 count=3000000 >>>> >>>> >>>> On Sun, Nov 24, 2013 at 12:31 PM, krad wrote: >>>> >>>> I was thinking the same, if it was using /dev/zero as an input any >>>>> compression would skew the results a little. >>>>> >>>>> >>>>> On 24 November 2013 15:15, Steven Hartland >>>>> wrote: >>>>> >>>>> ----- Original Message ----- From: "Eric Browning" >>>>>> >>>>>> >>>>>> >>>>>> On a side note I forgot that I had used dd to test the disk >>>>>> performance a >>>>>> >>>>>>> while ago when I was using ZFS. >>>>>>> >>>>>>> ZFS performance: >>>>>>> 3072000000 bytes transferred in 34.167480 secs (89910055 bytes/sec) >>>>>>> 34.17s real 0.61s user 31.89s sys >>>>>>> >>>>>>> UFS performance: >>>>>>> 3072000000 bytes transferred in 11.848883 secs (259264942 bytes/sec) >>>>>>> 11.85s real 0.58s user 11.25s sys >>>>>>> >>>>>>> Again, even with dd performance is about 3x faster with UFS with the >>>>>>> same >>>>>>> disks. >>>>>>> >>>>>>> >>>>>> Interesting, what was you command exactly? >>>>>> >>>>>> Regards >>>>>> Steve >>>>>> >>>>>> ================================================ >>>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>>>>> the person or entity to whom it is addressed. In the event of >>>>>> misdirection, >>>>>> the recipient is prohibited from using, copying, printing or otherwise >>>>>> disseminating it or any information contained in it. >>>>>> In the event of misdirection, illegible or incomplete transmission >>>>>> please >>>>>> telephone +44 845 868 1337 >>>>>> or return the E.mail to postmaster@multiplay.co.uk. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> freebsd-fs@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> -- >>>> Eric Browning >>>> Systems Administrator >>>> 801-984-7623 >>>> >>>> Skaggs Catholic Center >>>> Juan Diego Catholic High School >>>> Saint John the Baptist Middle >>>> Saint John the Baptist Elementary >>>> >>>> >>> ================================================ >>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>> the person or entity to whom it is addressed. In the event of misdirection, >>> the recipient is prohibited from using, copying, printing or otherwise >>> disseminating it or any information contained in it. >>> In the event of misdirection, illegible or incomplete transmission please >>> telephone +44 845 868 1337 >>> or return the E.mail to postmaster@multiplay.co.uk. >>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> >>> >> >> ================================================ >> This e.mail is private and confidential between Multiplay (UK) Ltd. and >> the person or entity to whom it is addressed. In the event of misdirection, >> the recipient is prohibited from using, copying, printing or otherwise >> disseminating it or any information contained in it. >> In the event of misdirection, illegible or incomplete transmission please >> telephone +44 845 868 1337 >> or return the E.mail to postmaster@multiplay.co.uk. >> >> > > > -- > Eric Browning > Systems Administrator > 801-984-7623 > > Skaggs Catholic Center > Juan Diego Catholic High School > Saint John the Baptist Middle > Saint John the Baptist Elementary > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Tue Nov 26 01:43:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BC69390D for ; Tue, 26 Nov 2013 01:43:28 +0000 (UTC) Received: from wonkity.com (wonkity.com [67.158.26.137]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 698BA2870 for ; Tue, 26 Nov 2013 01:43:28 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.7/8.14.7) with ESMTP id rAQ1hQ8e049666; Mon, 25 Nov 2013 18:43:26 -0700 (MST) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.7/8.14.7/Submit) with ESMTP id rAQ1hQxL049663; Mon, 25 Nov 2013 18:43:26 -0700 (MST) (envelope-from wblock@wonkity.com) Date: Mon, 25 Nov 2013 18:43:26 -0700 (MST) From: Warren Block To: Eric Browning Subject: Re: Performance difference between UFS and ZFS with NFS In-Reply-To: Message-ID: References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca> <9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com> <5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com> <18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (wonkity.com [127.0.0.1]); Mon, 25 Nov 2013 18:43:26 -0700 (MST) Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Nov 2013 01:43:28 -0000 On Mon, 25 Nov 2013, Eric Browning wrote: > I've tried to 4K align these SSDs with gnop but they are currently ashift > 9. Those are two different things. Alignment is controlled by partition starting block and size. Both should be integer multiples of 4K. Using gnop to force ashift=12 just makes sure ZFS is using 4K blocks, it does not force alignment. From owner-freebsd-fs@FreeBSD.ORG Tue Nov 26 01:50:29 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0CA1BC2E for ; Tue, 26 Nov 2013 01:50:29 +0000 (UTC) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9900D28B6 for ; Tue, 26 Nov 2013 01:50:27 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50006828948.msg for ; Tue, 26 Nov 2013 01:50:24 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 26 Nov 2013 01:50:24 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1042f5e7dd=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Warren Block" , "Eric Browning" References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca> <9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com> <5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com> <18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> Subject: Re: Performance difference between UFS and ZFS with NFS Date: Tue, 26 Nov 2013 01:50:15 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Nov 2013 01:50:29 -0000 ----- Original Message ----- From: "Warren Block" > On Mon, 25 Nov 2013, Eric Browning wrote: > >> I've tried to 4K align these SSDs with gnop but they are currently ashift >> 9. > > Those are two different things. Alignment is controlled by partition > starting block and size. Both should be integer multiples of 4K. > > Using gnop to force ashift=12 just makes sure ZFS is using 4K blocks, it > does not force alignment. stable/10 and current/11 have dynamic ashift support. It still requires either the disk report 4k sectors or a cam quirk. So if you believe your disk is 4k but its not reporting as such let us know. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Tue Nov 26 20:51:56 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 25A8C6F2 for ; Tue, 26 Nov 2013 20:51:56 +0000 (UTC) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) by mx1.freebsd.org (Postfix) with ESMTP id DE9532C60 for ; Tue, 26 Nov 2013 20:51:55 +0000 (UTC) Received: from inetmail.dmz (unknown [10.3.0.4]) by dss.incore.de (Postfix) with ESMTP id B2E785C025 for ; Tue, 26 Nov 2013 21:51:47 +0100 (CET) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.4]) (amavisd-new, port 10024) with LMTP id 90T0O9fxIiV2 for ; Tue, 26 Nov 2013 21:51:46 +0100 (CET) Received: from mail.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id E3BB05C01D for ; Tue, 26 Nov 2013 21:51:46 +0100 (CET) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.incore (Postfix) with ESMTP id 989DD508BA for ; Tue, 26 Nov 2013 21:51:46 +0100 (CET) Message-ID: <529509E2.4090903@incore.de> Date: Tue, 26 Nov 2013 21:51:46 +0100 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: [ffs] ffs_valloc: free inode /home/19 had 128 blocks Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Nov 2013 20:51:56 -0000 I run FreeBSD 8.4-STABLE #0 r256119 and I am confused about the message coming from the code snippet if (DIP(ip, i_blocks) && (fs->fs_flags & FS_UNCLEAN) == 0) { /* XXX */ printf("free inode %s/%lu had %ld blocks\n", fs->fs_fsmnt, (u_long)ino, (long)DIP(ip, i_blocks)); DIP_SET(ip, i_blocks, 0); } in the function ffs_valloc(). I see these kernel messages often when a snapshot is taken on a gjournaled ufs partition with mount -u -o noatime -o async -o snapshot /home/.snap/snaptest /home The inode number (19) is always the inode number of the snapshot file. I would like to know if the 128 blocks are lost forever. If "yes" there is really a problem. If "no" then the message is not relevant for gjournaled file systems and fs->fs_flags should be tested in this way: (fs->fs_flags & FS_UNCLEAN|FS_GJOURNAL) == 0 -- Andreas Longwitz From owner-freebsd-fs@FreeBSD.ORG Tue Nov 26 22:44:22 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4E19F75E for ; Tue, 26 Nov 2013 22:44:22 +0000 (UTC) Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39]) by mx1.freebsd.org (Postfix) with ESMTP id 0AA4A2412 for ; Tue, 26 Nov 2013 22:44:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mx1.internetx.com (Postfix) with ESMTP id 617324C4DB62 for ; Tue, 26 Nov 2013 23:35:45 +0100 (CET) X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de Received: from mx1.internetx.com ([62.116.129.39]) by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8JrIMjYWYqgR for ; Tue, 26 Nov 2013 23:35:43 +0100 (CET) Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.internetx.com (Postfix) with ESMTPSA id 397654C4DB60 for ; Tue, 26 Nov 2013 23:35:43 +0100 (CET) Message-ID: <5295223E.6030602@internetx.com> Date: Tue, 26 Nov 2013 23:35:42 +0100 From: InterNetX - Juergen Gotteswinter User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Subject: General reliability rating of FreeBSD 9.x newnfs v3 + statdt/lockd X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list Reply-To: jg@internetx.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Nov 2013 22:44:22 -0000 Hi folks, so, i am running several ZFS Based Filers, mostly tracking freebsd 9 current. but there are also release stable setups. hardware is well selected for zfs, lsi hba, no diskcache enabled, server grade ssd, ecc ram (dell servers basicly, with customized configuratoin). clients are mostly linux (centos) with usual mount options, r/wsize 32k ,noatime etc. application running on the clients is different, some vm containers, web content, a well mixed of everything. generally, i get more and more the impression that for example compared to the linux nfs server the freebsd one is not that reliable. every few weeks i run into problems which look like the statd or lockd freezed, no reaction when trying to restart, doesnt take care about a -9 signal. it seems, from client side, that the process is a "slow" one, which looks like its going on for a few hours and at one point the nfs server is done and needs to be rebooted. i am not looking for a solution here, just asking for others impressions. so, tell me your story, i whould love to hear. thanks! From owner-freebsd-fs@FreeBSD.ORG Tue Nov 26 23:41:15 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 56B22160; Tue, 26 Nov 2013 23:41:15 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id C54242744; Tue, 26 Nov 2013 23:41:14 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAOQwlVKDaFve/2dsb2JhbABRCBaDKVOCergJgT90giwjBFJEGQIEVQYRHYdmDa5LkRIMC44mIhkbB4JrgUgDiUKGb4kTkGODRh4EgWo X-IronPort-AV: E=Sophos;i="4.93,778,1378872000"; d="scan'208";a="72619630" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 26 Nov 2013 18:41:13 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 707B1B3F2B; Tue, 26 Nov 2013 18:41:13 -0500 (EST) Date: Tue, 26 Nov 2013 18:41:13 -0500 (EST) From: Rick Macklem To: FreeBSD FS Message-ID: <731168702.21452440.1385509273449.JavaMail.root@uoguelph.ca> In-Reply-To: <1139579526.21452374.1385509250511.JavaMail.root@uoguelph.ca> Subject: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_21452438_736821265.1385509273446" X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: Kostik Belousov X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Nov 2013 23:41:15 -0000 ------=_Part_21452438_736821265.1385509273446 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi, The current NFS client does a synchronous write to the server when a non-contiguous write to the same buffer cache block occurs. This is done because there is a single dirty byte range recorded in the buf structure. This results in a lot of synchronous writes for software builds (I believe it is the loader that loves to write small non-contiguous chunks to its output file). Some users disable synchronous writing on the server to improve performance, but this puts them at risk of data loss when the server crashes. Long ago jhb@ emailed me a small patch that avoided the synchronous writes by simply making the dirty byte range a superset of the bytes written. The problem with doing this is that for a rare (possibly non-existent) application that writes non-overlapping byte ranges to the same file from multiple clients concurrently, some of these writes might get lost by stale data in the superset of the byte range being written back to the server. (Crappy, run on sentence, but hopefully it makes sense;-) I created a patch that maintained a list of dirty byte ranges. It was complicated and I found that the list often had to be > 100 entries to avoid the synchronous writes. So, I think his solution is preferable, although I've added a couple of tweaks: - The synchronous writes (old/current algorithm) is still used if there has been file locking done on the file. (I think any app. that writes a file from multiple clients will/should use file locking.) - The synchronous writes (old/current algorithm) is used if a sysctl is set. This will avoid breakage for any app. (if there is one) that writes a file from multiple clients without doing file locking. For testing on my very slow single core hardware, I see about a 10% improvement in kernel build times, but with fewer I/O RPCs: Read RPCs Write RPCs old/current 50K 122K patched 39K 40K --> it reduced the Read RPC count by about 20% and cut the Write RPC count to 1/3rd. I think jhb@ saw pretty good performance results with his patch. Anyhow, the patch is attached and can also be found here: http://people.freebsd.org/~rmacklem/noncontig-write.patch I'd like to invite folks to comment/review/test this patch, since I think it is ready for head/current. Thanks, rick ps: Kostik, maybe you could look at it. In particular, I am wondering if I zero'd out the buffer the correct way, via vfs_bio_bzero_buf()? ------=_Part_21452438_736821265.1385509273446 Content-Type: text/x-patch; name=noncontig-write.patch Content-Disposition: attachment; filename=noncontig-write.patch Content-Transfer-Encoding: base64 LS0tIGZzL25mc2NsaWVudC9uZnNfY2xiaW8uYy5vcmlnCTIwMTMtMDgtMjggMTg6NDU6NDEuMDAw MDAwMDAwIC0wNDAwCisrKyBmcy9uZnNjbGllbnQvbmZzX2NsYmlvLmMJMjAxMy0xMS0yNSAyMTo0 MjoxNi4wMDAwMDAwMDAgLTA1MDAKQEAgLTcyLDYgKzcyLDEyIEBAIGV4dGVybiBpbnQgbmZzX2tl ZXBfZGlydHlfb25fZXJyb3I7CiAKIGludCBuY2xfcGJ1Zl9mcmVlY250ID0gLTE7CS8qIHN0YXJ0 IG91dCB1bmxpbWl0ZWQgKi8KIAorU1lTQ1RMX0RFQ0woX3Zmc19uZnMpOworCitzdGF0aWMgaW50 CW5jbF9vbGRub25jb250aWd3cml0aW5nID0gMDsKK1NZU0NUTF9JTlQoX3Zmc19uZnMsIE9JRF9B VVRPLCBvbGRfbm9uY29udGlnX3dyaXRpbmcsIENUTEZMQUdfUlcsCisJICAgJm5jbF9vbGRub25j b250aWd3cml0aW5nLCAwLCAiTkZTIHVzZSBvbGQgbm9uY29udGlnIHdyaXRpbmcgYWxnIik7CisK IHN0YXRpYyBzdHJ1Y3QgYnVmICpuZnNfZ2V0Y2FjaGVibGsoc3RydWN0IHZub2RlICp2cCwgZGFk ZHJfdCBibiwgaW50IHNpemUsCiAgICAgc3RydWN0IHRocmVhZCAqdGQpOwogc3RhdGljIGludCBu ZnNfZGlyZWN0aW9fd3JpdGUoc3RydWN0IHZub2RlICp2cCwgc3RydWN0IHVpbyAqdWlvcCwKQEAg LTg3NCw3ICs4ODAsNyBAQCBuY2xfd3JpdGUoc3RydWN0IHZvcF93cml0ZV9hcmdzICphcCkKIAlz dHJ1Y3QgdmF0dHIgdmF0dHI7CiAJc3RydWN0IG5mc21vdW50ICpubXAgPSBWRlNUT05GUyh2cC0+ dl9tb3VudCk7CiAJZGFkZHJfdCBsYm47Ci0JaW50IGJjb3VudDsKKwlpbnQgYmNvdW50LCBub25j b250aWdfd3JpdGUsIG9iY291bnQ7CiAJaW50IGJwX2NhY2hlZCwgbiwgb24sIGVycm9yID0gMCwg ZXJyb3IxOwogCXNpemVfdCBvcmlnX3Jlc2lkLCBsb2NhbF9yZXNpZDsKIAlvZmZfdCBvcmlnX3Np emUsIHRtcF9vZmY7CkBAIC0xMDM3LDcgKzEwNDMsMTUgQEAgYWdhaW46CiAJCSAqIHVuYWxpZ25l ZCBidWZmZXIgc2l6ZS4KIAkJICovCiAJCW10eF9sb2NrKCZucC0+bl9tdHgpOwotCQlpZiAodWlv LT51aW9fb2Zmc2V0ID09IG5wLT5uX3NpemUgJiYgbikgeworCQlpZiAoKG5wLT5uX2ZsYWcgJiBO SEFTQkVFTkxPQ0tFRCkgPT0gMCAmJgorCQkgICAgbmNsX29sZG5vbmNvbnRpZ3dyaXRpbmcgPT0g MCkKKwkJCW5vbmNvbnRpZ193cml0ZSA9IDE7CisJCWVsc2UKKwkJCW5vbmNvbnRpZ193cml0ZSA9 IDA7CisJCWlmICgodWlvLT51aW9fb2Zmc2V0ID09IG5wLT5uX3NpemUgfHwKKwkJICAgIChub25j b250aWdfd3JpdGUgIT0gMCAmJgorCQkgICAgbGJuID09IChucC0+bl9zaXplIC8gYmlvc2l6ZSkg JiYKKwkJICAgIHVpby0+dWlvX29mZnNldCArIG4gPiBucC0+bl9zaXplKSkgJiYgbikgewogCQkJ bXR4X3VubG9jaygmbnAtPm5fbXR4KTsKIAkJCS8qCiAJCQkgKiBHZXQgdGhlIGJ1ZmZlciAoaW4g aXRzIHByZS1hcHBlbmQgc3RhdGUgdG8gbWFpbnRhaW4KQEAgLTEwNDUsOCArMTA1OSw4IEBAIGFn YWluOgogCQkJICogbmZzbm9kZSBhZnRlciB3ZSBoYXZlIGxvY2tlZCB0aGUgYnVmZmVyIHRvIHBy ZXZlbnQKIAkJCSAqIHJlYWRlcnMgZnJvbSByZWFkaW5nIGdhcmJhZ2UuCiAJCQkgKi8KLQkJCWJj b3VudCA9IG9uOwotCQkJYnAgPSBuZnNfZ2V0Y2FjaGVibGsodnAsIGxibiwgYmNvdW50LCB0ZCk7 CisJCQlvYmNvdW50ID0gbnAtPm5fc2l6ZSAtIChsYm4gKiBiaW9zaXplKTsKKwkJCWJwID0gbmZz X2dldGNhY2hlYmxrKHZwLCBsYm4sIG9iY291bnQsIHRkKTsKIAogCQkJaWYgKGJwICE9IE5VTEwp IHsKIAkJCQlsb25nIHNhdmU7CkBAIC0xMDU4LDkgKzEwNzIsMTIgQEAgYWdhaW46CiAJCQkJbXR4 X3VubG9jaygmbnAtPm5fbXR4KTsKIAogCQkJCXNhdmUgPSBicC0+Yl9mbGFncyAmIEJfQ0FDSEU7 Ci0JCQkJYmNvdW50ICs9IG47CisJCQkJYmNvdW50ID0gb24gKyBuOwogCQkJCWFsbG9jYnVmKGJw LCBiY291bnQpOwogCQkJCWJwLT5iX2ZsYWdzIHw9IHNhdmU7CisJCQkJaWYgKG5vbmNvbnRpZ193 cml0ZSAhPSAwICYmIGJjb3VudCA+IG9iY291bnQpCisJCQkJCXZmc19iaW9fYnplcm9fYnVmKGJw LCBvYmNvdW50LCBiY291bnQgLQorCQkJCQkgICAgb2Jjb3VudCk7CiAJCQl9CiAJCX0gZWxzZSB7 CiAJCQkvKgpAQCAtMTE1OSwxOSArMTE3NiwyMyBAQCBhZ2FpbjoKIAkJICogYXJlYSwganVzdCB1 cGRhdGUgdGhlIGJfZGlydHlvZmYgYW5kIGJfZGlydHllbmQsCiAJCSAqIG90aGVyd2lzZSBmb3Jj ZSBhIHdyaXRlIHJwYyBvZiB0aGUgb2xkIGRpcnR5IGFyZWEuCiAJCSAqCisJCSAqIElmIHRoZXJl IGhhcyBiZWVuIGEgZmlsZSBsb2NrIGFwcGxpZWQgdG8gdGhpcyBmaWxlCisJCSAqIG9yIHZmcy5u ZnMub2xkX25vbmNvbnRpZ193cml0aW5nIGlzIHNldCwgZG8gdGhlIGZvbGxvd2luZzoKIAkJICog V2hpbGUgaXQgaXMgcG9zc2libGUgdG8gbWVyZ2UgZGlzY29udGlndW91cyB3cml0ZXMgZHVlIHRv CiAJCSAqIG91ciBoYXZpbmcgYSBCX0NBQ0hFIGJ1ZmZlciAoIGFuZCB0aHVzIHZhbGlkIHJlYWQg ZGF0YQogCQkgKiBmb3IgdGhlIGhvbGUpLCB3ZSBkb24ndCBiZWNhdXNlIGl0IGNvdWxkIGxlYWQg dG8KIAkJICogc2lnbmlmaWNhbnQgY2FjaGUgY29oZXJlbmN5IHByb2JsZW1zIHdpdGggbXVsdGlw bGUgY2xpZW50cywKIAkJICogZXNwZWNpYWxseSBpZiBsb2NraW5nIGlzIGltcGxlbWVudGVkIGxh dGVyIG9uLgogCQkgKgotCQkgKiBBcyBhbiBvcHRpbWl6YXRpb24gd2UgY291bGQgdGhlb3JldGlj YWxseSBtYWludGFpbgotCQkgKiBhIGxpbmtlZCBsaXN0IG9mIGRpc2NvbnRpbnVvdXMgYXJlYXMs IGJ1dCB3ZSB3b3VsZCBzdGlsbAotCQkgKiBoYXZlIHRvIGNvbW1pdCB0aGVtIHNlcGFyYXRlbHkg c28gdGhlcmUgaXNuJ3QgbXVjaAotCQkgKiBhZHZhbnRhZ2UgdG8gaXQgZXhjZXB0IHBlcmhhcHMg YSBiaXQgb2YgYXN5bmNocm9uaXphdGlvbi4KKwkJICogSWYgdmZzLm5mcy5vbGRfbm9uY29udGln X3dyaXRpbmcgaXMgbm90IHNldCBhbmQgdGhlcmUgaGFzCisJCSAqIG5vdCBiZWVuIGZpbGUgbG9j a2luZyBkb25lIG9uIHRoaXMgZmlsZToKKwkJICogUmVsYXggY29oZXJlbmN5IGEgYml0IGZvciB0 aGUgc2FrZSBvZiBwZXJmb3JtYW5jZSBhbmQKKwkJICogZXhwYW5kIHRoZSBjdXJyZW50IGRpcnR5 IHJlZ2lvbiB0byBjb250YWluIHRoZSBuZXcKKwkJICogd3JpdGUgZXZlbiBpZiBpdCBtZWFucyB3 ZSBtYXJrIHNvbWUgbm9uLWRpcnR5IGRhdGEgYXMKKwkJICogZGlydHkuCiAJCSAqLwogCi0JCWlm IChicC0+Yl9kaXJ0eWVuZCA+IDAgJiYKKwkJaWYgKG5vbmNvbnRpZ193cml0ZSA9PSAwICYmIGJw LT5iX2RpcnR5ZW5kID4gMCAmJgogCQkgICAgKG9uID4gYnAtPmJfZGlydHllbmQgfHwgKG9uICsg bikgPCBicC0+Yl9kaXJ0eW9mZikpIHsKIAkJCWlmIChid3JpdGUoYnApID09IEVJTlRSKSB7CiAJ CQkJZXJyb3IgPSBFSU5UUjsKLS0tIGZzL25mc2NsaWVudC9uZnNub2RlLmgub3JpZwkyMDEzLTEx LTE5IDE4OjE3OjM3LjAwMDAwMDAwMCAtMDUwMAorKysgZnMvbmZzY2xpZW50L25mc25vZGUuaAky MDEzLTExLTI1IDIxOjI5OjU4LjAwMDAwMDAwMCAtMDUwMApAQCAtMTU3LDYgKzE1Nyw3IEBAIHN0 cnVjdCBuZnNub2RlIHsKICNkZWZpbmUJTkxPQ0tXQU5UCTB4MDAwMTAwMDAgIC8qIFdhbnQgdGhl IHNsZWVwIGxvY2sgKi8KICNkZWZpbmUJTk5PTEFZT1VUCTB4MDAwMjAwMDAgIC8qIENhbid0IGdl dCBhIGxheW91dCBmb3IgdGhpcyBmaWxlICovCiAjZGVmaW5lCU5XUklURU9QRU5FRAkweDAwMDQw MDAwICAvKiBIYXMgYmVlbiBvcGVuZWQgZm9yIHdyaXRpbmcgKi8KKyNkZWZpbmUJTkhBU0JFRU5M T0NLRUQJMHgwMDA4MDAwMCAgLyogSGFzIGJlZW4gZmlsZSBsb2NrZWQuICovCiAKIC8qCiAgKiBD b252ZXJ0IGJldHdlZW4gbmZzbm9kZSBwb2ludGVycyBhbmQgdm5vZGUgcG9pbnRlcnMKLS0tIGZz L25mc2NsaWVudC9uZnNfY2x2bm9wcy5jLm9yaWcJMjAxMy0xMS0xOSAxODoxOTo0Mi4wMDAwMDAw MDAgLTA1MDAKKysrIGZzL25mc2NsaWVudC9uZnNfY2x2bm9wcy5jCTIwMTMtMTEtMjUgMjE6MzI6 NDcuMDAwMDAwMDAwIC0wNTAwCkBAIC0zMDc5LDYgKzMwNzksMTAgQEAgbmZzX2FkdmxvY2soc3Ry dWN0IHZvcF9hZHZsb2NrX2FyZ3MgKmFwKQogCQkJCQlucC0+bl9jaGFuZ2UgPSB2YS52YV9maWxl cmV2OwogCQkJCX0KIAkJCX0KKwkJCS8qIE1hcmsgdGhhdCBhIGZpbGUgbG9jayBoYXMgYmVlbiBh Y3F1aXJlZC4gKi8KKwkJCW10eF9sb2NrKCZucC0+bl9tdHgpOworCQkJbnAtPm5fZmxhZyB8PSBO SEFTQkVFTkxPQ0tFRDsKKwkJCW10eF91bmxvY2soJm5wLT5uX210eCk7CiAJCX0KIAkJTkZTVk9Q VU5MT0NLKHZwLCAwKTsKIAkJcmV0dXJuICgwKTsKQEAgLTMwOTgsNiArMzEwMiwxMiBAQCBuZnNf YWR2bG9jayhzdHJ1Y3Qgdm9wX2FkdmxvY2tfYXJncyAqYXApCiAJCQkJZXJyb3IgPSBFTk9MQ0s7 CiAJCQl9CiAJCX0KKwkJaWYgKGVycm9yID09IDAgJiYgYXAtPmFfb3AgPT0gRl9TRVRMSykgewor CQkJLyogTWFyayB0aGF0IGEgZmlsZSBsb2NrIGhhcyBiZWVuIGFjcXVpcmVkLiAqLworCQkJbXR4 X2xvY2soJm5wLT5uX210eCk7CisJCQlucC0+bl9mbGFnIHw9IE5IQVNCRUVOTE9DS0VEOworCQkJ bXR4X3VubG9jaygmbnAtPm5fbXR4KTsKKwkJfQogCX0KIAlyZXR1cm4gKGVycm9yKTsKIH0K ------=_Part_21452438_736821265.1385509273446-- From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 08:51:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 38B3D201 for ; Wed, 27 Nov 2013 08:51:11 +0000 (UTC) Received: from pi.nmdps.net (pi.nmdps.net [IPv6:2a01:be00:10:201:0:80:0:1]) by mx1.freebsd.org (Postfix) with ESMTP id F3537213B for ; Wed, 27 Nov 2013 08:51:10 +0000 (UTC) Received: from pi.nmdps.net (pi.nmdps.net [109.61.102.5]) (Authenticated sender: krichy@cflinux.hu) by pi.nmdps.net (Postfix) with ESMTPSA id BFF6E10AF for ; Wed, 27 Nov 2013 09:51:08 +0100 (CET) Date: Wed, 27 Nov 2013 09:51:06 +0100 (CET) From: Richard Kojedzinszky X-X-Sender: krichy@pi.nmdps.net To: freebsd-fs@freebsd.org Subject: ssd for zfs Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 08:51:11 -0000 Dear fs developers, Probably this is not the best list to report my issue, but please forward it to where it should get. I bought an SSD for my ZFS filesystem to use it as a ZIL. I've tested it under linux, and found that it can handle around 1400 random synchronized write IOPS. Then I placed it into my freebsd 9.2 box, and after attaching it as a ZIL, my zpool only performs 100 (!) write iops. I've attached it to an AHCI controller and to an LSI 1068 controller, on both it behaves the same. So I expect that something in the scsi layer is different, FreeBSD is handling this device slower, but actually it can handle the 1400 iops as tested under linux. Please give some advice where to go, how to debug, and how to improve FreeBSD's performance with this drive. The device is: # camcontrol identify ada3 pass4: ATA-8 SATA 2.x device pass4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes) protocol ATA/ATAPI-8 SATA 2.x device model STEC MACH16 M16SD2S-50UI firmware revision 00000299 serial number STM0001680E8 WWN 5000a7203006f8e5 media serial number STEC MACH16 M16SD2S-50UI STM00 cylinders 16383 heads 15 sectors/track 63 sector size logical 512, physical 512, offset 0 LBA supported 97696368 sectors LBA48 supported 97696368 sectors PIO supported PIO4 DMA supported WDMA2 UDMA6 media RPM non-rotating Feature Support Enabled Value Vendor read ahead yes yes write cache yes yes flush cache yes yes overlap no Tagged Command Queuing (TCQ) no no Native Command Queuing (NCQ) yes 32 tags SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management no no automatic acoustic management no no media status notification no no power-up in Standby yes no write-read-verify no no unload no yes free-fall no no Data Set Management (DSM/TRIM) yes DSM - max 512byte blocks yes 8 DSM - deterministic read yes any value Host Protected Area (HPA) yes no 97696368/97696368 HPA - Security no Regards, Kojedzinszky Richard From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 08:52:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0C2CA2AB; Wed, 27 Nov 2013 08:52:52 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9E7F7214B; Wed, 27 Nov 2013 08:52:51 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAR8qjvW027494; Wed, 27 Nov 2013 10:52:45 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAR8qjvW027494 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAR8qjEj027493; Wed, 27 Nov 2013 10:52:45 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 27 Nov 2013 10:52:45 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: RFC: NFS client patch to reduce sychronous writes Message-ID: <20131127085245.GW59496@kib.kiev.ua> References: <1139579526.21452374.1385509250511.JavaMail.root@uoguelph.ca> <731168702.21452440.1385509273449.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Wmw2osJQsLCWSWY8" Content-Disposition: inline In-Reply-To: <731168702.21452440.1385509273449.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 08:52:52 -0000 --Wmw2osJQsLCWSWY8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Nov 26, 2013 at 06:41:13PM -0500, Rick Macklem wrote: > Hi, >=20 > The current NFS client does a synchronous write > to the server when a non-contiguous write to the > same buffer cache block occurs. This is done because > there is a single dirty byte range recorded in the > buf structure. This results in a lot of synchronous > writes for software builds (I believe it is the loader > that loves to write small non-contiguous chunks to > its output file). Some users disable synchronous > writing on the server to improve performance, but > this puts them at risk of data loss when the server > crashes. >=20 > Long ago jhb@ emailed me a small patch that avoided > the synchronous writes by simply making the dirty byte > range a superset of the bytes written. The problem > with doing this is that for a rare (possibly non-existent) > application that writes non-overlapping byte ranges > to the same file from multiple clients concurrently, > some of these writes might get lost by stale data in > the superset of the byte range being written back to > the server. (Crappy, run on sentence, but hopefully > it makes sense;-) >=20 > I created a patch that maintained a list of dirty byte > ranges. It was complicated and I found that the list > often had to be > 100 entries to avoid the synchronous > writes. >=20 > So, I think his solution is preferable, although I've > added a couple of tweaks: > - The synchronous writes (old/current algorithm) is still > used if there has been file locking done on the file. > (I think any app. that writes a file from multiple clients > will/should use file locking.) > - The synchronous writes (old/current algorithm) is used > if a sysctl is set. This will avoid breakage for any app. > (if there is one) that writes a file from multiple clients > without doing file locking. My feeling is that global sysctl is too coarse granularity for the control. IMO the setting should be per-mount. It is not unreasonable to have shared files on one export, and perform private client operations on another. E.g. mailboxes and /usr/obj; I understand that mailboxes case should be covered by the advlock heuristic. But note that if advlock is acquired after some writes, the heuristic breaks. Also, since the feature might cause very hard to diagnose corruption, I think a facility to detect that dirty range coalescing was done would be helpful. It could be a counter printed by mount -v, or even a warning printed once per mount. >=20 > For testing on my very slow single core hardware, I see about > a 10% improvement in kernel build times, but with fewer I/O > RPCs: > Read RPCs Write RPCs > old/current 50K 122K > patched 39K 40K > --> it reduced the Read RPC count by about 20% and cut the > Write RPC count to 1/3rd. > I think jhb@ saw pretty good performance results with his patch. >=20 > Anyhow, the patch is attached and can also be found here: > http://people.freebsd.org/~rmacklem/noncontig-write.patch >=20 > I'd like to invite folks to comment/review/test this patch, > since I think it is ready for head/current. >=20 > Thanks, rick > ps: Kostik, maybe you could look at it. In particular, I am > wondering if I zero'd out the buffer the correct way, via > vfs_bio_bzero_buf()? Both vfs_bio_bzero_buf() and plain bzero() would work for NFS client, since it does not use unmapped buffers. Use of vfs_bio_bzero_buf() is preferred since it makes one less place to find if converting NFS to unmapped i/o. In fact, I do not understand why the zeroing is needed. Could you, please, comment on it ? More, the zeroing seems to overwrite part of the previously valid content of the buffer, unless I mis-read the patch. This breaks the mmap/file consistency, since buffer pages may be mapped by usermode process, and it could observe the 'out of thin air' zeroes before the writing thread progressed to fill the zeroed part with new content. --Wmw2osJQsLCWSWY8 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSlbLcAAoJEJDCuSvBvK1Bp0QQAKhA3TwLJTVpN4PjBac7ZbOX SEKBXMTdIfaoEwiRHmmOFdfz6wryFk3VKan5a0tzsvdpkGwPiu7Yy0pYhH+9PIea WKqHCjJVUJdf/a1f72Lw/T36/1lJ+8/73D1rbAzEH0i6mys0lFrQiDxmojRpTS3l GCtv+p7KKdRyw235/u0zndn/ChtYHL0ehu6oRE5mU+6Pssbu5NIfAdm4uc6BWXeu iyeYPpzkNUBWokA4OJ2YMpC44EvxEUVjRBMUy9DIjxyi1Ab4ohm6eTKp7Q/eguuy BUn8B2TywWY5ZIFce/ocDuWgm9vuSvG4pPmMr3XcisdXyVwF5FOD7Qogz71nO7AV AnpHCe0ytEHY04Re8fjHTeIAkWg78E767D3vIJOHDJUK3Hspfh0NTchFk7s8+1A0 ShYKawm/fXPr4kAZx6OuprgefxyaUc3VoLhNlpzIkryDnD4dWcVlBFM7N568cGSn H5iubTBjm8ID+aoE27Hqda+/TzYeHWwiFmAnW/qFdg3ywQ/wTcriPpXT5j1DCKPA eCLZJFSXGbGYV+OkYSfSIFjdcnmLdgoRxotMUkG9mMGJOUBfJ6Yw4XOKpbd+nmli RvjwBs4nieWGQJKpaxH/Fue9iDzxSby2H9Cesc6rr+ksNQ7p5xiXdI46wEpuPrJW lrjtp0lyTMTDjNWUui9i =T4xD -----END PGP SIGNATURE----- --Wmw2osJQsLCWSWY8-- From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 11:58:55 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 970AE1E8; Wed, 27 Nov 2013 11:58:55 +0000 (UTC) Received: from www2140.sakura.ne.jp (www2140.sakura.ne.jp [IPv6:2403:3a00:101:15:182:48:49:50]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 33BDF2B08; Wed, 27 Nov 2013 11:58:54 +0000 (UTC) Received: from qbear.ru (OrbitaTelecom-Net-91-206-14-Ip-12.kurkino.net.ru [91.206.14.12]) (authenticated bits=0) by www2140.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id rARBwlFt028394; Wed, 27 Nov 2013 20:58:48 +0900 (JST) (envelope-from lino@qbear.ru) Received: from qbear.ru (IOM-40-27 [{#CLIENT_IP}]) by qbear.ru (mailer) with SMTP id 8qZOu71jelqL for ; Wed, 27 Nov 2013 16:00:32 +0400 Date: Wed, 27 Nov 2013 16:00:32 +0400 From: "Alessandra" To: Subject: Messaggio dal forum 188250 Message-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 11:58:55 -0000 Ciao, freebsd-fs. Hai 1 messaggio non letto, si puo vedere in http://next-life.ne.jp/Forma/Conferma.zip?838554323gK4iiIt8FhK2sZKJ0ldGQ From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 12:21:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2E443A2A for ; Wed, 27 Nov 2013 12:21:01 +0000 (UTC) Received: from mail-yh0-x232.google.com (mail-yh0-x232.google.com [IPv6:2607:f8b0:4002:c01::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E95CF2C6A for ; Wed, 27 Nov 2013 12:21:00 +0000 (UTC) Received: by mail-yh0-f50.google.com with SMTP id b6so4851504yha.23 for ; Wed, 27 Nov 2013 04:21:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:user-agent:in-reply-to:references :mime-version:content-transfer-encoding:content-type; bh=NRjOzbCC0nn4SWn7D3FE/ftFRHgCbUHnb3mesOlKdy0=; b=mE+gWqCcY1qrd8j1+NO2ZZK/KwSrlW92MvcMRR/AnrSwTsRWEPsVQb1J6M+kaqTry8 YLWI43fLty9oXSZjWqD75t1k7Euna8Yvia0Sg79aeLhwhdi2xJPEILL5NDTkIUwjeXYR mNpbSjzPnUryy4MryEMe+NvUFi2WUI8+RFW1myutXxD7Ma8Y5gLmpMbO9kjW3kGeClhv he6JEwcRqtRliorcfvVSSrxM77TMG2q2iXCbqtmDxqHmraqhsw81bka6btszeawrix5w jKHkVB3pY2SrOWCRbvRYexC9x+9K8p3eJYqZIuwo9LUjGTwbGwssv8+1mhOozqAvikVO LUrA== X-Received: by 10.236.190.199 with SMTP id e47mr35437660yhn.26.1385554860171; Wed, 27 Nov 2013 04:21:00 -0800 (PST) Received: from blackbeast.local (75-120-65-175.dyn.centurytel.net. [75.120.65.175]) by mx.google.com with ESMTPSA id 48sm89328736yhq.11.2013.11.27.04.20.59 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Nov 2013 04:20:59 -0800 (PST) From: Chuck Burns To: freebsd-fs@freebsd.org Subject: Re: ssd for zfs Date: Wed, 27 Nov 2013 06:20:58 -0600 Message-ID: <1464424.7QQFLu4g0t@blackbeast.local> User-Agent: KMail/4.10.5 (FreeBSD/10.0-BETA3; KDE/4.10.5; amd64; ; ) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 12:21:01 -0000 On Wednesday, November 27, 2013 9:51:06 AM Richard Kojedzinszky wrote: > Dear fs developers, > > Probably this is not the best list to report my issue, but please forward > it to where it should get. > > I bought an SSD for my ZFS filesystem to use it as a ZIL. I've tested it > under linux, and found that it can handle around 1400 random synchronized > write IOPS. Then I placed it into my freebsd 9.2 box, and after attaching > it as a ZIL, my zpool only performs 100 (!) write iops. I've attached it > to an AHCI controller and to an LSI 1068 controller, on both it behaves > the same. So I expect that something in the scsi layer is different, > FreeBSD is handling this device slower, but actually it can handle the > 1400 iops as tested under linux. > > Please give some advice where to go, how to debug, and how to improve > FreeBSD's performance with this drive. > > The device is: > # camcontrol identify ada3 > pass4: ATA-8 SATA 2.x device > pass4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes) > > protocol ATA/ATAPI-8 SATA 2.x > device model STEC MACH16 M16SD2S-50UI > firmware revision 00000299 > serial number STM0001680E8 > WWN 5000a7203006f8e5 > media serial number STEC MACH16 M16SD2S-50UI STM00 > cylinders 16383 > heads 15 > sectors/track 63 > sector size logical 512, physical 512, offset 0 > LBA supported 97696368 sectors > LBA48 supported 97696368 sectors > PIO supported PIO4 > DMA supported WDMA2 UDMA6 > media RPM non-rotating > > Feature Support Enabled Value Vendor > read ahead yes yes > write cache yes yes > flush cache yes yes > overlap no > Tagged Command Queuing (TCQ) no no > Native Command Queuing (NCQ) yes 32 tags > SMART yes yes > microcode download yes yes > security yes no > power management yes yes > advanced power management no no > automatic acoustic management no no > media status notification no no > power-up in Standby yes no > write-read-verify no no > unload no yes > free-fall no no > Data Set Management (DSM/TRIM) yes > DSM - max 512byte blocks yes 8 > DSM - deterministic read yes any value > Host Protected Area (HPA) yes no 97696368/97696368 > HPA - Security no > > Regards, > > Kojedzinszky Richard > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" So.. You tested the speed of the ZIL device, and it's fast.. But you're complaining about the speed of the zpool? We can't give you any information, because a ZIL is not the only thing in a zpool. Are the rest of your drives SSDs, or are they mechanical drives? Are they AHCI? Or SAS? Your actual storage devices are likely what is slow here. break19 From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 12:32:18 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D16A6F71 for ; Wed, 27 Nov 2013 12:32:18 +0000 (UTC) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 508D72CF9 for ; Wed, 27 Nov 2013 12:32:17 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id rARCHeCm070607 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 27 Nov 2013 14:17:41 +0200 (EET) (envelope-from daniel@digsys.bg) Message-ID: <5295E2E4.8050506@digsys.bg> Date: Wed, 27 Nov 2013 14:17:40 +0200 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Performance difference between UFS and ZFS with NFS References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca> <9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com> <5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com> <18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 12:32:18 -0000 On 25.11.13 21:24, Eric Browning wrote: > Steven, > > I've tried to 4K align these SSDs with gnop but they are currently ashift > 9. Pool layout is just 4 drives in a zfs stripe. I've also tried raidz1 > with no noticeable performance impacts other than a loss of space for > parity. > ashift=9 with most (all?) SSDs is a big no-no! You really should make that pool ashift=12 (at least) and have it 4k aligned (partition). Especially for writes, an properly aligned 4 drive SSD stripe should be way faster. Here is what I get from ashift=12 raidz1 4 SSD drive pool # dd if=/dev/zero of=zero bs=1m count=1k 1024+0 records in 1024+0 records out 1073741824 bytes transferred in 1.817027 secs (590933382 bytes/sec) Unfortunately, your only option is dump / recreate pool / restore. (zfs send/receive is an option too -- especially if you have another set of drives) Daniel From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 12:37:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 45636195 for ; Wed, 27 Nov 2013 12:37:01 +0000 (UTC) Received: from mail-la0-x234.google.com (mail-la0-x234.google.com [IPv6:2a00:1450:4010:c03::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CABD22D28 for ; Wed, 27 Nov 2013 12:37:00 +0000 (UTC) Received: by mail-la0-f52.google.com with SMTP id y1so3259367lam.25 for ; Wed, 27 Nov 2013 04:36:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=18JCssRPlfcm/PTZ20B9Pza4Uz5Kmzz9jeHdbCPmhJQ=; b=MeZRdHtq4KfJ6LfFubh3tJCjo3yIqrzXxlnA/0/glNgR+o2sm3Yj8K5qzvwofU2pLp l8ALsj0/bmaB0Z+RQeGBfDRp0XIp7pbJcUfx0cclKQaTYzGSuU5yKe0953srJMMZ4wXE f/dY/qaQo2kfoF7d7TRxd2W6MIkvCfUTfGpFQKDghPJ/97y/iCv3vm4Y0tkDM+n17Ae2 hbMahJGMs9HZzRJvXRFLoutYpa2s8AzByMYDXYHvMm3FrLjW9TaPAhRVaLN3znJpfjDT 68nXUwt8pNukrfQqvJ2+LVBBdrw/gjiHdb5W2gTn0cja+0GknG8taeT/UcK9B78cyjCA EeeA== MIME-Version: 1.0 X-Received: by 10.112.172.137 with SMTP id bc9mr28299988lbc.21.1385555818718; Wed, 27 Nov 2013 04:36:58 -0800 (PST) Received: by 10.112.133.69 with HTTP; Wed, 27 Nov 2013 04:36:58 -0800 (PST) In-Reply-To: References: Date: Wed, 27 Nov 2013 12:36:58 +0000 Message-ID: Subject: Re: ssd for zfs From: Tom Evans To: Richard Kojedzinszky Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 12:37:01 -0000 On Wed, Nov 27, 2013 at 8:51 AM, Richard Kojedzinszky wrote: > Dear fs developers, > > Probably this is not the best list to report my issue, but please forward it > to where it should get. > > I bought an SSD for my ZFS filesystem to use it as a ZIL. I've tested it > under linux, and found that it can handle around 1400 random synchronized > write IOPS. Then I placed it into my freebsd 9.2 box, and after attaching it > as a ZIL, my zpool only performs 100 (!) write iops. I've attached it to an > AHCI controller and to an LSI 1068 controller, on both it behaves the same. > So I expect that something in the scsi layer is different, FreeBSD is > handling this device slower, but actually it can handle the 1400 iops as > tested under linux. > > Please give some advice where to go, how to debug, and how to improve > FreeBSD's performance with this drive. > The ZIL is only used for synchronous writes. The majority of writes are asynchronous, and the ZIL is not used at all. Plus, a ZIL can only increase iops by bundling writes - if your underlying pool is write saturated already, then a ZIL can't help - any data written to the ZIL has to end up on the pool. Test the SSD by itself under FreeBSD to rule out FreeBSD not working correctly on the SSD (I doubt this though). Cheers Tom From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 13:07:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 61F5597C for ; Wed, 27 Nov 2013 13:07:05 +0000 (UTC) Received: from pi.nmdps.net (pi.nmdps.net [IPv6:2a01:be00:10:201:0:80:0:1]) by mx1.freebsd.org (Postfix) with ESMTP id 250FF2EC7 for ; Wed, 27 Nov 2013 13:07:05 +0000 (UTC) Received: from pi.nmdps.net (pi.nmdps.net [109.61.102.5]) (Authenticated sender: krichy@cflinux.hu) by pi.nmdps.net (Postfix) with ESMTPSA id 3BC831255; Wed, 27 Nov 2013 14:07:04 +0100 (CET) Date: Wed, 27 Nov 2013 14:07:01 +0100 (CET) From: Richard Kojedzinszky X-X-Sender: krichy@pi.nmdps.net To: Tom Evans Subject: Re: ssd for zfs In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 13:07:05 -0000 Dear FS devs, After some investigation, it turned out that when I turn write-cache off under linux, the performance drops to 100 on that OS also. But when enabled, 1400 IOPS (synchronous) can be achieved. So I would like to see the same on FreeBSD as well. Using camcontrol shows that the write cache is enabled, but I may assume that something around this is causing the performance degradation. But unfortunately I cannot step forward right now. Regards, Kojedzinszky Richard On Wed, 27 Nov 2013, Tom Evans wrote: > On Wed, Nov 27, 2013 at 8:51 AM, Richard Kojedzinszky wrote: >> Dear fs developers, >> >> Probably this is not the best list to report my issue, but please forward it >> to where it should get. >> >> I bought an SSD for my ZFS filesystem to use it as a ZIL. I've tested it >> under linux, and found that it can handle around 1400 random synchronized >> write IOPS. Then I placed it into my freebsd 9.2 box, and after attaching it >> as a ZIL, my zpool only performs 100 (!) write iops. I've attached it to an >> AHCI controller and to an LSI 1068 controller, on both it behaves the same. >> So I expect that something in the scsi layer is different, FreeBSD is >> handling this device slower, but actually it can handle the 1400 iops as >> tested under linux. >> >> Please give some advice where to go, how to debug, and how to improve >> FreeBSD's performance with this drive. >> > > The ZIL is only used for synchronous writes. The majority of writes > are asynchronous, and the ZIL is not used at all. Plus, a ZIL can only > increase iops by bundling writes - if your underlying pool is write > saturated already, then a ZIL can't help - any data written to the ZIL > has to end up on the pool. > > Test the SSD by itself under FreeBSD to rule out FreeBSD not working > correctly on the SSD (I doubt this though). > > Cheers > > Tom > From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 13:09:48 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7C558CF8 for ; Wed, 27 Nov 2013 13:09:48 +0000 (UTC) Received: from mail-pb0-f46.google.com (mail-pb0-f46.google.com [209.85.160.46]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 423342F1B for ; Wed, 27 Nov 2013 13:09:48 +0000 (UTC) Received: by mail-pb0-f46.google.com with SMTP id md12so10421982pbc.33 for ; Wed, 27 Nov 2013 05:09:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=C6ezm3x3xMHBG7GVLPAyki3TcI9zOp6trMV67Bs/9kg=; b=Tv07PeOsNkydpJuyX7K+ZKHbUW9j6MQ4ShTO4Xabo9rdWXA6qLt7ACB2934yVLKvG8 FGoS7nuir8QWInzEBvX0VqNE4rSqx7rNSzlT23xbZQMrWvMtaN/07r1z1nCKoLlESooS mWUeRveHeoQBJhOjZywWAxM9tEeM74+RPBHfXCgOebb34ibgDrgqbKrkY+9RuQI5x2QD zZA/JTtcc8Y+qPbz9V+Mgy8HgfxSIEbKolVE9JNjjA/ie1IVP9u11L+NlAk10oEeIrL5 PlM9LhEyLyJJfpSd2gVNMGhzK1mWCkjyymMcgRHso3PwUbktyzsyU5BBGJNiJv5+WAK0 CyrQ== X-Gm-Message-State: ALoCoQlLmNBL1OLOh5ioAaBIMQhLskwXMy3i5F01DMaOWUErxgZR0YtITL8yfznvc2U501l26ARd MIME-Version: 1.0 X-Received: by 10.68.233.135 with SMTP id tw7mr4988198pbc.112.1385557787275; Wed, 27 Nov 2013 05:09:47 -0800 (PST) Received: by 10.70.102.133 with HTTP; Wed, 27 Nov 2013 05:09:47 -0800 (PST) In-Reply-To: <5295E2E4.8050506@digsys.bg> References: <2103733116.16923158.1384866769683.JavaMail.root@uoguelph.ca> <9F76D61C-EFEB-44B3-9717-D0795789832D@gmail.com> <5969250F-0987-4304-BB95-52C7BAE8D84D@gmail.com> <18391B9C-2FC4-427B-A4B6-1739B3C17498@gmail.com> <5295E2E4.8050506@digsys.bg> Date: Wed, 27 Nov 2013 06:09:47 -0700 Message-ID: Subject: Re: Performance difference between UFS and ZFS with NFS From: Eric Browning To: Daniel Kalchev Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 13:09:48 -0000 Daniel, I did that on my student server and it was still slow. I was using ZFS v28 at the time with freebsd 9.1 what are you currently using to get these results? On Wed, Nov 27, 2013 at 5:17 AM, Daniel Kalchev wrote: > > On 25.11.13 21:24, Eric Browning wrote: > >> Steven, >> >> I've tried to 4K align these SSDs with gnop but they are currently ashift >> 9. Pool layout is just 4 drives in a zfs stripe. I've also tried raidz1 >> with no noticeable performance impacts other than a loss of space for >> parity. >> >> > ashift=9 with most (all?) SSDs is a big no-no! You really should make that > pool ashift=12 (at least) and have it 4k aligned (partition). > > Especially for writes, an properly aligned 4 drive SSD stripe should be > way faster. > > Here is what I get from ashift=12 raidz1 4 SSD drive pool > > # dd if=/dev/zero of=zero bs=1m count=1k > 1024+0 records in > 1024+0 records out > 1073741824 bytes transferred in 1.817027 secs (590933382 bytes/sec) > > Unfortunately, your only option is dump / recreate pool / restore. (zfs > send/receive is an option too -- especially if you have another set of > drives) > > > Daniel > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- Eric Browning Systems Administrator 801-984-7623 Skaggs Catholic Center Juan Diego Catholic High School Saint John the Baptist Middle Saint John the Baptist Elementary From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 14:14:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D2B7214F for ; Wed, 27 Nov 2013 14:14:24 +0000 (UTC) Received: from pi.nmdps.net (pi.nmdps.net [IPv6:2a01:be00:10:201:0:80:0:1]) by mx1.freebsd.org (Postfix) with ESMTP id 995C222CE for ; Wed, 27 Nov 2013 14:14:24 +0000 (UTC) Received: from pi.nmdps.net (localhost [127.0.0.1]) (Authenticated sender: krichy@cflinux.hu) by pi.nmdps.net (Postfix) with ESMTPSA id 92ACE10E6 for ; Wed, 27 Nov 2013 15:14:16 +0100 (CET) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Wed, 27 Nov 2013 15:14:14 +0100 From: krichy@cflinux.hu To: freebsd-fs@freebsd.org Subject: Fwd: Re: ssd for zfs Message-ID: <074193da2481bdd3ee18cb5de09bd28d@cflinux.hu> X-Sender: krichy@cflinux.hu User-Agent: Roundcube Webmail/0.9.5 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 14:14:24 -0000 -------- Eredeti üzenet -------- Tárgy: Re: ssd for zfs Dátum: 2013-11-27 14:07 Feladó: Richard Kojedzinszky Címzett: Tom Evans Másolat: FreeBSD FS Dear FS devs, After some investigation, it turned out that when I turn write-cache off under linux, the performance drops to 100 on that OS also. But when enabled, 1400 IOPS (synchronous) can be achieved. So I would like to see the same on FreeBSD as well. Using camcontrol shows that the write cache is enabled, but I may assume that something around this is causing the performance degradation. But unfortunately I cannot step forward right now. Regards, Kojedzinszky Richard On Wed, 27 Nov 2013, Tom Evans wrote: > On Wed, Nov 27, 2013 at 8:51 AM, Richard Kojedzinszky > wrote: >> Dear fs developers, >> >> Probably this is not the best list to report my issue, but please >> forward it >> to where it should get. >> >> I bought an SSD for my ZFS filesystem to use it as a ZIL. I've tested >> it >> under linux, and found that it can handle around 1400 random >> synchronized >> write IOPS. Then I placed it into my freebsd 9.2 box, and after >> attaching it >> as a ZIL, my zpool only performs 100 (!) write iops. I've attached it >> to an >> AHCI controller and to an LSI 1068 controller, on both it behaves the >> same. >> So I expect that something in the scsi layer is different, FreeBSD is >> handling this device slower, but actually it can handle the 1400 iops >> as >> tested under linux. >> >> Please give some advice where to go, how to debug, and how to improve >> FreeBSD's performance with this drive. >> > > The ZIL is only used for synchronous writes. The majority of writes > are asynchronous, and the ZIL is not used at all. Plus, a ZIL can only > increase iops by bundling writes - if your underlying pool is write > saturated already, then a ZIL can't help - any data written to the ZIL > has to end up on the pool. > > Test the SSD by itself under FreeBSD to rule out FreeBSD not working > correctly on the SSD (I doubt this though). > > Cheers > > Tom > From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 14:28:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 73EC6558; Wed, 27 Nov 2013 14:28:57 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id ED15A2397; Wed, 27 Nov 2013 14:28:56 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAJAAllKDaFve/2dsb2JhbABZFoMpU4J6tz2BM3SCJQEBBAEjBFIFFg4KERkCBFUGLodgBg2vLZEGF45ONAeCa4FIA4lChm+JE5Bjg0cegW4 X-IronPort-AV: E=Sophos;i="4.93,782,1378872000"; d="c'?scan'208";a="73912331" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 27 Nov 2013 09:28:55 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 50E9EB3F7D; Wed, 27 Nov 2013 09:28:55 -0500 (EST) Date: Wed, 27 Nov 2013 09:28:55 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <1694315515.21775878.1385562535320.JavaMail.root@uoguelph.ca> In-Reply-To: <20131127085245.GW59496@kib.kiev.ua> Subject: Re: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 14:28:57 -0000 Kostik wrote: > On Tue, Nov 26, 2013 at 06:41:13PM -0500, Rick Macklem wrote: > > Hi, > > > > The current NFS client does a synchronous write > > to the server when a non-contiguous write to the > > same buffer cache block occurs. This is done because > > there is a single dirty byte range recorded in the > > buf structure. This results in a lot of synchronous > > writes for software builds (I believe it is the loader > > that loves to write small non-contiguous chunks to > > its output file). Some users disable synchronous > > writing on the server to improve performance, but > > this puts them at risk of data loss when the server > > crashes. > > > > Long ago jhb@ emailed me a small patch that avoided > > the synchronous writes by simply making the dirty byte > > range a superset of the bytes written. The problem > > with doing this is that for a rare (possibly non-existent) > > application that writes non-overlapping byte ranges > > to the same file from multiple clients concurrently, > > some of these writes might get lost by stale data in > > the superset of the byte range being written back to > > the server. (Crappy, run on sentence, but hopefully > > it makes sense;-) > > > > I created a patch that maintained a list of dirty byte > > ranges. It was complicated and I found that the list > > often had to be > 100 entries to avoid the synchronous > > writes. > > > > So, I think his solution is preferable, although I've > > added a couple of tweaks: > > - The synchronous writes (old/current algorithm) is still > > used if there has been file locking done on the file. > > (I think any app. that writes a file from multiple clients > > will/should use file locking.) > > - The synchronous writes (old/current algorithm) is used > > if a sysctl is set. This will avoid breakage for any app. > > (if there is one) that writes a file from multiple clients > > without doing file locking. > My feeling is that global sysctl is too coarse granularity for the > control. IMO the setting should be per-mount. It is not unreasonable > to > have shared files on one export, and perform private client > operations > on another. E.g. mailboxes and /usr/obj; I understand that mailboxes > case should be covered by the advlock heuristic. But note that if > advlock is acquired after some writes, the heuristic breaks. > > Also, since the feature might cause very hard to diagnose corruption, > I > think a facility to detect that dirty range coalescing was done would > be > helpful. It could be a counter printed by mount -v, or even a warning > printed once per mount. > Ok, I can make it a mount option. The only reason I didn't do that is I was trying to avoid "yet another" mount option that many don't know when to use properly. I'll try and come up with a good explanation for the man page. I suppose a mount option makes more sense if it enables the new behaviour, which will avoid any change by default (and no POLA). > > > > For testing on my very slow single core hardware, I see about > > a 10% improvement in kernel build times, but with fewer I/O > > RPCs: > > Read RPCs Write RPCs > > old/current 50K 122K > > patched 39K 40K > > --> it reduced the Read RPC count by about 20% and cut the > > Write RPC count to 1/3rd. > > I think jhb@ saw pretty good performance results with his patch. > > > > Anyhow, the patch is attached and can also be found here: > > http://people.freebsd.org/~rmacklem/noncontig-write.patch > > > > I'd like to invite folks to comment/review/test this patch, > > since I think it is ready for head/current. > > > > Thanks, rick > > ps: Kostik, maybe you could look at it. In particular, I am > > wondering if I zero'd out the buffer the correct way, via > > vfs_bio_bzero_buf()? > Both vfs_bio_bzero_buf() and plain bzero() would work for NFS client, > since it does not use unmapped buffers. Use of vfs_bio_bzero_buf() > is preferred since it makes one less place to find if converting NFS > to unmapped i/o. > > In fact, I do not understand why the zeroing is needed. Could you, > please, > comment on it ? > Well, if an app. writes a file with holes in it, without the bzeroing the hole can end up with garbage in it instead of 0s. See the attached trivial test program I used. > More, the zeroing seems to overwrite part of the previously valid > content of the buffer, unless I mis-read the patch. This breaks > the mmap/file consistency, since buffer pages may be mapped by > usermode > process, and it could observe the 'out of thin air' zeroes before the > writing thread progressed to fill the zeroed part with new content. > Well obcount is set to the offset in the block of the current EOF (np->n_size - lbn * biosize) and the zeroing is from there to the new size of the buffer. My intent was to only zero out the chunk that is being "grown" by this write. If that part of the file is already mmap()'d and could have been written by an app. already, I can see a problem, but I don't know how it would be fixed? I'll try and come up with a test case for this. I'll admit I don't know when the file's size (n_size) gets updated when mmap()'d. rick From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 14:36:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 635E9674; Wed, 27 Nov 2013 14:36:44 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 04EC82400; Wed, 27 Nov 2013 14:36:43 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAEUDllKDaFve/2dsb2JhbABZFoMpU4J6tx2BM3SCJQEBBSMEUhsOCgICDRkCWQYuh2YNrymRIIEpjSU0B4JrgUgDiUKQApBjg0cegW4 X-IronPort-AV: E=Sophos;i="4.93,782,1378872000"; d="scan'208";a="72792677" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 27 Nov 2013 09:36:30 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id E5A38B414F; Wed, 27 Nov 2013 09:36:29 -0500 (EST) Date: Wed, 27 Nov 2013 09:36:29 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <1308930474.21788813.1385562989887.JavaMail.root@uoguelph.ca> In-Reply-To: <20131127085245.GW59496@kib.kiev.ua> Subject: Re: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 14:36:44 -0000 Kostik wrote: > On Tue, Nov 26, 2013 at 06:41:13PM -0500, Rick Macklem wrote: > > Hi, > > > > The current NFS client does a synchronous write > > to the server when a non-contiguous write to the > > same buffer cache block occurs. This is done because > > there is a single dirty byte range recorded in the > > buf structure. This results in a lot of synchronous > > writes for software builds (I believe it is the loader > > that loves to write small non-contiguous chunks to > > its output file). Some users disable synchronous > > writing on the server to improve performance, but > > this puts them at risk of data loss when the server > > crashes. > > > > Long ago jhb@ emailed me a small patch that avoided > > the synchronous writes by simply making the dirty byte > > range a superset of the bytes written. The problem > > with doing this is that for a rare (possibly non-existent) > > application that writes non-overlapping byte ranges > > to the same file from multiple clients concurrently, > > some of these writes might get lost by stale data in > > the superset of the byte range being written back to > > the server. (Crappy, run on sentence, but hopefully > > it makes sense;-) > > > > I created a patch that maintained a list of dirty byte > > ranges. It was complicated and I found that the list > > often had to be > 100 entries to avoid the synchronous > > writes. > > > > So, I think his solution is preferable, although I've > > added a couple of tweaks: > > - The synchronous writes (old/current algorithm) is still > > used if there has been file locking done on the file. > > (I think any app. that writes a file from multiple clients > > will/should use file locking.) > > - The synchronous writes (old/current algorithm) is used > > if a sysctl is set. This will avoid breakage for any app. > > (if there is one) that writes a file from multiple clients > > without doing file locking. > My feeling is that global sysctl is too coarse granularity for the > control. IMO the setting should be per-mount. It is not unreasonable > to > have shared files on one export, and perform private client > operations > on another. E.g. mailboxes and /usr/obj; I understand that mailboxes > case should be covered by the advlock heuristic. But note that if > advlock is acquired after some writes, the heuristic breaks. > > Also, since the feature might cause very hard to diagnose corruption, > I > think a facility to detect that dirty range coalescing was done would > be > helpful. It could be a counter printed by mount -v, or even a warning > printed once per mount. > > > > > For testing on my very slow single core hardware, I see about > > a 10% improvement in kernel build times, but with fewer I/O > > RPCs: > > Read RPCs Write RPCs > > old/current 50K 122K > > patched 39K 40K > > --> it reduced the Read RPC count by about 20% and cut the > > Write RPC count to 1/3rd. > > I think jhb@ saw pretty good performance results with his patch. > > > > Anyhow, the patch is attached and can also be found here: > > http://people.freebsd.org/~rmacklem/noncontig-write.patch > > > > I'd like to invite folks to comment/review/test this patch, > > since I think it is ready for head/current. > > > > Thanks, rick > > ps: Kostik, maybe you could look at it. In particular, I am > > wondering if I zero'd out the buffer the correct way, via > > vfs_bio_bzero_buf()? > Both vfs_bio_bzero_buf() and plain bzero() would work for NFS client, > since it does not use unmapped buffers. Use of vfs_bio_bzero_buf() > is preferred since it makes one less place to find if converting NFS > to unmapped i/o. > > In fact, I do not understand why the zeroing is needed. Could you, > please, > comment on it ? > > More, the zeroing seems to overwrite part of the previously valid > content of the buffer, unless I mis-read the patch. This breaks > the mmap/file consistency, since buffer pages may be mapped by > usermode > process, and it could observe the 'out of thin air' zeroes before the > writing thread progressed to fill the zeroed part with new content. > Oh, I suppose a way to fix the zeroing out of holes would be to zero out pages whenever they are newly allocated to the file, via either the mmap()'d or buffer cache route. This would result in a lot of zeroing to fix a rather obscure (and rare?) case, but I suppose it's just a little CPU overhead. rick From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 16:18:57 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F0301CB8; Wed, 27 Nov 2013 16:18:56 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 7D1382971; Wed, 27 Nov 2013 16:18:55 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA08413; Wed, 27 Nov 2013 18:18:47 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Vlhp8-0005kS-T4; Wed, 27 Nov 2013 18:18:46 +0200 Message-ID: <52961B2E.1080602@FreeBSD.org> Date: Wed, 27 Nov 2013 18:17:50 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org, FreeBSD Current , freebsd-fs@FreeBSD.org Subject: [HEADSUP!!!] do not upgrade to or past r258632 if you use ZFS + TRIM References: <201311260957.rAQ9vF6d004168@svn.freebsd.org> In-Reply-To: <201311260957.rAQ9vF6d004168@svn.freebsd.org> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 16:18:57 -0000 on 26/11/2013 11:57 Andriy Gapon said the following: > Author: avg > Date: Tue Nov 26 09:57:14 2013 > New Revision: 258632 > URL: http://svnweb.freebsd.org/changeset/base/258632 > > Log: > MFV r255255: 4045 zfs write throttle & i/o scheduler performance work > > illumos/illumos-gate@69962b5647e4a8b9b14998733b765925381b727e > > Please note the following changes: > - zio_ioctl has lost its priority parameter and now TRIM is executed > with 'now' priority > - some knobs are gone and some new knobs are added; not all of them are > exposed as tunables / sysctls yet > > MFC after: 10 days > Sponsored by: HybridCluster [merge] I think that I've introduced a very serious bug when merging this change. Please do not upgrade to this revision if you use ZFS with SSDs and have TRIM support enabled. If you have already upgraded, please disable TRIM support ASAP and roll back to a previous version of kernel and then check integrity of your pools. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 18:31:13 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7EA4E41A; Wed, 27 Nov 2013 18:31:13 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0A8BE200F; Wed, 27 Nov 2013 18:31:12 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rARIV7NP049746; Wed, 27 Nov 2013 20:31:07 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rARIV7NP049746 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rARIV7fZ049745; Wed, 27 Nov 2013 20:31:07 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 27 Nov 2013 20:31:06 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: RFC: NFS client patch to reduce sychronous writes Message-ID: <20131127183106.GB59496@kib.kiev.ua> References: <20131127085245.GW59496@kib.kiev.ua> <1694315515.21775878.1385562535320.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="sB9dJ6svPyodOVES" Content-Disposition: inline In-Reply-To: <1694315515.21775878.1385562535320.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 18:31:13 -0000 --sB9dJ6svPyodOVES Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Nov 27, 2013 at 09:28:55AM -0500, Rick Macklem wrote: > Kostik wrote: > > On Tue, Nov 26, 2013 at 06:41:13PM -0500, Rick Macklem wrote: > Well, if an app. writes a file with holes in it, without the bzeroing > the hole can end up with garbage in it instead of 0s. See the attached > trivial test program I used. Ok. >=20 > > More, the zeroing seems to overwrite part of the previously valid > > content of the buffer, unless I mis-read the patch. This breaks > > the mmap/file consistency, since buffer pages may be mapped by > > usermode > > process, and it could observe the 'out of thin air' zeroes before the > > writing thread progressed to fill the zeroed part with new content. > >=20 > Well obcount is set to the offset in the block of the current EOF > (np->n_size - lbn * biosize) and the zeroing is from there to the new > size of the buffer. My intent was to only zero out the chunk that is > being "grown" by this write. If that part of the file is already mmap()'d > and could have been written by an app. already, I can see a problem, > but I don't know how it would be fixed? But, if the old size of the file is not biosize-aligned, than lbn*biosize is less than the old EOF ? >=20 > I'll try and come up with a test case for this. I'll admit I don't > know when the file's size (n_size) gets updated when mmap()'d. Sorry, I do not understand the question. mmap(2) itself does not change file size. But if mmaped area includes the last page, I still think that the situation I described before is possible. --sB9dJ6svPyodOVES Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSljppAAoJEJDCuSvBvK1BtKYP/3Cl/ogeCvuuCoQJl3/Hqott MgxvK+efYn/BjqVNdnZg/QPES6DV/B9GdE4pwwKmU6dU2NErwVqN2CArdFJ1INjB Q3ygLtk+dq+FYwAqC2yKoX0vkqlAGM8bxjPVZWdXYv2MBA1Tm4FXlBs9Mc7yy5lI 738eCjimLwRbgBMuRmmfUcVB60vMTRJuGfn67GaKHcY2ayKQ6ly/xrBexn23bG26 p31ZP4jarYbApwtKZgMga+FKz2e+O1gPEazEiMCNs7Hjosgq07kl5djNfwXvNMbv Y/ACYH5gSa+7RCO8OS3Q9PUnJc8b/0HyBtsx58T9Lk2vgjIs8e3+Ne4ki0sF+sMt iO3d7/5ZEKIRJzgzXYlk+Hw5S/+r+POLx3F3SuJzo12fnA9RdnDGjYhZz1qmpL/E wtC90ZMpQuu5Jrqv3UtT5v3QlUc5wH/kIDbCZbmmAwUsQ9qixLvyBlOlCXdGNFmT dapOmVm3QxDmkKW+I2BAwXYB1sz1fxnfcqHHojrxM3SXZEoTlYBPCUSKzcRJ/Oyu I3p1JNHDDD0vxA3dvRpJS6BTphtLxLOwPDu7S+GcDBNBIK+A4n0PgoEPJtSUaK6v GrfiIm8iP+kG8d5Oypu5cB5tlTpoyiX/6Hg52pjwXezh7pbFYOgPKtlvztvhBWM/ N+YJ1bRTCa2kq5ix1eoe =T025 -----END PGP SIGNATURE----- --sB9dJ6svPyodOVES-- From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 22:50:56 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2056193B; Wed, 27 Nov 2013 22:50:56 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id C57C42120; Wed, 27 Nov 2013 22:50:55 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEADN2llKDaFve/2dsb2JhbABZhBKCerVJgTd0giUBAQUjBFIbDgoCAg0ZAlkGiBSvB5BuF4EpjSU0B4JrgUgDiUKgZYNHHoFu X-IronPort-AV: E=Sophos;i="4.93,785,1378872000"; d="scan'208";a="72997841" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 27 Nov 2013 17:50:48 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 9B5ADB4038; Wed, 27 Nov 2013 17:50:48 -0500 (EST) Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <661293549.22251424.1385592648623.JavaMail.root@uoguelph.ca> In-Reply-To: <20131127183106.GB59496@kib.kiev.ua> Subject: Re: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 22:50:56 -0000 Kostik wrote: > On Wed, Nov 27, 2013 at 09:28:55AM -0500, Rick Macklem wrote: > > Kostik wrote: > > > On Tue, Nov 26, 2013 at 06:41:13PM -0500, Rick Macklem wrote: > > Well, if an app. writes a file with holes in it, without the > > bzeroing > > the hole can end up with garbage in it instead of 0s. See the > > attached > > trivial test program I used. > Ok. > > > > > > More, the zeroing seems to overwrite part of the previously valid > > > content of the buffer, unless I mis-read the patch. This breaks > > > the mmap/file consistency, since buffer pages may be mapped by > > > usermode > > > process, and it could observe the 'out of thin air' zeroes before > > > the > > > writing thread progressed to fill the zeroed part with new > > > content. > > > > > Well obcount is set to the offset in the block of the current EOF > > (np->n_size - lbn * biosize) and the zeroing is from there to the > > new > > size of the buffer. My intent was to only zero out the chunk that > > is > > being "grown" by this write. If that part of the file is already > > mmap()'d > > and could have been written by an app. already, I can see a > > problem, > > but I don't know how it would be fixed? > But, if the old size of the file is not biosize-aligned, than > lbn*biosize > is less than the old EOF ? > Yes, at this point np->n_size is the old size and this calculates the offset within the file's last block of the old EOF and puts it in obcount. bcount is the offset of the new EOF that will result after the write. > > > > I'll try and come up with a test case for this. I'll admit I don't > > know when the file's size (n_size) gets updated when mmap()'d. > Sorry, I do not understand the question. mmap(2) itself does not > change > file size. But if mmaped area includes the last page, I still think > that > the situation I described before is possible. > > Yes, I'll need to look at this. If it is a problem, all I can think of is bzeroing all new pages when they're allocated to the buffer cache. Thanks for looking at it, rick ps: Btw, jhb@'s patch didn't have the bzeroing in it. From owner-freebsd-fs@FreeBSD.ORG Wed Nov 27 23:20:19 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 602CAEB7 for ; Wed, 27 Nov 2013 23:20:19 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4179231C for ; Wed, 27 Nov 2013 23:20:19 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id rARNKEKQ045789; Wed, 27 Nov 2013 15:20:14 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201311272320.rARNKEKQ045789@chez.mckusick.com> To: Rick Macklem Subject: Re: RFC: NFS client patch to reduce sychronous writes In-reply-to: <661293549.22251424.1385592648623.JavaMail.root@uoguelph.ca> Date: Wed, 27 Nov 2013 15:20:14 -0800 From: Kirk McKusick Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 23:20:19 -0000 > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST) > From: Rick Macklem > To: Konstantin Belousov > Subject: Re: RFC: NFS client patch to reduce sychronous writes > > Kostik wrote: >> Sorry, I do not understand the question. mmap(2) itself does not change >> file size. But if mmaped area includes the last page, I still think >> that the situation I described before is possible. > > Yes, I'll need to look at this. If it is a problem, all I can think of > is bzeroing all new pages when they're allocated to the buffer cache. > > Thanks for looking at it, rick > ps: Btw, jhb@'s patch didn't have the bzeroing in it. The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS for this problem and it killed write performance of the filesystem by nearly half. We corrected this by only doing the bzero when the file is mmap'ed which helped things considerably (since most files being written are not also bmap'ed). Kirk From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 00:19:55 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A72F8988 for ; Thu, 28 Nov 2013 00:19:55 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 69F83852 for ; Thu, 28 Nov 2013 00:19:55 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAE2LllKDaFve/2dsb2JhbABZhBKCerVJgTd0giUBAQUjVhsRAwECAQICDRkCIy4IBhOHbwMPrwiIXQ2IAheBKYtIgV00B4JrgUgDiUKMZ45FhTmDRx6Bbg X-IronPort-AV: E=Sophos;i="4.93,786,1378872000"; d="scan'208";a="74149290" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 27 Nov 2013 19:19:47 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C1B49B3F4E; Wed, 27 Nov 2013 19:19:47 -0500 (EST) Date: Wed, 27 Nov 2013 19:19:47 -0500 (EST) From: Rick Macklem To: Kirk McKusick Message-ID: <1476192898.22291791.1385597987782.JavaMail.root@uoguelph.ca> In-Reply-To: <201311272320.rARNKEKQ045789@chez.mckusick.com> Subject: Re: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 00:19:55 -0000 Kirk wrote: > > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST) > > From: Rick Macklem > > To: Konstantin Belousov > > Subject: Re: RFC: NFS client patch to reduce sychronous writes > > > > Kostik wrote: > >> Sorry, I do not understand the question. mmap(2) itself does not > >> change > >> file size. But if mmaped area includes the last page, I still > >> think > >> that the situation I described before is possible. > > > > Yes, I'll need to look at this. If it is a problem, all I can think > > of > > is bzeroing all new pages when they're allocated to the buffer > > cache. > > > > Thanks for looking at it, rick > > ps: Btw, jhb@'s patch didn't have the bzeroing in it. > > The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS > for this problem and it killed write performance of the filesystem > by nearly half. We corrected this by only doing the bzero when the > file is mmap'ed which helped things considerably (since most files > being written are not also bmap'ed). > > Kirk > Ok, thanks. I've been trying to reproduce the problem over NFS and haven't been able to break my patch. I was using the attached trivial test program and would simply make a copy of the source file (529 bytes) to test on. I got the same results both locally and over NFS: - built without -DWRITEIT, the setting of a value after EOF would be lost, because nothing grew the file from 529 bytes to over 4080bytes. - built with -DWRITEIT, both the 'A' and 'B' are in the result, since my patch bzeros the grown segment in the write(2) syscall. - If I move the write (code in #ifdef WRITEIT) to after the "*cp" of the mapped page, the 'A' assigned to "*cp" gets lost for both UFS and NFS. Is this correct behaviour? If it is correct behaviour, I can't see how the patch is broken, but if you think it might still be, I'll look at doing what Kirk suggests, which is bzeroing all new buffer cache pages when the file is mmap()d. Thanks for the help, rick From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 00:21:26 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5B6459FC for ; Thu, 28 Nov 2013 00:21:26 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 1C7FE880 for ; Thu, 28 Nov 2013 00:21:25 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAHKLllKDaFve/2dsb2JhbABZhBKCerVJgTd0giUBAQUjVgwPEQMBAgERGQIEHy4IBhOHbwMPrwiIXQ2IAheMcYFdGQoRBwaCZYFIA4lChm+FeI5FhTmDRx6Bbg X-IronPort-AV: E=Sophos;i="4.93,786,1378872000"; d="c'?scan'208";a="73025463" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 27 Nov 2013 19:21:24 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id A27CFB40B4; Wed, 27 Nov 2013 19:21:24 -0500 (EST) Date: Wed, 27 Nov 2013 19:21:24 -0500 (EST) From: Rick Macklem To: Kirk McKusick Message-ID: <66384815.22292870.1385598084659.JavaMail.root@uoguelph.ca> In-Reply-To: <201311272320.rARNKEKQ045789@chez.mckusick.com> Subject: Re: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 00:21:26 -0000 Oops, I did my usual and forgot to attach the test program. Here it is, rick ----- Original Message ----- > > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST) > > From: Rick Macklem > > To: Konstantin Belousov > > Subject: Re: RFC: NFS client patch to reduce sychronous writes > > > > Kostik wrote: > >> Sorry, I do not understand the question. mmap(2) itself does not > >> change > >> file size. But if mmaped area includes the last page, I still > >> think > >> that the situation I described before is possible. > > > > Yes, I'll need to look at this. If it is a problem, all I can think > > of > > is bzeroing all new pages when they're allocated to the buffer > > cache. > > > > Thanks for looking at it, rick > > ps: Btw, jhb@'s patch didn't have the bzeroing in it. > > The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS > for this problem and it killed write performance of the filesystem > by nearly half. We corrected this by only doing the bzero when the > file is mmap'ed which helped things considerably (since most files > being written are not also bmap'ed). > > Kirk > From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 00:25:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6B346ACE for ; Thu, 28 Nov 2013 00:25:46 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 2E58F8A8 for ; Thu, 28 Nov 2013 00:25:45 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAGWMllKDaFve/2dsb2JhbABZhBKCerVJgTd0giUBAQUjVgwPEQMBAgECAg0ZAiMuCAYTh28DD68KiFwNiAIXgSmLSIFAAQEbNAcGgmWBSAOJQoxnjkWFOYNHHoE1OQ X-IronPort-AV: E=Sophos;i="4.93,786,1378872000"; d="scan'208";a="74150570" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 27 Nov 2013 19:25:44 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 486B7B3F41; Wed, 27 Nov 2013 19:25:45 -0500 (EST) Date: Wed, 27 Nov 2013 19:25:45 -0500 (EST) From: Rick Macklem To: Kirk McKusick Message-ID: <1525534748.22295409.1385598345290.JavaMail.root@uoguelph.ca> In-Reply-To: <201311272320.rARNKEKQ045789@chez.mckusick.com> Subject: Re: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 00:25:46 -0000 Ok, the attchment seemed to get stripped off. Here's the code. Apologies if the mail system I use eats the whitespace, rick #include #include #include #include main(int argc, char *argv[]) { int x, i; char *cp; printf("before open\n"); x = open(argv[1], O_RDWR | O_CREAT, 0666); printf("aft open=%d\n", x); cp = mmap((void *)0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, x, 0); #ifdef WRITEIT lseek(x, 4090, SEEK_SET); write(x, "B", 1); printf("wrote B at 4090\n"); #endif if (cp != NULL) { *(cp + 4080) = 'A'; if (msync(cp, 0, MS_SYNC) < 0) printf("msync failed\n"); } close(x); } ----- Original Message ----- > > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST) > > From: Rick Macklem > > To: Konstantin Belousov > > Subject: Re: RFC: NFS client patch to reduce sychronous writes > > > > Kostik wrote: > >> Sorry, I do not understand the question. mmap(2) itself does not > >> change > >> file size. But if mmaped area includes the last page, I still > >> think > >> that the situation I described before is possible. > > > > Yes, I'll need to look at this. If it is a problem, all I can think > > of > > is bzeroing all new pages when they're allocated to the buffer > > cache. > > > > Thanks for looking at it, rick > > ps: Btw, jhb@'s patch didn't have the bzeroing in it. > > The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS > for this problem and it killed write performance of the filesystem > by nearly half. We corrected this by only doing the bzero when the > file is mmap'ed which helped things considerably (since most files > being written are not also bmap'ed). > > Kirk > From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 07:18:27 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 00CDC858 for ; Thu, 28 Nov 2013 07:18:26 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7F2C519FA for ; Thu, 28 Nov 2013 07:18:26 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAS7IMDt012306; Thu, 28 Nov 2013 09:18:22 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAS7IMDt012306 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAS7ILol012305; Thu, 28 Nov 2013 09:18:21 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 28 Nov 2013 09:18:21 +0200 From: Konstantin Belousov To: Kirk McKusick Subject: Re: RFC: NFS client patch to reduce sychronous writes Message-ID: <20131128071821.GH59496@kib.kiev.ua> References: <661293549.22251424.1385592648623.JavaMail.root@uoguelph.ca> <201311272320.rARNKEKQ045789@chez.mckusick.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="GI+6NRUBHXI5NCMC" Content-Disposition: inline In-Reply-To: <201311272320.rARNKEKQ045789@chez.mckusick.com> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 07:18:27 -0000 --GI+6NRUBHXI5NCMC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Nov 27, 2013 at 03:20:14PM -0800, Kirk McKusick wrote: > The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS > for this problem and it killed write performance of the filesystem > by nearly half. We corrected this by only doing the bzero when the > file is mmap'ed which helped things considerably (since most files > being written are not also bmap'ed). I am not sure that I follow. For UFS, leaving any part of the buffer with undefined garbage would cause the garbage to appear on the next mmap(2), since page in is implemented as translation of the file offsets into disk offsets and than reading disk blocks. The read always fetch full page. UFS cannot know if the file would be mapped sometime in future, or after the reboot. In fact, UFS is quite plentiful WRT zeroing buffers on write. It is easy to see almost all places where it is done, by searching for BA_CLRBUF flag for UFS_BALLOC(). UFS does perform the optimization of _trying_ to not clear newly allocated buffer on write if uio covers the whole buffer range. Still, on error it falls back to clearing, which is performed by vfs_bio_clrbuf() call in ffs_write(). --GI+6NRUBHXI5NCMC Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSlu48AAoJEJDCuSvBvK1BGvAP+wdCrnmlXGV5S8Rc6S8aYqP3 +gLf+MmOOkEkbUttXgt6UU56pDZ0gm9NPGwTwdIT5YkjJ6gAvnA3jwFJ1Sqf8NC7 x1cKrXsy5/GIpKfa/mtFrLjVQk+dqYCNWA3tvqPHIQlhEu9sV2G8SpFqu1OcTUpu 3Kzi8/nC75Il2WsjaU3zk7NTiFdyg0iaCg5vm0IlA/P9GF721jEMpue+ccyvgpXQ FZr9OO2Wq8rKNuGE/dzV33Tj67/yRgkLxqBjZz84g7BXslS3dhVazfrA/UnNJRJr J7oZV7L9ot2t5RgAdxWLbDYsLc4OpjtzFQ6WjxyRNVZgrOGfSwIJbIqSDflRlVsm NbJLoaBoDvVUj4Okv/PhMTxcicFFfwDqO8IhLa6ETvMEt+9pqjJDgx9WmBBfTkLX xAfm91wes3J+fOaRozLSh8Gwg2S7s/VOFDibp2f7Zh97AMGAzpB5vXecEAd6S4fu 7Xk3TCVrWrYCGMFhH7ftFUFKY1rEteN584xcNWEjeFoILt+K+DctuVBx6MkQQ7ud zAUcVAWypvzhdM86N7l+Fl+9RsGuxheVdFvbRnhcefXD3hzqojcPWY/0MLKw4h5e NGR7bea73Xr5R5wt2ldTCFZcTpSBMHGj4/n0Qk7ILCd6yi79fJSIakFalgPWWkhY YjilSDQU+SiHxg/6kb/z =D01p -----END PGP SIGNATURE----- --GI+6NRUBHXI5NCMC-- From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 07:24:45 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1B8CA9E8 for ; Thu, 28 Nov 2013 07:24:45 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 993C31A44 for ; Thu, 28 Nov 2013 07:24:44 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAS7OZXQ013368; Thu, 28 Nov 2013 09:24:35 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAS7OZXQ013368 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAS7OZ1t013367; Thu, 28 Nov 2013 09:24:35 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 28 Nov 2013 09:24:35 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: RFC: NFS client patch to reduce sychronous writes Message-ID: <20131128072435.GI59496@kib.kiev.ua> References: <201311272320.rARNKEKQ045789@chez.mckusick.com> <1476192898.22291791.1385597987782.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ffNf1iMHjKeYni8r" Content-Disposition: inline In-Reply-To: <1476192898.22291791.1385597987782.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Kirk McKusick , FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 07:24:45 -0000 --ffNf1iMHjKeYni8r Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Nov 27, 2013 at 07:19:47PM -0500, Rick Macklem wrote: > Kirk wrote: > > > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST) > > > From: Rick Macklem > > > To: Konstantin Belousov > > > Subject: Re: RFC: NFS client patch to reduce sychronous writes > > >=20 > > > Kostik wrote: > > >> Sorry, I do not understand the question. mmap(2) itself does not > > >> change > > >> file size. But if mmaped area includes the last page, I still > > >> think > > >> that the situation I described before is possible. > > >=20 > > > Yes, I'll need to look at this. If it is a problem, all I can think > > > of > > > is bzeroing all new pages when they're allocated to the buffer > > > cache. > > >=20 > > > Thanks for looking at it, rick > > > ps: Btw, jhb@'s patch didn't have the bzeroing in it. > >=20 > > The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS > > for this problem and it killed write performance of the filesystem > > by nearly half. We corrected this by only doing the bzero when the > > file is mmap'ed which helped things considerably (since most files > > being written are not also bmap'ed). > >=20 > > Kirk > >=20 > Ok, thanks. I've been trying to reproduce the problem over NFS and > haven't been able to break my patch. I was using the attached trivial > test program and would simply make a copy of the source file (529 bytes) > to test on. I got the same results both locally and over NFS: > - built without -DWRITEIT, the setting of a value after EOF would be > lost, because nothing grew the file from 529 bytes to over 4080bytes. > - built with -DWRITEIT, both the 'A' and 'B' are in the result, since > my patch bzeros the grown segment in the write(2) syscall. >=20 > - If I move the write (code in #ifdef WRITEIT) to after the "*cp" > of the mapped page, the 'A' assigned to "*cp" gets lost for > both UFS and NFS. > Is this correct behaviour? >=20 > If it is correct behaviour, I can't see how the patch is broken, but > if you think it might still be, I'll look at doing what Kirk suggests, > which is bzeroing all new buffer cache pages when the file is mmap()d. >=20 Replying there, since text description is more informative than the code. You cannot get the situation I described, with single process. You should have a writer in one thread, and reader through the mmaped area in another. Even than, the race window is thin. Let me describe the issue which could exist one more time: Thread A (writer) issued write(2). The kernel does two things: 1. zeroes part of the last buffer of the affected file. 2. kernel uiomove()s the write data into the buffer b_data. Now, assume that thread B has the same file mmaped somewhere, and accesses the page of the buffer after the [1] but before [2]. Than, it would see zeroes instead of the valid content. I said that this breaks write/mmap consistency, since thread B can see a content in the file which was never written there. The condition is transient, it self-repairs after thread A passes point 2. --ffNf1iMHjKeYni8r Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSlu+yAAoJEJDCuSvBvK1BXZwP/1satyMCYlVo26vy0f9PjEon t0wTKkCUVAniDsMAOgLQgoM1HmTrXjpHdkojn3CPutXEjpt6aXHOj1qz4B0DbAkM +spqzWLCne7WGfYZ/1ckIkOnApyk8/X/vecyVml1y54FzAl+qHTZAwGq8F2WfO8N WF4sIg4AWXWKoPm1inzhpAARWXysdoPCMtDrNxBXIdBGjkcblvy33oz9gdPioViI MYvw7/wK5xVETHfpWOv5WT9loPvKOz8Not08L6pP0X3NIubbfHwyWFqSPNRo9OXK qb28TUcX303DGJzQm8KpD1S7c/MSS1AM2q3U+7jpB3FrgVMzcdWMHNsMXKL5jZoZ TPRAtGnFzidZ4B4XQOK1HWONqSUjSKR564TzlaRk5SUJlgqiHDx7zcLYMi0J4u2T Af2BXKkz5epeS7qrxLNP/J4zADDaLId5gxcu+y7V0UNHHSSzG+5hbSqsmFHsW6SI X95dL9ZcMC/iDvyDnGMoFqadFuI5GZM8ZoFyhHImcDxC1CL2xrP7yK1mXTy2dmZu bgl8dt1f6wqZDObWfGqVXJKfeZ19eaqAZ43ZlY0A16cUoyHeDbQeKuOew7+qqkIF 0mmJHaVA2kOreZIZWPh3OcV2EMaB5Yzp8zznVcHW702x+406KaPcT+9F+G+lFeKu CTIJhCgTFZtXYR0Ii7TH =DwbK -----END PGP SIGNATURE----- --ffNf1iMHjKeYni8r-- From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 08:53:10 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1B0EAD71; Thu, 28 Nov 2013 08:53:10 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 96B851E81; Thu, 28 Nov 2013 08:53:08 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA19929; Thu, 28 Nov 2013 10:53:06 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1VlxLN-0009d1-Ql; Thu, 28 Nov 2013 10:53:05 +0200 Message-ID: <52970439.3070406@FreeBSD.org> Date: Thu, 28 Nov 2013 10:52:09 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org, FreeBSD Current , freebsd-fs@FreeBSD.org Subject: Re: [HEADSUP!!!] do not upgrade to or past r258632 if you use ZFS + TRIM References: <201311260957.rAQ9vF6d004168@svn.freebsd.org> <52961B2E.1080602@FreeBSD.org> In-Reply-To: <52961B2E.1080602@FreeBSD.org> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 08:53:10 -0000 on 27/11/2013 18:17 Andriy Gapon said the following: > on 26/11/2013 11:57 Andriy Gapon said the following: >> Author: avg >> Date: Tue Nov 26 09:57:14 2013 >> New Revision: 258632 >> URL: http://svnweb.freebsd.org/changeset/base/258632 >> >> Log: >> MFV r255255: 4045 zfs write throttle & i/o scheduler performance work >> >> illumos/illumos-gate@69962b5647e4a8b9b14998733b765925381b727e >> >> Please note the following changes: >> - zio_ioctl has lost its priority parameter and now TRIM is executed >> with 'now' priority >> - some knobs are gone and some new knobs are added; not all of them are >> exposed as tunables / sysctls yet >> >> MFC after: 10 days >> Sponsored by: HybridCluster [merge] > > I think that I've introduced a very serious bug when merging this change. > Please do not upgrade to this revision if you use ZFS with SSDs and have TRIM > support enabled. > > If you have already upgraded, please disable TRIM support ASAP and roll back to > a previous version of kernel and then check integrity of your pools. The issue should be fixed in r258704. Yes, the bug was that simple and that serious. My apologies to all who were affected. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 12:51:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D69A7862 for ; Thu, 28 Nov 2013 12:51:49 +0000 (UTC) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 73F141C35 for ; Thu, 28 Nov 2013 12:51:48 +0000 (UTC) Received: from r2d2 ([82.69.179.241]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50006859340.msg for ; Thu, 28 Nov 2013 12:51:41 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 28 Nov 2013 12:51:41 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.179.241 X-Return-Path: prvs=1044cd13f0=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <03CB6E4FFD454EBA9E416813A5187C9F@multiplay.co.uk> From: "Steven Hartland" To: "Richard Kojedzinszky" , References: Subject: Re: ssd for zfs Date: Thu, 28 Nov 2013 12:51:36 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 12:51:50 -0000 FreeBSD has TRIM support and some disks have very slow TRIM support. Try disabling TRIM support in /boot/loader.conf vfs.zfs.trim.enabled=0 Also check to see if your seeing ashift 12 as I suspect thats actually a 4k not 512b device. This can be confirmed by looking at the boot output. Regards Steve ----- Original Message ----- From: "Richard Kojedzinszky" To: Sent: Wednesday, November 27, 2013 8:51 AM Subject: ssd for zfs > Dear fs developers, > > Probably this is not the best list to report my issue, but please forward > it to where it should get. > > I bought an SSD for my ZFS filesystem to use it as a ZIL. I've tested it > under linux, and found that it can handle around 1400 random synchronized > write IOPS. Then I placed it into my freebsd 9.2 box, and after attaching > it as a ZIL, my zpool only performs 100 (!) write iops. I've attached it > to an AHCI controller and to an LSI 1068 controller, on both it behaves > the same. So I expect that something in the scsi layer is different, > FreeBSD is handling this device slower, but actually it can handle the > 1400 iops as tested under linux. > > Please give some advice where to go, how to debug, and how to improve > FreeBSD's performance with this drive. > > The device is: > # camcontrol identify ada3 > pass4: ATA-8 SATA 2.x device > pass4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes) > > protocol ATA/ATAPI-8 SATA 2.x > device model STEC MACH16 M16SD2S-50UI > firmware revision 00000299 > serial number STM0001680E8 > WWN 5000a7203006f8e5 > media serial number STEC MACH16 M16SD2S-50UI STM00 > cylinders 16383 > heads 15 > sectors/track 63 > sector size logical 512, physical 512, offset 0 > LBA supported 97696368 sectors > LBA48 supported 97696368 sectors > PIO supported PIO4 > DMA supported WDMA2 UDMA6 > media RPM non-rotating > > Feature Support Enabled Value Vendor > read ahead yes yes > write cache yes yes > flush cache yes yes > overlap no > Tagged Command Queuing (TCQ) no no > Native Command Queuing (NCQ) yes 32 tags > SMART yes yes > microcode download yes yes > security yes no > power management yes yes > advanced power management no no > automatic acoustic management no no > media status notification no no > power-up in Standby yes no > write-read-verify no no > unload no yes > free-fall no no > Data Set Management (DSM/TRIM) yes > DSM - max 512byte blocks yes 8 > DSM - deterministic read yes any value > Host Protected Area (HPA) yes no 97696368/97696368 > HPA - Security no > > Regards, > > Kojedzinszky Richard > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 14:15:39 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BBC849E8 for ; Thu, 28 Nov 2013 14:15:39 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7F1AB104E for ; Thu, 28 Nov 2013 14:15:39 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAHhPl1KDaFve/2dsb2JhbABZhBKCerVNgTB0giUBAQUjBFIbDgMDAQIBAgINGQIjLggGE4dvAw+vWIhkDYgCF4Epi06BXTQHgmuBSAOJQoxnjkWFOYNHHoFu X-IronPort-AV: E=Sophos;i="4.93,791,1378872000"; d="scan'208";a="73229240" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 28 Nov 2013 09:15:37 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id D6232B408D; Thu, 28 Nov 2013 09:15:37 -0500 (EST) Date: Thu, 28 Nov 2013 09:15:37 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <820090900.22521963.1385648137865.JavaMail.root@uoguelph.ca> In-Reply-To: <20131128072435.GI59496@kib.kiev.ua> Subject: Re: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: Kirk McKusick , FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 14:15:39 -0000 Kostik wrote: > On Wed, Nov 27, 2013 at 07:19:47PM -0500, Rick Macklem wrote: > > Kirk wrote: > > > > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST) > > > > From: Rick Macklem > > > > To: Konstantin Belousov > > > > Subject: Re: RFC: NFS client patch to reduce sychronous writes > > > > > > > > Kostik wrote: > > > >> Sorry, I do not understand the question. mmap(2) itself does > > > >> not > > > >> change > > > >> file size. But if mmaped area includes the last page, I still > > > >> think > > > >> that the situation I described before is possible. > > > > > > > > Yes, I'll need to look at this. If it is a problem, all I can > > > > think > > > > of > > > > is bzeroing all new pages when they're allocated to the buffer > > > > cache. > > > > > > > > Thanks for looking at it, rick > > > > ps: Btw, jhb@'s patch didn't have the bzeroing in it. > > > > > > The ``fix'' of bzero'ing every buffer cache page was made to > > > UFS/FFS > > > for this problem and it killed write performance of the > > > filesystem > > > by nearly half. We corrected this by only doing the bzero when > > > the > > > file is mmap'ed which helped things considerably (since most > > > files > > > being written are not also bmap'ed). > > > > > > Kirk > > > > > Ok, thanks. I've been trying to reproduce the problem over NFS and > > haven't been able to break my patch. I was using the attached > > trivial > > test program and would simply make a copy of the source file (529 > > bytes) > > to test on. I got the same results both locally and over NFS: > > - built without -DWRITEIT, the setting of a value after EOF would > > be > > lost, because nothing grew the file from 529 bytes to over > > 4080bytes. > > - built with -DWRITEIT, both the 'A' and 'B' are in the result, > > since > > my patch bzeros the grown segment in the write(2) syscall. > > > > - If I move the write (code in #ifdef WRITEIT) to after the "*cp" > > of the mapped page, the 'A' assigned to "*cp" gets lost for > > both UFS and NFS. > > Is this correct behaviour? > > > > If it is correct behaviour, I can't see how the patch is broken, > > but > > if you think it might still be, I'll look at doing what Kirk > > suggests, > > which is bzeroing all new buffer cache pages when the file is > > mmap()d. > > > Replying there, since text description is more informative than the > code. > > You cannot get the situation I described, with single process. > You should have a writer in one thread, and reader through the mmaped > area in another. Even than, the race window is thin. > > Let me describe the issue which could exist one more time: > > Thread A (writer) issued write(2). The kernel does two things: > 1. zeroes part of the last buffer of the affected file. > 2. kernel uiomove()s the write data into the buffer b_data. > > Now, assume that thread B has the same file mmaped somewhere, and > accesses the page of the buffer after the [1] but before [2]. Than, > it would see zeroes instead of the valid content. > > I said that this breaks write/mmap consistency, since thread B can > see a content in the file which was never written there. The > condition > is transient, it self-repairs after thread A passes point 2. > Ok, but doesn't that exist now? Without the patch, when the Thread A (writer) appends to the file, the np->n_size is grown and vnode_pager_setsize() is called with the new, larger size, followed by an allocbuf(). { at around line# 1066,1073 of nfs_clbio.c. Same code exists in the old client and was cloned. } Then vn_io_fault_uiomove() is called at line#1195. Without the patch, Thread B would get whatever garbage is in the page(s) if it accesses the latter (just grown) part of the buffer between 1 and 2, would it not? With the patch, it will most likely get 0s instead of garbage, if it accesses the latter (just grown) part of the buffer between 1. and 2. There is actually a short time just after the vnode_pager_setsize() and before the bzeroing, that it would still get garbage. (Note that the patch only zeros out the part of the last page that was "grown" by the write in 1. that is past the previous EOF. If I bzeroing data that was before the previous EOF, I can see the breakage, but the patch doesn't do that. Unless I've written buggy code, of course, but I think it's correct.) So, I don't see any difference between the unpatched and patched version? rick From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 19:19:04 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 13168760 for ; Thu, 28 Nov 2013 19:19:04 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0F8551FF3 for ; Thu, 28 Nov 2013 19:19:02 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rASJIppo069406; Thu, 28 Nov 2013 21:18:51 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rASJIppo069406 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rASJIpw8069405; Thu, 28 Nov 2013 21:18:51 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 28 Nov 2013 21:18:51 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: RFC: NFS client patch to reduce sychronous writes Message-ID: <20131128191851.GM59496@kib.kiev.ua> References: <20131128072435.GI59496@kib.kiev.ua> <820090900.22521963.1385648137865.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rNM3Wo3yRXFr03ac" Content-Disposition: inline In-Reply-To: <820090900.22521963.1385648137865.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Kirk McKusick , FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 19:19:04 -0000 --rNM3Wo3yRXFr03ac Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 28, 2013 at 09:15:37AM -0500, Rick Macklem wrote: > Kostik wrote: > > On Wed, Nov 27, 2013 at 07:19:47PM -0500, Rick Macklem wrote: > > > Kirk wrote: > > > > > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST) > > > > > From: Rick Macklem > > > > > To: Konstantin Belousov > > > > > Subject: Re: RFC: NFS client patch to reduce sychronous writes > > > > >=20 > > > > > Kostik wrote: > > > > >> Sorry, I do not understand the question. mmap(2) itself does > > > > >> not > > > > >> change > > > > >> file size. But if mmaped area includes the last page, I still > > > > >> think > > > > >> that the situation I described before is possible. > > > > >=20 > > > > > Yes, I'll need to look at this. If it is a problem, all I can > > > > > think > > > > > of > > > > > is bzeroing all new pages when they're allocated to the buffer > > > > > cache. > > > > >=20 > > > > > Thanks for looking at it, rick > > > > > ps: Btw, jhb@'s patch didn't have the bzeroing in it. > > > >=20 > > > > The ``fix'' of bzero'ing every buffer cache page was made to > > > > UFS/FFS > > > > for this problem and it killed write performance of the > > > > filesystem > > > > by nearly half. We corrected this by only doing the bzero when > > > > the > > > > file is mmap'ed which helped things considerably (since most > > > > files > > > > being written are not also bmap'ed). > > > >=20 > > > > Kirk > > > >=20 > > > Ok, thanks. I've been trying to reproduce the problem over NFS and > > > haven't been able to break my patch. I was using the attached > > > trivial > > > test program and would simply make a copy of the source file (529 > > > bytes) > > > to test on. I got the same results both locally and over NFS: > > > - built without -DWRITEIT, the setting of a value after EOF would > > > be > > > lost, because nothing grew the file from 529 bytes to over > > > 4080bytes. > > > - built with -DWRITEIT, both the 'A' and 'B' are in the result, > > > since > > > my patch bzeros the grown segment in the write(2) syscall. > > >=20 > > > - If I move the write (code in #ifdef WRITEIT) to after the "*cp" > > > of the mapped page, the 'A' assigned to "*cp" gets lost for > > > both UFS and NFS. > > > Is this correct behaviour? > > >=20 > > > If it is correct behaviour, I can't see how the patch is broken, > > > but > > > if you think it might still be, I'll look at doing what Kirk > > > suggests, > > > which is bzeroing all new buffer cache pages when the file is > > > mmap()d. > > >=20 > > Replying there, since text description is more informative than the > > code. > >=20 > > You cannot get the situation I described, with single process. > > You should have a writer in one thread, and reader through the mmaped > > area in another. Even than, the race window is thin. > >=20 > > Let me describe the issue which could exist one more time: > >=20 > > Thread A (writer) issued write(2). The kernel does two things: > > 1. zeroes part of the last buffer of the affected file. > > 2. kernel uiomove()s the write data into the buffer b_data. > >=20 > > Now, assume that thread B has the same file mmaped somewhere, and > > accesses the page of the buffer after the [1] but before [2]. Than, > > it would see zeroes instead of the valid content. > >=20 > > I said that this breaks write/mmap consistency, since thread B can > > see a content in the file which was never written there. The > > condition > > is transient, it self-repairs after thread A passes point 2. > >=20 > Ok, but doesn't that exist now? >=20 > Without the patch, when the Thread A (writer) appends to the file, > the np->n_size is grown and vnode_pager_setsize() is called with > the new, larger size, followed by an allocbuf(). > { at around line# 1066,1073 of nfs_clbio.c. Same code exists in the > old client and was cloned. } > Then vn_io_fault_uiomove() is called at line#1195. >=20 > Without the patch, Thread B would get whatever garbage is in the > page(s) if it accesses the latter (just grown) part of the buffer between > 1 and 2, would it not? >=20 > With the patch, it will most likely get 0s instead of garbage, if > it accesses the latter (just grown) part of the buffer between 1. and 2. > There is actually a short time just after the vnode_pager_setsize() > and before the bzeroing, that it would still get garbage. > (Note that the patch only zeros out the part of the last page that > was "grown" by the write in 1. that is past the previous EOF. > If I bzeroing data that was before the previous EOF, I can see > the breakage, but the patch doesn't do that. Unless I've written > buggy code, of course, but I think it's correct.) >=20 > So, I don't see any difference between the unpatched and patched > version? AFAIU, you bzero the whole region in the buffer which is subject to i/o, not the new extent. E.g., if the file size mod biosize is non-zero, but the current write starts on the buffer boundary, than then range from the buffer start to the old EOF is zeroed for some moment, before being overwritten by user data. --rNM3Wo3yRXFr03ac Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSl5caAAoJEJDCuSvBvK1BDxoQAJ3izrlf3y3xpjTglmsFXGJ0 AbLjaiptVr35dLPK0cPMuia/zScjmOPx1gZCjg4AZVuQ6dPMqsadIIhORDu+ekGF r2RaTPW570P0a0IDoAkZwpNVhBf8WnJ9SEvt9N43H6DaEVVoRSGtwqSlV/vbk10o qC4B//QhRlNpQa4drP1AJRd1DPCKOzjTmPnWeSYJiYmp8x2aZUWfb3N6OykG8m/f o9O4Y20YrIg9gAdw207niunH3umzanrV40TDzH+8BGs66mUrH2rmwNZbbdAW8SEE fZkOgG76Ixc8OhHfT1GT6xQbfr925lh9aa+ODAMQ4SDzWvfCRGA0H24iLCXe+LCy GVBeTCO1B7y5O271a3BunxzvuRuBeo0fELh0wPQo365iM2WJ9Nk8C4tCCuxqNNlQ aCBymFehLKekCUTK3plhMoxmy5tnf3y8s9rkZdg9j24ATGUrGKFAMZZ69AEeyGS7 qFV5BDGsAWWyXg3xVmZhgPVcMhs1KyX4kkoAloAbY/nkdtvmFSdE2NpFGZrNdm8Q zBeJRV1P4g6mKuSXHF8FDLsqYMxdtwhsRtXJYOpdvqxU6wDEwEdR4LeIdxMFU9n7 YY+81si4GPZe0cxE5Yz0Az4obyhLYbj+7DQ5nWBWgWjuOLvuuaE3Eq0IZqrmSkEm Hbe6CVwMdQPDoxdGuFiq =EGIt -----END PGP SIGNATURE----- --rNM3Wo3yRXFr03ac-- From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 21:57:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6D96D9D6 for ; Thu, 28 Nov 2013 21:57:31 +0000 (UTC) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) by mx1.freebsd.org (Postfix) with ESMTP id 2A5121732 for ; Thu, 28 Nov 2013 21:57:30 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Vm9dP-000Kg5-Gi for freebsd-fs@freebsd.org; Fri, 29 Nov 2013 02:00:31 +0400 Date: Fri, 29 Nov 2013 02:00:31 +0400 From: Slawa Olhovchenkov To: freebsd-fs@freebsd.org Subject: zfs unneceary massive IO writing Message-ID: <20131128220031.GA77254@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 21:57:31 -0000 I have host with 8-STABLE and ZFS. ZFS setup is very simple: zfs_www 880G 665G 215G 75% 1.00x ONLINE - aacd0s3 880G 665G 215G - No L2ARC, no ZIL, copies=1 After creating small file (or append some bytes to file) I see very large write to disk (400 write ops, total up to 1MB). How I can discover this problem? dtrace on zio_create? How I can diagnose source of IO operation? From owner-freebsd-fs@FreeBSD.ORG Thu Nov 28 23:50:23 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 106F5FDD for ; Thu, 28 Nov 2013 23:50:23 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9D2C11BE7 for ; Thu, 28 Nov 2013 23:50:22 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAGnVl1KDaFve/2dsb2JhbABRCIQSgnq1W4ExdIIlAQEEASMEUhsOAwMBAgECAg0ZAiMuCAYTh28DCQavYIhaDYgCF4Epi06BOyI0B4JrgUgDiUKMZ45FhTmDRx6Bbg X-IronPort-AV: E=Sophos;i="4.93,793,1378872000"; d="scan'208";a="74569951" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 28 Nov 2013 18:50:20 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id AFBE2B3EB3; Thu, 28 Nov 2013 18:50:20 -0500 (EST) Date: Thu, 28 Nov 2013 18:50:20 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <1355437347.22943634.1385682620696.JavaMail.root@uoguelph.ca> In-Reply-To: <20131128191851.GM59496@kib.kiev.ua> Subject: Re: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: Kirk McKusick , FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2013 23:50:23 -0000 Kostik wrote: > On Thu, Nov 28, 2013 at 09:15:37AM -0500, Rick Macklem wrote: > > Kostik wrote: > > > On Wed, Nov 27, 2013 at 07:19:47PM -0500, Rick Macklem wrote: > > > > Kirk wrote: > > > > > > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST) > > > > > > From: Rick Macklem > > > > > > To: Konstantin Belousov > > > > > > Subject: Re: RFC: NFS client patch to reduce sychronous > > > > > > writes > > > > > > > > > > > > Kostik wrote: > > > > > >> Sorry, I do not understand the question. mmap(2) itself > > > > > >> does > > > > > >> not > > > > > >> change > > > > > >> file size. But if mmaped area includes the last page, I > > > > > >> still > > > > > >> think > > > > > >> that the situation I described before is possible. > > > > > > > > > > > > Yes, I'll need to look at this. If it is a problem, all I > > > > > > can > > > > > > think > > > > > > of > > > > > > is bzeroing all new pages when they're allocated to the > > > > > > buffer > > > > > > cache. > > > > > > > > > > > > Thanks for looking at it, rick > > > > > > ps: Btw, jhb@'s patch didn't have the bzeroing in it. > > > > > > > > > > The ``fix'' of bzero'ing every buffer cache page was made to > > > > > UFS/FFS > > > > > for this problem and it killed write performance of the > > > > > filesystem > > > > > by nearly half. We corrected this by only doing the bzero > > > > > when > > > > > the > > > > > file is mmap'ed which helped things considerably (since most > > > > > files > > > > > being written are not also bmap'ed). > > > > > > > > > > Kirk > > > > > > > > > Ok, thanks. I've been trying to reproduce the problem over NFS > > > > and > > > > haven't been able to break my patch. I was using the attached > > > > trivial > > > > test program and would simply make a copy of the source file > > > > (529 > > > > bytes) > > > > to test on. I got the same results both locally and over NFS: > > > > - built without -DWRITEIT, the setting of a value after EOF > > > > would > > > > be > > > > lost, because nothing grew the file from 529 bytes to over > > > > 4080bytes. > > > > - built with -DWRITEIT, both the 'A' and 'B' are in the result, > > > > since > > > > my patch bzeros the grown segment in the write(2) syscall. > > > > > > > > - If I move the write (code in #ifdef WRITEIT) to after the > > > > "*cp" > > > > of the mapped page, the 'A' assigned to "*cp" gets lost for > > > > both UFS and NFS. > > > > Is this correct behaviour? > > > > > > > > If it is correct behaviour, I can't see how the patch is > > > > broken, > > > > but > > > > if you think it might still be, I'll look at doing what Kirk > > > > suggests, > > > > which is bzeroing all new buffer cache pages when the file is > > > > mmap()d. > > > > > > > Replying there, since text description is more informative than > > > the > > > code. > > > > > > You cannot get the situation I described, with single process. > > > You should have a writer in one thread, and reader through the > > > mmaped > > > area in another. Even than, the race window is thin. > > > > > > Let me describe the issue which could exist one more time: > > > > > > Thread A (writer) issued write(2). The kernel does two things: > > > 1. zeroes part of the last buffer of the affected file. > > > 2. kernel uiomove()s the write data into the buffer b_data. > > > > > > Now, assume that thread B has the same file mmaped somewhere, and > > > accesses the page of the buffer after the [1] but before [2]. > > > Than, > > > it would see zeroes instead of the valid content. > > > > > > I said that this breaks write/mmap consistency, since thread B > > > can > > > see a content in the file which was never written there. The > > > condition > > > is transient, it self-repairs after thread A passes point 2. > > > > > Ok, but doesn't that exist now? > > > > Without the patch, when the Thread A (writer) appends to the file, > > the np->n_size is grown and vnode_pager_setsize() is called with > > the new, larger size, followed by an allocbuf(). > > { at around line# 1066,1073 of nfs_clbio.c. Same code exists in the > > old client and was cloned. } > > Then vn_io_fault_uiomove() is called at line#1195. > > > > Without the patch, Thread B would get whatever garbage is in the > > page(s) if it accesses the latter (just grown) part of the buffer > > between > > 1 and 2, would it not? > > > > With the patch, it will most likely get 0s instead of garbage, if > > it accesses the latter (just grown) part of the buffer between 1. > > and 2. > > There is actually a short time just after the vnode_pager_setsize() > > and before the bzeroing, that it would still get garbage. > > (Note that the patch only zeros out the part of the last page that > > was "grown" by the write in 1. that is past the previous EOF. > > If I bzeroing data that was before the previous EOF, I can see > > the breakage, but the patch doesn't do that. Unless I've written > > buggy code, of course, but I think it's correct.) > > > > So, I don't see any difference between the unpatched and patched > > version? > AFAIU, you bzero the whole region in the buffer which is subject to > i/o, > not the new extent. E.g., if the file size mod biosize is non-zero, > but the current write starts on the buffer boundary, than then > range from the buffer start to the old EOF is zeroed for some moment, > before being overwritten by user data. > Why does the fact the write starts on the buffer boundary affect this? obcount is == file size mod biosize and that is where the bzero'ng starts. (Now, it is true that if the current write starts on the buffer boundary, then bzero'ng isn't necessary, but that doesn't make the bzero start at the buffer boundary and not the old EOF. The patch I'm currently testing wouldn't do a bzero at all for the case where "on" (the offset within the buffer isn't after the old EOF), which will be true when the write starts at the buffer boundary, or "on" == 0 if you prefer. But this is an optimization and doesn't change where the bzero starts.) Ok, here's the code snippet with some annotations: if ((uio->uio_offset == np->n_size || (noncontig_write != 0 && lbn == (np->n_size / biosize) && uio->uio_offset + n > np->n_size)) && n) { *** In this code block we know: lbn == np->n_size / biosize (for the case of uio_offset == n_size, it must be) lbn is the last buffer cache block for the file the size of the file is growing (ie. n_size will get larger) mtx_unlock(&np->n_mtx); /* * Get the buffer (in its pre-append state to maintain * B_CACHE if it was previously set). Resize the * nfsnode after we have locked the buffer to prevent * readers from reading garbage. */ obcount = np->n_size - (lbn * biosize); *** This sets obcount to the byte offset within the buffer cache block of EOF before it is grown by this write (or file size mod biosize, if you prefer) bp = nfs_getcacheblk(vp, lbn, obcount, td); if (bp != NULL) { long save; mtx_lock(&np->n_mtx); np->n_size = uio->uio_offset + n; np->n_flag |= NMODIFIED; vnode_pager_setsize(vp, np->n_size); *** n_size is now grown to the new EOF for after the write. mtx_unlock(&np->n_mtx); save = bp->b_flags & B_CACHE; bcount = on + n; allocbuf(bp, bcount); bp->b_flags |= save; if (noncontig_write != 0 && bcount > obcount) vfs_bio_bzero_buf(bp, obcount, bcount - obcount); *** This zeros bytes from "obcount" (the offset of the old EOF) to "bcount" (which is the offset of EOF after the write). Sorry, but I can't see why this would bzero from the start of this write, since this write starts at "on" calculated from uio_offset, whereas "obcount" is calculated from the pre-write value of n_size. Or am I just brain farting when I look at this code? --> For the patch I am now testing, I have changed the above 3 lines to: if (noncontig_write != 0 && on > obcount) vfs_bio_bzero_buf(bp, obcount, on - obcount); I realized that using "bcount" (the new EOF offset) was overkill (although I believe harmless) since the bytes from offset "on" to offset "bcount" will be written soon. By using "on" (the start of the this write) as the end of the bzero'ng range, I minimize the bytes being bzero'd and avoid the call completely when the new write isn't leaving a gap after the old EOF, such as the case you mentioned. rick From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 06:00:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5486E97A for ; Fri, 29 Nov 2013 06:00:40 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 113C21F87 for ; Fri, 29 Nov 2013 06:00:40 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id rAT60aff046648; Thu, 28 Nov 2013 22:00:36 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201311290600.rAT60aff046648@chez.mckusick.com> To: Konstantin Belousov Subject: Re: RFC: NFS client patch to reduce sychronous writes In-reply-to: <20131128071821.GH59496@kib.kiev.ua> Date: Thu, 28 Nov 2013 22:00:36 -0800 From: Kirk McKusick Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 06:00:40 -0000 > Date: Thu, 28 Nov 2013 09:18:21 +0200 > From: Konstantin Belousov > To: Kirk McKusick > Cc: Rick Macklem , FreeBSD FS > Subject: Re: RFC: NFS client patch to reduce sychronous writes > > On Wed, Nov 27, 2013 at 03:20:14PM -0800, Kirk McKusick wrote: >> The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS >> for this problem and it killed write performance of the filesystem >> by nearly half. We corrected this by only doing the bzero when the >> file is mmap'ed which helped things considerably (since most files >> being written are not also bmap'ed). > > I am not sure that I follow. > > For UFS, leaving any part of the buffer with undefined garbage would > cause the garbage to appear on the next mmap(2), since page in is > implemented as translation of the file offsets into disk offsets and > than reading disk blocks. The read always fetch full page. UFS cannot > know if the file would be mapped sometime in future, or after the > reboot. > > In fact, UFS is quite plentiful WRT zeroing buffers on write. It is easy > to see almost all places where it is done, by searching for BA_CLRBUF > flag for UFS_BALLOC(). UFS does perform the optimization of _trying_ to > not clear newly allocated buffer on write if uio covers the whole buffer > range. Still, on error it falls back to clearing, which is performed by > vfs_bio_clrbuf() call in ffs_write(). You are entirely correct in your analysis. The original "fix" was to always clear every buffer even when it was being completely filled (which is the most common case). I changed the filling completely case to first try the copyin and only zeroing it when the copyin fails. Making that change nearly doubled the the speed of bulk writes. ~Kirk From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 07:55:25 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 05F2346E for ; Fri, 29 Nov 2013 07:55:25 +0000 (UTC) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CB17D14BA for ; Fri, 29 Nov 2013 07:55:24 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1VmIv5-0006ii-E7 for freebsd-fs@freebsd.org; Thu, 28 Nov 2013 23:55:23 -0800 Date: Thu, 28 Nov 2013 23:55:23 -0800 (PST) From: Beeblebrox To: freebsd-fs@freebsd.org Message-ID: <1385711723411-5864713.post@n5.nabble.com> In-Reply-To: References: <1380880223590-5848720.post@n5.nabble.com> <524EEE40.5060208@gmail.com> <1381060830753-5849397.post@n5.nabble.com> <20131006123350.GJ3287@sludge.elizium.za.net> <1381683172518-5851539.post@n5.nabble.com> Subject: Re: Questions re swap-on-zfs MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 07:55:25 -0000 @Ronald Klop: I decided to give gconcat a go (since the coder is Pawel Dawidek from the ZFS side). Creating the concat device and mounting as swap etc is very easy: # gconcat label -v swap /dev/zvol/tank0/swap /dev/ada0p1 However, I can not find any reference to setting priority or prefer parameters on the devices being used by /dev/concat/swap. Is this possible, or did you mean create 2 concat devices, then set pri for each in /etc/fstab? Regards. ----- FreeBSD-11-current_amd64_root-on-zfs_RadeonKMS -- View this message in context: http://freebsd.1045724.n5.nabble.com/Questions-re-swap-on-zfs-tp5848720p5864713.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 11:06:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 500ACFA0 for ; Fri, 29 Nov 2013 11:06:46 +0000 (UTC) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 33F561EA9 for ; Fri, 29 Nov 2013 11:06:45 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1VmLuG-00056L-5Q for freebsd-fs@freebsd.org; Fri, 29 Nov 2013 03:06:44 -0800 Date: Fri, 29 Nov 2013 03:06:44 -0800 (PST) From: Beeblebrox To: freebsd-fs@freebsd.org Message-ID: <1385723204127-5864731.post@n5.nabble.com> Subject: ZFS Trim MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 11:06:46 -0000 Hello. I just recently enabled trim for my SSD + zpool: /boot/loader.conf => vfs.zfs.trim_disable=0 The SSD hosts root zpool, plus ZIL of second pool hosted on spindle HDD. Previously I was using the configuration without trim code in loader.conf. sysctl kstat.zfs.misc.zio_trim shows: kstat.zfs.misc.zio_trim.unsupported: 106 kstat.zfs.misc.zio_trim.failed: 0 2 questions: * What does the *trim.unsupported* mean for my setup * Do I need to run code to fix this (like zpool clean zil) in single-user mode? Thank you. ----- FreeBSD-11-current_amd64_root-on-zfs_RadeonKMS -- View this message in context: http://freebsd.1045724.n5.nabble.com/ZFS-Trim-tp5864731.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 12:40:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8E2FBCE5 for ; Fri, 29 Nov 2013 12:40:50 +0000 (UTC) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2BA05146A for ; Fri, 29 Nov 2013 12:40:49 +0000 (UTC) Received: from r2d2 ([82.69.179.241]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50006870922.msg for ; Fri, 29 Nov 2013 12:40:47 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 29 Nov 2013 12:40:47 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.179.241 X-Return-Path: prvs=10451ca0e0=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <67B55680E8B7491D9ED6F36ED9E28A49@multiplay.co.uk> From: "Steven Hartland" To: "Beeblebrox" , References: <1385723204127-5864731.post@n5.nabble.com> Subject: Re: ZFS Trim Date: Fri, 29 Nov 2013 12:40:40 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 12:40:50 -0000 Thats an old sysctl so if your using a recent version you'll want vfs.zfs.trim.enabled = 0 Unsupported means the device reported the "trim" request was rejected by the underlying device, this will typically happen for HD's. Regard Steve ----- Original Message ----- From: "Beeblebrox" To: Sent: Friday, November 29, 2013 11:06 AM Subject: ZFS Trim > Hello. > I just recently enabled trim for my SSD + zpool: > /boot/loader.conf => vfs.zfs.trim_disable=0 > The SSD hosts root zpool, plus ZIL of second pool hosted on spindle HDD. > Previously I was using the configuration without trim code in loader.conf. > > sysctl kstat.zfs.misc.zio_trim shows: > kstat.zfs.misc.zio_trim.unsupported: 106 > kstat.zfs.misc.zio_trim.failed: 0 > > 2 questions: > * What does the *trim.unsupported* mean for my setup > * Do I need to run code to fix this (like zpool clean zil) in single-user > mode? > > Thank you. > > > > ----- > FreeBSD-11-current_amd64_root-on-zfs_RadeonKMS > -- > View this message in context: http://freebsd.1045724.n5.nabble.com/ZFS-Trim-tp5864731.html > Sent from the freebsd-fs mailing list archive at Nabble.com. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 19:32:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 423412F8 for ; Fri, 29 Nov 2013 19:32:59 +0000 (UTC) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 234BF1B47 for ; Fri, 29 Nov 2013 19:32:59 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1VmTo9-0000F5-Hy for freebsd-fs@freebsd.org; Fri, 29 Nov 2013 11:32:57 -0800 Date: Fri, 29 Nov 2013 11:32:57 -0800 (PST) From: Beeblebrox To: freebsd-fs@freebsd.org Message-ID: <1385753577419-5864830.post@n5.nabble.com> In-Reply-To: <67B55680E8B7491D9ED6F36ED9E28A49@multiplay.co.uk> References: <1385723204127-5864731.post@n5.nabble.com> <67B55680E8B7491D9ED6F36ED9E28A49@multiplay.co.uk> Subject: Re: ZFS Trim MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 19:32:59 -0000 >> Unsupported means the device reported the "trim" request was rejected >> by the underlying device, this will typically happen for HD's. So no "reset" of the SSD partitions is needed I assume. ----- FreeBSD-11-current_amd64_root-on-zfs_RadeonKMS -- View this message in context: http://freebsd.1045724.n5.nabble.com/ZFS-Trim-tp5864731p5864830.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 20:30:08 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B8F81548 for ; Fri, 29 Nov 2013 20:30:08 +0000 (UTC) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 54DD11DE6 for ; Fri, 29 Nov 2013 20:30:08 +0000 (UTC) Received: from r2d2 ([82.69.179.241]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50006875033.msg for ; Fri, 29 Nov 2013 20:29:58 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 29 Nov 2013 20:29:58 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.179.241 X-Return-Path: prvs=10451ca0e0=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <09C05F6302494B61A52C1C5065C719CE@multiplay.co.uk> From: "Steven Hartland" To: "Beeblebrox" , References: <1385723204127-5864731.post@n5.nabble.com> <67B55680E8B7491D9ED6F36ED9E28A49@multiplay.co.uk> <1385753577419-5864830.post@n5.nabble.com> Subject: Re: ZFS Trim Date: Fri, 29 Nov 2013 20:29:54 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 20:30:08 -0000 ----- Original Message ----- From: "Beeblebrox" To: Sent: Friday, November 29, 2013 7:32 PM Subject: Re: ZFS Trim >>> Unsupported means the device reported the "trim" request was rejected >>> by the underlying device, this will typically happen for HD's. > > So no "reset" of the SSD partitions is needed I assume. Sorry I don't know what you mean by that, could you expand on that? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 21:00:45 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 35F708EE for ; Fri, 29 Nov 2013 21:00:45 +0000 (UTC) Received: from mail-oa0-x22f.google.com (mail-oa0-x22f.google.com [IPv6:2607:f8b0:4003:c02::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id ED5731F67 for ; Fri, 29 Nov 2013 21:00:44 +0000 (UTC) Received: by mail-oa0-f47.google.com with SMTP id k1so10799626oag.20 for ; Fri, 29 Nov 2013 13:00:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=GJxs7JUM2ePejIjs8LC+8WVkZ+QuXb7toZQmm/kkMqg=; b=Vb3fjeU8vlTunaEiEIvKMYZigoDzgvp4s1bTB1yJ9PB40aeiBhkUts4KADyPS/a1D0 ZgW5PZfoIfE0NF4OP5cXT8/lr8LVkDQuBU2vL1OLTcTZS7CakOI7ZQ+MhvejoQLPBuLY iX3J9u5ER2D274G51khlE5ij6InNJUSa8Q3UmfOF/RfkTLfB1YkyBjEwT1X3EY7kkZXR X7LTH4piaWU4cj2avo6mepk1d1vxmQwB9XfaDhJmDyKxtzpVeJpA3yDkVJDHOant2nVn 46V7qcDMYI7ccUv0S+EwYmZySNrTVEiVU/eC5VkYbd0TNesgdwEFvWXkc2aPRWHKys9L 6usA== MIME-Version: 1.0 X-Received: by 10.182.70.5 with SMTP id i5mr44456273obu.8.1385758843013; Fri, 29 Nov 2013 13:00:43 -0800 (PST) Received: by 10.76.132.9 with HTTP; Fri, 29 Nov 2013 13:00:42 -0800 (PST) In-Reply-To: <1385711723411-5864713.post@n5.nabble.com> References: <1380880223590-5848720.post@n5.nabble.com> <524EEE40.5060208@gmail.com> <1381060830753-5849397.post@n5.nabble.com> <20131006123350.GJ3287@sludge.elizium.za.net> <1381683172518-5851539.post@n5.nabble.com> <1385711723411-5864713.post@n5.nabble.com> Date: Fri, 29 Nov 2013 13:00:42 -0800 Message-ID: Subject: Re: Questions re swap-on-zfs From: Freddie Cash To: Beeblebrox Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 21:00:45 -0000 On Thu, Nov 28, 2013 at 11:55 PM, Beeblebrox wrote: > @Ronald Klop: > I decided to give gconcat a go (since the coder is Pawel Dawidek from the > ZFS side). > Creating the concat device and mounting as swap etc is very easy: > # gconcat label -v swap /dev/zvol/tank0/swap /dev/ada0p1 > > However, I can not find any reference to setting priority or prefer > parameters on the devices being used by /dev/concat/swap. =E2=80=8Bgconcat works by filling the first device, and then filling the ne= xt device, and then filling the next device, and so on. Thus, the order the devices are listed in the gconcat command *is* the priority. The first device listed is filled first; then the next device listed; etc.=E2=80=8B --=20 Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 21:40:03 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3E833B4A for ; Fri, 29 Nov 2013 21:40:03 +0000 (UTC) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1A87A10E8 for ; Fri, 29 Nov 2013 21:40:02 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1VmVn2-0000fM-E5 for freebsd-fs@freebsd.org; Fri, 29 Nov 2013 13:39:56 -0800 Date: Fri, 29 Nov 2013 13:39:56 -0800 (PST) From: Beeblebrox To: freebsd-fs@freebsd.org Message-ID: <1385761196424-5864850.post@n5.nabble.com> In-Reply-To: References: <1380880223590-5848720.post@n5.nabble.com> <524EEE40.5060208@gmail.com> <1381060830753-5849397.post@n5.nabble.com> <20131006123350.GJ3287@sludge.elizium.za.net> <1381683172518-5851539.post@n5.nabble.com> <1385711723411-5864713.post@n5.nabble.com> Subject: Re: Questions re swap-on-zfs MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 21:40:03 -0000 Thank you. Just to be nit-picking over the details, * A zpool dataset hosting swap may lock-up. * As work-around, we want to enable SSD-based 1G partition as fall-back swap (only last resort) * My system DOES suffer from lockups (when poudriere is compiling), but the spindle-HDD hosted zpool/swap is not why (swap space usage barely maxes at 15%). The lockup is probably due to CPU (unlocked core). * Does a /dev/gconcat device, (zpool first, SSD partition as number 2) function as designed (and I mean does the SSD partition work as fallback), if and should the primary swap on zpool/swap have a lockup for some reason. Difficult to call IMHO. Thanks again. ----- FreeBSD-11-current_amd64_root-on-zfs_RadeonKMS -- View this message in context: http://freebsd.1045724.n5.nabble.com/Questions-re-swap-on-zfs-tp5848720p5864850.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 21:49:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 002DFD75 for ; Fri, 29 Nov 2013 21:49:05 +0000 (UTC) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D3A1F114B for ; Fri, 29 Nov 2013 21:49:05 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1VmVvs-00017R-Pl for freebsd-fs@freebsd.org; Fri, 29 Nov 2013 13:49:04 -0800 Date: Fri, 29 Nov 2013 13:49:04 -0800 (PST) From: Beeblebrox To: freebsd-fs@freebsd.org Message-ID: <1385761744784-5864851.post@n5.nabble.com> In-Reply-To: <09C05F6302494B61A52C1C5065C719CE@multiplay.co.uk> References: <1385723204127-5864731.post@n5.nabble.com> <67B55680E8B7491D9ED6F36ED9E28A49@multiplay.co.uk> <1385753577419-5864830.post@n5.nabble.com> <09C05F6302494B61A52C1C5065C719CE@multiplay.co.uk> Subject: Re: ZFS Trim MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 21:49:06 -0000 Code in /boot/loader.conf needs to be: vfs.zfs.trim.enabled = *1* "0" disables trim. Very dumb question on my part, and windows related probably but, when searching the issue answers like this come up: https://sort.symantec.com/public/documents/sfha/6.0.1/linux/productguides/html/virtualstore_admin/ch29s05.htm ZFS is probably immune to this. My question is in context of "trim.unsupported": kstat.zfs.misc.zio_trim.bytes: 16388096 kstat.zfs.misc.zio_trim.success: 1668 kstat.zfs.misc.zio_trim.unsupported: 197 kstat.zfs.misc.zio_trim.failed: 0 ----- FreeBSD-11-current_amd64_root-on-zfs_RadeonKMS -- View this message in context: http://freebsd.1045724.n5.nabble.com/ZFS-Trim-tp5864731p5864851.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 22:26:19 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 277DE6F2 for ; Fri, 29 Nov 2013 22:26:19 +0000 (UTC) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B434C12C8 for ; Fri, 29 Nov 2013 22:26:18 +0000 (UTC) Received: from r2d2 ([82.69.179.241]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50006876036.msg for ; Fri, 29 Nov 2013 22:26:16 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 29 Nov 2013 22:26:16 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.179.241 X-Return-Path: prvs=10451ca0e0=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <5C60466ABFE5442690405143D7AA4695@multiplay.co.uk> From: "Steven Hartland" To: "Beeblebrox" , References: <1385723204127-5864731.post@n5.nabble.com> <67B55680E8B7491D9ED6F36ED9E28A49@multiplay.co.uk> <1385753577419-5864830.post@n5.nabble.com> <09C05F6302494B61A52C1C5065C719CE@multiplay.co.uk> <1385761744784-5864851.post@n5.nabble.com> Subject: Re: ZFS Trim Date: Fri, 29 Nov 2013 22:26:13 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 22:26:19 -0000 ----- Original Message ----- From: "Beeblebrox" > Code in /boot/loader.conf needs to be: > vfs.zfs.trim.enabled = *1* > "0" disables trim. You don't need to enable TRIM as thats default. > Very dumb question on my part, and windows related probably but, when > searching the issue answers like this come up: > https://sort.symantec.com/public/documents/sfha/6.0.1/linux/productguides/html/virtualstore_admin/ch29s05.htm > ZFS is probably immune to this. My question is in context of > "trim.unsupported": > > kstat.zfs.misc.zio_trim.bytes: 16388096 > kstat.zfs.misc.zio_trim.success: 1668 > kstat.zfs.misc.zio_trim.unsupported: 197 > kstat.zfs.misc.zio_trim.failed: 0 The unsupported will be due a zpool in your machine which is likely created on a HDD hence doesn't support TRIM. If you enable a TRIM after writing then deleting data from the pool then you can fill the disk with random data and then delete that data to ensure all areas previously containing data is TRIM'ed. If this wasn't a large amount then I wouldnt bother. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 23:50:07 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6E8242A7; Fri, 29 Nov 2013 23:50:07 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id EE6D716FC; Fri, 29 Nov 2013 23:50:06 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAL4nmVKDaFve/2dsb2JhbABRCBaDKVOCerVfgTR0giwjBFIbGgINGQJZBhGIAw2vR5AJDAuBKY0JIgEzB4JrgUgDiUKQApBjg0ceBIFq X-IronPort-AV: E=Sophos;i="4.93,799,1378872000"; d="scan'208";a="74996596" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 29 Nov 2013 18:50:00 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 10FFAB4060; Fri, 29 Nov 2013 18:50:00 -0500 (EST) Date: Fri, 29 Nov 2013 18:50:00 -0500 (EST) From: Rick Macklem To: FreeBSD FS Message-ID: <5797959.23550239.1385769000059.JavaMail.root@uoguelph.ca> In-Reply-To: <20131128191851.GM59496@kib.kiev.ua> Subject: Re: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: Kostik Belousov X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 23:50:07 -0000 Ok, I've put an updated patch here: http://people.freebsd.org/~rmacklem/noncontig-write.patch It now enables the change which allows non-contiguous byte range writes within the same buffer cache block when a mount option I called "noncontigwr" is given to a mount point. (It still disables this if there has been a file lock applied to the file since it was opened by the process(es) that currently have the file open and/or mmap()'d.) If anyone has a suggestion for a better name for the option, please suggest it. I didn't do a count for when the non-contiguous writes happen, since I suspect it will happen on most any mount point that uses the option (and it seemed to be too NFS specific and obscure for "mount -v" imho). So, does this sound reasonable to commit to head? rick --- here is the patch, in case you want to look at it --- --- fs/nfsclient/nfs_clbio.c.orig 2013-08-28 18:45:41.000000000 -0400 +++ fs/nfsclient/nfs_clbio.c 2013-11-28 19:59:30.000000000 -0500 @@ -874,7 +874,7 @@ ncl_write(struct vop_write_args *ap) struct vattr vattr; struct nfsmount *nmp = VFSTONFS(vp->v_mount); daddr_t lbn; - int bcount; + int bcount, noncontig_write, obcount; int bp_cached, n, on, error = 0, error1; size_t orig_resid, local_resid; off_t orig_size, tmp_off; @@ -1037,7 +1037,15 @@ again: * unaligned buffer size. */ mtx_lock(&np->n_mtx); - if (uio->uio_offset == np->n_size && n) { + if ((np->n_flag & NHASBEENLOCKED) == 0 && + (nmp->nm_flag & NFSMNT_NONCONTIGWR) != 0) + noncontig_write = 1; + else + noncontig_write = 0; + if ((uio->uio_offset == np->n_size || + (noncontig_write != 0 && + lbn == (np->n_size / biosize) && + uio->uio_offset + n > np->n_size)) && n) { mtx_unlock(&np->n_mtx); /* * Get the buffer (in its pre-append state to maintain @@ -1045,8 +1053,8 @@ again: * nfsnode after we have locked the buffer to prevent * readers from reading garbage. */ - bcount = on; - bp = nfs_getcacheblk(vp, lbn, bcount, td); + obcount = np->n_size - (lbn * biosize); + bp = nfs_getcacheblk(vp, lbn, obcount, td); if (bp != NULL) { long save; @@ -1058,9 +1066,12 @@ again: mtx_unlock(&np->n_mtx); save = bp->b_flags & B_CACHE; - bcount += n; + bcount = on + n; allocbuf(bp, bcount); bp->b_flags |= save; + if (noncontig_write != 0 && on > obcount) + vfs_bio_bzero_buf(bp, obcount, on - + obcount); } } else { /* @@ -1159,19 +1170,23 @@ again: * area, just update the b_dirtyoff and b_dirtyend, * otherwise force a write rpc of the old dirty area. * + * If there has been a file lock applied to this file + * or vfs.nfs.old_noncontig_writing is set, do the following: * While it is possible to merge discontiguous writes due to * our having a B_CACHE buffer ( and thus valid read data * for the hole), we don't because it could lead to * significant cache coherency problems with multiple clients, * especially if locking is implemented later on. * - * As an optimization we could theoretically maintain - * a linked list of discontinuous areas, but we would still - * have to commit them separately so there isn't much - * advantage to it except perhaps a bit of asynchronization. + * If vfs.nfs.old_noncontig_writing is not set and there has + * not been file locking done on this file: + * Relax coherency a bit for the sake of performance and + * expand the current dirty region to contain the new + * write even if it means we mark some non-dirty data as + * dirty. */ - if (bp->b_dirtyend > 0 && + if (noncontig_write == 0 && bp->b_dirtyend > 0 && (on > bp->b_dirtyend || (on + n) < bp->b_dirtyoff)) { if (bwrite(bp) == EINTR) { error = EINTR; --- fs/nfsclient/nfsnode.h.orig 2013-11-19 18:17:37.000000000 -0500 +++ fs/nfsclient/nfsnode.h 2013-11-25 21:29:58.000000000 -0500 @@ -157,6 +157,7 @@ struct nfsnode { #define NLOCKWANT 0x00010000 /* Want the sleep lock */ #define NNOLAYOUT 0x00020000 /* Can't get a layout for this file */ #define NWRITEOPENED 0x00040000 /* Has been opened for writing */ +#define NHASBEENLOCKED 0x00080000 /* Has been file locked. */ /* * Convert between nfsnode pointers and vnode pointers --- fs/nfsclient/nfs_clvnops.c.orig 2013-11-19 18:19:42.000000000 -0500 +++ fs/nfsclient/nfs_clvnops.c 2013-11-25 21:32:47.000000000 -0500 @@ -3079,6 +3079,10 @@ nfs_advlock(struct vop_advlock_args *ap) np->n_change = va.va_filerev; } } + /* Mark that a file lock has been acquired. */ + mtx_lock(&np->n_mtx); + np->n_flag |= NHASBEENLOCKED; + mtx_unlock(&np->n_mtx); } NFSVOPUNLOCK(vp, 0); return (0); @@ -3098,6 +3102,12 @@ nfs_advlock(struct vop_advlock_args *ap) error = ENOLCK; } } + if (error == 0 && ap->a_op == F_SETLK) { + /* Mark that a file lock has been acquired. */ + mtx_lock(&np->n_mtx); + np->n_flag |= NHASBEENLOCKED; + mtx_unlock(&np->n_mtx); + } } return (error); } --- fs/nfsclient/nfs_clvfsops.c.orig 2013-11-28 20:00:58.000000000 -0500 +++ fs/nfsclient/nfs_clvfsops.c 2013-11-28 20:06:32.000000000 -0500 @@ -719,7 +719,8 @@ static const char *nfs_opts[] = { "from" "retrans", "acregmin", "acregmax", "acdirmin", "acdirmax", "resvport", "readahead", "hostname", "timeout", "addr", "fh", "nfsv3", "sec", "principal", "nfsv4", "gssname", "allgssname", "dirpath", "minorversion", - "nametimeo", "negnametimeo", "nocto", "pnfs", "wcommitsize", + "nametimeo", "negnametimeo", "nocto", "noncontigwr", "pnfs", + "wcommitsize", NULL }; /* @@ -840,6 +841,8 @@ nfs_mount(struct mount *mp) args.flags |= NFSMNT_ALLGSSNAME; if (vfs_getopt(mp->mnt_optnew, "nocto", NULL, NULL) == 0) args.flags |= NFSMNT_NOCTO; + if (vfs_getopt(mp->mnt_optnew, "noncontigwr", NULL, NULL) == 0) + args.flags |= NFSMNT_NONCONTIGWR; if (vfs_getopt(mp->mnt_optnew, "pnfs", NULL, NULL) == 0) args.flags |= NFSMNT_PNFS; if (vfs_getopt(mp->mnt_optnew, "readdirsize", (void **)&opt, NULL) == 0) { @@ -1792,6 +1795,8 @@ void nfscl_retopts(struct nfsmount *nmp, &blen); nfscl_printopt(nmp, (nmp->nm_flag & NFSMNT_NOCTO) != 0, ",nocto", &buf, &blen); + nfscl_printopt(nmp, (nmp->nm_flag & NFSMNT_NONCONTIGWR) != 0, + ",noncontigwr", &buf, &blen); nfscl_printopt(nmp, (nmp->nm_flag & (NFSMNT_NOLOCKD | NFSMNT_NFSV4)) == 0, ",lockd", &buf, &blen); nfscl_printopt(nmp, (nmp->nm_flag & (NFSMNT_NOLOCKD | NFSMNT_NFSV4)) == --- nfsclient/nfsargs.h.orig 2013-11-28 19:53:56.000000000 -0500 +++ nfsclient/nfsargs.h 2013-11-28 19:56:38.000000000 -0500 @@ -99,5 +99,6 @@ struct nfs_args { #define NFSMNT_STRICT3530 0x10000000 /* Adhere strictly to RFC3530 */ #define NFSMNT_NOCTO 0x20000000 /* Don't flush attrcache on open */ #define NFSMNT_PNFS 0x40000000 /* Enable pNFS support */ +#define NFSMNT_NONCONTIGWR 0x80000000 /* Enable non-contiguous writes */ #endif From owner-freebsd-fs@FreeBSD.ORG Sat Nov 30 13:22:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D5FB5CD7 for ; Sat, 30 Nov 2013 13:22:59 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6F2D41341 for ; Sat, 30 Nov 2013 13:22:44 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAUDMYDv018496; Sat, 30 Nov 2013 15:22:34 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAUDMYDv018496 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAUDMX4p018495; Sat, 30 Nov 2013 15:22:33 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 30 Nov 2013 15:22:33 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: RFC: NFS client patch to reduce sychronous writes Message-ID: <20131130132233.GZ59496@kib.kiev.ua> References: <20131128191851.GM59496@kib.kiev.ua> <1355437347.22943634.1385682620696.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="4oAqxJayxW1dikgl" Content-Disposition: inline In-Reply-To: <1355437347.22943634.1385682620696.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Kirk McKusick , FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Nov 2013 13:22:59 -0000 --4oAqxJayxW1dikgl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 28, 2013 at 06:50:20PM -0500, Rick Macklem wrote: > obcount =3D np->n_size - (lbn * biosize); > *** This sets obcount to the byte offset within the buffer cache block of > EOF before it is grown by this write (or file size mod biosize, if yo= u prefer) > bp =3D nfs_getcacheblk(vp, lbn, obcount, td); >=20 > if (bp !=3D NULL) { > long save; >=20 > mtx_lock(&np->n_mtx); > np->n_size =3D uio->uio_offset + n; > np->n_flag |=3D NMODIFIED; > vnode_pager_setsize(vp, np->n_size); > *** n_size is now grown to the new EOF for after the write. > mtx_unlock(&np->n_mtx); >=20 > save =3D bp->b_flags & B_CACHE; > bcount =3D on + n; > allocbuf(bp, bcount); > bp->b_flags |=3D save; > if (noncontig_write !=3D 0 && bcount > obcount) > vfs_bio_bzero_buf(bp, obcount, bcount - > obcount); > *** This zeros bytes from "obcount" (the offset of the old EOF) to "bcoun= t" > (which is the offset of EOF after the write). I believe that I got it now, the patch in the other message looks fine. Thank you for the patience. --4oAqxJayxW1dikgl Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSmeaYAAoJEJDCuSvBvK1BpGIP/1n2j1K6vE0EWb+xDJR0dJVa cr1yrpgYUjwrPLdoCzU8zyERahaIFd4i97jD6JFF2qksKNJyGBNAdfoei5IktIZf OHKXR8LKpMW5vBxNtMZpwy/UMR7sgFsAVna9NJ9GEYPJT1QMPOGG3mi4YO4nkuiu t2ZAMQCrwdQl/e9ehniot1WduSxb1kzcMUrJ3rPJME1GFLHsvEfY14udGvwxS44S fDPXNTmcYdS2IfDnGO+KB85eKFqclBjQxI+pIFQg6yss5ojFaQQvCdZ/+MeFpuVW GwGogqgoZ/QAfw0gBw9eOUPYJh7HIAraF6vDwq+fksKPLdTLevWzwWVEfhijfXRa 00206vE023Y4j5xZwSUcvK39sJMBSvdOoRojCnCSS0/2WCrVKbA0vLnKLVpA5lOP jUbp3vNSKQJFUyUi0GKzWJJ1Hr4KL423SymIKQdW4+IuvAr0iJoMhTj9qgi02fuQ 4V3PPJhuvEQiJVdM3alrhL+S0pObwSiVc4U2mTAS4g+3POJmy+pwv6gVO2fdCKI9 9lS1JAcrkYWAof3q7DtlhriXkdHf3DtjHVel1H25Wo3iYCntcFcqbnKibOpAHZKt Qg+lT/iZJ24pJpYUlVHEIvae8aWFXhOiMH1sVlZ5fdjJ1hm3CgIBu69lBQyKAmLH +tQtN4LEF7fM144gkok3 =QkIU -----END PGP SIGNATURE----- --4oAqxJayxW1dikgl-- From owner-freebsd-fs@FreeBSD.ORG Sat Nov 30 20:04:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 21B268AF for ; Sat, 30 Nov 2013 20:04:14 +0000 (UTC) Received: from mail-ve0-x236.google.com (mail-ve0-x236.google.com [IPv6:2607:f8b0:400c:c01::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D527C16A6 for ; Sat, 30 Nov 2013 20:04:13 +0000 (UTC) Received: by mail-ve0-f182.google.com with SMTP id jy13so7876171veb.13 for ; Sat, 30 Nov 2013 12:04:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=8qmKwKztar3x6nmdiEe6r9xzGoF24jcnOiUoEa/jFt0=; b=rVgBMVFFRI1jL68DG/QaarR+vNOowXxMKJSXtcnOBJ8rpewKJmqFxGaijZKSptwuhO RbIW76zRStbjLMQG2R7LzU4w6WS/O+dXZWoIBP7lW/+3AzptYN515PMP+KPicvYpYcuh X3QsCWtCbSL0E3G8acaEulXJHBPVUDe+uyjqeizE8ePz/2dkwk3IBvdACRbAE8gFZNLE z8KRNMPxSJGNYhQC0EzOnOyTsdNote1sTTkk2n4r8vdY8klLZqkkXDggqIR2ErFmfZT7 sLoraBXjhNnG2YgV5dg6EqWxIO2aYuI0qPJHM7MunWvbCBXbIlZHwValFByCV7mSLopd OyCA== X-Received: by 10.58.210.66 with SMTP id ms2mr47193509vec.10.1385841852316; Sat, 30 Nov 2013 12:04:12 -0800 (PST) MIME-Version: 1.0 Received: by 10.58.231.167 with HTTP; Sat, 30 Nov 2013 12:03:52 -0800 (PST) In-Reply-To: References: From: Anton Sayetsky Date: Sat, 30 Nov 2013 22:03:52 +0200 Message-ID: Subject: Re: ZFS and Wired memory, again To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Nov 2013 20:04:14 -0000 2013/11/22 Anton Sayetsky : > Hello, > > I'm planning to deploy a ~150 TiB ZFS pool and when playing with ZFS > noticed that amount of wired memory is MUCH bigger than ARC size (in > absence of other hungry memory consumers, of course). I'm afraid that > this strange behavior may become even worse on a machine with big pool > and some hundreds gibibytes of RAM. > > So let me explain what happened. > > Immediately after booting system top says the following: > ===== > Mem: 14M Active, 13M Inact, 117M Wired, 2947M Free > ARC: 24M Total, 5360K MFU, 18M MRU, 16K Anon, 328K Header, 1096K Other > ===== > Ok, wired mem - arc = 92 MiB > > Then I started to read pool (tar cpf /dev/null /). > Memory usage when ARC size is ~1GiB > ===== > Mem: 16M Active, 15M Inact, 1410M Wired, 1649M Free > ARC: 1114M Total, 29M MFU, 972M MRU, 21K Anon, 18M Header, 95M Other > ===== > 1410-1114=296 MiB > > Memory usage when ARC size reaches it's maximum of 2 GiB > ===== > Mem: 16M Active, 16M Inact, 2523M Wired, 536M Free > ARC: 2067M Total, 3255K MFU, 1821M MRU, 35K Anon, 38M Header, 204M Other > ===== > 2523-2067=456 MiB > > Memory usage a few minutes later > ===== > Mem: 10M Active, 27M Inact, 2721M Wired, 333M Free > ARC: 2002M Total, 22M MFU, 1655M MRU, 21K Anon, 36M Header, 289M Other > ===== > 2721-2002=719 MiB > > So why the wired ram on a machine with only minimal amount of services > has grown from 92 to 719 MiB? Sometimes I can even see about a gig! > I'm using 9.2-RELEASE-p1 amd64. Test machine has a T5450 C2D CPU and 4 > G RAM (actual available amount is 3 G). ZFS pool is configured on a > GPT partition of a single 1 TB HDD. > Disabling/enabling prefetch does't helps. Limiting ARC to 1 gig doesn't helps. > When reading a pool, evict skips can increment very fast and sometimes > arc metadata exceeds limit (2x-5x). > > I've attached logs with system configuration, outputs from top, ps, > zfs-stats and vmstat. > conf.log = system configuration, also uploaded to http://pastebin.com/NYBcJPeT > top_ps_zfs-stats_vmstat_afterboot = memory stats immediately after > booting system, http://pastebin.com/mudmEyG5 > top_ps_zfs-stats_vmstat_1g-arc = after ARC grown to 1 gig, > http://pastebin.com/4AC8dn5C > top_ps_zfs-stats_vmstat_fullmem = when ARC reached limit of 2 gigs, > http://pastebin.com/bx7svEP0 > top_ps_zfs-stats_vmstat_fullmem_2 = few minutes later, > http://pastebin.com/qYWFaNeA > > What should I do next? Can anyone help me? From owner-freebsd-fs@FreeBSD.ORG Sat Nov 30 22:57:08 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AFD9EA40 for ; Sat, 30 Nov 2013 22:57:08 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 741001F82 for ; Sat, 30 Nov 2013 22:57:07 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAAVsmlKDaFve/2dsb2JhbABZhBKCerVjgS90giUBAQUjBFIbDgoCAg0ZAlkGiBSvGo9TF4EpjSs0B4JrgUgDiUKgZYNHHoFu X-IronPort-AV: E=Sophos;i="4.93,804,1378872000"; d="scan'208";a="74165800" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 30 Nov 2013 17:57:01 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 25CD7B3F51; Sat, 30 Nov 2013 17:57:01 -0500 (EST) Date: Sat, 30 Nov 2013 17:57:01 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <1288055532.23818811.1385852221143.JavaMail.root@uoguelph.ca> In-Reply-To: <20131130132233.GZ59496@kib.kiev.ua> Subject: Re: RFC: NFS client patch to reduce sychronous writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: Kirk McKusick , FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Nov 2013 22:57:08 -0000 Kostik wrote: > On Thu, Nov 28, 2013 at 06:50:20PM -0500, Rick Macklem wrote: > > obcount = np->n_size - (lbn * biosize); > > *** This sets obcount to the byte offset within the buffer cache > > block of > > EOF before it is grown by this write (or file size mod biosize, > > if you prefer) > > bp = nfs_getcacheblk(vp, lbn, obcount, td); > > > > if (bp != NULL) { > > long save; > > > > mtx_lock(&np->n_mtx); > > np->n_size = uio->uio_offset + n; > > np->n_flag |= NMODIFIED; > > vnode_pager_setsize(vp, np->n_size); > > *** n_size is now grown to the new EOF for after the write. > > mtx_unlock(&np->n_mtx); > > > > save = bp->b_flags & B_CACHE; > > bcount = on + n; > > allocbuf(bp, bcount); > > bp->b_flags |= save; > > if (noncontig_write != 0 && bcount > obcount) > > vfs_bio_bzero_buf(bp, obcount, bcount - > > obcount); > > *** This zeros bytes from "obcount" (the offset of the old EOF) to > > "bcount" > > (which is the offset of EOF after the write). > > I believe that I got it now, the patch in the other message looks > fine. > Thank you for the patience. > Thanks for the review. It got me to look more closely at the patch and replace "bcount" with "on", plus other improvements. Have fun, rick