From owner-freebsd-fs@FreeBSD.ORG Tue Aug 10 21:44:24 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0254F1065673 for ; Tue, 10 Aug 2010 21:44:24 +0000 (UTC) (envelope-from marco@tolstoy.tols.org) Received: from tolstoy.tols.org (tolstoy.tols.org [IPv6:2a02:898:0:20::57:1]) by mx1.freebsd.org (Postfix) with ESMTP id 4DE468FC12 for ; Tue, 10 Aug 2010 21:44:22 +0000 (UTC) Received: from tolstoy.tols.org (localhost [127.0.0.1]) by tolstoy.tols.org (8.14.4/8.14.4) with ESMTP id o7ALiIPP029653 for ; Tue, 10 Aug 2010 21:44:18 GMT (envelope-from marco@tolstoy.tols.org) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.96.1 at tolstoy.tols.org Received: (from marco@localhost) by tolstoy.tols.org (8.14.4/8.14.4/Submit) id o7ALiIdI029652 for freebsd-fs@freebsd.org; Tue, 10 Aug 2010 21:44:18 GMT (envelope-from marco) Date: Tue, 10 Aug 2010 21:44:18 +0000 From: Marco van Tol To: freebsd-fs@freebsd.org Message-ID: <20100810214418.GA28288@tolstoy.tols.org> Mail-Followup-To: freebsd-fs@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on tolstoy.tols.org Subject: zfs arc - just take it all and be good to me X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Aug 2010 21:44:24 -0000 Hi there, What you will find here is a short description of my attempts to find the optimal settings for zfs memory, followed by what I ended up with at this point. Then I started wondering what I was missing, because I think what I'm doing might be plain stupid. Because noone else seems to be doing it this way. I hope it makes a tiny bit of sense. If it doesn't just pretend you didn't read it. ;) All the way at the bottom of the mail you will find some details about the hardware / software. I have been trying, like many others, what would be best practice regarding zfs, kmem_size, arc_max and what not and had trouble finding that reliable magic post that just summed up the magic formula. So I went ahead and mixed a couple of advises I had seen during my search and at some point ended up on a machine with 2 Gigabytes of physical memory with the settings: vm.kmem_size: 3G vfs.zfs.arc_max: 750M The 3G vm.kmem_size on a machine with 2G physical memory comes from something I had read about fragmented kernel memory which would warrant this setting. Now, these settings annoyed me a lot because, it used to be that page cache (and with that I mean active/inactive memory) was auto tuned. That it would (in short) take as much available memory as possible, and just release it when an application needed it. I'm describing it simple here because I am by no means a filesystem and/or kernel expert. At some point I stumbled upon a blog post from Kip Macy, and a reply from Richard Elling at: http://daemonflux.blogspot.com/2010/01/zfs-arc-page-cache-and-1970s-buffer.html Somewhere around this time I started to think that an auto-tuned arc might still be possible given that ZFS releases memory when there is high demand for it. So I googled for "freebsd zfs memory pressure". After reading through a couple of hits I felt like just trying it and ended up with the settings: physical memory: 2G vm.kmem_size: 3G vfs.zfs.arc_max: 1792M Then I setup a couple of terms: - top -I # To monitor active/inactive/wired/free - sysctl kstat.zfs.misc.arcstats.size # (In a loop) To monitor the arc size - # And a term to do some tests while watching those values # test 1 - tar the filesystem to grow the arc from reads -> tar -cf /dev/null / Sure enough the arc grew rapid to some max value. After a little while the values where: 12M Active, 18M Inact, 1789M Wired, 144K Cache, 156M Free kstat.zfs.misc.arcstats.size: 1697754720 Nothing to worry about so far # Test 2 - do a bunch of writes to see if that makes things different -> bash> for i in {1..10} ; do dd if=/dev/zero of=/zfspath/file$i \ bs=1m count=1024 ; done And again the arc would grow and shrink a little bit, leaving the values: 8308K Active, 22M Inact, 1710M Wired, 136K Cache, 235M Free kstat.zfs.misc.arcstats.size: 1596992192 Still nothing to worry about # Test 3 - let an application demand some memory and see what happens. -> perl -e '$x="x"x500_000_000' After perl completed the values were: 5112K Active, 5884K Inact, 932M Wired, 14M Cache, 1019M Free kstat.zfs.misc.arcstats.size: 817991496 No differences in swap usage worth mentioning, I think somewhere around 5 megabytes. Top mentioned pages going into swap very briefly. (Side info: I had run a too large value with the perl step before, which left me with 35MB swap) # Test 4 - All of test 1, 2 and 3 at the same time, for a (probably # lame) attempt at a mixed environment. For test 1 (tar) I excluded # the files where I was writing the files for test 2 (dd). Test 3 ran # in a sleepless loop with the value 1_000_000_000 instead of the 500 # megs I used in the original test 3. Ending values: 21M Active, 7836K Inact, 1672M Wired, 4140K Cache, 272M Free kstat.zfs.misc.arcstats.size: 1570260528 Swap usage didn't change in the running top I was watching. ------ All in all this looks like a close attempt at zfs memory being auto tuned while using maximum amount of memory. The only problem is, nobody else is doing it like this so its very likely that this is not the smart thing to do. What are the first problems the zfs people can think off with a setup like this? Thanks in advance! Marco van Tol ------ Machine Details: zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/tank_s0 ONLINE 0 0 0 gpt/tank_s1 ONLINE 0 0 0 gpt/tank_s2 ONLINE 0 0 0 gpt/tank_s3 ONLINE 0 0 0 errors: No known data errors hw.machine: amd64 hw.model: Intel(R) Atom(TM) CPU 330 @ 1.60GHz hw.ncpu: 2 hw.physmem: 2135396352 hw.pagesize: 4096 vm.kmem_size: 3221225472 vfs.zfs.version.spa: 14 vfs.zfs.version.zpl: 3 vfs.zfs.prefetch_disable: 1 vfs.zfs.zil_disable: 0 vfs.zfs.zio.use_uma: 0 vfs.zfs.vdev.cache.size: 10485760 vfs.zfs.arc_min: 234881024 vfs.zfs.arc_max: 1879048192 gpart show => 34 976773101 ada0 GPT (466G) 34 128 1 freebsd-boot (64K) 162 4194304 2 freebsd-swap (2.0G) 4194466 972578669 3 freebsd-zfs (464G) => 34 976773101 ada1 GPT (466G) 34 128 1 freebsd-boot (64K) 162 4194304 2 freebsd-swap (2.0G) 4194466 972578669 3 freebsd-zfs (464G) => 34 976773101 ada2 GPT (466G) 34 128 1 freebsd-boot (64K) 162 4194304 2 freebsd-swap (2.0G) 4194466 972578669 3 freebsd-zfs (464G) => 34 976773101 ada3 GPT (466G) 34 128 1 freebsd-boot (64K) 162 4194304 2 freebsd-swap (2.0G) 4194466 972578669 3 freebsd-zfs (464G) swap: /dev/gpt/swap0 none swap sw 0 0 /dev/gpt/swap1 none swap sw 0 0 /dev/gpt/swap2 none swap sw 0 0 /dev/gpt/swap3 none swap sw 0 0 zfs list NAME USED AVAIL REFER MOUNTPOINT tank 83.7G 1.24T 28.4K legacy tank/dirvish 21.6G 106G 21.6G /dirvish tank/home 21.4G 1.24T 21.4G /home tank/mm 30.1G 120G 30.1G /mm tank/root 745M 279M 745M legacy tank/tmp 126K 4.00G 126K /tmp tank/usr 2.95G 13.1G 2.95G /usr tank/var 115M 3.89G 115M /var -- A male gynecologist is like an auto mechanic who never owned a car. - Carrie Snow