From owner-freebsd-questions@FreeBSD.ORG Thu Aug 16 18:11:19 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D1763106566C for ; Thu, 16 Aug 2012 18:11:19 +0000 (UTC) (envelope-from chad@shire.net) Received: from mail.shire.net (mail.shire.net [199.102.78.250]) by mx1.freebsd.org (Postfix) with ESMTP id A8A888FC14 for ; Thu, 16 Aug 2012 18:11:19 +0000 (UTC) Received: from c-76-27-96-201.hsd1.ut.comcast.net ([76.27.96.201] helo=[192.168.99.216]) by mail.shire.net with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.77) (envelope-from ) id 1T24XJ-000HuV-3X; Thu, 16 Aug 2012 12:11:13 -0600 Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: multipart/signed; boundary="Apple-Mail=_BB772409-CB4E-4C76-B7DF-A0183396825F"; protocol="application/pkcs7-signature"; micalg=sha1 From: "Chad Leigh Shire.Net LLC" In-Reply-To: <8847D91D-169C-43C5-882C-81695210B0B3@pengar.com> Date: Thu, 16 Aug 2012 12:11:12 -0600 Message-Id: <9B93809A-302E-4DA4-A6B4-6AA2D44E4BD4@shire.net> References: <8847D91D-169C-43C5-882C-81695210B0B3@pengar.com> To: FreeBSD Mailing List X-Mailer: Apple Mail (2.1278) X-SA-Exim-Connect-IP: 76.27.96.201 X-SA-Exim-Mail-From: chad@shire.net X-SA-Exim-Scanned: No (on mail.shire.net); SAEximRunCond expanded to false X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Chad Leigh Subject: Re: ZFS stats in "top" -- ZFS performance started being crappy in spurts X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Aug 2012 18:11:19 -0000 --Apple-Mail=_BB772409-CB4E-4C76-B7DF-A0183396825F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Aug 11, 2012, at 5:33 PM, Chad Leigh - Pengar LLC wrote: > Hi >=20 > I have a FreeBSD 9 system with ZFS root. It is actually a VM under = Xen on a beefy piece of HW (4 core Sandy Bridge 3ghz Xeon, total HW = memory 32GB -- VM has 4vcpus and 6GB RAM). Mirrored gpart partitions. = I am looking for data integrity more than performance as long as = performance is reasonable (which it has more than been the last 3 = months). >=20 > The other "servers" on the same HW, the other VMs on the same, don't = have this problem but are set up the same way. There are 4 other = FreeBSD VMs, one running email for a one man company and a few of his = friends, as well as some static web pages and stuff for him, one runs a = few low use web apps for various customers, and one runs about 30 = websites with apache and nginx, mostly just static sites. None are = heavily used. There is also one VM with linux running a couple low use = FrontBase databases. Not high use database -- low use ones. >=20 > The troubleseome VM has been running fine for over 3 months since I = installed it. Level of use has been pretty much constant. The = server runs 4 jails on it, each dedicated to a different bit of email = processing for a small number of users. One is a secondary DNS. One = runs clamav and spamassassin. One runs exim for incoming and outgoing = mail. One runs dovecot for imap and pop. There is no web server or = database or anything else running. >=20 > Total number of mail users on the system is approximately 50, plus or = minus. Total mail traffic is very low compared to "real" mail servers. >=20 > Earlier this week things started "freezing up". It might last a few = minutes, or it might last 1/2 hour. Processes become unresponsive. = This can last a few minutes or much longer. It eventually resolves = itself and things are good for another 10 minutes or 3 hours until it = happens again. When it happens, lots of processes are listed in "top" = as=20 >=20 > zfs > zio->i > zfs > tx->tx > db->db >=20 > state. These processes only get listed in these states when there = are problems. What are these states indicative of? >=20 Ok, after much reading of ZFS blog posts, forum postings, email list = postings, and trying stuff out, I seem to have gotten stuff back down to = normal and reasonable performance. In case anyone has similar issues in a similar circumstance, here is = what I did. Some of these may have had little or no effect but this is = what was changed. The biggest effect was when I did the following: vfs.zfs.zfetch.block_cap from default 256 down to 64 This was like night and day. The idea to try this from a post by user = "madtrader" in the forum = http://forums.sagetv.com/forums/showthread.php?t=3D43830&page=3D2 . He = was recording multiple streams of HD video and trying to play HD video = off a stream from the same server/ZFS file system. =20 Also, setting vfs.zfs.write_limit_override to something other than the default = disabled "0" seems to have had a relatively significant effect. Before = I worked with the "block_cap" above, I was focussing on this and had = tried everything from 64M to 768M. It is currently set to 576M and is = around the area where I was having best results on my system with my = amount of RAM (6GB). I tried 512M and had good results and then 768M, = which was still good but not quite as good as far as I could tell from = testing. So I went with 576M on my last attempt and then added in the = block_cap and things really are pretty much back to normal. I turned on vdev caching vfs.zfs.vdev.cache.size form 0 to 10M. Don't know if it helped. =20 I also lowered=20 vfs.zfs.txg.timeout from 5 to 3. This seems to have had a slightly = noticeable effect. I also adjusted vfs.zfs.arc_max The default of 0 (meaning system self set) seemed to result in an actual = value of around 75-80% of RAM, which seemed high. I ended up setting = it at 3072M, which for me seems to work well. Don't know what the = overall effect on the problem was though. Thanks Chad --Apple-Mail=_BB772409-CB4E-4C76-B7DF-A0183396825F--