From owner-freebsd-questions@FreeBSD.ORG  Thu Aug 16 18:11:19 2012
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D1763106566C
	for <freebsd-questions@freebsd.org>;
	Thu, 16 Aug 2012 18:11:19 +0000 (UTC) (envelope-from chad@shire.net)
Received: from mail.shire.net (mail.shire.net [199.102.78.250])
	by mx1.freebsd.org (Postfix) with ESMTP id A8A888FC14
	for <freebsd-questions@freebsd.org>;
	Thu, 16 Aug 2012 18:11:19 +0000 (UTC)
Received: from c-76-27-96-201.hsd1.ut.comcast.net ([76.27.96.201]
	helo=[192.168.99.216])
	by mail.shire.net with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.77)
	(envelope-from <chad@shire.net>)
	id 1T24XJ-000HuV-3X; Thu, 16 Aug 2012 12:11:13 -0600
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: multipart/signed;
	boundary="Apple-Mail=_BB772409-CB4E-4C76-B7DF-A0183396825F";
	protocol="application/pkcs7-signature"; micalg=sha1
From: "Chad Leigh Shire.Net LLC" <chad@shire.net>
In-Reply-To: <8847D91D-169C-43C5-882C-81695210B0B3@pengar.com>
Date: Thu, 16 Aug 2012 12:11:12 -0600
Message-Id: <9B93809A-302E-4DA4-A6B4-6AA2D44E4BD4@shire.net>
References: <8847D91D-169C-43C5-882C-81695210B0B3@pengar.com>
To: FreeBSD Mailing List <freebsd-questions@freebsd.org>
X-Mailer: Apple Mail (2.1278)
X-SA-Exim-Connect-IP: 76.27.96.201
X-SA-Exim-Mail-From: chad@shire.net
X-SA-Exim-Scanned: No (on mail.shire.net); SAEximRunCond expanded to false
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: Chad Leigh <chad@shire.net>
Subject: Re: ZFS stats in "top" -- ZFS performance started being crappy in
	spurts
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Aug 2012 18:11:19 -0000


--Apple-Mail=_BB772409-CB4E-4C76-B7DF-A0183396825F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii


On Aug 11, 2012, at 5:33 PM, Chad Leigh - Pengar LLC wrote:

> Hi
>=20
> I have a FreeBSD 9 system with ZFS root.  It is actually a VM under =
Xen on a beefy piece of HW (4 core Sandy Bridge 3ghz Xeon, total HW =
memory 32GB -- VM has 4vcpus and 6GB RAM).  Mirrored gpart partitions.  =
I am looking for data integrity more than performance as long as =
performance is reasonable (which it has more than been the last 3 =
months).
>=20
> The other "servers" on the same HW, the other VMs on the same, don't =
have this problem but are set up the same way.  There are 4 other =
FreeBSD VMs, one running email for a one man company and a few of his =
friends, as well as some static web pages and stuff for him, one runs a =
few low use web apps for various customers, and one runs about 30 =
websites with apache and nginx, mostly just static sites.  None are =
heavily used.  There is also one VM with linux running a couple low use =
FrontBase databases.   Not high use database -- low use ones.
>=20
> The troubleseome VM  has been running fine for over 3 months since I =
installed it.    Level of use has been pretty much constant.   The =
server runs 4 jails on it, each dedicated to a different bit of email =
processing for a small number of users.   One is a secondary DNS.  One =
runs clamav and spamassassin.  One runs exim for incoming and outgoing =
mail.  One runs dovecot for imap and pop.   There is no web server or =
database or anything else running.
>=20
> Total number of mail users on the system is approximately 50, plus or =
minus.  Total mail traffic is very low compared to "real" mail servers.
>=20
> Earlier this week things started "freezing up".  It might last a few =
minutes, or it might last 1/2 hour.   Processes become unresponsive.  =
This can last a few minutes or much longer.  It eventually resolves =
itself and things are good for another 10 minutes or 3 hours until it =
happens again.  When it happens,  lots of processes are listed in "top" =
as=20
>=20
> zfs
> zio->i
> zfs
> tx->tx
> db->db
>=20
> state.   These processes only get listed in these states when there =
are problems.   What are these states indicative of?
>=20

Ok, after much reading of ZFS blog posts, forum postings, email list =
postings, and trying stuff out, I seem to have gotten stuff back down to =
normal and reasonable performance.

In case anyone has similar issues in a similar circumstance, here is =
what I did.  Some of these may have had little or no effect but this is =
what was changed.

The biggest effect was when I did the following:

vfs.zfs.zfetch.block_cap  from default 256 down to 64

This was like night and day.  The idea to try this from a post by user =
"madtrader" in the forum =
http://forums.sagetv.com/forums/showthread.php?t=3D43830&page=3D2  .  He =
was recording multiple streams of HD video and trying to play HD video =
off a stream from the same server/ZFS file system. =20


Also, setting

vfs.zfs.write_limit_override   to something other than the default =
disabled "0" seems to have had a relatively significant effect.   Before =
I worked with the  "block_cap" above, I was focussing on this and had =
tried everything from 64M to 768M.  It is currently set to 576M and is =
around the area where I was having best results on my system with my =
amount of RAM (6GB).  I tried 512M and had good results and then 768M, =
which was still good but not quite as good as far as I could tell from =
testing.  So I went with 576M on my last attempt and then added in the =
block_cap and things really are pretty much back to normal.


I turned on vdev caching

vfs.zfs.vdev.cache.size   form 0 to 10M.   Don't know if it helped. =20

I also lowered=20

vfs.zfs.txg.timeout   from 5 to 3.   This seems to have had a slightly =
noticeable effect.


I also adjusted


vfs.zfs.arc_max

The default of 0 (meaning system self set) seemed to result in an actual =
value of around 75-80% of RAM, which seemed high.   I ended up setting =
it at 3072M, which for me seems to work well.  Don't know what the =
overall effect on the problem was though.


Thanks
Chad


--Apple-Mail=_BB772409-CB4E-4C76-B7DF-A0183396825F--