Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Aug 2012 12:11:12 -0600
From:      "Chad Leigh Shire.Net LLC" <chad@shire.net>
To:        FreeBSD Mailing List <freebsd-questions@freebsd.org>
Cc:        Chad Leigh <chad@shire.net>
Subject:   Re: ZFS stats in "top" -- ZFS performance started being crappy in spurts
Message-ID:  <9B93809A-302E-4DA4-A6B4-6AA2D44E4BD4@shire.net>
In-Reply-To: <8847D91D-169C-43C5-882C-81695210B0B3@pengar.com>
References:  <8847D91D-169C-43C5-882C-81695210B0B3@pengar.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_BB772409-CB4E-4C76-B7DF-A0183396825F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii


On Aug 11, 2012, at 5:33 PM, Chad Leigh - Pengar LLC wrote:

> Hi
>=20
> I have a FreeBSD 9 system with ZFS root.  It is actually a VM under =
Xen on a beefy piece of HW (4 core Sandy Bridge 3ghz Xeon, total HW =
memory 32GB -- VM has 4vcpus and 6GB RAM).  Mirrored gpart partitions.  =
I am looking for data integrity more than performance as long as =
performance is reasonable (which it has more than been the last 3 =
months).
>=20
> The other "servers" on the same HW, the other VMs on the same, don't =
have this problem but are set up the same way.  There are 4 other =
FreeBSD VMs, one running email for a one man company and a few of his =
friends, as well as some static web pages and stuff for him, one runs a =
few low use web apps for various customers, and one runs about 30 =
websites with apache and nginx, mostly just static sites.  None are =
heavily used.  There is also one VM with linux running a couple low use =
FrontBase databases.   Not high use database -- low use ones.
>=20
> The troubleseome VM  has been running fine for over 3 months since I =
installed it.    Level of use has been pretty much constant.   The =
server runs 4 jails on it, each dedicated to a different bit of email =
processing for a small number of users.   One is a secondary DNS.  One =
runs clamav and spamassassin.  One runs exim for incoming and outgoing =
mail.  One runs dovecot for imap and pop.   There is no web server or =
database or anything else running.
>=20
> Total number of mail users on the system is approximately 50, plus or =
minus.  Total mail traffic is very low compared to "real" mail servers.
>=20
> Earlier this week things started "freezing up".  It might last a few =
minutes, or it might last 1/2 hour.   Processes become unresponsive.  =
This can last a few minutes or much longer.  It eventually resolves =
itself and things are good for another 10 minutes or 3 hours until it =
happens again.  When it happens,  lots of processes are listed in "top" =
as=20
>=20
> zfs
> zio->i
> zfs
> tx->tx
> db->db
>=20
> state.   These processes only get listed in these states when there =
are problems.   What are these states indicative of?
>=20

Ok, after much reading of ZFS blog posts, forum postings, email list =
postings, and trying stuff out, I seem to have gotten stuff back down to =
normal and reasonable performance.

In case anyone has similar issues in a similar circumstance, here is =
what I did.  Some of these may have had little or no effect but this is =
what was changed.

The biggest effect was when I did the following:

vfs.zfs.zfetch.block_cap  from default 256 down to 64

This was like night and day.  The idea to try this from a post by user =
"madtrader" in the forum =
http://forums.sagetv.com/forums/showthread.php?t=3D43830&page=3D2  .  He =
was recording multiple streams of HD video and trying to play HD video =
off a stream from the same server/ZFS file system. =20


Also, setting

vfs.zfs.write_limit_override   to something other than the default =
disabled "0" seems to have had a relatively significant effect.   Before =
I worked with the  "block_cap" above, I was focussing on this and had =
tried everything from 64M to 768M.  It is currently set to 576M and is =
around the area where I was having best results on my system with my =
amount of RAM (6GB).  I tried 512M and had good results and then 768M, =
which was still good but not quite as good as far as I could tell from =
testing.  So I went with 576M on my last attempt and then added in the =
block_cap and things really are pretty much back to normal.


I turned on vdev caching

vfs.zfs.vdev.cache.size   form 0 to 10M.   Don't know if it helped. =20

I also lowered=20

vfs.zfs.txg.timeout   from 5 to 3.   This seems to have had a slightly =
noticeable effect.


I also adjusted


vfs.zfs.arc_max

The default of 0 (meaning system self set) seemed to result in an actual =
value of around 75-80% of RAM, which seemed high.   I ended up setting =
it at 3072M, which for me seems to work well.  Don't know what the =
overall effect on the problem was though.


Thanks
Chad



--Apple-Mail=_BB772409-CB4E-4C76-B7DF-A0183396825F--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9B93809A-302E-4DA4-A6B4-6AA2D44E4BD4>