Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Aug 2012 12:11:12 -0600
From:      "Chad Leigh Shire.Net LLC" <>
To:        FreeBSD Mailing List <>
Cc:        Chad Leigh <>
Subject:   Re: ZFS stats in "top" -- ZFS performance started being crappy in spurts
Message-ID:  <>
In-Reply-To: <>
References:  <>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help

Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;

On Aug 11, 2012, at 5:33 PM, Chad Leigh - Pengar LLC wrote:

> Hi
> I have a FreeBSD 9 system with ZFS root.  It is actually a VM under =
Xen on a beefy piece of HW (4 core Sandy Bridge 3ghz Xeon, total HW =
memory 32GB -- VM has 4vcpus and 6GB RAM).  Mirrored gpart partitions.  =
I am looking for data integrity more than performance as long as =
performance is reasonable (which it has more than been the last 3 =
> The other "servers" on the same HW, the other VMs on the same, don't =
have this problem but are set up the same way.  There are 4 other =
FreeBSD VMs, one running email for a one man company and a few of his =
friends, as well as some static web pages and stuff for him, one runs a =
few low use web apps for various customers, and one runs about 30 =
websites with apache and nginx, mostly just static sites.  None are =
heavily used.  There is also one VM with linux running a couple low use =
FrontBase databases.   Not high use database -- low use ones.
> The troubleseome VM  has been running fine for over 3 months since I =
installed it.    Level of use has been pretty much constant.   The =
server runs 4 jails on it, each dedicated to a different bit of email =
processing for a small number of users.   One is a secondary DNS.  One =
runs clamav and spamassassin.  One runs exim for incoming and outgoing =
mail.  One runs dovecot for imap and pop.   There is no web server or =
database or anything else running.
> Total number of mail users on the system is approximately 50, plus or =
minus.  Total mail traffic is very low compared to "real" mail servers.
> Earlier this week things started "freezing up".  It might last a few =
minutes, or it might last 1/2 hour.   Processes become unresponsive.  =
This can last a few minutes or much longer.  It eventually resolves =
itself and things are good for another 10 minutes or 3 hours until it =
happens again.  When it happens,  lots of processes are listed in "top" =
> zfs
> zio->i
> zfs
> tx->tx
> db->db
> state.   These processes only get listed in these states when there =
are problems.   What are these states indicative of?

Ok, after much reading of ZFS blog posts, forum postings, email list =
postings, and trying stuff out, I seem to have gotten stuff back down to =
normal and reasonable performance.

In case anyone has similar issues in a similar circumstance, here is =
what I did.  Some of these may have had little or no effect but this is =
what was changed.

The biggest effect was when I did the following:

vfs.zfs.zfetch.block_cap  from default 256 down to 64

This was like night and day.  The idea to try this from a post by user =
"madtrader" in the forum =  .  He =
was recording multiple streams of HD video and trying to play HD video =
off a stream from the same server/ZFS file system. =20

Also, setting

vfs.zfs.write_limit_override   to something other than the default =
disabled "0" seems to have had a relatively significant effect.   Before =
I worked with the  "block_cap" above, I was focussing on this and had =
tried everything from 64M to 768M.  It is currently set to 576M and is =
around the area where I was having best results on my system with my =
amount of RAM (6GB).  I tried 512M and had good results and then 768M, =
which was still good but not quite as good as far as I could tell from =
testing.  So I went with 576M on my last attempt and then added in the =
block_cap and things really are pretty much back to normal.

I turned on vdev caching

vfs.zfs.vdev.cache.size   form 0 to 10M.   Don't know if it helped. =20

I also lowered=20

vfs.zfs.txg.timeout   from 5 to 3.   This seems to have had a slightly =
noticeable effect.

I also adjusted


The default of 0 (meaning system self set) seemed to result in an actual =
value of around 75-80% of RAM, which seemed high.   I ended up setting =
it at 3072M, which for me seems to work well.  Don't know what the =
overall effect on the problem was though.



Want to link to this message? Use this URL: <>