Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 May 2011 03:13:41 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Robert Schulze <rs@bytecamp.net>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: zfs l2arc issue
Message-ID:  <20110505101341.GA10618@icarus.home.lan>
In-Reply-To: <4DC25DA6.3060009@bytecamp.net>
References:  <4DC25DA6.3060009@bytecamp.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 05, 2011 at 10:19:50AM +0200, Robert Schulze wrote:
> we are running an NFS server with the following pool setup:
> 
> home        ONLINE       0     0     0
> 	  raidz2    ONLINE       0     0     0
> 	    da1     ONLINE       0     0     0
> 	    da2     ONLINE       0     0     0
> 	    da3     ONLINE       0     0     0
> 	    da4     ONLINE       0     0     0
> 	    da5     ONLINE       0     0     0
> 	  raidz2    ONLINE       0     0     0
> 	    da6     ONLINE       0     0     0
> 	    da7     ONLINE       0     0     0
> 	    da8     ONLINE       0     0     0
> 	    da9     ONLINE       0     0     0
> 	    da10    ONLINE       0     0     0
> 	logs        ONLINE       0     0     0
> 	  mirror    ONLINE       0     0     0
> 	    da12    ONLINE       0     0     0
> 	    da13    ONLINE       0     0     0
> 	cache
> 	  ad4       ONLINE       0     0     0
> 	  ad8       ONLINE       0     0     0
> 
> 
> All drives except the caching SSDs are attached to a LSI 9690SA-8I.
> The system is equipped with 32 GB RAM, and runs with a load of <1,
> please note: we are running 8.0, yet, since there was one issue with
> ZFS which blocked the upgrade to 8-STABLE.
> 
> After about 100d uptime, we had a sudden large increase in load of
> about 5-7, nfsd had 100-400% WCPU. Also an rsync downloading files
> from that machine was very slow.
> 
> We didn't really narrow down the problem, we had to reboot the
> machine because performance was nearly completely absent. After
> reboot, system performance became normal.
> 
> Could this problem be related to the caching SSDs beeing full? Cache
> consists of two 76 GB SSDs, after warming up, only 8 MB are free on
> each disk.
> Is ZFS supposed to fill arbitrary large caches? I think of doubling
> the cache and then ending up with fully filled SSDs again. For if,
> could l2arc be limited somehow, so that SSDs don't get written full?
> 
> Could this behaviour also appear in 8-STABLE?

To readers: make sure you note this user is running either 8.0-RELEASE
or 8.0-STABLE.  ZFS during that time is very different and **many**
pieces to its innards and tweaking/tuning pieces are different now.

- It would help if we could match disk types (SSDs, etc.) to a device
string. "camcontrol devlist -v" would be useful on this machine.

- nfsd taking up 100-400% CPU (that has been addressed in a later
release by the way; it will show 100% total for all 4 cores; I believe
"top -C" changes the behaviour) doesn't tell us much.  What was nfsd
actually *doing* during that time?  Could you "procstat -kk PID"?
Did you try using "ktrace -i -t+ -p PID" to see what syscalls it was
making?

- Have you done any system tuning on this machine for ZFS?  It's very
important that you provide the following:

  - uname -a (you can hide/XXX-out the machine name).  This will
    provide both the exact build date (which hopefully will match
    what time your kernel sources were synced), and whether or not
    the machine is i386 or amd64
  - Contents of /etc/sysctl.conf
  - Contents of /boot/loader.conf
  - Contents of /etc/rc.conf (you can XXX out machine names, IPs, etc.)
  - Output from dmesg (after a fresh reboot is fine)
  - Output from "sysctl -a vfs.zfs"
  - Output from "sysctl -a kstat.zfs"
  - Output from "top" when the issue is occurring; interested mainly
    in the high-CPU-usage processes as well as all the system/memory
    statistics
  - Output from "zpool iostat -v 1" when the issue is occurring.

I should warn you in advance: you're asking for assistance with
something that's "fairly old", and as I stated in the "To readers"
section, ZFS on 8.0 is very different than 8.2.  There are all sorts of
tunings/adjustments that are required there that are not on 8.2.

I think most of us would like to know what single ZFS issue is keeping
you from upgrading the machine to RELENG_8 / 8.2-STABLE.  I think
overall it might make the most sense to address or fix that problem for
you and then have you try 8.2-STABLE to see if the above issue persists.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110505101341.GA10618>