Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Aug 2015 16:08:47 -0700
From:      Wim Lewis <wiml@omnigroup.com>
To:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   ZFS L2ARC statistics interpretation
Message-ID:  <0CEC2752-7787-4C6D-99E2-E7D7BF238449@omnigroup.com>

next in thread | raw e-mail | index | archive | help
I'm trying to understand some problems we've been having with our ZFS =
systems, in particular their L2ARC performance. Before I make too many =
guesses about what's going on, I'm hoping someone can clarify what some =
of the ZFS statistics actually mean, or point me to documentation if any =
exists.

In particular, I'm hoping someone can tell me the interpretation of:

Errors:
   kstat.zfs.misc.arcstats.l2_cksum_bad
   kstat.zfs.misc.arcstats.l2_io_error

Other than problems with the underlying disk (or controller or cable =
or...), are there reasons for these counters to be nonzero? On some of =
our systems, they increase fairly rapidly (20000/day). Is this =
considered normal, or does it indicate a problem? If a problem, what =
should I be looking at?

Size:
   kstat.zfs.misc.arcstats.l2_size
   kstat.zfs.misc.arcstats.l2_asize

What does l2_size/l2_asize measure? Compressed or uncompressed size? It =
sometimes tops out at roughly the size of my L2ARC device, and sometimes =
just continually grows (e.g., one of my systems has an l2_size of about =
1.3T but a 190G L2ARC; I doubt I'm getting nearly 7:1 compression on my =
dataset! But maybe I am? How can I tell?)

There are reports over the last few years [1,2,3,4] that suggest that =
there's a ZFS bug that attempts to use space past the end of the L2ARC, =
resulting both in l2_size being larger than is possible and also in =
io_errors and bad cksums (when the nonexistent sectors are read back). =
But given that this behavior has been reported off and on for several =
years now, and many of the threads devolve into supposition and =
folklore, I'm hoping to get an informed answer about what these =
statistics mean, whether the numbers I'm seeing indicate a problem or =
not, and be able to make a judgment about whether a given fix in FreeBSD =
might solve the problem.

FWIW, I'm seeing these problems on FreeBSD 10.0 and 10.1; I'm not seeing =
them on 9.2.=20


[1] =
https://lists.freebsd.org/pipermail/freebsd-current/2013-October/045088.ht=
ml
[2] https://forums.freebsd.org/threads/l2arc-degraded.47540/
[3] =
https://lists.freebsd.org/pipermail/freebsd-fs/2014-October/020256.html
[4] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D198242


Thanks

Wim Lewis / wiml@omnigroup.com





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0CEC2752-7787-4C6D-99E2-E7D7BF238449>