Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 May 2011 18:04:33 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Jason Hellenthal <jhell@DataIX.net>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS: How to enable cache and logs.
Message-ID:  <20110512010433.GA48863@icarus.home.lan>
In-Reply-To: <20110511223849.GA65193@DataIX.net>
References:  <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> 
> Jeremy,
> 
> On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > > >should also keep that in mind when putting an SSD into use in this
> > > >fashion.
> > >
> > > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > written slowly (on purpose).  Any current, or 1-2 generations back SSD
> > > would handle that write load without TRIM and without any performance
> > > degradation.
> > >
> > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > NAND should be smaller (my wild guess, current practice may differ)
> > > and the need for rewriting will be small. If you don't need to
> > > rewrite already written data, TRIM does not help. Also, as far as I
> > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > twice or more the advertised size and always write to fresh cells,
> > > scheduling an background erase of the 'overwritten' cell.
> > 
> > AFAIK, drive manufacturers do not disclose just how much reallocation
> > space they keep available on an SSD.  I'd rather not speculate as to how
> > much, as I'm certain it varies per vendor.
> > 
> 
> Lets not forget here: The size of the separate log device may be quite 
> small. A rule of thumb is that you should size the separate log to be able 
> to handle 10 seconds of your expected synchronous write workload. It would 
> be rare to need more than 100 MB in a separate log device, but the 
> separate log must be at least 64 MB.
> 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> 
> So in other words how much is TRIM really even effective give the above ?
> 
> Even with a high database write load on the disks at full compacity of the 
> incoming link I would find it hard to believe that anyone could get the 
> ZIL to even come close to 512MB.

In the case of an SSD being used as a log device (ZIL), I imagine it
would only matter the longer the drive was kept in use.  I do not use
log devices anywhere with ZFS, so I can't really comment.

In the case of an SSD being used as a cache device (L2ARC), I imagine it
would matter much more.

In the case of an SSD being used as a pool device, it matters greatly.

Why it matters: there's two methods of "reclaiming" blocks which were
used: internal SSD "garbage collection" and TRIM.  For a NAND block to be
reclaimed, it has to be erased -- SSDs erase things in pages rather
than individual LBAs.  With TRIM, you submit the data management command
via ATA with a list of LBAs you wish to inform the drive are no longer
used.  The drive aggregates the LBA ranges, determines if an entire
flash page can be erased, and does it.  If it can't, it makes some sort
of mental note that the individual LBA (in some particular page)
shouldn't be used.

The "garbage collection" works when the SSD is idle.  I have no idea
what "idle" actually means operationally, because again, vendors don't
disclose what the idle intervals are.  5 minutes?  24 hours?  It
matters, but they don't tell us.  (What confuses me about the "idle GC"
method is how it determines what it can erase -- if the OS didn't tell
it what it's using, how does it know it can erase the page?)

Anyway, how all this manifests itself performance-wise is intriguing.
It's not speculation: there's hard evidence that not using TRIM results
in SSD performance, bluntly put, sucking badly on some SSDs.

There's this mentality that wear levelling completely solves all of the
**performance** concerns -- that isn't the case at all.  In fact, I'm
under the impression it probably hurts performance, but it depends on
how it's implemented within the drive firmware.

bit-tech did an experiment using Windows 7 -- which supports and uses
TRIM assuming the device advertises the capability -- with different
models of SSDs.  The testing procedure is documented here, but I'll
document it as well:

http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4

Again, remember, this is done on a Windows 7 system which does support
TRIM if the device supports it.  The testing steps, in this order:

1) SSD without TRIM support -- all LBAs are zeroed.
2) Took read/write benchmark readings.
3) SSD without TRIM support -- partitioned and formatted as NTFS
   (cluster size unknown), copied 100GB of data to the drive, deleted all
   the data, and repeated this method 10 times.
4) Step #2 repeated.
5) Upgraded SSD firmware to a version that supports TRIM.
6) SSD with TRIM support -- step #1 repeated.
7) Step #2 repeated.
8) SSD with TRIM support -- step #3 repeated.
9) Step #2 repeated.

Without TRIM, some drives drop their read performance by more than 50%,
and write performance by almost 70%.  I'm focusing on Intel SSDs here,
by the way.  I do not care for OCZ or Corsair products.

So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
on FreeBSD will mimic (to some degree).

Therefore, simply put, users should be concerned when using ZFS on
FreeBSD with SSDs.  It doesn't matter to me if you're only using
64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
means degraded performance over time.

Can you refute any of this evidence?

> Given most SSD's come at a size greater than 32GB I hope this comes as a 
> early reminder that the ZIL you are buying that disk for is only going to 
> be using a small percent of that disk and I hope you justify cost over its 
> actual use. If you do happen to justify creating a ZIL for your pool then 
> I hope that you partition it wisely to make use of the rest of the space 
> that is untouched.
> 
> For all other cases I would reccomend if you still want to have a ZIL that 
> you take some sort of PCI->SD CARD or USB stick into account with 
> mirroring.

Others have pointed out this isn't effective (re: USB sticks).  The read
and write speeds are too slow, and limit the overall performance of ZFS
in a very bad way.  I can absolutely confirm this claim (I've tested it
myself, using a high-end USB flash drive as a cache device (L2ARC)).

Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
*does* improve performance on older systems which have slower disk I/O
(e.g. ICH5-based systems).

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110512010433.GA48863>