Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 May 2011 11:11:58 +0200
From:      Alexander Leidinger <Alexander@Leidinger.net>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS: How to enable cache and logs.
Message-ID:  <20110512111158.16451mu57sv0f8f4@webmail.leidinger.net>
In-Reply-To: <20110512083429.GA58841@icarus.home.lan>
References:  <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> <20110512083429.GA58841@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Quoting Jeremy Chadwick <freebsd@jdc.parodius.com> (from Thu, 12 May  
2011 01:34:29 -0700):

> On Thu, May 12, 2011 at 09:33:06AM +0300, Daniel Kalchev wrote:
>> On 12.05.11 06:36, Jeremy Chadwick wrote:
>> >On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote:
>> >>On Thu, 12 May 2011, Danny Carroll wrote:
>> >>>Replying to myself in order to summarise the recommendations (when using
>> >>>v28):
>> >>>- Don't use SSD for the Log device.  Write speed tends to be a problem.
>> >>DO use SSD for the log device.  The log device is only used for
>> >>synchronous writes.  Except for certain usages (E.g. database and
>> >>NFS server) most writes will be asynchronous and never be written to
>> >>the log.  Huge synchronous writes will also bypass the SSD log
>> >>device. The log device is for reducing latency on small synchronous
>> >>writes.
>> >Bob, please correct me if I'm wrong, but as I understand it a log device
>> >(ZIL) effectively limits the overall write speed of the pool itself.
>> >
>> Perhaps I misstated it in my first post, but there is nothing wrong
>> with using SSD for the SLOG.
>>
>> You can of course create usage/benchmark scenario, where an (cheap)
>> SSD based SLOG will be worse than an (fast) HDD based SLOG,
>> especially if you are not concerned about latency. The SLOG resolves
>> two issues, it increases the pool throughput (primary storage) by
>> removing small synchronous writes from it, that will unnecessarily
>> introduce head movement and more IOPS and it provided low latency
>> for small synchronous writes.
>
> I've been reading about this in detail here:
>
> http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained
>
> I had no idea the primary point of a SLOG was to deal with applications
> that make use of O_SYNC.  I thought it was supposed to improve write
> performance for both asynchronous and synchronous writes.  Obviously I'm
> wrong here.
>
> The author's description (at that URL) of an example scenario makes
> little sense to me; there's a story he tells referring to a bank and a
> financial transaction of US$699 performed which got cached in RAM and
> then the system lost power -- and how the intent log on a filesystem
> would be replayed during reboot.
>
> What guarantee is there that the intent log -- which is written to the
> disk -- actually got written to the disk in the middle of a power
> failure?  There's a lot of focus there on the idea that "the intent log
> will fix everything, but may lose writes", but what guarantee do I have
> that the intent log isn't corrupt or botched during a power failure?

The request comes in, the data is written to stable storage (as it is  
a sync-write to the SLOG), the application knows that the data has hit  
stable storage when the write-call returns (as it is a sync-write),  
the app ACKs to the other party.

Without the SLOG you can have the same, just at a lower speed (if done  
correctly). So the SLOG is not about the guarantee (you should have it  
already with a normal pool and disks which tell the truth regarding a  
cache flush), the SLOG is about a higher amount of transactions with  
such a guarantee.

>> The later is only valid if the SSD is sufficiently write-optimized.
>> Most consumer SSDs end up saturated by writes. Sequential write IOPS
>> is what matters here.
>
> Oh, I absolutely agree on this point.  So basically consumer-level SSDs
> that don't provide extreme write speed benefits (compared to a classic
> MHDD) -- not discussing seek times here, we all know SSDs win there --
> probably aren't good candidates for SLOGs.
>
> What's interesting about the focus on IOPS is that Intel SSDs, in the
> consumer class, still trump their competitors.  But given that your
> above statement focuses on sequential writes, and the site I provided is
> quite clear about what happens to sequential writes on Intel SSD that
> doesn't have TRIM..... Yeah, you get where I'm going with this.  :-)

TRIM for SLOG is IMO more important than TRIM for the cache. For the  
SLOG the write-latency matters, for the cache normally it does not  
_that much_. Remember, if you are in the case that something is moved  
from RAM to L2ARC, the data you move is not needed ATM. Data is moved  
from RAM to L2ARC because the OS decides that either the ARC is at  
some kind of high-watermark (predicting the future and make sure there  
is some free space for future data, respectively some kind of garbage  
collection), or because the OS really needs some free RAM _now_  
(either some free area in the ARC, or because an application needs  
memory). In the first case the write latency does not matter much, in  
the second case it matters (but in this case you can evaluate if  
adding more RAM is an option here).

>> About TRIM. As it was already mentioned, you will use only small
>> portion of an (for example) 32GB SSD for the SLOG. If you do not
>> allocate the entire SSD, then wear leveling will be able to play
>> well and it is very likely you will not suffer any performance
>> degradation.
>
> That sounds ideal, though I'm not sure about the "won't suffer ANY
> performance degradation" part.  I think degradation is just less likely
> to be witnessed.

IMO TRIM support for ZFS can improve the performance. IMO the most  
bang for the bucks would be to add TRIM support first (if it can not  
be added to everything at the same time) for SLOGs, then for the pool,  
and then for the cache.

My rationale here is, that if you use a SLOG you have very high  
requirements for sync-writes, and consumer SSDs could give you a lot  
of ROI if the SSD is not used completely and TRIM is used. I do not  
expect that TRIM for the cache gives a lot of ROI (less than TRIM  
support for the pool).

FYI: it also depends upon how TRIM is implemented. If you TRIM one LBA  
after another, this adds a huge amount of latency just for the TRIM. I  
do not know if TRIMming a range of LBAs is a lot cheaper, but I would  
expect it is. TRIMming in FreeBSD (in UFS) is AFAIK one LBA after  
another.

Bye,
Alexander.

-- 
"Sonny, what is it?"
"They shot the old man. Don't worry, he's not dead."
		-- Sandra and Santino Corleone, "Chapter 2", page 83

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110512111158.16451mu57sv0f8f4>