Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Feb 2016 18:28:22 +0100
From:      Jan Bramkamp <crest@rlwinm.de>
To:        freebsd-stable@freebsd.org
Subject:   Re: Best practices for ZFS setup for a strictly SSD based system?
Message-ID:  <56BA21B6.3070308@rlwinm.de>
In-Reply-To: <2D296837-3B06-4E72-B8B0-A33AE6CE48AE@punkt.de>
References:  <2D296837-3B06-4E72-B8B0-A33AE6CE48AE@punkt.de>

next in thread | previous in thread | raw e-mail | index | archive | help


On 09/02/16 16:54, Patrick M. Hausen wrote:
> Hi, all,
>
> while there is quite a bit of documentation on how to improve ZFS performance
> by using a combination of rotating disks and SSDs, I have not found much about
> an SSD only setup.
>
> We are planning to try a hosting server with 8 SATA SSDs with ZFS. Things I am
> not at all sure about:
>
> *	Does the recommended limit of 6 disks for a RAIDZ2 still
> 	hold? 2x 4 disks is quite a bit of overhead, could I use all 8
> 	in one vdev and get away with it?
> 	(The maximum of 6 recommendation is in some old Sun doc)

There are multiple reasons to limit number of disks per RAID-Z VDEV.

  * Resilver time: ZFS has to process all objects ordered by transaction 
id to resilver a RAID-Z. Resilvering is a torture test for the remaining 
disks of your degraded RAID-Z and with the ratio of bandwidth to 
capacity of current hard disks resilvering takes too long. This isn't an 
issue for SSDs.

  * For performance estimations think of the RAID-Z of one huge disk 
with larger blocks but the same IOPS as the slowest disk in the RAID-Z. 
Databases perform disk I/O in small blocks limiting your RAID-Z to the 
performance of about one of its member disks.

  * A ZFS pool can only grow by adding whole VDEVS or replacing all 
disks in a VDEV one at a time. Using mirror allows the pool to grow in 
smaller increments.

> *	Will e.g. MySQL still profit from residing on a mirror
> 	instead of a RAIDZ2, even if all disks are SSDs?

Yes OpenZFS schedules reads on mirrors to the disk with the shortest 
queue thus a mirror offers about sum of its member disks in read 
performance (IOPS and bandwidth) and the minimum of its member disks in 
write performance (IOPS and bandwidth). A pool with as many mirrored 
VDEVs as possible will offer the optimal performance for a given number 
of disks. For write heavy workloads the quality of the SSDs matters a 
lot as well. Cheap consumer SSDs can't sustain high write rates for any 
length of time. Even medium quality SSDs have a lot of jitter and suffer 
from throughput degradation under sustained write loads. Optimized 
server SSDs can sustain random write workloads with little jitter and 
bounded latency.

A NVMe SSD can offer an additional order of magnitude performance 
increase over SATA SSDs but at a significant increase in price. With 
multiple NVMe SSDs you will run into the current scalability limits of 
ZFS and GEOM.

> *	Does a separate ZIL and/or ARC cache device still
> 	make sense?

Most likely not.



An other optimization is splitting the log and table space and creating 
a dedicated ZFS dataset for each. Create the dataset containing the 
table space with the fixed record size of your MySQL backend. ZFS also 
offers a lot more consistency and atomicity quarantines  than required 
by a minimal POSIX file system. This allows you to further reduce the 
syncing overhead by tuning MySQL to take advantage of ZFS quarantines.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56BA21B6.3070308>