Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Oct 2017 12:08:01 -0600
From:      markham breitbach <markham_breitbach@ssimicro.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: FreeBSD ZFS file server with SSD HDD
Message-ID:  <d0c4a978-5fab-ef66-89c0-7ee956ff5b24@ssimicro.com>
In-Reply-To: <e99b1b0c-7d8a-90b4-d49b-24a9d8428864@holgerdanske.com>
References:  <20171011130512.GE24374@apple.rat.burntout.org> <e99b1b0c-7d8a-90b4-d49b-24a9d8428864@holgerdanske.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I ran into some problems of disks choking on heavy IO under VMware.  It
turned out to be an issue with firmware on the SSDs and backplane in a
Dell server.
It's probably worth making sure those are all up to date.

-M

On 2017-10-11 11:30 AM, David Christensen wrote:
> On 10/11/17 06:05, Kate Dawson wrote:
>> Currently running a FreeBSD NFS server with a zpool comprising
>> 12 x 1TB hard disk drives are arranged as pairs of mirrors in a strip
>> set ( RAID 10 )
>
> That should do 6+ Gb/s.
>
>
> bonnie++ should be able to measure that.  (It's been a while, but I
> seem to recall that bonnie++ expects raw drives and nukes your data. 
> So, it could take some effort to use it.)
>
> https://www.coker.com.au/bonnie++/
>
>
>> An additional 2x 960GB SSD added. These two SSD are partitioned with a
>> small partition begin used for a ZIL log, and larger partion arranged
>> for
>> L2ARC cache.
>
> Assuming the ZIL is mirrored, that should do 5+ Gb/s.
>
>
> Assuming the L2ARC is striped, that should do 10+ Gb/s.
>
>
> I dont' know how to test ZIL and L2ARC in isolation, but dbench should
> be able to test what ZFS exposes, both locally and over NFS:
>
> https://dbench.samba.org/
>
>
>> Additionally the host has 64GB RAM and 16 CPU cores (AMD Opteron 2Ghz)
>
> That should do 20+ Gb/s.
>
>
> Memtest86+ will be to measure:
>
> http://www.memtest.org/
>
>
>> A dataset from the pool is exported via NFS to a number of Debian
>> Gnu/Linux hosts running a xen hypervisor. These run several disk image
>> based virtual machines
>>
>> In general use, the FreeBSD NFS host sees very little read IO, which
>> is to expected
>> as the RAM cache  and L2ARC are designed to minimise the amount of
>> read load
>> on the disks.
>>
>> However we're starting to see high load ( mostly IO WAIT ) on the Linux
>> virtualisation hosts, and virtual machines - with kernel timeouts
>> occurring resulting in crashes and instability.
>>
>> I believe this may be due to the limited number of random write IOPS
>> available
>> on the zpool NFS export.
>>
>> I can get sequential writes and reads to and from the NFS server at
>> speeds that approach the maximum the network provides ( currently 1Gb/s
>> + Jumbo Frames, and I could increase this by bonding multiple
>> interfaces together. )
>>
>> However day to day usage does not show network utilisation anywhere near
>> this maximum.
>>
>> If I look at the output of `zpool iostat -v tank 1 ` I see that every
>> five seconds or so, the numner of write operation go to > 2k
>>
>> I think this shows that the I'm hitting the limit that the spinning disk
>> can provide in this workload.
>>
>> As a cost effective way to improve this ( rather than replacing the
>> whole chassis ) I was considering replacing the 1TB HDD with 1TB SSD,
>> for the improved IOPS.
>>
>> I wonder if there were any opinions within the community here, on
>>
>> 1. What metrics can I gather to confirm the disk write IO as bottleneck?
>>
>> 2. If the proposed solution will have the required effect?  That is an
>> decrease in the IOWAIT on the GNU/Linux virtualization hosts.
>
>
> I infer your network to be:
>
> - 1 host running FreeBSD (freebsd-version? uname -a?) and an NFS
> server (version?).
>
> - N (how many?) Debian GNU/Linux hosts (/etc/debian-version?  uname
> -a?), each running a Xen hypervisor (version?) and an NFS client.
>
> - The VM's are configured to see their drives as local devices (e.g.
> the VM's are not running NFS clients connected to the FreeBSD NFS
> server).
>
> - Gigabit switch (make? model?).
>
> - 1 Gigabit connection between switch and each host.
>
>
> As you have correctly stated, you need visibility on the relevant
> performance metrics to make informed decisions.  In addition to the
> above tools:
>
> - For networking, I'd try netstat:
>
> http://netstat.net/
>
> - For drive I/O, I use nmon on Debian:
>
> https://en.wikipedia.org/wiki/Nmon
>
> - I believe iostat is available on both:
>
> https://en.wikipedia.org/wiki/Iostat
>
> - For CPU's, RAM, and swap, I use top.
>
> https://en.wikipedia.org/wiki/Top_(software)
>
> - You seem to have found at least one ZFS tool.
>
>
> As others have stated, you will want to ensure that all the pieces are
> reasonably in tune -- VM, NFS client, Xen, Debian networking, switch,
> FreeBSD networking, NFS server, ZFS, etc..  I'd start by looking for
> errors and/or warnings in the usual places (dmesg, /var/log, etc.).  I
> typically leave the settings at the installer defaults, unless I have
> some compelling reason to make a change (at least one reader made a
> suggestion).  Be sure to keep good notes if you're going to muck with
> the settings.
>
>
> As for 'zpool iostat -v tank 1', I suspect ZFS is telling you that it
> is flushing writes to the HDD's every five seconds.  If flushes always
> complete before the next scheduled flush, replacing the HDD's with
> SSD's probably will not help with the VM IO WAIT and kernel timeout
> problems. But, if the flushes are overrunning each other during peak
> usage, you may have found the bottleneck.
>
>
> That said, I suspect that the root cause of the VM IO WAIT and kernel
> timeout problems is that the virtual machines need a low latency
> connection to their system drives, temporary file systems, and/or swap
> devices, and they aren't getting it.  I would not bet on NFS to
> provide this, even with SSD's instead of HDD's.  I would bet on local
> resources.  I suggest:
>
> 1.  Put 2 mirrored SSD's in each Xen server.
>
> 2.  Put VM system drives on the local SSD mirror.
>
> 3.  Put VM /tmp file systems on the local SSD mirror, or on RAM:
>
> https://en.wikipedia.org/wiki/Tmpfs
>
> 4.  Put VM swap devices on the local SSD mirror, or on RAM:
>
> https://en.wikipedia.org/wiki/Zram
>
> 5.  Put VM data drives on NFS.
>
>
> I am unsure if it is better to do the "on RAM" and "on NFS" ideas at
> the Xen level or within each VM.  Performance is one consideration. 
> Others considerations are security and accountability -- e.g. do
> customers have root on the VM's?
>
>
> To improve NFS performance:
>
> 1.  Enlarging the pipe between the NFS server and the switch --
> bonding (your idea), upgrade to 10 Gb/s, etc..
>
> 2.  Enlarge the pipes between the Xen hosts and the switch.
>
> 3.  Add NIC's to the NFS server, add switches, and divide up the Xen
> hosts across the switches.
>
> 4.  Add NIC's to the NFS server, one per Xen host, and make direct
> connections between the NFS server and each Xen host.
>
>
> Please let us know how it goes.  :-)
>
>
> David
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to
> "freebsd-questions-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d0c4a978-5fab-ef66-89c0-7ee956ff5b24>