From owner-freebsd-questions@freebsd.org  Wed Oct 11 18:08:07 2017
Return-Path: <owner-freebsd-questions@freebsd.org>
Delivered-To: freebsd-questions@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5DE80E32805
 for <freebsd-questions@mailman.ysv.freebsd.org>;
 Wed, 11 Oct 2017 18:08:07 +0000 (UTC)
 (envelope-from markham@ssimicro.com)
Received: from barracuda.ssimicro.com (barracuda.ssimicro.com [96.46.39.196])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits))
 (Client CN "*.ssimicro.com", Issuer "RapidSSL SHA256 CA - G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 27E6D70C52
 for <freebsd-questions@freebsd.org>; Wed, 11 Oct 2017 18:08:06 +0000 (UTC)
 (envelope-from markham@ssimicro.com)
X-ASG-Debug-ID: 1507745282-08e7172d0be954a0001-jLrpzn
Received: from mail.ssimicro.com (mail.ssimicro.com [64.247.129.10]) by
 barracuda.ssimicro.com with ESMTP id 4OdxmzJRvz7arkRv (version=TLSv1.2
 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for
 <freebsd-questions@freebsd.org>; Wed, 11 Oct 2017 14:08:02 -0400 (EDT)
X-Barracuda-Envelope-From: markham@ssimicro.com
X-Barracuda-Effective-Source-IP: mail.ssimicro.com[64.247.129.10]
X-Barracuda-Apparent-Source-IP: 64.247.129.10
Received: from yk-office-dhcp-64-247-130-165.ssimicro.com
 (yk-office-dhcp-64-247-130-165.ssimicro.com [64.247.130.165])
 (authenticated bits=0)
 by mail.ssimicro.com (8.15.2/8.15.2) with ESMTPSA id v9BI81Vj022716
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT)
 for <freebsd-questions@freebsd.org>; Wed, 11 Oct 2017 12:08:01 -0600 (MDT)
 (envelope-from markham@ssimicro.com)
Subject: Re: FreeBSD ZFS file server with SSD HDD
To: freebsd-questions@freebsd.org
X-ASG-Orig-Subj: Re: FreeBSD ZFS file server with SSD HDD
References: <20171011130512.GE24374@apple.rat.burntout.org>
 <e99b1b0c-7d8a-90b4-d49b-24a9d8428864@holgerdanske.com>
From: markham breitbach <markham_breitbach@ssimicro.com>
Message-ID: <d0c4a978-5fab-ef66-89c0-7ee956ff5b24@ssimicro.com>
Date: Wed, 11 Oct 2017 12:08:01 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0)
 Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <e99b1b0c-7d8a-90b4-d49b-24a9d8428864@holgerdanske.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-Barracuda-Connect: mail.ssimicro.com[64.247.129.10]
X-Barracuda-Start-Time: 1507745282
X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384
X-Barracuda-URL: https://barracuda.ssimicro.com:443/cgi-mod/mark.cgi
X-Barracuda-Scan-Msg-Size: 6981
X-Virus-Scanned: by bsmtpd at ssimicro.com
X-Barracuda-BRTS-Status: 1
X-Barracuda-Spam-Score: 0.00
X-Barracuda-Spam-Status: No,
 SCORE=0.00 using per-user scores of TAG_LEVEL=1000.0
 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=5.0 tests=
X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.43806
 Rule breakdown below
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Oct 2017 18:08:07 -0000

I ran into some problems of disks choking on heavy IO under VMware.  It
turned out to be an issue with firmware on the SSDs and backplane in a
Dell server.
It's probably worth making sure those are all up to date.

-M

On 2017-10-11 11:30 AM, David Christensen wrote:
> On 10/11/17 06:05, Kate Dawson wrote:
>> Currently running a FreeBSD NFS server with a zpool comprising
>> 12 x 1TB hard disk drives are arranged as pairs of mirrors in a strip
>> set ( RAID 10 )
>
> That should do 6+ Gb/s.
>
>
> bonnie++ should be able to measure that.  (It's been a while, but I
> seem to recall that bonnie++ expects raw drives and nukes your data. 
> So, it could take some effort to use it.)
>
> https://www.coker.com.au/bonnie++/
>
>
>> An additional 2x 960GB SSD added. These two SSD are partitioned with a
>> small partition begin used for a ZIL log, and larger partion arranged
>> for
>> L2ARC cache.
>
> Assuming the ZIL is mirrored, that should do 5+ Gb/s.
>
>
> Assuming the L2ARC is striped, that should do 10+ Gb/s.
>
>
> I dont' know how to test ZIL and L2ARC in isolation, but dbench should
> be able to test what ZFS exposes, both locally and over NFS:
>
> https://dbench.samba.org/
>
>
>> Additionally the host has 64GB RAM and 16 CPU cores (AMD Opteron 2Ghz)
>
> That should do 20+ Gb/s.
>
>
> Memtest86+ will be to measure:
>
> http://www.memtest.org/
>
>
>> A dataset from the pool is exported via NFS to a number of Debian
>> Gnu/Linux hosts running a xen hypervisor. These run several disk image
>> based virtual machines
>>
>> In general use, the FreeBSD NFS host sees very little read IO, which
>> is to expected
>> as the RAM cache  and L2ARC are designed to minimise the amount of
>> read load
>> on the disks.
>>
>> However we're starting to see high load ( mostly IO WAIT ) on the Linux
>> virtualisation hosts, and virtual machines - with kernel timeouts
>> occurring resulting in crashes and instability.
>>
>> I believe this may be due to the limited number of random write IOPS
>> available
>> on the zpool NFS export.
>>
>> I can get sequential writes and reads to and from the NFS server at
>> speeds that approach the maximum the network provides ( currently 1Gb/s
>> + Jumbo Frames, and I could increase this by bonding multiple
>> interfaces together. )
>>
>> However day to day usage does not show network utilisation anywhere near
>> this maximum.
>>
>> If I look at the output of `zpool iostat -v tank 1 ` I see that every
>> five seconds or so, the numner of write operation go to > 2k
>>
>> I think this shows that the I'm hitting the limit that the spinning disk
>> can provide in this workload.
>>
>> As a cost effective way to improve this ( rather than replacing the
>> whole chassis ) I was considering replacing the 1TB HDD with 1TB SSD,
>> for the improved IOPS.
>>
>> I wonder if there were any opinions within the community here, on
>>
>> 1. What metrics can I gather to confirm the disk write IO as bottleneck?
>>
>> 2. If the proposed solution will have the required effect?  That is an
>> decrease in the IOWAIT on the GNU/Linux virtualization hosts.
>
>
> I infer your network to be:
>
> - 1 host running FreeBSD (freebsd-version? uname -a?) and an NFS
> server (version?).
>
> - N (how many?) Debian GNU/Linux hosts (/etc/debian-version?  uname
> -a?), each running a Xen hypervisor (version?) and an NFS client.
>
> - The VM's are configured to see their drives as local devices (e.g.
> the VM's are not running NFS clients connected to the FreeBSD NFS
> server).
>
> - Gigabit switch (make? model?).
>
> - 1 Gigabit connection between switch and each host.
>
>
> As you have correctly stated, you need visibility on the relevant
> performance metrics to make informed decisions.  In addition to the
> above tools:
>
> - For networking, I'd try netstat:
>
> http://netstat.net/
>
> - For drive I/O, I use nmon on Debian:
>
> https://en.wikipedia.org/wiki/Nmon
>
> - I believe iostat is available on both:
>
> https://en.wikipedia.org/wiki/Iostat
>
> - For CPU's, RAM, and swap, I use top.
>
> https://en.wikipedia.org/wiki/Top_(software)
>
> - You seem to have found at least one ZFS tool.
>
>
> As others have stated, you will want to ensure that all the pieces are
> reasonably in tune -- VM, NFS client, Xen, Debian networking, switch,
> FreeBSD networking, NFS server, ZFS, etc..  I'd start by looking for
> errors and/or warnings in the usual places (dmesg, /var/log, etc.).  I
> typically leave the settings at the installer defaults, unless I have
> some compelling reason to make a change (at least one reader made a
> suggestion).  Be sure to keep good notes if you're going to muck with
> the settings.
>
>
> As for 'zpool iostat -v tank 1', I suspect ZFS is telling you that it
> is flushing writes to the HDD's every five seconds.  If flushes always
> complete before the next scheduled flush, replacing the HDD's with
> SSD's probably will not help with the VM IO WAIT and kernel timeout
> problems. But, if the flushes are overrunning each other during peak
> usage, you may have found the bottleneck.
>
>
> That said, I suspect that the root cause of the VM IO WAIT and kernel
> timeout problems is that the virtual machines need a low latency
> connection to their system drives, temporary file systems, and/or swap
> devices, and they aren't getting it.  I would not bet on NFS to
> provide this, even with SSD's instead of HDD's.  I would bet on local
> resources.  I suggest:
>
> 1.  Put 2 mirrored SSD's in each Xen server.
>
> 2.  Put VM system drives on the local SSD mirror.
>
> 3.  Put VM /tmp file systems on the local SSD mirror, or on RAM:
>
> https://en.wikipedia.org/wiki/Tmpfs
>
> 4.  Put VM swap devices on the local SSD mirror, or on RAM:
>
> https://en.wikipedia.org/wiki/Zram
>
> 5.  Put VM data drives on NFS.
>
>
> I am unsure if it is better to do the "on RAM" and "on NFS" ideas at
> the Xen level or within each VM.  Performance is one consideration. 
> Others considerations are security and accountability -- e.g. do
> customers have root on the VM's?
>
>
> To improve NFS performance:
>
> 1.  Enlarging the pipe between the NFS server and the switch --
> bonding (your idea), upgrade to 10 Gb/s, etc..
>
> 2.  Enlarge the pipes between the Xen hosts and the switch.
>
> 3.  Add NIC's to the NFS server, add switches, and divide up the Xen
> hosts across the switches.
>
> 4.  Add NIC's to the NFS server, one per Xen host, and make direct
> connections between the NFS server and each Xen host.
>
>
> Please let us know how it goes.  :-)
>
>
> David
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to
> "freebsd-questions-unsubscribe@freebsd.org"