From owner-freebsd-emulation@freebsd.org Thu Nov 17 12:32:07 2016 Return-Path: Delivered-To: freebsd-emulation@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4EE1C45EFD for ; Thu, 17 Nov 2016 12:32:07 +0000 (UTC) (envelope-from crest@rlwinm.de) Received: from smtp.rlwinm.de (smtp.rlwinm.de [IPv6:2a01:4f8:201:31ef::e]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ADE03D5D for ; Thu, 17 Nov 2016 12:32:07 +0000 (UTC) (envelope-from crest@rlwinm.de) Received: from vader9.bultmann.eu (unknown [87.253.189.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.rlwinm.de (Postfix) with ESMTPSA id 9F4E610FF1 for ; Thu, 17 Nov 2016 13:32:04 +0100 (CET) Subject: Re: bhyve: zvols for guest disk - yes or no? To: freebsd-emulation@freebsd.org References: <5be68f57-c9c5-7c20-f590-1beed55fd6bb@rlwinm.de> <582D97BC.8030801@b1t.name> From: Jan Bramkamp Message-ID: Date: Thu, 17 Nov 2016 13:32:04 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <582D97BC.8030801@b1t.name> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2016 12:32:08 -0000 On 17/11/2016 12:42, Volodymyr Kostyrko wrote: > Jan Bramkamp wrote: >> An other thing I learned the hard way is that ZVOL are set in stone at >> the ZVOL creation. You have to (cam)dd everything to change the block >> size. The default ZVOL block size is 8K which isn't wrong but your >> guests need to align their file systems (and swap) correctly or you'll >> suffer from write amplification. And ZFS RAID-Z really sucks for such >> small block sizes. Use mirrored VDEVs in your pools or you will suffer >> from massive metadata overhead and disappointing IOPS. > > This pole has two ends though. > > When you are working with file system default 8k block size is too > small. Setting it up to 64k will save a lot writes for host. This is due > to most current filesystems do work correctly with extents/big blocks: > > * Linux: ext4, xfs - full support; > * BSD: ufs - disabled, need to be enabled on format, zfs - full support; > * NTFS: no support, but you can use 64k blocks with Win XP - Win 10 > (though Win XP can't boot off such partition). > > Setting block size bigger makes fragmentation less common (check your > `zpool list`) and saves writes. When guest writes one 64k chunk ZFS > writes 8 separate 8k blocks (+ metadata and stuff) and this is not good > for the speed. > > On the opposite when you have some database inside VM you need to > prepare disk for it accordingly. Guides for using PostgreSQL and MySQL > apply to VM's too. MSSQL on the other hand uses 64k extents to work with > database internally so raising block size to 64k would be good for it. IIRC MySQL with InnoDB uses 16kiB blocks and Postgres uses 8kiB blocks. UFS2 in FreeBSD defaults to 32kiB blocks with 4kiB fragments and you can't raise the block size to more than 64kiB and the documentation recommends keeping the block size to fragement size ratio at 8/1, but I don't know why. This gets even more complicated if you want to use GELI because GELI can't reliably deal with blocks larger than the physical page size (4kiB on i386/amd64). The defaults work out perfectly in this case and changing them can break your systems in interesting ways like GELI trying to allocate a second page worth of memory after processing the first page and failing... If you your main use case for the ZVOLs is MySQL inside the guests you're probably best of with a 16KiB blocksize for the ZVOLs, aligning your partitions to multiples of 16KiB and checking your guest file system parameters as well. Keep in mind that I didn't test this myself. -- Jan Bramkamp