From owner-freebsd-fs@FreeBSD.ORG Wed Oct 12 16:50:26 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23931106566B for ; Wed, 12 Oct 2011 16:50:26 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id B496F8FC0C for ; Wed, 12 Oct 2011 16:50:25 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p9CGoFY4005781 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 12 Oct 2011 19:50:20 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4E95C546.70904@digsys.bg> Date: Wed, 12 Oct 2011 19:50:14 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111007 Thunderbird/7.0.1 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4E95AE08.7030105@lerctr.org> <20111012155938.GA24649@icarus.home.lan> In-Reply-To: <20111012155938.GA24649@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: AF (4096 byte sector) drives: Can you mix/match in a ZFS pool? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Oct 2011 16:50:26 -0000 On 12.10.11 18:59, Jeremy Chadwick wrote: > On Wed, Oct 12, 2011 at 10:11:04AM -0500, Larry Rosenman wrote: >> I have a root on ZFS box with 6 drives, all 400G (except one 500G) >> in a pool. >> >> I want to upgrade to 2T or 3T drives, but was wondering if you can >> mix/match while doing the drive by drive >> replacement. >> >> This is on 9.0-BETA3 if that matters. > This is a very good question, and opens a large can of worms. My gut > feeling tells me this discussion is going to be very long. > > I'm going to say that no, mixing 512-byte and 4096-byte sector drives in > a single vdev is a bad idea. Here's why: This was not the original question. The original question is whether replacing 512-byte sector drives in a 512-byte sector aligned zpool with 4096-byte sector drives is possible. It is possible, of course, as most 4096-byte drives today emulate 512-byte drives and some even pretend to be 512-byte sector drives. Performance might degrade, this depends on the workload. In some cases the performance might be way bad. > > The procedure I've read for doing this is as follows: > > ada0 = 512-byte sector disk > ada1 = 4096-byte sector disk > ada2 = 512-byte sector disk > > gnop create -S 4096 ada1 > zpool create mypool raidz ada0 ada1.nop ada2 > zdb | grep ashift > 512-byte alignment> > zpool export mypool > gnop destroy ada1.nop > zpool import mypool It is not important which of the underlying drives will be gnop-ed. You may well gnop all of these. The point is, that ZFS uses the largest sector size of any of the underlying devices to determine the ashift value. That is the "minimum write" value, or the smallest unit of data ZFS will write in an I/O. > Circling back to the procedure I stated above: this would result in an > ashift=12 alignment for all I/O to all underlying disks. How do you > think your 512-byte sector drives are going to perform when doing reads > and writes? (Answer: badly) The gnop trick is used not because you will ask a 512-byte sector drive to write 8 sectors with one I/O, but because you may ask an 4096-byte sector drive to write only 512 bytes -- which for the drive means it has to read 4096 bytes, modify 512 of these bytes and write back 4096 bytes. > So my advice is do not mix-match 512-byte and 4096-byte sector disks in a > vdev that consists of multiple disks. > The proper way to handle this is to create your zpool with 4096-byte alignment, that is, for the time being by using the above gnop 'hack'. This way, you are sure to not have performance implications no matter what (512 or 4096 byte) drives you use in the vdev. There should be no implications to having one vdev with 512 byte alignment and another with 4096 byte alignment. ZFS is smart enough to issue minimum of 512 byte writes to the former and 4096 bytes to the latter thus not creating any bottleneck. Daniel PS: I didn't say you are wrong. ;)