From owner-freebsd-questions@freebsd.org Mon Dec 7 22:23:26 2020 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 38FDC4B8C5F for ; Mon, 7 Dec 2020 22:23:26 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.49.70]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "gromit.dlib.vt.edu", Issuer "Chumby Certificate Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CqdB53srCz3w7n for ; Mon, 7 Dec 2020 22:23:25 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from mather.gromit23.net (unknown [73.99.214.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gromit.dlib.vt.edu (Postfix) with ESMTPSA id EEDB6130; Mon, 7 Dec 2020 17:23:18 -0500 (EST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: effect of differing spindle speeds on prospective zfs vdevs From: Paul Mather In-Reply-To: Date: Mon, 7 Dec 2020 17:23:18 -0500 Cc: tech-lists@zyxst.net Content-Transfer-Encoding: quoted-printable Message-Id: References: To: freebsd-questions@freebsd.org X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Rspamd-Queue-Id: 4CqdB53srCz3w7n X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=vt.edu (policy=none); spf=none (mx1.freebsd.org: domain of paul@gromit.dlib.vt.edu has no SPF policy when checking 128.173.49.70) smtp.mailfrom=paul@gromit.dlib.vt.edu X-Spamd-Result: default: False [-2.50 / 15.00]; RCVD_TLS_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FREEFALL_USER(0.00)[paul]; FROM_HAS_DN(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[128.173.49.70:from]; MV_CASE(0.50)[]; MID_RHS_MATCH_FROM(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; ARC_NA(0.00)[]; SPAMHAUS_ZRD(0.00)[128.173.49.70:from:127.0.2.255]; RECEIVED_SPAMHAUS_PBL(0.00)[73.99.214.146:received]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_SPF_NA(0.00)[no SPF record]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:1312, ipnet:128.173.0.0/16, country:US]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-questions]; DMARC_POLICY_SOFTFAIL(0.10)[vt.edu : No valid SPF, No valid DKIM,none] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Dec 2020 22:23:26 -0000 On Sat, 5 Dec 2020 19:16:33 +0000, tech-lists = wrote: > Hi, >=20 > On Sat, Dec 05, 2020 at 08:51:08AM -0500, Paul Mather wrote: >> IIRC, ZFS pools have a single ashift for the entire pool, so you = should=20 >> set it to accommodate the 4096/4096 devices to avoid performance=20 >> degradation. I believe it defaults to that now, and should = auto-detect=20 >> anyway. But, in a mixed setup of vdevs like you have, you should be=20= >> using ashift=3D12. >>=20 >> I believe having an ashift=3D9 on your mixed-drive setup would have = the=20 >> biggest performance impact in terms of reducing performance. >=20 > Part of my confusion about the ashift thing is I thought ashift=3D9 = was for > 512/512 logical/physical. Is this still the case? >=20 > On a different machine which has been running since FreeBSD12 was = -current, > one of the disks in the array went bang. zdb shows ashift=3D9 (as was = default > when it was created). The only available replacement was an otherwise=20= > identical disk but 512 logical/4096 physical. zpool status mildly = warns=20 > about preformance degradation like this: >=20 > ada2 ONLINE 0 0 0 block size: 512B configured, 4096B = native >=20 > state: ONLINE > status: One or more devices are configured to use a non-native block = size. > Expect reduced performance. > action: Replace affected devices with devices that support the > configured block size, or migrate data to a properly configured > pool. >=20 > The other part of my confusion is that I understood zfs to set its own=20= > blocksize on the fly. You're correct in that ZFS has its own concept of a block size (the = "recordsize" property) but this is not the same as the block size = concerning ashift. When "zpool" complains about "non-native block size" = it is talking about the physical block size of the underlying vdev. = That is the smallest unit of data that are read or written from the = device. (It also has an impact on where partitions can be addressed.) When hard drives became larger the number of bits used to address = logical blocks (LBAs) became insufficient to reference all blocks on the = device. One way around this, and to enable devices to store more total = data, was to make the referenced blocks larger. (Larger block sizes are = also good in that they require relatively less space for ECC data.) = Hence, the 4K "advanced format" drives arrived. Before that, block = (a.k.a. sector) sizes typically had been 512 bytes for hard drives. = After, it became 4096 bytes. For some drives, the device actually utilises 4096-byte sectors but = advertises a 512-byte sector size to the outside world. =46rom a read = standpoint this doesn't create a problem. It is when writing that you = can incur performance issues. This is because writing/updating a = 512-byte sector within a 4096-byte physical sector involves a = read-modify-write operation: the original 4096-byte contents must be = read, then the 512-byte subset updated, and finally the new 4096-byte = whole re-written back to disk. That involves more than simply writing a = 512-byte block as-is to a 512-byte sector. (In similar fashion, = partitions not aligned on a 4K boundary can incur performance = degradation for 4096-byte physical sectors that advertise as 512-byte.) > (I guess there must be some performance degradation but it's not > yet enough for me to notice. Or it might only be noticable if low on = space). ZFS has a lot of caching, plus the use of ZIL "batches" writes, and all = of this can ameliorate the effects of misaligned block sizes and = partition boundaries. (Large sequential writes are best for = performance, especially in spinning disks that incur penalties for head = movement and can incur rotational delays.) But, if you have a = write-intensive pool, you are unnecessarily causing yourself a = performance hit by not using the correct ashift and/or partition = boundaries. BTW, low space mainly affects performance due to fragmentation. It is a = different issue vs. mismatched block size (ashift). When I replaced my ashift=3D9 512-byte drives I eventually recreated the = pool with ashift=3D12. Using ashift=3D12 on pools with 512-byte sector = size drives will not incur any performance penalty, which is why ashift = defaults to 12 nowadays. (I wouldn't be surprised if the default = changes to ashift=3D13 due to the prevalence of SSDs these days.) Cheers, Paul.