Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Jan 2013 11:19:52 -0000
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Borja Marcos" <borjam@sarenet.es>, <freebsd-scsi@freebsd.org>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: Problem adding SCSI quirks for a SSD, 4K sector and ZFS
Message-ID:  <A71C9D2FA3984F6A84CED6DE3E21DDCF@multiplay.co.uk>
References:  <492280E6-E3EE-4540-92CE-C535C8943CCF@sarenet.es>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.

------=_NextPart_000_0014_01CDFA24.BAC45C90
Content-Type: text/plain;
	format=flowed;
	charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit


----- Original Message ----- 
From: "Borja Marcos" <borjam@sarenet.es>
To: <freebsd-scsi@freebsd.org>
Cc: "FreeBSD Filesystems" <freebsd-fs@freebsd.org>
Sent: Thursday, January 24, 2013 9:46 AM
Subject: Problem adding SCSI quirks for a SSD, 4K sector and ZFS


>
> Hello,
>
> Crossposting to FreeBSD-fs, as I am wondering if I have had a problem with ZFS and sector size detection as well.
>
> I am doing tests with an OCZ Vertex 4 connected to a SAS backplane.
>
> < OCZ-VERTEX4 1.5>                 at scbus6 target 22 lun 0 (pass19,da15)
>
> (The blank before "OCZ" really appears there)
>
> pass19: < OCZ-VERTEX4 1.5> Fixed Direct Access SCSI-5 device
> pass19: Serial Number OCZ-1SVG6KZ2YRMSS8E1
> pass19: 3.300MB/s transfers
>
> I am bypassing an "aac" RAID card so that the disks are directly attached to the da driver, instead of relying on the so-called 
> JBOD feature.
>
> I have had a weird problem, with the disk being unresponsive to the REQUEST CAPACITY(16) command. Weird, seems it timeouts.
>
> So, just to complete the tests, I have added a quirk to scsi_da.c. Anyway, I also need the disk to be recognized as a 4K sector 
> drive.
>
> I created a new quirk, called it DA_Q_NO_RC16, and added an  entry to the quirk table, so that these drives are recognized as 4K 
> drives  and the driver doesn't try to send a RC(16) command.
>
> diff scsi_da.c.orig scsi_da.c
> 93c93,94
> < DA_Q_4K = 0x08
> ---
>> DA_Q_4K = 0x08,
>> DA_Q_NO_RC16 = 0x10
> 811a813,817
>> /* OCZ Vertex 4 firmware 1.5 */
>> { T_DIRECT, SIP_MEDIA_FIXED, "", "OCZ-VERTEX4", "*" },
>> /*quirks*/DA_Q_NO_RC16 | DA_Q_4K
>> },
>> {
> 1635,1636c1641,1646
> < /* Predict whether device may support READ CAPACITY(16). */
> < if (SID_ANSI_REV(&cgd->inq_data) >= SCSI_REV_SPC3) {
> ---
>> /*
>> * Predict whether device may support READ CAPACITY(16).
>> * BUT Some disks don't support RC(16) even though they should.
>> */
>> if ((SID_ANSI_REV(&cgd->inq_data) >= SCSI_REV_SPC3)
>> && !(softc->quirks & DA_Q_NO_RC16) ) {
>
>
>
> I think it's working. I haven't seen any more RC(16) errors, and the disk is working fine. Anyway I am not sure I've done it 
> right. After adding the 4K quirk and rebooting, GEOM_PART complained that the partitions weren't aligned to 4K
>
> /var/log/messages.0:Jan 23 16:01:30  kernel: GEOM_PART: partition 1 is not aligned on 4096 bytes
> /var/log/messages.0:Jan 23 16:01:30  kernel: GEOM_PART: partition 2 is not aligned on 4096 bytes
>
> So it seems it works. However, when using the disk for ZFS, it still detects a 512 byte sector size, which is odd.
>
> Jan 23 16:01:30 rasputin kernel: GEOM: new disk da15
> Jan 23 16:01:30 rasputin kernel: da15 at aacp0 bus 0 scbus6 target 22 lun 0
> Jan 23 16:01:30 rasputin kernel: da15: < OCZ-VERTEX4 1.5> Fixed Direct Access SCSI-5 device
> Jan 23 16:01:30 rasputin kernel: da15: Serial Number OCZ-1SVG6KZ2YRMSS8E1
> Jan 23 16:01:30 rasputin kernel: da15: 3.300MB/s transfers
> Jan 23 16:01:30 rasputin kernel: da15: 488386MB (1000215216 512 byte sectors: 255H 63S/T 62260C)
>
>
> diskinfo is returning a sector size of 512 bytes, and a stripesize of 4096. Is this correct? ZFS is still detecting it as a 512 
> byte sector disk.
>
> /dev/da15
> 512         # sectorsize
> 512110190592 # mediasize in bytes (477G)
> 1000215216  # mediasize in sectors
> 4096        # stripesize
> 0           # stripeoffset
> 62260       # Cylinders according to firmware.
> 255         # Heads according to firmware.
> 63          # Sectors according to firmware.
> OCZ-1SVG6KZ2YRMSS8E1 # Disk ident.
>
>
>
> So, to summarize:
>
> If the quirk was working, should diskinfo return a sector size of 512 bytes, or is it correct  to show a "stripesize" of 4096?
>
> Do we have a bug either on ZFS  or the disk drivers? The same experiment on another system (both are 9.1-RELEASE) and a similar 
> drive attached to a SATA controller, also adding a 4K sector quirk for it, defines a stripe size instead of a sector size.

Simple answer is ZFS doesn't understand quirks. The attached patch does
what you're looking for along with a few other things, see notes at
the top for details.

Its not a final version as there's still some discussion about
implementation details but it should do what your looking for.

    Regards
    Steve 


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.
------=_NextPart_000_0014_01CDFA24.BAC45C90
Content-Type: application/octet-stream;
	name="zzz-zfs-ashift-fix.patch"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="zzz-zfs-ashift-fix.patch"

Changes zfs zpool initial / desired ashift to be based off stripesize=0A=
instead of sectorsize making it compatible with drives marked with=0A=
the 4k sector size quirk.=0A=
=0A=
Without the correct min block size BIO_DELETE requests passed to=0A=
a large number of current SSD's via TRIM don't actually perform=0A=
any LBA TRIM so its vital for the correct operation of TRIM to get=0A=
the correct min block size.=0A=
=0A=
To do this we added the additional dashift (desired ashift) to=0A=
vdev_open_func_t calls. This was needed as just updating ashift to=0A=
be based off stripesize would mean that a devices reported minimum=0A=
transfer size (ashift) could increase and that in turn would cause=0A=
member devices to be unusable and hence break pools with error=0A=
ZFS-8000-5E.=0A=
=0A=
The global minimum ashift used for new zpools can now also be=0A=
tuned using the vfs.zfs.min_create_ashift sysctl. This defaults=0A=
to 12 (4096 byte blocks) in order to optimise for newer disks which=0A=
are migrating from 512 to 4096 byte sectors.=0A=
=0A=
The value of vfs.zfs.min_create_ashift is limited to min of=0A=
SPA_MINBLOCKSHIFT (9) and a max of SPA_MAXBLOCKSHIFT (17).=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c.orig	=
2011-06-06 09:36:46.000000000 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c	=
2012-11-02 14:47:55.293668071 +0000=0A=
@@ -32,6 +32,8 @@=0A=
 #include <sys/sunldi.h>=0A=
 #include <sys/fm/fs/zfs.h>=0A=
 =0A=
+extern int zfs_min_ashift;=0A=
+=0A=
 /*=0A=
  * Virtual device vector for disks.=0A=
  */=0A=
@@ -103,7 +105,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A=
+vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, uint64_t =
*dashift)=0A=
 {=0A=
 	spa_t *spa =3D vd->vdev_spa;=0A=
 	vdev_disk_t *dvd;=0A=
@@ -284,7 +286,7 @@=0A=
 	}=0A=
 =0A=
 	/*=0A=
-	 * Determine the device's minimum transfer size.=0A=
+	 * Determine the device's minimum and desired transfer size.=0A=
 	 * If the ioctl isn't supported, assume DEV_BSIZE.=0A=
 	 */=0A=
 	if (ldi_ioctl(dvd->vd_lh, DKIOCGMEDIAINFOEXT, (intptr_t)&dkmext,=0A=
@@ -292,6 +294,7 @@=0A=
 		dkmext.dki_pbsize =3D DEV_BSIZE;=0A=
 =0A=
 	*ashift =3D highbit(MAX(dkmext.dki_pbsize, SPA_MINBLOCKSIZE)) - 1;=0A=
+	*dashift =3D highbit(MAX(dkmext.dki_pbsize, (1ULL << zfs_min_ashift))) =
- 1;=0A=
 =0A=
 	/*=0A=
 	 * Clear the nowritecache bit, so that on a vdev_reopen() we will=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c.orig	=
2012-01-05 22:31:25.000000000 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c	=
2012-11-02 14:47:38.252107541 +0000=0A=
@@ -30,6 +30,8 @@=0A=
 #include <sys/fs/zfs.h>=0A=
 #include <sys/fm/fs/zfs.h>=0A=
 =0A=
+extern int zfs_min_ashift;=0A=
+=0A=
 /*=0A=
  * Virtual device vector for files.=0A=
  */=0A=
@@ -47,7 +49,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A=
+vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, uint64_t =
*dashift)=0A=
 {=0A=
 	vdev_file_t *vf;=0A=
 	vnode_t *vp;=0A=
@@ -127,6 +129,7 @@=0A=
 =0A=
 	*psize =3D vattr.va_size;=0A=
 	*ashift =3D SPA_MINBLOCKSHIFT;=0A=
+	*dashift =3D zfs_min_ashift;=0A=
 =0A=
 	return (0);=0A=
 }=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c.orig	=
2012-11-02 12:20:15.918986181 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	=
2012-11-02 14:47:48.135273692 +0000=0A=
@@ -36,6 +36,8 @@=0A=
 #include <geom/geom.h>=0A=
 #include <geom/geom_int.h>=0A=
 =0A=
+extern int zfs_min_ashift;=0A=
+=0A=
 /*=0A=
  * Virtual device vector for GEOM.=0A=
  */=0A=
@@ -408,7 +410,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A=
+vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, uint64_t =
*dashift)=0A=
 {=0A=
 	struct g_provider *pp;=0A=
 	struct g_consumer *cp;=0A=
@@ -494,9 +496,10 @@=0A=
 	*psize =3D pp->mediasize;=0A=
 =0A=
 	/*=0A=
-	 * Determine the device's minimum transfer size.=0A=
+	 * Determine the device's minimum and desired transfer size.=0A=
 	 */=0A=
 	*ashift =3D highbit(MAX(pp->sectorsize, SPA_MINBLOCKSIZE)) - 1;=0A=
+	*dashift =3D highbit(MAX(pp->stripesize, (1ULL << zfs_min_ashift))) - =
1;=0A=
 =0A=
 	/*=0A=
 	 * Clear the nowritecache settings, so that on a vdev_reopen()=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c.orig	=
2012-07-03 11:49:22.342245151 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c	=
2012-07-03 11:58:02.161948585 +0000=0A=
@@ -127,7 +127,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_mirror_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift)=0A=
+vdev_mirror_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift, =
uint64_t *dashift)=0A=
 {=0A=
 	int numerrors =3D 0;=0A=
 	int lasterror =3D 0;=0A=
@@ -150,6 +150,7 @@=0A=
 =0A=
 		*asize =3D MIN(*asize - 1, cvd->vdev_asize - 1) + 1;=0A=
 		*ashift =3D MAX(*ashift, cvd->vdev_ashift);=0A=
+		*dashift =3D MAX(*dashift, cvd->vdev_dashift);=0A=
 	}=0A=
 =0A=
 	if (numerrors =3D=3D vd->vdev_children) {=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c.orig	=
2012-07-03 11:49:10.545275865 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c	=
2012-07-03 11:58:07.670470640 +0000=0A=
@@ -40,7 +40,7 @@=0A=
 =0A=
 /* ARGSUSED */=0A=
 static int=0A=
-vdev_missing_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A=
+vdev_missing_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, =
uint64_t *dashift)=0A=
 {=0A=
 	/*=0A=
 	 * Really this should just fail.  But then the root vdev will be in the=0A=
@@ -50,6 +50,7 @@=0A=
 	 */=0A=
 	*psize =3D 0;=0A=
 	*ashift =3D 0;=0A=
+	*dashift =3D 0;=0A=
 	return (0);=0A=
 }=0A=
 =0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c.orig	=
2012-07-03 11:49:03.675875505 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c	=
2012-07-03 11:58:15.334806334 +0000=0A=
@@ -1447,7 +1447,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_raidz_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift)=0A=
+vdev_raidz_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift, uint64_t =
*dashift)=0A=
 {=0A=
 	vdev_t *cvd;=0A=
 	uint64_t nparity =3D vd->vdev_nparity;=0A=
@@ -1476,6 +1476,7 @@=0A=
 =0A=
 		*asize =3D MIN(*asize - 1, cvd->vdev_asize - 1) + 1;=0A=
 		*ashift =3D MAX(*ashift, cvd->vdev_ashift);=0A=
+		*dashift =3D MAX(*dashift, cvd->vdev_dashift);=0A=
 	}=0A=
 =0A=
 	*asize *=3D vd->vdev_children;=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c.orig	=
2012-07-03 11:49:27.901760380 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c	=
2012-07-03 11:58:19.704427068 +0000=0A=
@@ -50,7 +50,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_root_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift)=0A=
+vdev_root_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift, uint64_t =
*dashift)=0A=
 {=0A=
 	int lasterror =3D 0;=0A=
 	int numerrors =3D 0;=0A=
@@ -78,6 +78,7 @@=0A=
 =0A=
 	*asize =3D 0;=0A=
 	*ashift =3D 0;=0A=
+	*dashift =3D 0;=0A=
 =0A=
 	return (0);=0A=
 }=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c.orig	=
2012-10-22 20:41:50.234005351 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c	2012-10-22 =
20:42:16.355805894 +0000=0A=
@@ -1125,6 +1125,7 @@=0A=
 	uint64_t osize =3D 0;=0A=
 	uint64_t asize, psize;=0A=
 	uint64_t ashift =3D 0;=0A=
+	uint64_t dashift =3D 0;=0A=
 =0A=
 	ASSERT(vd->vdev_open_thread =3D=3D curthread ||=0A=
 	    spa_config_held(spa, SCL_STATE_ALL, RW_WRITER) =3D=3D =
SCL_STATE_ALL);=0A=
@@ -1154,7 +1155,7 @@=0A=
 		return (ENXIO);=0A=
 	}=0A=
 =0A=
-	error =3D vd->vdev_ops->vdev_op_open(vd, &osize, &ashift);=0A=
+	error =3D vd->vdev_ops->vdev_op_open(vd, &osize, &ashift, &dashift);=0A=
 =0A=
 	/*=0A=
 	 * Reset the vdev_reopening flag so that we actually close=0A=
@@ -1255,14 +1256,16 @@=0A=
 		 */=0A=
 		vd->vdev_asize =3D asize;=0A=
 		vd->vdev_ashift =3D MAX(ashift, vd->vdev_ashift);=0A=
+		vd->vdev_dashift =3D MAX(dashift, vd->vdev_dashift);=0A=
 	} else {=0A=
 		/*=0A=
 		 * Make sure the alignment requirement hasn't increased.=0A=
 		 */=0A=
 		if (ashift > vd->vdev_top->vdev_ashift) {=0A=
+			printf("ZFS ashift open failure of %s (%ld > %ld)\n", vd->vdev_path, =
ashift, vd->vdev_top->vdev_ashift);=0A=
 			vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,=0A=
 			    VDEV_AUX_BAD_LABEL);=0A=
 			return (EINVAL);=0A=
 		}=0A=
 	}=0A=
 =0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c.orig	=
2012-11-05 15:27:52.092194343 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c	=
2012-11-05 15:53:26.449021023 +0000=0A=
@@ -145,9 +145,12 @@=0A=
 #include <sys/fs/zfs.h>=0A=
 =0A=
 static boolean_t vdev_trim_on_init =3D B_TRUE;=0A=
+static boolean_t vdev_dashift_enable =3D B_TRUE;=0A=
 SYSCTL_DECL(_vfs_zfs_vdev);=0A=
 SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, trim_on_init, CTLFLAG_RW,=0A=
     &vdev_trim_on_init, 0, "Enable/disable full vdev trim on =
initialisation");=0A=
+SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, optimal_ashift, CTLFLAG_RW,=0A=
+    &vdev_dashift_enable, 0, "Enable/disable optimal ashift usage on =
initialisation");=0A=
 =0A=
 /*=0A=
  * Basic routines to read and write from a vdev label.=0A=
@@ -282,6 +285,16 @@=0A=
 		    vd->vdev_ms_array) =3D=3D 0);=0A=
 		VERIFY(nvlist_add_uint64(nv, ZPOOL_CONFIG_METASLAB_SHIFT,=0A=
 		    vd->vdev_ms_shift) =3D=3D 0);=0A=
+		/*=0A=
+		 * We use the max of ashift and dashift (the desired/optimal=0A=
+		 * ashift), which is typically the stripesize of a device, to=0A=
+		 * ensure we get the best performance from underlying devices.=0A=
+		 * =0A=
+		 * Its done here as it should only ever have an effect on new=0A=
+		 * zpool creation.=0A=
+		 */=0A=
+		if (vdev_dashift_enable)=0A=
+			vd->vdev_ashift =3D MAX(vd->vdev_ashift, vd->vdev_dashift);=0A=
 		VERIFY(nvlist_add_uint64(nv, ZPOOL_CONFIG_ASHIFT,=0A=
 		    vd->vdev_ashift) =3D=3D 0);=0A=
 		VERIFY(nvlist_add_uint64(nv, ZPOOL_CONFIG_ASIZE,=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h.orig	=
2012-10-22 20:40:08.361577293 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h	=
2012-10-22 21:02:52.447781800 +0000=0A=
@@ -55,7 +55,7 @@=0A=
 /*=0A=
  * Virtual device operations=0A=
  */=0A=
-typedef int	vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t =
*ashift);=0A=
+typedef int	vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t =
*ashift, uint64_t *dashift);=0A=
 typedef void	vdev_close_func_t(vdev_t *vd);=0A=
 typedef uint64_t vdev_asize_func_t(vdev_t *vd, uint64_t psize);=0A=
 typedef int	vdev_io_start_func_t(zio_t *zio);=0A=
@@ -119,6 +119,7 @@=0A=
 	uint64_t	vdev_asize;	/* allocatable device capacity	*/=0A=
 	uint64_t	vdev_min_asize;	/* min acceptable asize		*/=0A=
 	uint64_t	vdev_ashift;	/* block alignment shift	*/=0A=
+	uint64_t	vdev_dashift;	/* desired blk alignment shift	*/=0A=
 	uint64_t	vdev_state;	/* see VDEV_STATE_* #defines	*/=0A=
 	uint64_t	vdev_prevstate;	/* used when reopening a vdev	*/=0A=
 	vdev_ops_t	*vdev_ops;	/* vdev operations		*/=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c.orig	=
2012-11-02 14:56:29.474248887 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	2012-11-03 =
01:27:28.066912403 +0000=0A=
@@ -41,6 +41,30 @@=0A=
 #include <sys/spa_impl.h>=0A=
 #include <sys/dsl_deadlist.h>=0A=
 =0A=
+#define ZFS_MIN_ASHIFT SPA_MINBLOCKSHIFT=0A=
+/*=0A=
+ * Max ashift - limited by how labels are accessed by zio_read_phys =
using offsets=0A=
+ * within vdev_label_t=0A=
+ *=0A=
+ * If label access is fixed to work with ashift properly then the max =
should be=0A=
+ * set to SPA_MAXBLOCKSHIFT=0A=
+ */=0A=
+#define ZFS_MAX_ASHIFT 13=0A=
+/*=0A=
+ * Optimum ashift - defaults to 12 which results in a min block size of =
4096 as=0A=
+ * this is the optimum value for newer disks which are migrating from =
512 to 4096=0A=
+ * byte sectors=0A=
+ */=0A=
+#define ZFS_OPTIMUM_ASHIFT 12 =0A=
+=0A=
+/*=0A=
+ * Minimum ashift used when creating new pools=0A=
+ *=0A=
+ * This can be tuned using the sysctl vfs.zfs.min_create_ashift but is =
limited=0A=
+ * to a min of ZFS_MIN_ASHIFT and a max of ZFS_MAX_ASHIFT=0A=
+ * =0A=
+ */=0A=
+int zfs_min_ashift =3D MAX(SPA_MINBLOCKSHIFT, ZFS_OPTIMUM_ASHIFT);=0A=
 int zfs_no_write_throttle =3D 0;=0A=
 int zfs_write_limit_shift =3D 3;			/* 1/8th of physical memory */=0A=
 int zfs_txg_synctime_ms =3D 1000;		/* target millisecs to sync a txg */=0A=
@@ -54,6 +78,9 @@=0A=
 =0A=
 static pgcnt_t old_physmem =3D 0;=0A=
 =0A=
+#ifdef _KERNEL=0A=
+static int min_ashift_sysctl(SYSCTL_HANDLER_ARGS);=0A=
+=0A=
 SYSCTL_DECL(_vfs_zfs);=0A=
 TUNABLE_INT("vfs.zfs.no_write_throttle", &zfs_no_write_throttle);=0A=
 SYSCTL_INT(_vfs_zfs, OID_AUTO, no_write_throttle, CTLFLAG_RDTUN,=0A=
@@ -78,6 +105,32 @@=0A=
 TUNABLE_QUAD("vfs.zfs.write_limit_override", &zfs_write_limit_override);=0A=
 SYSCTL_QUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RDTUN,=0A=
     &zfs_write_limit_override, 0, "");=0A=
+SYSCTL_PROC(_vfs_zfs, OID_AUTO, min_create_ashift, CTLTYPE_INT | =
CTLFLAG_RW,=0A=
+    &zfs_min_ashift, 0, min_ashift_sysctl, "I",=0A=
+    "Minimum ashift used when creating new pools");=0A=
+=0A=
+static int=0A=
+min_ashift_sysctl(SYSCTL_HANDLER_ARGS)=0A=
+{=0A=
+	int error, value;=0A=
+=0A=
+	value =3D *(int *)arg1;=0A=
+=0A=
+	error =3D sysctl_handle_int(oidp, &value, 0, req);=0A=
+=0A=
+	if ((error !=3D 0) || (req->newptr =3D=3D NULL))=0A=
+		return (error);=0A=
+=0A=
+	if (value < ZFS_MIN_ASHIFT)=0A=
+		value =3D ZFS_MIN_ASHIFT;=0A=
+	else if (value > ZFS_MAX_ASHIFT)=0A=
+		value =3D ZFS_MAX_ASHIFT;=0A=
+=0A=
+	*(int *)arg1 =3D value;=0A=
+=0A=
+	return (0);=0A=
+}=0A=
+#endif=0A=
 =0A=
 int=0A=
 dsl_pool_open_special_dir(dsl_pool_t *dp, const char *name, dsl_dir_t =
**ddp)=0A=

------=_NextPart_000_0014_01CDFA24.BAC45C90--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A71C9D2FA3984F6A84CED6DE3E21DDCF>