Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 May 2013 10:33:11 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Ajit Jain" <ajit.jain@cloudbyte.com>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: seeing data corruption with zfs trim functionality
Message-ID:  <35ABA7AAEB7F4D86A1ED54C4C47FEB49@multiplay.co.uk>
References:  <CAA71u6Y5dKZ9O0rqxCpx-9t7DYgTnPZSoNy-iHOnmzrOUYp%2Bvw@mail.gmail.com> <60316751643743738AB83DABC6A5934B@multiplay.co.uk> <20130429105143.GA1492@icarus.home.lan> <3AD1AB31003D49B2BF2EA7DD411B38A2@multiplay.co.uk> <C6AA4D0A7C49469ABB3C7440B1BCC108@multiplay.co.uk> <CAA71u6Zh7BbbdC=utqfR2MD1Nn=9euUDXHKqqu9NyBG-Jx%2B=Ow@mail.gmail.com> <9681E07546D348168052D4FC5365B4CD@multiplay.co.uk> <CAA71u6ZuO9CF0ECFS4z07-E5qPea-6SfNwkvhr_g6pFT5MV5yQ@mail.gmail.com> <CAA71u6YKGHDRVg6W_xnCNaA68bJvAZ2Lkp-UisiPqb1vKjJhfA@mail.gmail.com> <3E9CA9334E6F433A8F135ACD5C237340@multiplay.co.uk> <CAA71u6YZAKrmfTLU32f8UmYecmydwiqRT-OrR1ukZ9V6PGsU%2Bw@mail.gmail.com> <A05ACD84EB974E80B7142CE9982E479C@multiplay.co.uk> <93D0677B373A452BAF58C8EA6823783D@multiplay.co.uk> <CAA71u6bZ_4fb9FxYSwcrHBBApkZog30iQJGyTERi-xFMksud1g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.

------=_NextPart_000_0730_01CE57A0.ECC7BD20
Content-Type: text/plain;
	format=flowed;
	charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit

I've attacked the two patch sets I'm looking to MFC to stable-9, one
adds BIO_DELETE CAM changes and the other is ZFS TRIM support.

They should both apply cleanly to stable-9, if you could test with
those on your machine and let me know.

    Regards
    Steve

----- Original Message ----- 
From: "Ajit Jain" <ajit.jain@cloudbyte.com>


> Hi Steven,
> 
> FW version on the setup is P15.
> I will upgrade the FW to P16, but I think my
> best bet will be to update code base to 9 stable as unlike you,
> I was seeing corruption for all three delete methods.
> 
> thanks
> ajit
> 
> On Sat, May 18, 2013 at 4:15 AM, Steven Hartland <killing@multiplay.co.uk>wrote:
> 
>> ----- Original Message ----- From: "Steven Hartland" <
>> killing@multiplay.co.uk>
>>
>>
>>> After initially seeing not issues, our overnight monitoring started
>>> moaning
>>> big time on the test box. So we checked and there was zpool corruption as
>>> well
>>> as a missing boot loader and a corrupt GPT, so I believe we have
>>> reproduced
>>> your issue.
>>>
>>> After recovering the machine I created 3 pools on 3 different disks each
>>> running a different delete_method.
>>>
>>> We then re-ran the tests which resulted in the pool running with
>>> delete_method
>>> WS16 being so broken it had suspended IO. A reboot resulted in it once
>>> again
>>> reporting no partition table via gpart.
>>>
>>> A third test run again produced a corrupt pool for WS16.
>>>
>>> I've conducted a preliminary review of the CAM WS16 code path along with
>>> SBC-3
>>> spec which didn't identify any obvious issues.
>>>
>>> Given we're both using LSI 2008 based controllers it could be FW issue
>>> specific
>>> to WS16 but that's just speculation atm, so I'll continue to investigate.
>>>
>>> If you could re-test you end without using WS16 to see if you can
>>> reproduce the
>>> problem with either UNMAP or ATA_TRIM that would be a very useful data
>>> point.
>>>
>>
>> After much playing I narrow down a test case of one delete which was
>> causing
>> disc corruption for us (deleted the partition table instead of data in
>> the middle of the disk).
>>
>> The conclusion is LSI 2008 HBA with FW below P13 will eat the data on your
>> SATA
>> disks if you use WS16 due to the following bug:-
>> SCGCQ00230159 (DFCT) - Write same command to a SATA drive that doesn't
>> support
>> SCT write same may write wrong region.
>>
>> After updating here to P16, which we would generally be running, but test
>> box
>> was new and hadnt updated yet the corruption issue is no longer
>> reproducable.
>>
>> So Ajit please check your FW version, I'm hoping to here your on something
>> below P13, P12 possibly?
>>
>> If so then this is your issue, to fix simply update to P16 and the problem
>> should be gone.
>>
>>
>>    Regards
>>    Steve
>>
>>
>> ==============================**==================
>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>> the person or entity to whom it is addressed. In the event of misdirection,
>> the recipient is prohibited from using, copying, printing or otherwise
>> disseminating it or any information contained in it.
>> In the event of misdirection, illegible or incomplete transmission please
>> telephone +44 845 868 1337
>> or return the E.mail to postmaster@multiplay.co.uk.
>>
>>
>

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.
------=_NextPart_000_0730_01CE57A0.ECC7BD20
Content-Type: application/octet-stream;
	name="mfc-bio_delete-bioqsort-stable-9.patch"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="mfc-bio_delete-bioqsort-stable-9.patch"

Enhanced BIO_DELETE support for CAM SCSI which adds ATA_TRIM support.=0A=
=0A=
Add the ability to enable / disable the sorting of BIO requests queue in =
CAM,=0A=
which is now disabled by default for non-rotating medium.=0A=
=0A=
MFC r246146 Format CDB output as 2 digit hex correcting the length=0A=
MFC r248922 Adds the ability to enable / disable sorting of BIO requests=0A=
MFC r248992 Added ATA Pass-Through support to CAM=0A=
MFC r249929 Removed unneeded tests in dadeletemethodset=0A=
MFC r249930 Added a sysctl to control the maximum size of a delete =
request=0A=
MFC r249931 Added Dataset Management defines to be used by TRIM=0A=
MFC r249933 Added the ability to send ATA identify and TRIM commands via =
SCSI=0A=
MFC r249934 Updated TRIM calculations in cam/ata to be based off =
ATA_DSM_* defines=0A=
MFC r249937 Refactored scsi_xpt use of device_has_vpd=0A=
MFC r249939 Added available delete methods discovery during device probe=0A=
MFC r249941 Automatically disable BIO queue sorting for non-rotating =
media=0A=
MFC r250033 Correct comment typo's=0A=
MFC r250179 Update probe flow so that devices with lbp can also disable =
disksort=0A=
MFC r250180 Fix probe in progress check in dareprobe=0A=
MFC r250181 Check for ATA Information VPD before querying for ATA=0A=
MFC r250183 Enable CAM SCSI to choice ATA TRIM during autodetection=0A=
Index: sys=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys	(revision 250577)=0A=
+++ sys	(working copy)=0A=
=0A=
Property changes on: sys=0A=
___________________________________________________________________=0A=
Modified: svn:mergeinfo=0A=
   Merged =
/head/sys:r246146,248922,248992,249930-249931,249933-249934,249937,249939=
,249941,250033,250179-250181,250183=0A=
Index: sys/cam/cam.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cam/cam.c	(revision 250577)=0A=
+++ sys/cam/cam.c	(working copy)=0A=
@@ -110,8 +110,17 @@=0A=
 =0A=
 #ifdef _KERNEL=0A=
 SYSCTL_NODE(_kern, OID_AUTO, cam, CTLFLAG_RD, 0, "CAM Subsystem");=0A=
+=0A=
+#ifndef CAM_DEFAULT_SORT_IO_QUEUES=0A=
+#define CAM_DEFAULT_SORT_IO_QUEUES 1=0A=
 #endif=0A=
 =0A=
+int cam_sort_io_queues =3D CAM_DEFAULT_SORT_IO_QUEUES;=0A=
+TUNABLE_INT("kern.cam.sort_io_queues", &cam_sort_io_queues);=0A=
+SYSCTL_INT(_kern_cam, OID_AUTO, sort_io_queues, CTLFLAG_RWTUN,=0A=
+    &cam_sort_io_queues, 0, "Sort IO queues to try and optimise disk =
access patterns");=0A=
+#endif=0A=
+=0A=
 void=0A=
 cam_strvis(u_int8_t *dst, const u_int8_t *src, int srclen, int dstlen)=0A=
 {=0A=
Index: sys/cam/cam.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cam/cam.h	(revision 250577)=0A=
+++ sys/cam/cam.h	(working copy)=0A=
@@ -228,6 +228,9 @@=0A=
 =0A=
 extern const struct cam_status_entry cam_status_table[];=0A=
 extern const int num_cam_status_entries;=0A=
+#ifdef _KERNEL=0A=
+extern int cam_sort_io_queues;=0A=
+#endif=0A=
 union ccb;=0A=
 =0A=
 #ifdef SYSCTL_DECL	/* from sysctl.h */=0A=
Index: sys/cam/scsi/scsi_all.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cam/scsi/scsi_all.c	(revision 250577)=0A=
+++ sys/cam/scsi/scsi_all.c	(working copy)=0A=
@@ -40,6 +40,9 @@=0A=
 #include <sys/systm.h>=0A=
 #include <sys/libkern.h>=0A=
 #include <sys/kernel.h>=0A=
+#include <sys/lock.h>=0A=
+#include <sys/malloc.h>=0A=
+#include <sys/mutex.h>=0A=
 #include <sys/sysctl.h>=0A=
 #else=0A=
 #include <errno.h>=0A=
@@ -53,8 +56,15 @@=0A=
 #include <cam/cam_queue.h>=0A=
 #include <cam/cam_xpt.h>=0A=
 #include <cam/scsi/scsi_all.h>=0A=
+#include <sys/ata.h>=0A=
 #include <sys/sbuf.h>=0A=
-#ifndef _KERNEL=0A=
+=0A=
+#ifdef _KERNEL=0A=
+#include <cam/cam_periph.h>=0A=
+#include <cam/cam_xpt_sim.h>=0A=
+#include <cam/cam_xpt_periph.h>=0A=
+#include <cam/cam_xpt_internal.h>=0A=
+#else=0A=
 #include <camlib.h>=0A=
 #include <stddef.h>=0A=
 =0A=
@@ -3136,7 +3146,7 @@=0A=
 	*cdb_string =3D '\0';=0A=
 	for (i =3D 0; i < cdb_len; i++)=0A=
 		snprintf(cdb_string + strlen(cdb_string),=0A=
-			 len - strlen(cdb_string), "%x ", cdb_ptr[i]);=0A=
+			 len - strlen(cdb_string), "%02hhx ", cdb_ptr[i]);=0A=
 =0A=
 	return(cdb_string);=0A=
 }=0A=
@@ -5847,6 +5857,101 @@=0A=
 }=0A=
 =0A=
 void=0A=
+scsi_ata_identify(struct ccb_scsiio *csio, u_int32_t retries,=0A=
+		  void (*cbfcnp)(struct cam_periph *, union ccb *),=0A=
+		  u_int8_t tag_action, u_int8_t *data_ptr,=0A=
+		  u_int16_t dxfer_len, u_int8_t sense_len,=0A=
+		  u_int32_t timeout)=0A=
+{=0A=
+	scsi_ata_pass_16(csio,=0A=
+			 retries,=0A=
+			 cbfcnp,=0A=
+			 /*flags*/CAM_DIR_IN,=0A=
+			 tag_action,=0A=
+			 /*protocol*/AP_PROTO_PIO_IN,=0A=
+			 /*ata_flags*/AP_FLAG_TDIR_FROM_DEV|=0A=
+				AP_FLAG_BYT_BLOK_BYTES|AP_FLAG_TLEN_SECT_CNT,=0A=
+			 /*features*/0,=0A=
+			 /*sector_count*/dxfer_len,=0A=
+			 /*lba*/0,=0A=
+			 /*command*/ATA_ATA_IDENTIFY,=0A=
+			 /*control*/0,=0A=
+			 data_ptr,=0A=
+			 dxfer_len,=0A=
+			 sense_len,=0A=
+			 timeout);=0A=
+}=0A=
+=0A=
+void=0A=
+scsi_ata_trim(struct ccb_scsiio *csio, u_int32_t retries,=0A=
+	      void (*cbfcnp)(struct cam_periph *, union ccb *),=0A=
+	      u_int8_t tag_action, u_int16_t block_count,=0A=
+	      u_int8_t *data_ptr, u_int16_t dxfer_len, u_int8_t sense_len,=0A=
+	      u_int32_t timeout)=0A=
+{=0A=
+	scsi_ata_pass_16(csio,=0A=
+			 retries,=0A=
+			 cbfcnp,=0A=
+			 /*flags*/CAM_DIR_OUT,=0A=
+			 tag_action,=0A=
+			 /*protocol*/AP_EXTEND|AP_PROTO_DMA,=0A=
+			 /*ata_flags*/AP_FLAG_TLEN_SECT_CNT|AP_FLAG_BYT_BLOK_BLOCKS,=0A=
+			 /*features*/ATA_DSM_TRIM,=0A=
+			 /*sector_count*/block_count,=0A=
+			 /*lba*/0,=0A=
+			 /*command*/ATA_DATA_SET_MANAGEMENT,=0A=
+			 /*control*/0,=0A=
+			 data_ptr,=0A=
+			 dxfer_len,=0A=
+			 sense_len,=0A=
+			 timeout);=0A=
+}=0A=
+=0A=
+void=0A=
+scsi_ata_pass_16(struct ccb_scsiio *csio, u_int32_t retries,=0A=
+		 void (*cbfcnp)(struct cam_periph *, union ccb *),=0A=
+		 u_int32_t flags, u_int8_t tag_action,=0A=
+		 u_int8_t protocol, u_int8_t ata_flags, u_int16_t features,=0A=
+		 u_int16_t sector_count, uint64_t lba, u_int8_t command,=0A=
+		 u_int8_t control, u_int8_t *data_ptr, u_int16_t dxfer_len,=0A=
+		 u_int8_t sense_len, u_int32_t timeout)=0A=
+{=0A=
+	struct ata_pass_16 *ata_cmd;=0A=
+=0A=
+	ata_cmd =3D (struct ata_pass_16 *)&csio->cdb_io.cdb_bytes;=0A=
+	ata_cmd->opcode =3D ATA_PASS_16;=0A=
+	ata_cmd->protocol =3D protocol;=0A=
+	ata_cmd->flags =3D ata_flags;=0A=
+	ata_cmd->features_ext =3D features >> 8;=0A=
+	ata_cmd->features =3D features;=0A=
+	ata_cmd->sector_count_ext =3D sector_count >> 8;=0A=
+	ata_cmd->sector_count =3D sector_count;=0A=
+	ata_cmd->lba_low =3D lba;=0A=
+	ata_cmd->lba_mid =3D lba >> 8;=0A=
+	ata_cmd->lba_high =3D lba >> 16;=0A=
+	ata_cmd->device =3D ATA_DEV_LBA;=0A=
+	if (protocol & AP_EXTEND) {=0A=
+		ata_cmd->lba_low_ext =3D lba >> 24;=0A=
+		ata_cmd->lba_mid_ext =3D lba >> 32;=0A=
+		ata_cmd->lba_high_ext =3D lba >> 40;=0A=
+	} else=0A=
+		ata_cmd->device |=3D (lba >> 24) & 0x0f;=0A=
+	ata_cmd->command =3D command;=0A=
+	ata_cmd->control =3D control;=0A=
+=0A=
+	cam_fill_csio(csio,=0A=
+		      retries,=0A=
+		      cbfcnp,=0A=
+		      flags,=0A=
+		      tag_action,=0A=
+		      data_ptr,=0A=
+		      dxfer_len,=0A=
+		      sense_len,=0A=
+		      sizeof(*ata_cmd),=0A=
+		      timeout);=0A=
+}=0A=
+=0A=
+void=0A=
 scsi_unmap(struct ccb_scsiio *csio, u_int32_t retries,=0A=
 	   void (*cbfcnp)(struct cam_periph *, union ccb *),=0A=
 	   u_int8_t tag_action, u_int8_t byte2,=0A=
@@ -6156,6 +6261,28 @@=0A=
 }=0A=
 =0A=
 #ifdef _KERNEL=0A=
+int=0A=
+scsi_vpd_supported_page(struct cam_periph *periph, uint8_t page_id)=0A=
+{=0A=
+	struct cam_ed *device;=0A=
+	struct scsi_vpd_supported_pages *vpds;=0A=
+	int i, num_pages;=0A=
+=0A=
+	device =3D periph->path->device;=0A=
+	vpds =3D (struct scsi_vpd_supported_pages *)device->supported_vpds;=0A=
+=0A=
+	if (vpds !=3D NULL) {=0A=
+		num_pages =3D device->supported_vpds_len -=0A=
+		    SVPD_SUPPORTED_PAGES_HDR_LEN;=0A=
+		for (i =3D 0; i < num_pages; i++) {=0A=
+			if (vpds->page_list[i] =3D=3D page_id)=0A=
+				return (1);=0A=
+		}=0A=
+	}=0A=
+=0A=
+	return (0);=0A=
+}=0A=
+=0A=
 static void=0A=
 init_scsi_delay(void)=0A=
 {=0A=
Index: sys/cam/scsi/scsi_all.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cam/scsi/scsi_all.h	(revision 250577)=0A=
+++ sys/cam/scsi/scsi_all.h	(working copy)=0A=
@@ -908,6 +908,19 @@=0A=
 struct ata_pass_12 {=0A=
 	u_int8_t opcode;=0A=
 	u_int8_t protocol;=0A=
+#define	AP_PROTO_HARD_RESET	(0x00 << 1)=0A=
+#define	AP_PROTO_SRST		(0x01 << 1)=0A=
+#define	AP_PROTO_NON_DATA	(0x03 << 1)=0A=
+#define	AP_PROTO_PIO_IN		(0x04 << 1)=0A=
+#define	AP_PROTO_PIO_OUT	(0x05 << 1)=0A=
+#define	AP_PROTO_DMA		(0x06 << 1)=0A=
+#define	AP_PROTO_DMA_QUEUED	(0x07 << 1)=0A=
+#define	AP_PROTO_DEVICE_DIAG	(0x08 << 1)=0A=
+#define	AP_PROTO_DEVICE_RESET	(0x09 << 1)=0A=
+#define	AP_PROTO_UDMA_IN	(0x10 << 1)=0A=
+#define	AP_PROTO_UDMA_OUT	(0x11 << 1)=0A=
+#define	AP_PROTO_FPDMA		(0x12 << 1)=0A=
+#define	AP_PROTO_RESP_INFO	(0x15 << 1)=0A=
 #define	AP_MULTI	0xe0=0A=
 	u_int8_t flags;=0A=
 #define	AP_T_LEN	0x03=0A=
@@ -943,6 +956,15 @@=0A=
 	u_int8_t protocol;=0A=
 #define	AP_EXTEND	0x01=0A=
 	u_int8_t flags;=0A=
+#define	AP_FLAG_TLEN_NO_DATA	(0 << 0)=0A=
+#define	AP_FLAG_TLEN_FEAT	(1 << 0)=0A=
+#define	AP_FLAG_TLEN_SECT_CNT	(2 << 0)=0A=
+#define	AP_FLAG_TLEN_STPSIU	(3 << 0)=0A=
+#define	AP_FLAG_BYT_BLOK_BYTES	(0 << 2)  =0A=
+#define	AP_FLAG_BYT_BLOK_BLOCKS	(1 << 2)  =0A=
+#define	AP_FLAG_TDIR_TO_DEV	(0 << 3)  =0A=
+#define	AP_FLAG_TDIR_FROM_DEV	(1 << 3)  =0A=
+#define	AP_FLAG_CHK_COND	(1 << 5)  =0A=
 	u_int8_t features_ext;=0A=
 	u_int8_t features;=0A=
 	u_int8_t sector_count_ext;=0A=
@@ -1064,7 +1086,7 @@=0A=
 =0A=
 /*=0A=
  * This length is the initial inquiry length used by the probe code, as =
   =0A=
- * well as the legnth necessary for scsi_print_inquiry() to function =0A=
+ * well as the length necessary for scsi_print_inquiry() to function =0A=
  * correctly.  If either use requires a different length in the future, =0A=
  * the two values should be de-coupled.=0A=
  */=0A=
@@ -1407,6 +1429,91 @@=0A=
 	uint8_t params[0];=0A=
 };=0A=
 =0A=
+/*=0A=
+ * ATA Information VPD Page based on=0A=
+ * T10/2126-D Revision 04=0A=
+ */=0A=
+#define SVPD_ATA_INFORMATION		0x89=0A=
+=0A=
+/*=0A=
+ * Block Device Characteristics VPD Page based on=0A=
+ * T10/1799-D Revision 31=0A=
+ */=0A=
+struct scsi_vpd_block_characteristics=0A=
+{=0A=
+	u_int8_t device;=0A=
+	u_int8_t page_code;=0A=
+#define SVPD_BDC			0xB1=0A=
+	u_int8_t page_length[2];=0A=
+	u_int8_t medium_rotation_rate[2];=0A=
+#define SVPD_BDC_RATE_NOT_REPORTED	0x00=0A=
+#define SVPD_BDC_RATE_NONE_ROTATING	0x01=0A=
+	u_int8_t reserved1;=0A=
+	u_int8_t nominal_form_factor;=0A=
+#define SVPD_BDC_FORM_NOT_REPORTED	0x00=0A=
+#define SVPD_BDC_FORM_5_25INCH		0x01=0A=
+#define SVPD_BDC_FORM_3_5INCH		0x02=0A=
+#define SVPD_BDC_FORM_2_5INCH		0x03=0A=
+#define SVPD_BDC_FORM_1_5INCH		0x04=0A=
+#define SVPD_BDC_FORM_LESSTHAN_1_5INCH	0x05=0A=
+	u_int8_t reserved2[56];=0A=
+};=0A=
+=0A=
+/*=0A=
+ * Logical Block Provisioning VPD Page based on=0A=
+ * T10/1799-D Revision 31=0A=
+ */=0A=
+struct scsi_vpd_logical_block_prov=0A=
+{=0A=
+	u_int8_t device;=0A=
+	u_int8_t page_code;=0A=
+#define	SVPD_LBP		0xB2=0A=
+	u_int8_t page_length[2];=0A=
+#define SVPD_LBP_PL_BASIC	0x04=0A=
+	u_int8_t threshold_exponent;=0A=
+	u_int8_t flags;=0A=
+#define SVPD_LBP_UNMAP		0x80=0A=
+#define SVPD_LBP_WS16		0x40=0A=
+#define SVPD_LBP_WS10		0x20=0A=
+#define SVPD_LBP_RZ		0x04=0A=
+#define SVPD_LBP_ANC_SUP	0x02=0A=
+#define SVPD_LBP_DP		0x01=0A=
+	u_int8_t prov_type;=0A=
+#define SVPD_LBP_RESOURCE	0x01=0A=
+#define SVPD_LBP_THIN		0x02=0A=
+	u_int8_t reserved;=0A=
+	/*=0A=
+	 * Provisioning Group Descriptor can be here if SVPD_LBP_DP is set=0A=
+	 * Its size can be determined from page_length - 4=0A=
+	 */=0A=
+};=0A=
+=0A=
+/*=0A=
+ * Block Limits VDP Page based on=0A=
+ * T10/1799-D Revision 31=0A=
+ */=0A=
+struct scsi_vpd_block_limits=0A=
+{=0A=
+	u_int8_t device;=0A=
+	u_int8_t page_code;=0A=
+#define	SVPD_BLOCK_LIMITS	0xB0=0A=
+	u_int8_t page_length[2];=0A=
+#define SVPD_BL_PL_BASIC	0x10=0A=
+#define SVPD_BL_PL_TP		0x3C=0A=
+	u_int8_t reserved1;=0A=
+	u_int8_t max_cmp_write_len;=0A=
+	u_int8_t opt_txfer_len_grain[2];=0A=
+	u_int8_t max_txfer_len[4];=0A=
+	u_int8_t opt_txfer_len[4];=0A=
+	u_int8_t max_prefetch[4];=0A=
+	u_int8_t max_unmap_lba_cnt[4];=0A=
+	u_int8_t max_unmap_blk_cnt[4];=0A=
+	u_int8_t opt_unmap_grain[4];=0A=
+	u_int8_t unmap_grain_align[4];=0A=
+	u_int8_t max_write_same_length[8];=0A=
+	u_int8_t reserved2[20];=0A=
+};=0A=
+=0A=
 struct scsi_read_capacity=0A=
 {=0A=
 	u_int8_t opcode;=0A=
@@ -2180,6 +2287,8 @@=0A=
 char *		scsi_sense_string(struct ccb_scsiio *csio,=0A=
 				  char *str, int str_len);=0A=
 void		scsi_sense_print(struct ccb_scsiio *csio);=0A=
+int 		scsi_vpd_supported_page(struct cam_periph *periph,=0A=
+					uint8_t page_id);=0A=
 #else /* _KERNEL */=0A=
 int		scsi_command_string(struct cam_device *device,=0A=
 				    struct ccb_scsiio *csio, struct sbuf *sb);=0A=
@@ -2370,6 +2479,26 @@=0A=
 		     u_int32_t dxfer_len, u_int8_t sense_len,=0A=
 		     u_int32_t timeout);=0A=
 =0A=
+void scsi_ata_identify(struct ccb_scsiio *csio, u_int32_t retries,=0A=
+		       void (*cbfcnp)(struct cam_periph *, union ccb *),=0A=
+		       u_int8_t tag_action, u_int8_t *data_ptr,=0A=
+		       u_int16_t dxfer_len, u_int8_t sense_len,=0A=
+		       u_int32_t timeout);=0A=
+=0A=
+void scsi_ata_trim(struct ccb_scsiio *csio, u_int32_t retries,=0A=
+	           void (*cbfcnp)(struct cam_periph *, union ccb *),=0A=
+	           u_int8_t tag_action, u_int16_t block_count,=0A=
+	           u_int8_t *data_ptr, u_int16_t dxfer_len,=0A=
+	           u_int8_t sense_len, u_int32_t timeout);=0A=
+=0A=
+void scsi_ata_pass_16(struct ccb_scsiio *csio, u_int32_t retries,=0A=
+		      void (*cbfcnp)(struct cam_periph *, union ccb *),=0A=
+		      u_int32_t flags, u_int8_t tag_action,=0A=
+		      u_int8_t protocol, u_int8_t ata_flags, u_int16_t features,=0A=
+		      u_int16_t sector_count, uint64_t lba, u_int8_t command,=0A=
+		      u_int8_t control, u_int8_t *data_ptr, u_int16_t dxfer_len,=0A=
+		      u_int8_t sense_len, u_int32_t timeout);=0A=
+=0A=
 void scsi_unmap(struct ccb_scsiio *csio, u_int32_t retries,=0A=
 		void (*cbfcnp)(struct cam_periph *, union ccb *),=0A=
 		u_int8_t tag_action, u_int8_t byte2,=0A=
Index: sys/cam/scsi/scsi_xpt.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cam/scsi/scsi_xpt.c	(revision 250577)=0A=
+++ sys/cam/scsi/scsi_xpt.c	(working copy)=0A=
@@ -556,7 +556,6 @@=0A=
 static cam_status	proberegister(struct cam_periph *periph,=0A=
 				      void *arg);=0A=
 static void	 probeschedule(struct cam_periph *probe_periph);=0A=
-static int	 device_has_vpd(struct cam_ed *device, uint8_t page_id);=0A=
 static void	 probestart(struct cam_periph *periph, union ccb =
*start_ccb);=0A=
 static void	 proberequestdefaultnegotiation(struct cam_periph *periph);=0A=
 static int       proberequestbackoff(struct cam_periph *periph,=0A=
@@ -708,21 +707,6 @@=0A=
 	xpt_schedule(periph, CAM_PRIORITY_XPT);=0A=
 }=0A=
 =0A=
-static int=0A=
-device_has_vpd(struct cam_ed *device, uint8_t page_id)=0A=
-{=0A=
-	int i, num_pages;=0A=
-	struct scsi_vpd_supported_pages *vpds;=0A=
-=0A=
-	vpds =3D (struct scsi_vpd_supported_pages *)device->supported_vpds;=0A=
-	num_pages =3D device->supported_vpds_len - =
SVPD_SUPPORTED_PAGES_HDR_LEN;=0A=
-	for (i =3D 0;i < num_pages;i++)=0A=
-		if (vpds->page_list[i] =3D=3D page_id)=0A=
-			return 1;=0A=
-=0A=
-	return 0;=0A=
-}=0A=
-=0A=
 static void=0A=
 probestart(struct cam_periph *periph, union ccb *start_ccb)=0A=
 {=0A=
@@ -910,11 +894,9 @@=0A=
 	case PROBE_DEVICE_ID:=0A=
 	{=0A=
 		struct scsi_vpd_device_id *devid;=0A=
-		struct cam_ed *device;=0A=
 =0A=
 		devid =3D NULL;=0A=
-		device =3D periph->path->device;=0A=
-		if (device_has_vpd(device, SVPD_DEVICE_ID))=0A=
+		if (scsi_vpd_supported_page(periph, SVPD_DEVICE_ID))=0A=
 			devid =3D malloc(SVPD_DEVICE_ID_MAX_SIZE, M_CAMXPT,=0A=
 			    M_NOWAIT | M_ZERO);=0A=
 =0A=
@@ -952,7 +934,7 @@=0A=
 			device->serial_num_len =3D 0;=0A=
 		}=0A=
 =0A=
-		if (device_has_vpd(device, SVPD_UNIT_SERIAL_NUMBER))=0A=
+		if (scsi_vpd_supported_page(periph, SVPD_UNIT_SERIAL_NUMBER))=0A=
 			serial_buf =3D (struct scsi_vpd_unit_serial_number *)=0A=
 				malloc(sizeof(*serial_buf), M_CAMXPT,=0A=
 				    M_NOWAIT|M_ZERO);=0A=
Index: sys/cam/scsi/scsi_da.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cam/scsi/scsi_da.c	(revision 250577)=0A=
+++ sys/cam/scsi/scsi_da.c	(working copy)=0A=
@@ -44,6 +44,7 @@=0A=
 #include <sys/eventhandler.h>=0A=
 #include <sys/malloc.h>=0A=
 #include <sys/cons.h>=0A=
+#include <sys/endian.h>=0A=
 #include <geom/geom.h>=0A=
 #include <geom/geom_disk.h>=0A=
 #endif /* _KERNEL */=0A=
@@ -67,8 +68,12 @@=0A=
 =0A=
 #ifdef _KERNEL=0A=
 typedef enum {=0A=
-	DA_STATE_PROBE,=0A=
-	DA_STATE_PROBE2,=0A=
+	DA_STATE_PROBE_RC,=0A=
+	DA_STATE_PROBE_RC16,=0A=
+	DA_STATE_PROBE_LBP,=0A=
+	DA_STATE_PROBE_BLK_LIMITS,=0A=
+	DA_STATE_PROBE_BDC,=0A=
+	DA_STATE_PROBE_ATA,=0A=
 	DA_STATE_NORMAL=0A=
 } da_state;=0A=
 =0A=
@@ -96,29 +101,47 @@=0A=
 } da_quirks;=0A=
 =0A=
 typedef enum {=0A=
-	DA_CCB_PROBE		=3D 0x01,=0A=
-	DA_CCB_PROBE2		=3D 0x02,=0A=
-	DA_CCB_BUFFER_IO	=3D 0x03,=0A=
-	DA_CCB_WAITING		=3D 0x04,=0A=
-	DA_CCB_DUMP		=3D 0x05,=0A=
-	DA_CCB_DELETE		=3D 0x06,=0A=
-	DA_CCB_TUR		=3D 0x07,=0A=
+	DA_CCB_PROBE_RC		=3D 0x01,=0A=
+	DA_CCB_PROBE_RC16	=3D 0x02,=0A=
+	DA_CCB_PROBE_LBP	=3D 0x03,=0A=
+	DA_CCB_PROBE_BLK_LIMITS	=3D 0x04,=0A=
+	DA_CCB_PROBE_BDC	=3D 0x05,=0A=
+	DA_CCB_PROBE_ATA	=3D 0x06,=0A=
+	DA_CCB_BUFFER_IO	=3D 0x07,=0A=
+	DA_CCB_WAITING		=3D 0x08,=0A=
+	DA_CCB_DUMP		=3D 0x0A,=0A=
+	DA_CCB_DELETE		=3D 0x0B,=0A=
+ 	DA_CCB_TUR		=3D 0x0C,=0A=
 	DA_CCB_TYPE_MASK	=3D 0x0F,=0A=
 	DA_CCB_RETRY_UA		=3D 0x10=0A=
 } da_ccb_state;=0A=
 =0A=
+/*=0A=
+ * Order here is important for method choice=0A=
+ *=0A=
+ * We prefer ATA_TRIM as tests run against a Sandforce 2281 SSD =
attached to=0A=
+ * LSI 2008 (mps) controller (FW: v12, Drv: v14) resulted 20% quicker =
deletes=0A=
+ * using ATA_TRIM than the corresponding UNMAP results for a real world =
mysql=0A=
+ * import taking 5mins.=0A=
+ *=0A=
+ */=0A=
 typedef enum {=0A=
 	DA_DELETE_NONE,=0A=
 	DA_DELETE_DISABLE,=0A=
+	DA_DELETE_ATA_TRIM,=0A=
+	DA_DELETE_UNMAP,=0A=
+	DA_DELETE_WS16,=0A=
+	DA_DELETE_WS10,=0A=
 	DA_DELETE_ZERO,=0A=
-	DA_DELETE_WS10,=0A=
-	DA_DELETE_WS16,=0A=
-	DA_DELETE_UNMAP,=0A=
-	DA_DELETE_MAX =3D DA_DELETE_UNMAP=0A=
+	DA_DELETE_MIN =3D DA_DELETE_ATA_TRIM,=0A=
+	DA_DELETE_MAX =3D DA_DELETE_ZERO=0A=
 } da_delete_methods;=0A=
 =0A=
 static const char *da_delete_method_names[] =3D=0A=
-    { "NONE", "DISABLE", "ZERO", "WS10", "WS16", "UNMAP" };=0A=
+    { "NONE", "DISABLE", "ATA_TRIM", "UNMAP", "WS16", "WS10", "ZERO" };=0A=
+static const char *da_delete_method_desc[] =3D=0A=
+    { "NONE", "DISABLED", "ATA TRIM", "UNMAP", "WRITE SAME(16) with =
UNMAP",=0A=
+      "WRITE SAME(10) with UNMAP", "ZERO" };=0A=
 =0A=
 /* Offsets into our private area for storing information */=0A=
 #define ccb_state	ppriv_field0=0A=
@@ -134,8 +157,18 @@=0A=
 	u_int     stripeoffset;=0A=
 };=0A=
 =0A=
-#define UNMAP_MAX_RANGES	512=0A=
+#define UNMAP_RANGE_MAX		0xffffffff=0A=
+#define UNMAP_HEAD_SIZE		8=0A=
+#define UNMAP_RANGE_SIZE	16=0A=
+#define UNMAP_MAX_RANGES	2048 /* Protocol Max is 4095 */=0A=
+#define UNMAP_BUF_SIZE		((UNMAP_MAX_RANGES * UNMAP_RANGE_SIZE) + \=0A=
+				UNMAP_HEAD_SIZE)=0A=
 =0A=
+#define WS10_MAX_BLKS		0xffff=0A=
+#define WS16_MAX_BLKS		0xffffffff=0A=
+#define ATA_TRIM_MAX_RANGES	((UNMAP_BUF_SIZE / \=0A=
+	(ATA_DSM_RANGE_SIZE * ATA_DSM_BLK_SIZE)) * ATA_DSM_BLK_SIZE)=0A=
+=0A=
 struct da_softc {=0A=
 	struct	 bio_queue_head bio_queue;=0A=
 	struct	 bio_queue_head delete_queue;=0A=
@@ -145,15 +178,19 @@=0A=
 	da_state state;=0A=
 	da_flags flags;	=0A=
 	da_quirks quirks;=0A=
+	int	 sort_io_queue;=0A=
 	int	 minimum_cmd_size;=0A=
 	int	 error_inject;=0A=
 	int	 ordered_tag_count;=0A=
 	int	 outstanding_cmds;=0A=
-	int	 unmap_max_ranges;=0A=
-	int	 unmap_max_lba;=0A=
+	int	 trim_max_ranges;=0A=
 	int	 delete_running;=0A=
 	int	 tur;=0A=
-	da_delete_methods	 delete_method;=0A=
+	int	 delete_available;	/* Delete methods possibly available */=0A=
+	uint32_t		unmap_max_ranges;=0A=
+	uint32_t		unmap_max_lba;=0A=
+	uint64_t		ws_max_blks;=0A=
+	da_delete_methods	delete_method;=0A=
 	struct	 disk_params params;=0A=
 	struct	 disk *disk;=0A=
 	union	 ccb saved_ccb;=0A=
@@ -162,11 +199,18 @@=0A=
 	struct sysctl_oid	*sysctl_tree;=0A=
 	struct callout		sendordered_c;=0A=
 	uint64_t wwpn;=0A=
-	uint8_t	 unmap_buf[UNMAP_MAX_RANGES * 16 + 8];=0A=
+	uint8_t	 unmap_buf[UNMAP_BUF_SIZE];=0A=
 	struct scsi_read_capacity_data_long rcaplong;=0A=
 	struct callout		mediapoll_c;=0A=
 };=0A=
 =0A=
+#define dadeleteflag(softc, delete_method, enable)			\=0A=
+	if (enable) {							\=0A=
+		softc->delete_available |=3D (1 << delete_method);	\=0A=
+	} else {							\=0A=
+		softc->delete_available &=3D ~(1 << delete_method);	\=0A=
+	}=0A=
+=0A=
 struct da_quirk_entry {=0A=
 	struct scsi_inquiry_pattern inq_pat;=0A=
 	da_quirks quirks;=0A=
@@ -869,6 +913,10 @@=0A=
 static	int		dadeletemethodsysctl(SYSCTL_HANDLER_ARGS);=0A=
 static	int		dadeletemethodset(struct da_softc *softc,=0A=
 					  da_delete_methods delete_method);=0A=
+static	void		dadeletemethodchoose(struct da_softc *softc,=0A=
+					     da_delete_methods default_method);=0A=
+static	void		daprobedone(struct cam_periph *periph, union ccb *ccb);=0A=
+=0A=
 static	periph_ctor_t	daregister;=0A=
 static	periph_dtor_t	dacleanup;=0A=
 static	periph_start_t	dastart;=0A=
@@ -903,6 +951,8 @@=0A=
 #define	DA_DEFAULT_SEND_ORDERED	1=0A=
 #endif=0A=
 =0A=
+#define DA_SIO (softc->sort_io_queue >=3D 0 ? \=0A=
+    softc->sort_io_queue : cam_sort_io_queues)=0A=
 =0A=
 static int da_poll_period =3D DA_DEFAULT_POLL_PERIOD;=0A=
 static int da_retry_count =3D DA_DEFAULT_RETRY;=0A=
@@ -1129,10 +1179,15 @@=0A=
 	if (bp->bio_cmd =3D=3D BIO_DELETE) {=0A=
 		if (bp->bio_bcount =3D=3D 0)=0A=
 			biodone(bp);=0A=
+		else if (DA_SIO)=0A=
+			bioq_disksort(&softc->delete_queue, bp);=0A=
 		else=0A=
-			bioq_disksort(&softc->delete_queue, bp);=0A=
-	} else=0A=
+			bioq_insert_tail(&softc->delete_queue, bp);=0A=
+	} else if (DA_SIO) {=0A=
 		bioq_disksort(&softc->bio_queue, bp);=0A=
+	} else {=0A=
+		bioq_insert_tail(&softc->bio_queue, bp);=0A=
+	}=0A=
 =0A=
 	/*=0A=
 	 * Schedule ourselves for performing the work.=0A=
@@ -1487,6 +1542,9 @@=0A=
 		OID_AUTO, "minimum_cmd_size", CTLTYPE_INT | CTLFLAG_RW,=0A=
 		&softc->minimum_cmd_size, 0, dacmdsizesysctl, "I",=0A=
 		"Minimum CDB size");=0A=
+	SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree),=0A=
+		OID_AUTO, "sort_io_queue", CTLFLAG_RW, &softc->sort_io_queue, 0,=0A=
+		"Sort IO queue to try and optimise disk access patterns");=0A=
 =0A=
 	SYSCTL_ADD_INT(&softc->sysctl_ctx,=0A=
 		       SYSCTL_CHILDREN(softc->sysctl_tree),=0A=
@@ -1574,6 +1632,85 @@=0A=
 	return (0);=0A=
 }=0A=
 =0A=
+static void=0A=
+daprobedone(struct cam_periph *periph, union ccb *ccb)=0A=
+{=0A=
+	struct da_softc *softc;=0A=
+=0A=
+	softc =3D (struct da_softc *)periph->softc;=0A=
+=0A=
+	dadeletemethodchoose(softc, DA_DELETE_NONE);=0A=
+=0A=
+	if (bootverbose && (softc->flags & DA_FLAG_PROBED) =3D=3D 0) {=0A=
+		char buf[80];=0A=
+		int i, sep;=0A=
+=0A=
+		snprintf(buf, sizeof(buf), "Delete methods: <");=0A=
+		sep =3D 0;=0A=
+		for (i =3D DA_DELETE_MIN; i <=3D DA_DELETE_MAX; i++) {=0A=
+			if (softc->delete_available & (1 << i)) {=0A=
+				if (sep) {=0A=
+					strlcat(buf, ",", sizeof(buf));=0A=
+				} else {=0A=
+				    sep =3D 1;=0A=
+				}=0A=
+				strlcat(buf, da_delete_method_names[i],=0A=
+				    sizeof(buf));=0A=
+				if (i =3D=3D softc->delete_method) {=0A=
+					strlcat(buf, "(*)", sizeof(buf));=0A=
+				}=0A=
+			}=0A=
+		}=0A=
+		if (sep =3D=3D 0) {=0A=
+			if (softc->delete_method =3D=3D DA_DELETE_NONE) =0A=
+				strlcat(buf, "NONE(*)", sizeof(buf));=0A=
+			else=0A=
+				strlcat(buf, "DISABLED(*)", sizeof(buf));=0A=
+		}=0A=
+		strlcat(buf, ">", sizeof(buf));=0A=
+		printf("%s%d: %s\n", periph->periph_name,=0A=
+		    periph->unit_number, buf);=0A=
+	}=0A=
+=0A=
+	/*=0A=
+	 * Since our peripheral may be invalidated by an error=0A=
+	 * above or an external event, we must release our CCB=0A=
+	 * before releasing the probe lock on the peripheral.=0A=
+	 * The peripheral will only go away once the last lock=0A=
+	 * is removed, and we need it around for the CCB release=0A=
+	 * operation.=0A=
+	 */=0A=
+	xpt_release_ccb(ccb);=0A=
+	softc->state =3D DA_STATE_NORMAL;=0A=
+	daschedule(periph);=0A=
+	wakeup(&softc->disk->d_mediasize);=0A=
+	if ((softc->flags & DA_FLAG_PROBED) =3D=3D 0) {=0A=
+		softc->flags |=3D DA_FLAG_PROBED;=0A=
+		cam_periph_unhold(periph);=0A=
+	} else=0A=
+		cam_periph_release_locked(periph);=0A=
+}=0A=
+=0A=
+static void=0A=
+dadeletemethodchoose(struct da_softc *softc, da_delete_methods =
default_method)=0A=
+{=0A=
+	int i, delete_method;=0A=
+=0A=
+	delete_method =3D default_method;=0A=
+=0A=
+	/*=0A=
+	 * Use the pre-defined order to choose the best=0A=
+	 * performing delete.=0A=
+	 */=0A=
+	for (i =3D DA_DELETE_MIN; i <=3D DA_DELETE_MAX; i++) {=0A=
+		if (softc->delete_available & (1 << i)) {=0A=
+			dadeletemethodset(softc, i);=0A=
+			return;=0A=
+		}=0A=
+	}=0A=
+	dadeletemethodset(softc, delete_method);=0A=
+}=0A=
+=0A=
 static int=0A=
 dadeletemethodsysctl(SYSCTL_HANDLER_ARGS)=0A=
 {=0A=
@@ -1626,14 +1763,17 @@=0A=
 	}=0A=
 =0A=
 	LIST_INIT(&softc->pending_ccbs);=0A=
-	softc->state =3D DA_STATE_PROBE;=0A=
+	softc->state =3D DA_STATE_PROBE_RC;=0A=
 	bioq_init(&softc->bio_queue);=0A=
 	bioq_init(&softc->delete_queue);=0A=
 	bioq_init(&softc->delete_run_queue);=0A=
 	if (SID_IS_REMOVABLE(&cgd->inq_data))=0A=
 		softc->flags |=3D DA_FLAG_PACK_REMOVABLE;=0A=
 	softc->unmap_max_ranges =3D UNMAP_MAX_RANGES;=0A=
-	softc->unmap_max_lba =3D 1024*1024*2;=0A=
+	softc->unmap_max_lba =3D UNMAP_RANGE_MAX;=0A=
+	softc->ws_max_blks =3D WS16_MAX_BLKS;=0A=
+	softc->trim_max_ranges =3D ATA_TRIM_MAX_RANGES;=0A=
+	softc->sort_io_queue =3D -1;=0A=
 =0A=
 	periph->softc =3D softc;=0A=
 =0A=
@@ -1709,7 +1849,7 @@=0A=
 	/* Predict whether device may support READ CAPACITY(16). */=0A=
 	if (SID_ANSI_REV(&cgd->inq_data) >=3D SCSI_REV_SPC3) {=0A=
 		softc->flags |=3D DA_FLAG_CAN_RC16;=0A=
-		softc->state =3D DA_STATE_PROBE2;=0A=
+		softc->state =3D DA_STATE_PROBE_RC16;=0A=
 	}=0A=
 =0A=
 	/*=0A=
@@ -1809,6 +1949,7 @@=0A=
 =0A=
 	CAM_DEBUG(periph->path, CAM_DEBUG_TRACE, ("dastart\n"));=0A=
 =0A=
+skipstate:=0A=
 	switch (softc->state) {=0A=
 	case DA_STATE_NORMAL:=0A=
 	{=0A=
@@ -1833,14 +1974,37 @@=0A=
 		if (!softc->delete_running &&=0A=
 		    (bp =3D bioq_first(&softc->delete_queue)) !=3D NULL) {=0A=
 		    uint64_t lba;=0A=
-		    u_int count;=0A=
+		    uint64_t count; /* forward compat with WS32 */=0A=
 =0A=
+		    /*=0A=
+		     * In each of the methods below, while its the caller's=0A=
+		     * responsibility to ensure the request will fit into a=0A=
+		     * single device request, we might have changed the delete=0A=
+		     * method due to the device incorrectly advertising either=0A=
+		     * its supported methods or limits.=0A=
+		     * =0A=
+		     * To prevent this causing further issues we validate the=0A=
+		     * against the methods limits, and warn which would=0A=
+		     * otherwise be unnecessary.=0A=
+		     */=0A=
+=0A=
 		    if (softc->delete_method =3D=3D DA_DELETE_UNMAP) {=0A=
 			uint8_t *buf =3D softc->unmap_buf;=0A=
 			uint64_t lastlba =3D (uint64_t)-1;=0A=
-			uint32_t lastcount =3D 0;=0A=
-			int blocks =3D 0, off, ranges =3D 0;=0A=
+			uint32_t lastcount =3D 0, c;=0A=
+			uint64_t totalcount =3D 0;=0A=
+			uint32_t off, ranges =3D 0;=0A=
 =0A=
+			/*=0A=
+			 * Currently this doesn't take the UNMAP=0A=
+			 * Granularity and Granularity Alignment=0A=
+			 * fields into account.=0A=
+			 *=0A=
+			 * This could result in both unoptimal unmap=0A=
+			 * requests as as well as UNMAP calls unmapping=0A=
+			 * fewer LBA's than requested.=0A=
+			 */=0A=
+=0A=
 			softc->delete_running =3D 1;=0A=
 			bzero(softc->unmap_buf, sizeof(softc->unmap_buf));=0A=
 			bp1 =3D bp;=0A=
@@ -1853,22 +2017,44 @@=0A=
 =0A=
 				/* Try to extend the previous range. */=0A=
 				if (lba =3D=3D lastlba) {=0A=
-					lastcount +=3D count;=0A=
-					off =3D (ranges - 1) * 16 + 8;=0A=
+					c =3D min(count, softc->unmap_max_lba -=0A=
+						lastcount);=0A=
+					lastcount +=3D c;=0A=
+					off =3D ((ranges - 1) * UNMAP_RANGE_SIZE) +=0A=
+					      UNMAP_HEAD_SIZE;=0A=
 					scsi_ulto4b(lastcount, &buf[off + 8]);=0A=
-				} else if (count > 0) {=0A=
-					off =3D ranges * 16 + 8;=0A=
+					count -=3D c;=0A=
+					lba +=3Dc;=0A=
+					totalcount +=3D c;=0A=
+				}=0A=
+=0A=
+				while (count > 0) {=0A=
+					c =3D min(count, softc->unmap_max_lba);=0A=
+					if (totalcount + c > softc->unmap_max_lba ||=0A=
+					    ranges >=3D softc->unmap_max_ranges) {=0A=
+						xpt_print(periph->path,=0A=
+						  "%s issuing short delete %ld > %ld"=0A=
+						  "|| %d >=3D %d",=0A=
+						  da_delete_method_desc[softc->delete_method],=0A=
+						  totalcount + c, softc->unmap_max_lba,=0A=
+						  ranges, softc->unmap_max_ranges);=0A=
+						break;=0A=
+					}=0A=
+					off =3D (ranges * UNMAP_RANGE_SIZE) +=0A=
+					      UNMAP_HEAD_SIZE;=0A=
 					scsi_u64to8b(lba, &buf[off + 0]);=0A=
-					scsi_ulto4b(count, &buf[off + 8]);=0A=
-					lastcount =3D count;=0A=
+					scsi_ulto4b(c, &buf[off + 8]);=0A=
+					lba +=3D c;=0A=
+					totalcount +=3D c;=0A=
 					ranges++;=0A=
+					count -=3D c;=0A=
+					lastcount =3D c;=0A=
 				}=0A=
-				blocks +=3D count;=0A=
-				lastlba =3D lba + count;=0A=
+				lastlba =3D lba;=0A=
 				bp1 =3D bioq_first(&softc->delete_queue);=0A=
 				if (bp1 =3D=3D NULL ||=0A=
 				    ranges >=3D softc->unmap_max_ranges ||=0A=
-				    blocks + bp1->bio_bcount /=0A=
+				    totalcount + bp1->bio_bcount /=0A=
 				     softc->params.secsize > softc->unmap_max_lba)=0A=
 					break;=0A=
 			} while (1);=0A=
@@ -1886,9 +2072,87 @@=0A=
 					da_default_timeout * 1000);=0A=
 			start_ccb->ccb_h.ccb_state =3D DA_CCB_DELETE;=0A=
 			goto out;=0A=
+		    } else if (softc->delete_method =3D=3D DA_DELETE_ATA_TRIM) {=0A=
+				uint8_t *buf =3D softc->unmap_buf;=0A=
+				uint64_t lastlba =3D (uint64_t)-1;=0A=
+				uint32_t lastcount =3D 0, c, requestcount;=0A=
+				int ranges =3D 0, off, block_count;=0A=
+=0A=
+				softc->delete_running =3D 1;=0A=
+				bzero(softc->unmap_buf, sizeof(softc->unmap_buf));=0A=
+				bp1 =3D bp;=0A=
+				do {=0A=
+					bioq_remove(&softc->delete_queue, bp1);=0A=
+					if (bp1 !=3D bp)=0A=
+						bioq_insert_tail(&softc->delete_run_queue, bp1);=0A=
+					lba =3D bp1->bio_pblkno;=0A=
+					count =3D bp1->bio_bcount / softc->params.secsize;=0A=
+					requestcount =3D count;=0A=
+=0A=
+					/* Try to extend the previous range. */=0A=
+					if (lba =3D=3D lastlba) {=0A=
+						c =3D min(count, ATA_DSM_RANGE_MAX - lastcount);=0A=
+						lastcount +=3D c;=0A=
+						off =3D (ranges - 1) * 8;=0A=
+						buf[off + 6] =3D lastcount & 0xff;=0A=
+						buf[off + 7] =3D (lastcount >> 8) & 0xff;=0A=
+						count -=3D c;=0A=
+						lba +=3D c;=0A=
+					}=0A=
+=0A=
+					while (count > 0) {=0A=
+						c =3D min(count, ATA_DSM_RANGE_MAX);=0A=
+						off =3D ranges * 8;=0A=
+=0A=
+						buf[off + 0] =3D lba & 0xff;=0A=
+						buf[off + 1] =3D (lba >> 8) & 0xff;=0A=
+						buf[off + 2] =3D (lba >> 16) & 0xff;=0A=
+						buf[off + 3] =3D (lba >> 24) & 0xff;=0A=
+						buf[off + 4] =3D (lba >> 32) & 0xff;=0A=
+						buf[off + 5] =3D (lba >> 40) & 0xff;=0A=
+						buf[off + 6] =3D c & 0xff;=0A=
+						buf[off + 7] =3D (c >> 8) & 0xff;=0A=
+						lba +=3D c;=0A=
+						ranges++;=0A=
+						count -=3D c;=0A=
+						lastcount =3D c;=0A=
+						if (count !=3D 0 && ranges =3D=3D softc->trim_max_ranges) {=0A=
+							xpt_print(periph->path,=0A=
+							  "%s issuing short delete %ld > %ld",=0A=
+							  da_delete_method_desc[softc->delete_method],=0A=
+							  requestcount,=0A=
+							  (softc->trim_max_ranges - ranges) *=0A=
+							  ATA_DSM_RANGE_MAX);=0A=
+							break;=0A=
+						}=0A=
+					}=0A=
+					lastlba =3D lba;=0A=
+					bp1 =3D bioq_first(&softc->delete_queue);=0A=
+					if (bp1 =3D=3D NULL ||=0A=
+					    bp1->bio_bcount / softc->params.secsize >=0A=
+					    (softc->trim_max_ranges - ranges) *=0A=
+						    ATA_DSM_RANGE_MAX)=0A=
+						break;=0A=
+				} while (1);=0A=
+=0A=
+				block_count =3D (ranges + ATA_DSM_BLK_RANGES - 1) /=0A=
+					      ATA_DSM_BLK_RANGES;=0A=
+				scsi_ata_trim(&start_ccb->csio,=0A=
+						/*retries*/da_retry_count,=0A=
+						/*cbfcnp*/dadone,=0A=
+						/*tag_action*/MSG_SIMPLE_Q_TAG,=0A=
+						block_count,=0A=
+						/*data_ptr*/buf,=0A=
+						/*dxfer_len*/block_count * ATA_DSM_BLK_SIZE,=0A=
+						/*sense_len*/SSD_FULL_SIZE,=0A=
+						da_default_timeout * 1000);=0A=
+				start_ccb->ccb_h.ccb_state =3D DA_CCB_DELETE;=0A=
+				goto out;=0A=
 		    } else if (softc->delete_method =3D=3D DA_DELETE_ZERO ||=0A=
 			       softc->delete_method =3D=3D DA_DELETE_WS10 ||=0A=
 			       softc->delete_method =3D=3D DA_DELETE_WS16) {=0A=
+			uint64_t ws_max_blks;=0A=
+			ws_max_blks =3D softc->ws_max_blks / softc->params.secsize;=0A=
 			softc->delete_running =3D 1;=0A=
 			lba =3D bp->bio_pblkno;=0A=
 			count =3D 0;=0A=
@@ -1898,11 +2162,19 @@=0A=
 				if (bp1 !=3D bp)=0A=
 					bioq_insert_tail(&softc->delete_run_queue, bp1);=0A=
 				count +=3D bp1->bio_bcount / softc->params.secsize;=0A=
+				if (count > ws_max_blks) {=0A=
+					count =3D min(count, ws_max_blks);=0A=
+					xpt_print(periph->path,=0A=
+					  "%s issuing short delete %ld > %ld",=0A=
+					  da_delete_method_desc[softc->delete_method],=0A=
+					  count, ws_max_blks);=0A=
+					break;=0A=
+				}=0A=
 				bp1 =3D bioq_first(&softc->delete_queue);=0A=
 				if (bp1 =3D=3D NULL ||=0A=
 				    lba + count !=3D bp1->bio_pblkno ||=0A=
 				    count + bp1->bio_bcount /=0A=
-				     softc->params.secsize > 0xffff)=0A=
+				     softc->params.secsize > ws_max_blks)=0A=
 					break;=0A=
 			} while (1);=0A=
 =0A=
@@ -2022,7 +2294,7 @@=0A=
 		daschedule(periph);=0A=
 		break;=0A=
 	}=0A=
-	case DA_STATE_PROBE:=0A=
+	case DA_STATE_PROBE_RC:=0A=
 	{=0A=
 		struct scsi_read_capacity_data *rcap;=0A=
 =0A=
@@ -2041,11 +2313,11 @@=0A=
 				   SSD_FULL_SIZE,=0A=
 				   /*timeout*/5000);=0A=
 		start_ccb->ccb_h.ccb_bp =3D NULL;=0A=
-		start_ccb->ccb_h.ccb_state =3D DA_CCB_PROBE;=0A=
+		start_ccb->ccb_h.ccb_state =3D DA_CCB_PROBE_RC;=0A=
 		xpt_action(start_ccb);=0A=
 		break;=0A=
 	}=0A=
-	case DA_STATE_PROBE2:=0A=
+	case DA_STATE_PROBE_RC16:=0A=
 	{=0A=
 		struct scsi_read_capacity_data_long *rcaplong;=0A=
 =0A=
@@ -2067,11 +2339,151 @@=0A=
 				      /*sense_len*/ SSD_FULL_SIZE,=0A=
 				      /*timeout*/ da_default_timeout * 1000);=0A=
 		start_ccb->ccb_h.ccb_bp =3D NULL;=0A=
-		start_ccb->ccb_h.ccb_state =3D DA_CCB_PROBE2;=0A=
-		xpt_action(start_ccb);	=0A=
+		start_ccb->ccb_h.ccb_state =3D DA_CCB_PROBE_RC16;=0A=
+		xpt_action(start_ccb);=0A=
 		break;=0A=
 	}=0A=
+	case DA_STATE_PROBE_LBP:=0A=
+	{=0A=
+		struct scsi_vpd_logical_block_prov *lbp;=0A=
+=0A=
+		if (!scsi_vpd_supported_page(periph, SVPD_LBP)) {=0A=
+			/*=0A=
+			 * If we get here we don't support any SBC-3 delete=0A=
+			 * methods with UNMAP as the Logical Block Provisioning=0A=
+			 * VPD page support is required for devices which=0A=
+			 * support it according to T10/1799-D Revision 31=0A=
+			 * however older revisions of the spec don't mandate=0A=
+			 * this so we currently don't remove these methods=0A=
+			 * from the available set.=0A=
+			 */=0A=
+			softc->state =3D DA_STATE_PROBE_BLK_LIMITS;=0A=
+			goto skipstate;=0A=
+		}=0A=
+=0A=
+		lbp =3D (struct scsi_vpd_logical_block_prov *)=0A=
+			malloc(sizeof(*lbp), M_SCSIDA, M_NOWAIT|M_ZERO);=0A=
+=0A=
+		if (lbp =3D=3D NULL) {=0A=
+			printf("dastart: Couldn't malloc lbp data\n");=0A=
+			/* da_free_periph??? */=0A=
+			break;=0A=
+		}=0A=
+=0A=
+		scsi_inquiry(&start_ccb->csio,=0A=
+			     /*retries*/da_retry_count,=0A=
+			     /*cbfcnp*/dadone,=0A=
+			     /*tag_action*/MSG_SIMPLE_Q_TAG,=0A=
+			     /*inq_buf*/(u_int8_t *)lbp,=0A=
+			     /*inq_len*/sizeof(*lbp),=0A=
+			     /*evpd*/TRUE,=0A=
+			     /*page_code*/SVPD_LBP,=0A=
+			     /*sense_len*/SSD_MIN_SIZE,=0A=
+			     /*timeout*/da_default_timeout * 1000);=0A=
+		start_ccb->ccb_h.ccb_bp =3D NULL;=0A=
+		start_ccb->ccb_h.ccb_state =3D DA_CCB_PROBE_LBP;=0A=
+		xpt_action(start_ccb);=0A=
+		break;=0A=
 	}=0A=
+	case DA_STATE_PROBE_BLK_LIMITS:=0A=
+	{=0A=
+		struct scsi_vpd_block_limits *block_limits;=0A=
+=0A=
+		if (!scsi_vpd_supported_page(periph, SVPD_BLOCK_LIMITS)) {=0A=
+			/* Not supported skip to next probe */=0A=
+			softc->state =3D DA_STATE_PROBE_BDC;=0A=
+			goto skipstate;=0A=
+		}=0A=
+=0A=
+		block_limits =3D (struct scsi_vpd_block_limits *)=0A=
+			malloc(sizeof(*block_limits), M_SCSIDA, M_NOWAIT|M_ZERO);=0A=
+=0A=
+		if (block_limits =3D=3D NULL) {=0A=
+			printf("dastart: Couldn't malloc block_limits data\n");=0A=
+			/* da_free_periph??? */=0A=
+			break;=0A=
+		}=0A=
+=0A=
+		scsi_inquiry(&start_ccb->csio,=0A=
+			     /*retries*/da_retry_count,=0A=
+			     /*cbfcnp*/dadone,=0A=
+			     /*tag_action*/MSG_SIMPLE_Q_TAG,=0A=
+			     /*inq_buf*/(u_int8_t *)block_limits,=0A=
+			     /*inq_len*/sizeof(*block_limits),=0A=
+			     /*evpd*/TRUE,=0A=
+			     /*page_code*/SVPD_BLOCK_LIMITS,=0A=
+			     /*sense_len*/SSD_MIN_SIZE,=0A=
+			     /*timeout*/da_default_timeout * 1000);=0A=
+		start_ccb->ccb_h.ccb_bp =3D NULL;=0A=
+		start_ccb->ccb_h.ccb_state =3D DA_CCB_PROBE_BLK_LIMITS;=0A=
+		xpt_action(start_ccb);=0A=
+		break;=0A=
+	}=0A=
+	case DA_STATE_PROBE_BDC:=0A=
+	{=0A=
+		struct scsi_vpd_block_characteristics *bdc;=0A=
+=0A=
+		if (!scsi_vpd_supported_page(periph, SVPD_BDC)) {=0A=
+			softc->state =3D DA_STATE_PROBE_ATA;=0A=
+			goto skipstate;=0A=
+		}=0A=
+=0A=
+		bdc =3D (struct scsi_vpd_block_characteristics *)=0A=
+			malloc(sizeof(*bdc), M_SCSIDA, M_NOWAIT|M_ZERO);=0A=
+=0A=
+		if (bdc =3D=3D NULL) {=0A=
+			printf("dastart: Couldn't malloc bdc data\n");=0A=
+			/* da_free_periph??? */=0A=
+			break;=0A=
+		}=0A=
+=0A=
+		scsi_inquiry(&start_ccb->csio,=0A=
+			     /*retries*/da_retry_count,=0A=
+			     /*cbfcnp*/dadone,=0A=
+			     /*tag_action*/MSG_SIMPLE_Q_TAG,=0A=
+			     /*inq_buf*/(u_int8_t *)bdc,=0A=
+			     /*inq_len*/sizeof(*bdc),=0A=
+			     /*evpd*/TRUE,=0A=
+			     /*page_code*/SVPD_BDC,=0A=
+			     /*sense_len*/SSD_MIN_SIZE,=0A=
+			     /*timeout*/da_default_timeout * 1000);=0A=
+		start_ccb->ccb_h.ccb_bp =3D NULL;=0A=
+		start_ccb->ccb_h.ccb_state =3D DA_CCB_PROBE_BDC;=0A=
+		xpt_action(start_ccb);=0A=
+		break;=0A=
+	}=0A=
+	case DA_STATE_PROBE_ATA:=0A=
+	{=0A=
+		struct ata_params *ata_params;=0A=
+=0A=
+		if (!scsi_vpd_supported_page(periph, SVPD_ATA_INFORMATION)) {=0A=
+			daprobedone(periph, start_ccb);=0A=
+			break;=0A=
+		}=0A=
+=0A=
+		ata_params =3D (struct ata_params*)=0A=
+			malloc(sizeof(*ata_params), M_SCSIDA, M_NOWAIT|M_ZERO);=0A=
+=0A=
+		if (ata_params =3D=3D NULL) {=0A=
+			printf("dastart: Couldn't malloc ata_params data\n");=0A=
+			/* da_free_periph??? */=0A=
+			break;=0A=
+		}=0A=
+=0A=
+		scsi_ata_identify(&start_ccb->csio,=0A=
+				  /*retries*/da_retry_count,=0A=
+				  /*cbfcnp*/dadone,=0A=
+                                  /*tag_action*/MSG_SIMPLE_Q_TAG,=0A=
+				  /*data_ptr*/(u_int8_t *)ata_params,=0A=
+				  /*dxfer_len*/sizeof(*ata_params),=0A=
+				  /*sense_len*/SSD_FULL_SIZE,=0A=
+				  /*timeout*/da_default_timeout * 1000);=0A=
+		start_ccb->ccb_h.ccb_bp =3D NULL;=0A=
+		start_ccb->ccb_h.ccb_state =3D DA_CCB_PROBE_ATA;=0A=
+		xpt_action(start_ccb);=0A=
+		break;=0A=
+	}=0A=
+	}=0A=
 }=0A=
 =0A=
 static int=0A=
@@ -2088,30 +2500,41 @@=0A=
 	softc =3D (struct da_softc *)xpt_path_periph(ccb->ccb_h.path)->softc;=0A=
 =0A=
 	if (ccb->ccb_h.ccb_state =3D=3D DA_CCB_DELETE) {=0A=
-		if (softc->delete_method =3D=3D DA_DELETE_UNMAP) {=0A=
-			xpt_print(ccb->ccb_h.path, "UNMAP is not supported, "=0A=
-			    "switching to WRITE SAME(16) with UNMAP.\n");=0A=
-			dadeletemethodset(softc, DA_DELETE_WS16);=0A=
-		} else if (softc->delete_method =3D=3D DA_DELETE_WS16) {=0A=
+		da_delete_methods old_method =3D softc->delete_method;=0A=
+=0A=
+		/*=0A=
+		 * Typically there are two reasons for failure here=0A=
+		 * 1. Delete method was detected as supported but isn't=0A=
+		 * 2. Delete failed due to invalid params e.g. too big=0A=
+		 *=0A=
+		 * While we will attempt to choose an alternative delete method=0A=
+		 * this may result in short deletes if the existing delete=0A=
+		 * requests from geom are big for the new method choosen.=0A=
+		 *=0A=
+		 * This method assumes that the error which triggered this=0A=
+		 * will not retry the io otherwise a panic will occur=0A=
+		 */=0A=
+		dadeleteflag(softc, old_method, 0);=0A=
+		dadeletemethodchoose(softc, DA_DELETE_DISABLE);=0A=
+		if (softc->delete_method =3D=3D DA_DELETE_DISABLE)=0A=
 			xpt_print(ccb->ccb_h.path,=0A=
-			    "WRITE SAME(16) with UNMAP is not supported, "=0A=
-			    "disabling BIO_DELETE.\n");=0A=
-			dadeletemethodset(softc, DA_DELETE_DISABLE);=0A=
-		} else if (softc->delete_method =3D=3D DA_DELETE_WS10) {=0A=
+				  "%s failed, disabling BIO_DELETE\n",=0A=
+				  da_delete_method_desc[old_method]);=0A=
+		else=0A=
 			xpt_print(ccb->ccb_h.path,=0A=
-			    "WRITE SAME(10) with UNMAP is not supported, "=0A=
-			    "disabling BIO_DELETE.\n");=0A=
-			dadeletemethodset(softc, DA_DELETE_DISABLE);=0A=
-		} else if (softc->delete_method =3D=3D DA_DELETE_ZERO) {=0A=
-			xpt_print(ccb->ccb_h.path,=0A=
-			    "WRITE SAME(10) is not supported, "=0A=
-			    "disabling BIO_DELETE.\n");=0A=
-			dadeletemethodset(softc, DA_DELETE_DISABLE);=0A=
-		} else=0A=
-			dadeletemethodset(softc, DA_DELETE_DISABLE);=0A=
-		while ((bp =3D bioq_takefirst(&softc->delete_run_queue))=0A=
-		    !=3D NULL)=0A=
-			bioq_disksort(&softc->delete_queue, bp);=0A=
+				  "%s failed, switching to %s BIO_DELETE\n",=0A=
+				  da_delete_method_desc[old_method],=0A=
+				  da_delete_method_desc[softc->delete_method]);=0A=
+=0A=
+		if (DA_SIO) {=0A=
+			while ((bp =3D bioq_takefirst(&softc->delete_run_queue))=0A=
+			    !=3D NULL)=0A=
+				bioq_disksort(&softc->delete_queue, bp);=0A=
+		} else {=0A=
+			while ((bp =3D bioq_takefirst(&softc->delete_run_queue))=0A=
+			    !=3D NULL)=0A=
+				bioq_insert_tail(&softc->delete_queue, bp);=0A=
+		}=0A=
 		bioq_insert_tail(&softc->delete_queue,=0A=
 		    (struct bio *)ccb->ccb_h.ccb_bp);=0A=
 		ccb->ccb_h.ccb_bp =3D NULL;=0A=
@@ -2185,7 +2608,7 @@=0A=
 			error =3D daerror(done_ccb, CAM_RETRY_SELTO, sf);=0A=
 			if (error =3D=3D ERESTART) {=0A=
 				/*=0A=
-				 * A retry was scheuled, so=0A=
+				 * A retry was scheduled, so=0A=
 				 * just return.=0A=
 				 */=0A=
 				return;=0A=
@@ -2281,16 +2704,18 @@=0A=
 			biodone(bp);=0A=
 		break;=0A=
 	}=0A=
-	case DA_CCB_PROBE:=0A=
-	case DA_CCB_PROBE2:=0A=
+	case DA_CCB_PROBE_RC:=0A=
+	case DA_CCB_PROBE_RC16:=0A=
 	{=0A=
 		struct	   scsi_read_capacity_data *rdcap;=0A=
 		struct     scsi_read_capacity_data_long *rcaplong;=0A=
 		char	   announce_buf[80];=0A=
+		int	   lbp;=0A=
 =0A=
+		lbp =3D 0;=0A=
 		rdcap =3D NULL;=0A=
 		rcaplong =3D NULL;=0A=
-		if (state =3D=3D DA_CCB_PROBE)=0A=
+		if (state =3D=3D DA_CCB_PROBE_RC)=0A=
 			rdcap =3D(struct scsi_read_capacity_data *)csio->data_ptr;=0A=
 		else=0A=
 			rcaplong =3D (struct scsi_read_capacity_data_long *)=0A=
@@ -2303,7 +2728,7 @@=0A=
 			u_int lbppbe;	/* LB per physical block exponent. */=0A=
 			u_int lalba;	/* Lowest aligned LBA. */=0A=
 =0A=
-			if (state =3D=3D DA_CCB_PROBE) {=0A=
+			if (state =3D=3D DA_CCB_PROBE_RC) {=0A=
 				block_size =3D scsi_4btoul(rdcap->length);=0A=
 				maxsector =3D scsi_4btoul(rdcap->addr);=0A=
 				lbppbe =3D 0;=0A=
@@ -2318,9 +2743,9 @@=0A=
 				 * with the short version of the command.=0A=
 				 */=0A=
 				if (maxsector =3D=3D 0xffffffff) {=0A=
-					softc->state =3D DA_STATE_PROBE2;=0A=
 					free(rdcap, M_SCSIDA);=0A=
 					xpt_release_ccb(done_ccb);=0A=
+					softc->state =3D DA_STATE_PROBE_RC16;=0A=
 					xpt_schedule(periph, priority);=0A=
 					return;=0A=
 				}=0A=
@@ -2353,9 +2778,7 @@=0A=
 				 */=0A=
 				dasetgeom(periph, block_size, maxsector,=0A=
 					  rcaplong, sizeof(*rcaplong));=0A=
-				if ((lalba & SRC16_LBPME_A)=0A=
-				 && softc->delete_method =3D=3D DA_DELETE_NONE)=0A=
-					dadeletemethodset(softc, DA_DELETE_UNMAP);=0A=
+				lbp =3D (lalba & SRC16_LBPME_A);=0A=
 				dp =3D &softc->params;=0A=
 				snprintf(announce_buf, sizeof(announce_buf),=0A=
 				        "%juMB (%ju %u byte sectors: %dH %dS/T "=0A=
@@ -2416,7 +2839,7 @@=0A=
 				 * If we tried READ CAPACITY(16) and failed,=0A=
 				 * fallback to READ CAPACITY(10).=0A=
 				 */=0A=
-				if ((state =3D=3D DA_CCB_PROBE2) &&=0A=
+				if ((state =3D=3D DA_CCB_PROBE_RC16) &&=0A=
 				    (softc->flags & DA_FLAG_CAN_RC16) &&=0A=
 				    (((csio->ccb_h.status & CAM_STATUS_MASK) =3D=3D=0A=
 					CAM_REQ_INVALID) ||=0A=
@@ -2424,9 +2847,9 @@=0A=
 				      (error_code =3D=3D SSD_CURRENT_ERROR) &&=0A=
 				      (sense_key =3D=3D SSD_KEY_ILLEGAL_REQUEST)))) {=0A=
 					softc->flags &=3D ~DA_FLAG_CAN_RC16;=0A=
-					softc->state =3D DA_STATE_PROBE;=0A=
 					free(rdcap, M_SCSIDA);=0A=
 					xpt_release_ccb(done_ccb);=0A=
+					softc->state =3D DA_STATE_PROBE_RC;=0A=
 					xpt_schedule(periph, priority);=0A=
 					return;=0A=
 				} else=0A=
@@ -2482,30 +2905,250 @@=0A=
 				taskqueue_enqueue(taskqueue_thread,=0A=
 						  &softc->sysctl_task);=0A=
 				xpt_announce_periph(periph, announce_buf);=0A=
+=0A=
 			} else {=0A=
 				xpt_print(periph->path, "fatal error, "=0A=
 				    "could not acquire reference count\n");=0A=
 			}=0A=
 		}=0A=
-		/*=0A=
-		 * Since our peripheral may be invalidated by an error=0A=
-		 * above or an external event, we must release our CCB=0A=
-		 * before releasing the probe lock on the peripheral.=0A=
-		 * The peripheral will only go away once the last lock=0A=
-		 * is removed, and we need it around for the CCB release=0A=
-		 * operation.=0A=
-		 */=0A=
+=0A=
+		/* Ensure re-probe doesn't see old delete. */=0A=
+		softc->delete_available =3D 0;=0A=
+		if (lbp) {=0A=
+			/*=0A=
+			 * Based on older SBC-3 spec revisions=0A=
+			 * any of the UNMAP methods "may" be=0A=
+			 * available via LBP given this flag so=0A=
+			 * we flag all of them as availble and=0A=
+			 * then remove those which further=0A=
+			 * probes confirm aren't available=0A=
+			 * later.=0A=
+			 *=0A=
+			 * We could also check readcap(16) p_type=0A=
+			 * flag to exclude one or more invalid=0A=
+			 * write same (X) types here=0A=
+			 */=0A=
+			dadeleteflag(softc, DA_DELETE_WS16, 1);=0A=
+			dadeleteflag(softc, DA_DELETE_WS10, 1);=0A=
+			dadeleteflag(softc, DA_DELETE_ZERO, 1);=0A=
+			dadeleteflag(softc, DA_DELETE_UNMAP, 1);=0A=
+=0A=
+			xpt_release_ccb(done_ccb);=0A=
+			softc->state =3D DA_STATE_PROBE_LBP;=0A=
+			xpt_schedule(periph, priority);=0A=
+			return;=0A=
+		}=0A=
+=0A=
 		xpt_release_ccb(done_ccb);=0A=
-		softc->state =3D DA_STATE_NORMAL;=0A=
-		daschedule(periph);=0A=
-		wakeup(&softc->disk->d_mediasize);=0A=
-		if ((softc->flags & DA_FLAG_PROBED) =3D=3D 0) {=0A=
-			softc->flags |=3D DA_FLAG_PROBED;=0A=
-			cam_periph_unhold(periph);=0A=
-		} else=0A=
-			cam_periph_release_locked(periph);=0A=
+		softc->state =3D DA_STATE_PROBE_BDC;=0A=
+		xpt_schedule(periph, priority);=0A=
 		return;=0A=
 	}=0A=
+	case DA_CCB_PROBE_LBP:=0A=
+	{=0A=
+		struct scsi_vpd_logical_block_prov *lbp;=0A=
+=0A=
+		lbp =3D (struct scsi_vpd_logical_block_prov *)csio->data_ptr;=0A=
+=0A=
+		if ((csio->ccb_h.status & CAM_STATUS_MASK) =3D=3D CAM_REQ_CMP) {=0A=
+			/*=0A=
+			 * T10/1799-D Revision 31 states at least one of these=0A=
+			 * must be supported but we don't currently enforce this.=0A=
+			 */=0A=
+			dadeleteflag(softc, DA_DELETE_WS16,=0A=
+				     (lbp->flags & SVPD_LBP_WS16));=0A=
+			dadeleteflag(softc, DA_DELETE_WS10,=0A=
+				     (lbp->flags & SVPD_LBP_WS10));=0A=
+			dadeleteflag(softc, DA_DELETE_ZERO,=0A=
+				     (lbp->flags & SVPD_LBP_WS10));=0A=
+			dadeleteflag(softc, DA_DELETE_UNMAP,=0A=
+				     (lbp->flags & SVPD_LBP_UNMAP));=0A=
+=0A=
+			if (lbp->flags & SVPD_LBP_UNMAP) {=0A=
+				free(lbp, M_SCSIDA);=0A=
+				xpt_release_ccb(done_ccb);=0A=
+				softc->state =3D DA_STATE_PROBE_BLK_LIMITS;=0A=
+				xpt_schedule(periph, priority);=0A=
+				return;=0A=
+			}=0A=
+		} else {=0A=
+			int error;=0A=
+			error =3D daerror(done_ccb, CAM_RETRY_SELTO,=0A=
+					SF_RETRY_UA|SF_NO_PRINT);=0A=
+			if (error =3D=3D ERESTART)=0A=
+				return;=0A=
+			else if (error !=3D 0) {=0A=
+				if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) !=3D 0) {=0A=
+					/* Don't wedge this device's queue */=0A=
+					cam_release_devq(done_ccb->ccb_h.path,=0A=
+							 /*relsim_flags*/0,=0A=
+							 /*reduction*/0,=0A=
+							 /*timeout*/0,=0A=
+							 /*getcount_only*/0);=0A=
+				}=0A=
+=0A=
+				/*=0A=
+				 * Failure indicates we don't support any SBC-3=0A=
+				 * delete methods with UNMAP=0A=
+				 */=0A=
+			}=0A=
+		}=0A=
+=0A=
+		free(lbp, M_SCSIDA);=0A=
+		xpt_release_ccb(done_ccb);=0A=
+		softc->state =3D DA_STATE_PROBE_BDC;=0A=
+		xpt_schedule(periph, priority);=0A=
+		return;=0A=
+	}=0A=
+	case DA_CCB_PROBE_BLK_LIMITS:=0A=
+	{=0A=
+		struct scsi_vpd_block_limits *block_limits;=0A=
+=0A=
+		block_limits =3D (struct scsi_vpd_block_limits *)csio->data_ptr;=0A=
+=0A=
+		if ((csio->ccb_h.status & CAM_STATUS_MASK) =3D=3D CAM_REQ_CMP) {=0A=
+			uint32_t max_unmap_lba_cnt =3D scsi_4btoul(=0A=
+				block_limits->max_unmap_lba_cnt);=0A=
+			uint32_t max_unmap_blk_cnt =3D scsi_4btoul(=0A=
+				block_limits->max_unmap_blk_cnt);=0A=
+			uint64_t ws_max_blks =3D scsi_8btou64(=0A=
+				block_limits->max_write_same_length);=0A=
+			/*=0A=
+			 * We should already support UNMAP but we check lba=0A=
+			 * and block count to be sure=0A=
+			 */=0A=
+			if (max_unmap_lba_cnt !=3D 0x00L &&=0A=
+			    max_unmap_blk_cnt !=3D 0x00L) {=0A=
+				softc->unmap_max_lba =3D max_unmap_lba_cnt;=0A=
+				softc->unmap_max_ranges =3D min(max_unmap_blk_cnt,=0A=
+					UNMAP_MAX_RANGES);=0A=
+			} else {=0A=
+				/*=0A=
+				 * Unexpected UNMAP limits which means the=0A=
+				 * device doesn't actually support UNMAP=0A=
+				 */=0A=
+				dadeleteflag(softc, DA_DELETE_UNMAP, 0);=0A=
+			}=0A=
+=0A=
+			if (ws_max_blks !=3D 0x00L)=0A=
+				softc->ws_max_blks =3D ws_max_blks;=0A=
+		} else {=0A=
+			int error;=0A=
+			error =3D daerror(done_ccb, CAM_RETRY_SELTO,=0A=
+					SF_RETRY_UA|SF_NO_PRINT);=0A=
+			if (error =3D=3D ERESTART)=0A=
+				return;=0A=
+			else if (error !=3D 0) {=0A=
+				if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) !=3D 0) {=0A=
+					/* Don't wedge this device's queue */=0A=
+					cam_release_devq(done_ccb->ccb_h.path,=0A=
+							 /*relsim_flags*/0,=0A=
+							 /*reduction*/0,=0A=
+							 /*timeout*/0,=0A=
+							 /*getcount_only*/0);=0A=
+				}=0A=
+=0A=
+				/*=0A=
+				 * Failure here doesn't mean UNMAP is not=0A=
+				 * supported as this is an optional page.=0A=
+				 */=0A=
+				softc->unmap_max_lba =3D 1;=0A=
+				softc->unmap_max_ranges =3D 1;=0A=
+			}=0A=
+		}=0A=
+=0A=
+		free(block_limits, M_SCSIDA);=0A=
+		xpt_release_ccb(done_ccb);=0A=
+		softc->state =3D DA_STATE_PROBE_BDC;=0A=
+		xpt_schedule(periph, priority);=0A=
+		return;=0A=
+	}=0A=
+	case DA_CCB_PROBE_BDC:=0A=
+	{=0A=
+		struct scsi_vpd_block_characteristics *bdc;=0A=
+=0A=
+		bdc =3D (struct scsi_vpd_block_characteristics *)csio->data_ptr;=0A=
+=0A=
+		if ((csio->ccb_h.status & CAM_STATUS_MASK) =3D=3D CAM_REQ_CMP) {=0A=
+			/*=0A=
+			 * Disable queue sorting for non-rotational media=0A=
+			 * by default.=0A=
+			 */=0A=
+			if (scsi_2btoul(bdc->medium_rotation_rate) =3D=3D=0A=
+			    SVPD_BDC_RATE_NONE_ROTATING)=0A=
+				softc->sort_io_queue =3D 0;=0A=
+		} else {=0A=
+			int error;=0A=
+			error =3D daerror(done_ccb, CAM_RETRY_SELTO,=0A=
+					SF_RETRY_UA|SF_NO_PRINT);=0A=
+			if (error =3D=3D ERESTART)=0A=
+				return;=0A=
+			else if (error !=3D 0) {=0A=
+				if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) !=3D 0) {=0A=
+					/* Don't wedge this device's queue */=0A=
+					cam_release_devq(done_ccb->ccb_h.path,=0A=
+							 /*relsim_flags*/0,=0A=
+							 /*reduction*/0,=0A=
+							 /*timeout*/0,=0A=
+							 /*getcount_only*/0);=0A=
+				}=0A=
+			}=0A=
+		}=0A=
+=0A=
+		free(bdc, M_SCSIDA);=0A=
+		xpt_release_ccb(done_ccb);=0A=
+		softc->state =3D DA_STATE_PROBE_ATA;=0A=
+		xpt_schedule(periph, priority);=0A=
+		return;=0A=
+	}=0A=
+	case DA_CCB_PROBE_ATA:=0A=
+	{=0A=
+		int i;=0A=
+		struct ata_params *ata_params;=0A=
+		int16_t *ptr;=0A=
+=0A=
+		ata_params =3D (struct ata_params *)csio->data_ptr;=0A=
+		ptr =3D (uint16_t *)ata_params;=0A=
+=0A=
+		if ((csio->ccb_h.status & CAM_STATUS_MASK) =3D=3D CAM_REQ_CMP) {=0A=
+			for (i =3D 0; i < sizeof(*ata_params) / 2; i++)=0A=
+				ptr[i] =3D le16toh(ptr[i]);=0A=
+			if (ata_params->support_dsm & ATA_SUPPORT_DSM_TRIM) {=0A=
+				dadeleteflag(softc, DA_DELETE_ATA_TRIM, 1);=0A=
+				if (ata_params->max_dsm_blocks !=3D 0)=0A=
+					softc->trim_max_ranges =3D min(=0A=
+					  softc->trim_max_ranges,=0A=
+					  ata_params->max_dsm_blocks *=0A=
+					  ATA_DSM_BLK_RANGES);=0A=
+			}=0A=
+			/*=0A=
+			 * Disable queue sorting for non-rotational media=0A=
+			 * by default.=0A=
+			 */=0A=
+			if (ata_params->media_rotation_rate =3D=3D 1)=0A=
+				softc->sort_io_queue =3D 0;=0A=
+		} else {=0A=
+			int error;=0A=
+			error =3D daerror(done_ccb, CAM_RETRY_SELTO,=0A=
+					SF_RETRY_UA|SF_NO_PRINT);=0A=
+			if (error =3D=3D ERESTART)=0A=
+				return;=0A=
+			else if (error !=3D 0) {=0A=
+				if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) !=3D 0) {=0A=
+					/* Don't wedge this device's queue */=0A=
+					cam_release_devq(done_ccb->ccb_h.path,=0A=
+							 /*relsim_flags*/0,=0A=
+							 /*reduction*/0,=0A=
+							 /*timeout*/0,=0A=
+							 /*getcount_only*/0);=0A=
+				}=0A=
+			}=0A=
+		}=0A=
+=0A=
+		free(ata_params, M_SCSIDA);=0A=
+		daprobedone(periph, done_ccb);=0A=
+		return;=0A=
+	}=0A=
 	case DA_CCB_WAITING:=0A=
 	{=0A=
 		/* Caller will release the CCB */=0A=
@@ -2549,7 +3192,7 @@=0A=
 	softc =3D (struct da_softc *)periph->softc;=0A=
 =0A=
 	/* Probe in progress; don't interfere. */=0A=
-	if ((softc->flags & DA_FLAG_PROBED) =3D=3D 0)=0A=
+	if (softc->state !=3D DA_STATE_NORMAL)=0A=
 		return;=0A=
 =0A=
 	status =3D cam_periph_acquire(periph);=0A=
@@ -2557,9 +3200,9 @@=0A=
 	    ("dareprobe: cam_periph_acquire failed"));=0A=
 =0A=
 	if (softc->flags & DA_FLAG_CAN_RC16)=0A=
-		softc->state =3D DA_STATE_PROBE2;=0A=
+		softc->state =3D DA_STATE_PROBE_RC16;=0A=
 	else=0A=
-		softc->state =3D DA_STATE_PROBE;=0A=
+		softc->state =3D DA_STATE_PROBE_RC;=0A=
 =0A=
 	xpt_schedule(periph, CAM_PRIORITY_DEV);=0A=
 }=0A=
@@ -2781,10 +3424,6 @@=0A=
 	softc->disk->d_fwheads =3D softc->params.heads;=0A=
 	softc->disk->d_devstat->block_size =3D softc->params.secsize;=0A=
 	softc->disk->d_devstat->flags &=3D ~DEVSTAT_BS_UNAVAILABLE;=0A=
-	if (softc->delete_method > DA_DELETE_DISABLE)=0A=
-		softc->disk->d_flags |=3D DISKFLAG_CANDELETE;=0A=
-	else=0A=
-		softc->disk->d_flags &=3D ~DISKFLAG_CANDELETE;=0A=
 }=0A=
 =0A=
 static void=0A=
Index: sys/cam/ata/ata_da.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cam/ata/ata_da.c	(revision 250577)=0A=
+++ sys/cam/ata/ata_da.c	(working copy)=0A=
@@ -117,10 +117,10 @@=0A=
 };=0A=
 =0A=
 #define TRIM_MAX_BLOCKS	8=0A=
-#define TRIM_MAX_RANGES	(TRIM_MAX_BLOCKS * 64)=0A=
+#define TRIM_MAX_RANGES	(TRIM_MAX_BLOCKS * ATA_DSM_BLK_RANGES)=0A=
 #define TRIM_MAX_BIOS	(TRIM_MAX_RANGES * 4)=0A=
 struct trim_request {=0A=
-	uint8_t		data[TRIM_MAX_RANGES * 8];=0A=
+	uint8_t		data[TRIM_MAX_RANGES * ATA_DSM_RANGE_SIZE];=0A=
 	struct bio	*bps[TRIM_MAX_BIOS];=0A=
 };=0A=
 =0A=
@@ -130,6 +130,7 @@=0A=
 	ada_state state;=0A=
 	ada_flags flags;	=0A=
 	ada_quirks quirks;=0A=
+	int	 sort_io_queue;=0A=
 	int	 ordered_tag_count;=0A=
 	int	 outstanding_cmds;=0A=
 	int	 trim_max_ranges;=0A=
@@ -449,6 +450,8 @@=0A=
 		 softc->read_ahead : ada_read_ahead)=0A=
 #define	ADA_WC	(softc->write_cache >=3D 0 ? \=0A=
 		 softc->write_cache : ada_write_cache)=0A=
+#define	ADA_SIO	(softc->sort_io_queue >=3D 0 ? \=0A=
+		 softc->sort_io_queue : cam_sort_io_queues)=0A=
 =0A=
 /*=0A=
  * Most platforms map firmware geometry to actual, but some don't.  If=0A=
@@ -659,10 +662,17 @@=0A=
 	 * Place it in the queue of disk activities for this disk=0A=
 	 */=0A=
 	if (bp->bio_cmd =3D=3D BIO_DELETE &&=0A=
-	    (softc->flags & ADA_FLAG_CAN_TRIM))=0A=
-		bioq_disksort(&softc->trim_queue, bp);=0A=
-	else=0A=
-		bioq_disksort(&softc->bio_queue, bp);=0A=
+	    (softc->flags & ADA_FLAG_CAN_TRIM)) {=0A=
+		if (ADA_SIO)=0A=
+		    bioq_disksort(&softc->trim_queue, bp);=0A=
+		else=0A=
+		    bioq_insert_tail(&softc->trim_queue, bp);=0A=
+	} else {=0A=
+		if (ADA_SIO)=0A=
+		    bioq_disksort(&softc->bio_queue, bp);=0A=
+		else=0A=
+		    bioq_insert_tail(&softc->bio_queue, bp);=0A=
+	}=0A=
 =0A=
 	/*=0A=
 	 * Schedule ourselves for performing the work.=0A=
@@ -1000,6 +1010,10 @@=0A=
 	SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree),=0A=
 		OID_AUTO, "write_cache", CTLFLAG_RW | CTLFLAG_MPSAFE,=0A=
 		&softc->write_cache, 0, "Enable disk write cache.");=0A=
+	SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree),=0A=
+		OID_AUTO, "sort_io_queue", CTLFLAG_RW | CTLFLAG_MPSAFE,=0A=
+		&softc->sort_io_queue, 0,=0A=
+		"Sort IO queue to try and optimise disk access patterns");=0A=
 #ifdef ADA_TEST_FAILURE=0A=
 	/*=0A=
 	 * Add a 'door bell' sysctl which allows one to set it from userland=0A=
@@ -1086,8 +1100,8 @@=0A=
 		softc->trim_max_ranges =3D TRIM_MAX_RANGES;=0A=
 		if (cgd->ident_data.max_dsm_blocks !=3D 0) {=0A=
 			softc->trim_max_ranges =3D=0A=
-			    min(cgd->ident_data.max_dsm_blocks * 64,=0A=
-				softc->trim_max_ranges);=0A=
+			    min(cgd->ident_data.max_dsm_blocks *=0A=
+				ATA_DSM_BLK_RANGES, softc->trim_max_ranges);=0A=
 		}=0A=
 	}=0A=
 	if (cgd->ident_data.support.command2 & ATA_SUPPORT_CFA)=0A=
@@ -1132,6 +1146,11 @@=0A=
 	snprintf(announce_buf, sizeof(announce_buf),=0A=
 	    "kern.cam.ada.%d.write_cache", periph->unit_number);=0A=
 	TUNABLE_INT_FETCH(announce_buf, &softc->write_cache);=0A=
+	/* Disable queue sorting for non-rotational media by default. */=0A=
+	if (cgd->ident_data.media_rotation_rate =3D=3D 1)=0A=
+		softc->sort_io_queue =3D 0;=0A=
+	else=0A=
+		softc->sort_io_queue =3D -1;=0A=
 	adagetparams(periph, cgd);=0A=
 	softc->disk =3D disk_alloc();=0A=
 	softc->disk->d_devstat =3D devstat_new_entry(periph->periph_name,=0A=
@@ -1162,10 +1181,12 @@=0A=
 	softc->disk->d_flags =3D 0;=0A=
 	if (softc->flags & ADA_FLAG_CAN_FLUSHCACHE)=0A=
 		softc->disk->d_flags |=3D DISKFLAG_CANFLUSHCACHE;=0A=
-	if ((softc->flags & ADA_FLAG_CAN_TRIM) ||=0A=
-	    ((softc->flags & ADA_FLAG_CAN_CFA) &&=0A=
-	    !(softc->flags & ADA_FLAG_CAN_48BIT)))=0A=
+	if (softc->flags & ADA_FLAG_CAN_TRIM) {=0A=
 		softc->disk->d_flags |=3D DISKFLAG_CANDELETE;=0A=
+	} else if ((softc->flags & ADA_FLAG_CAN_CFA) &&=0A=
+	    !(softc->flags & ADA_FLAG_CAN_48BIT)) {=0A=
+		softc->disk->d_flags |=3D DISKFLAG_CANDELETE;=0A=
+	}=0A=
 	strlcpy(softc->disk->d_descr, cgd->ident_data.model,=0A=
 	    MIN(sizeof(softc->disk->d_descr), sizeof(cgd->ident_data.model)));=0A=
 	strlcpy(softc->disk->d_ident, cgd->ident_data.serial,=0A=
@@ -1332,9 +1353,9 @@=0A=
 =0A=
 				/* Try to extend the previous range. */=0A=
 				if (lba =3D=3D lastlba) {=0A=
-					c =3D min(count, 0xffff - lastcount);=0A=
+					c =3D min(count, ATA_DSM_RANGE_MAX - lastcount);=0A=
 					lastcount +=3D c;=0A=
-					off =3D (ranges - 1) * 8;=0A=
+					off =3D (ranges - 1) * ATA_DSM_RANGE_SIZE;=0A=
 					req->data[off + 6] =3D lastcount & 0xff;=0A=
 					req->data[off + 7] =3D=0A=
 					    (lastcount >> 8) & 0xff;=0A=
@@ -1343,8 +1364,8 @@=0A=
 				}=0A=
 =0A=
 				while (count > 0) {=0A=
-					c =3D min(count, 0xffff);=0A=
-					off =3D ranges * 8;=0A=
+					c =3D min(count, ATA_DSM_RANGE_MAX);=0A=
+					off =3D ranges * ATA_DSM_RANGE_SIZE;=0A=
 					req->data[off + 0] =3D lba & 0xff;=0A=
 					req->data[off + 1] =3D (lba >> 8) & 0xff;=0A=
 					req->data[off + 2] =3D (lba >> 16) & 0xff;=0A=
@@ -1357,6 +1378,11 @@=0A=
 					count -=3D c;=0A=
 					lastcount =3D c;=0A=
 					ranges++;=0A=
+					/*=0A=
+					 * Its the caller's responsibility to ensure the=0A=
+					 * request will fit so we don't need to check for=0A=
+					 * overrun here=0A=
+					 */=0A=
 				}=0A=
 				lastlba =3D lba;=0A=
 				req->bps[bps++] =3D bp1;=0A=
@@ -1364,7 +1390,8 @@=0A=
 				if (bps >=3D TRIM_MAX_BIOS ||=0A=
 				    bp1 =3D=3D NULL ||=0A=
 				    bp1->bio_bcount / softc->params.secsize >=0A=
-				    (softc->trim_max_ranges - ranges) * 0xffff)=0A=
+				    (softc->trim_max_ranges - ranges) *=0A=
+				    ATA_DSM_RANGE_MAX)=0A=
 					break;=0A=
 			} while (1);=0A=
 			cam_fill_ataio(ataio,=0A=
@@ -1373,10 +1400,12 @@=0A=
 			    CAM_DIR_OUT,=0A=
 			    0,=0A=
 			    req->data,=0A=
-			    ((ranges + 63) / 64) * 512,=0A=
+			    ((ranges + ATA_DSM_BLK_RANGES - 1) /=0A=
+			        ATA_DSM_BLK_RANGES) * ATA_DSM_BLK_SIZE,=0A=
 			    ada_default_timeout * 1000);=0A=
 			ata_48bit_cmd(ataio, ATA_DATA_SET_MANAGEMENT,=0A=
-			    ATA_DSM_TRIM, 0, (ranges + 63) / 64);=0A=
+			    ATA_DSM_TRIM, 0, (ranges + ATA_DSM_BLK_RANGES -=0A=
+			    1) / ATA_DSM_BLK_RANGES);=0A=
 			start_ccb->ccb_h.ccb_state =3D ADA_CCB_TRIM;=0A=
 			goto out;=0A=
 		}=0A=
Index: sys/sys=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/sys	(revision 250577)=0A=
+++ sys/sys	(working copy)=0A=
=0A=
Property changes on: sys/sys=0A=
___________________________________________________________________=0A=
Modified: svn:mergeinfo=0A=
   Merged /head/sys/sys:r249931=0A=
Index: sys/sys/ata.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/sys/ata.h	(revision 250577)=0A=
+++ sys/sys/ata.h	(working copy)=0A=
@@ -261,6 +261,12 @@=0A=
 /*255*/ u_int16_t       integrity;=0A=
 } __packed;=0A=
 =0A=
+/* ATA Dataset Management */=0A=
+#define ATA_DSM_BLK_SIZE	512=0A=
+#define ATA_DSM_BLK_RANGES	64=0A=
+#define ATA_DSM_RANGE_SIZE	8=0A=
+#define ATA_DSM_RANGE_MAX	65535=0A=
+=0A=
 /*=0A=
  * ATA Device Register=0A=
  *=0A=
Index: sys/geom/geom_dev.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/geom/geom_dev.c	(revision 250577)=0A=
+++ sys/geom/geom_dev.c	(working copy)=0A=
@@ -52,6 +52,7 @@=0A=
 #include <sys/disk.h>=0A=
 #include <sys/fcntl.h>=0A=
 #include <sys/limits.h>=0A=
+#include <sys/sysctl.h>=0A=
 #include <geom/geom.h>=0A=
 #include <geom/geom_int.h>=0A=
 #include <machine/stdarg.h>=0A=
@@ -93,6 +94,19 @@=0A=
 	.attrchanged =3D g_dev_attrchanged=0A=
 };=0A=
 =0A=
+/*=0A=
+ * We target 262144 (8 x 32768) sectors by default as this significantly=0A=
+ * increases the throughput on commonly used SSD's with a marginal=0A=
+ * increase in non-interruptible request latency.=0A=
+ */=0A=
+static uint64_t g_dev_del_max_sectors =3D 262144;=0A=
+SYSCTL_DECL(_kern_geom);=0A=
+SYSCTL_NODE(_kern_geom, OID_AUTO, dev, CTLFLAG_RW, 0, "GEOM_DEV stuff");=0A=
+SYSCTL_QUAD(_kern_geom_dev, OID_AUTO, delete_max_sectors, CTLFLAG_RW,=0A=
+    &g_dev_del_max_sectors, 0, "Maximum number of sectors in a single "=0A=
+    "delete request sent to the provider. Larger requests are chunked "=0A=
+    "so they can be interrupted. (0 =3D disable chunking)");=0A=
+=0A=
 static void=0A=
 g_dev_destroy(void *arg, int flags __unused)=0A=
 {=0A=
@@ -412,17 +426,20 @@=0A=
 		}=0A=
 		while (length > 0) { =0A=
 			chunk =3D length;=0A=
-			if (chunk > 65536 * cp->provider->sectorsize)=0A=
-				chunk =3D 65536 * cp->provider->sectorsize;=0A=
+			if (g_dev_del_max_sectors !=3D 0 && chunk >=0A=
+			    g_dev_del_max_sectors * cp->provider->sectorsize) {=0A=
+				chunk =3D g_dev_del_max_sectors *=0A=
+				    cp->provider->sectorsize;=0A=
+			}=0A=
 			error =3D g_delete_data(cp, offset, chunk);=0A=
 			length -=3D chunk;=0A=
 			offset +=3D chunk;=0A=
 			if (error)=0A=
 				break;=0A=
 			/*=0A=
-			 * Since the request size is unbounded, the service=0A=
-			 * time is likewise.  We make this ioctl interruptible=0A=
-			 * by checking for signals for each bio.=0A=
+			 * Since the request size can be large, the service=0A=
+			 * time can be is likewise.  We make this ioctl=0A=
+			 * interruptible by checking for signals for each bio.=0A=
 			 */=0A=
 			if (SIGPENDING(td))=0A=
 				break;=0A=

------=_NextPart_000_0730_01CE57A0.ECC7BD20
Content-Type: application/octet-stream;
	name="mfc-zfs-trim-stable-9.patch"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="mfc-zfs-trim-stable-9.patch"

MFC r240868: Add TRIM support=0A=
MFC r244155: Renamed zfs trim stats=0A=
MFC r244187: Upgrade TRIM free request sizes optimisation =0A=
MFC r244188: Added vfs.zfs.vdev.trim_on_init sysctl=0A=
MFC r248572: Add TRIM support for L2ARC=0A=
MFC r248574: Improve TXG handling in the TRIM module=0A=
MFC r248575: TRIM cache devices based on time instead of TXGs=0A=
MFC r248576: Names the ZFS TRIM thread=0A=
MFC r248577: Optimisation of TRIM processing=0A=
MFC r248602: Fix for building libzpool under i386=0A=
MFC r249921: Enabled ZFS TRIM by default=0A=
Index: cddl/lib=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- cddl/lib	(revision 250526)=0A=
+++ cddl/lib	(working copy)=0A=
=0A=
Property changes on: cddl/lib=0A=
___________________________________________________________________=0A=
Modified: svn:mergeinfo=0A=
   Merged /head/cddl/lib:r240868=0A=
Index: cddl/lib/libzpool/Makefile=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- cddl/lib/libzpool/Makefile	(revision 250526)=0A=
+++ cddl/lib/libzpool/Makefile	(working copy)=0A=
@@ -26,7 +26,7 @@=0A=
 =0A=
 LIB=3D		zpool=0A=
 =0A=
-ZFS_COMMON_SRCS=3D ${ZFS_COMMON_OBJS:C/.o$/.c/} vdev_file.c=0A=
+ZFS_COMMON_SRCS=3D ${ZFS_COMMON_OBJS:C/.o$/.c/} vdev_file.c trim_map.c=0A=
 ZFS_SHARED_SRCS=3D ${ZFS_SHARED_OBJS:C/.o$/.c/}=0A=
 KERNEL_SRCS=3D	kernel.c taskq.c util.c=0A=
 LIST_SRCS=3D	list.c=0A=
Index: UPDATING=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- UPDATING	(revision 250526)=0A=
+++ UPDATING	(working copy)=0A=
@@ -11,6 +11,22 @@=0A=
 Items affecting the ports and packages system can be found in=0A=
 /usr/ports/UPDATING.  Please read that file before running portupgrade.=0A=
 =0A=
+20130512:=0A=
+	Added ZFS TRIM support which is enabled by default. To disable=0A=
+	ZFS TRIM support set vfs.zfs.trim.enabled=3D0 in loader.conf.=0A=
+=0A=
+	Creating new ZFS pools and adding new devices to existing pools=0A=
+	first performs a full device level TRIM, which can take a significant=0A=
+	amount of time. Set the sysctl vfs.zfs.vdev.trim_on_init to 0 to=0A=
+	disable this behaviour.=0A=
+=0A=
+	ZFS TRIM requires the underlying device support BIO_DELETE which=0A=
+	is currently provided by methods such as ATA TRIM and SCSI UNMAP=0A=
+	via CAM, which are typically supported by SSD's.=0A=
+=0A=
+	Stats for ZFS TRIM can be monitored by looking at the sysctl's=0A=
+	under kstat.zfs.misc.zio_trim.=0A=
+=0A=
 20130430:=0A=
 	The mergemaster command now uses the default MAKEOBJDIRPREFIX=0A=
 	rather than creating it's own in the temporary directory in=0A=
Index: sys=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys	(revision 250526)=0A=
+++ sys	(working copy)=0A=
=0A=
Property changes on: sys=0A=
___________________________________________________________________=0A=
Modified: svn:mergeinfo=0A=
   Merged =
/head/sys:r240868,244155,244187-244188,248572,248574-248577,248602,249921=0A=
Index: sys/modules=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/modules	(revision 250526)=0A=
+++ sys/modules	(working copy)=0A=
=0A=
Property changes on: sys/modules=0A=
___________________________________________________________________=0A=
Modified: svn:mergeinfo=0A=
   Merged /head/sys/modules:r240868=0A=
Index: sys/modules/zfs/Makefile=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/modules/zfs/Makefile	(revision 250526)=0A=
+++ sys/modules/zfs/Makefile	(working copy)=0A=
@@ -72,6 +72,7 @@=0A=
 ZFS_SRCS=3D	${ZFS_OBJS:C/.o$/.c/}=0A=
 SRCS+=3D	${ZFS_SRCS}=0A=
 SRCS+=3D	vdev_geom.c=0A=
+SRCS+=3D	trim_map.c=0A=
 =0A=
 # Use FreeBSD's namecache.=0A=
 CFLAGS+=3D-DFREEBSD_NAMECACHE=0A=
Index: sys/cddl/contrib/opensolaris=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris	(revision 250526)=0A=
+++ sys/cddl/contrib/opensolaris	(working copy)=0A=
=0A=
Property changes on: sys/cddl/contrib/opensolaris=0A=
___________________________________________________________________=0A=
Modified: svn:mergeinfo=0A=
   Merged =
/head/sys/cddl/contrib/opensolaris:r240868,244155,244187-244188,248572,24=
8574-248577,248602,249921=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c	=
(revision 250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c	=
(working copy)=0A=
@@ -293,10 +293,11 @@=0A=
 		c =3D vdev_mirror_child_select(zio);=0A=
 		children =3D (c >=3D 0);=0A=
 	} else {=0A=
-		ASSERT(zio->io_type =3D=3D ZIO_TYPE_WRITE);=0A=
+		ASSERT(zio->io_type =3D=3D ZIO_TYPE_WRITE ||=0A=
+		    zio->io_type =3D=3D ZIO_TYPE_FREE);=0A=
 =0A=
 		/*=0A=
-		 * Writes go to all children.=0A=
+		 * Writes and frees go to all children.=0A=
 		 */=0A=
 		c =3D 0;=0A=
 		children =3D mm->mm_children;=0A=
@@ -377,6 +378,8 @@=0A=
 				zio->io_error =3D vdev_mirror_worst_error(mm);=0A=
 		}=0A=
 		return;=0A=
+	} else if (zio->io_type =3D=3D ZIO_TYPE_FREE) {=0A=
+		return;=0A=
 	}=0A=
 =0A=
 	ASSERT(zio->io_type =3D=3D ZIO_TYPE_READ);=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c	(revision =
250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c	(working copy)=0A=
@@ -83,6 +83,11 @@=0A=
 TUNABLE_INT("vfs.zfs.cache_flush_disable", &zfs_nocacheflush);=0A=
 SYSCTL_INT(_vfs_zfs, OID_AUTO, cache_flush_disable, CTLFLAG_RDTUN,=0A=
     &zfs_nocacheflush, 0, "Disable cache flush");=0A=
+boolean_t zfs_trim_enabled =3D B_TRUE;=0A=
+SYSCTL_DECL(_vfs_zfs_trim);=0A=
+TUNABLE_INT("vfs.zfs.trim.enabled", &zfs_trim_enabled);=0A=
+SYSCTL_INT(_vfs_zfs_trim, OID_AUTO, enabled, CTLFLAG_RDTUN, =
&zfs_trim_enabled, 0,=0A=
+    "Enable ZFS TRIM");=0A=
 =0A=
 static kmem_cache_t *zil_lwb_cache;=0A=
 =0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c	(working =
copy)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c	(working =
copy)=0A=
@@ -27,7 +27,31 @@=0A=
 #include <sys/spa_impl.h>=0A=
 #include <sys/vdev_impl.h>=0A=
 #include <sys/trim_map.h>=0A=
+#include <sys/time.h>=0A=
 =0A=
+/*=0A=
+ * Calculate the zio end, upgrading based on ashift which would be=0A=
+ * done by zio_vdev_io_start.=0A=
+ *=0A=
+ * This makes free range consolidation much more effective=0A=
+ * than it would otherwise be as well as ensuring that entire=0A=
+ * blocks are invalidated by writes.=0A=
+ */=0A=
+#define	TRIM_ZIO_END(vd, offset, size)	(offset +		\=0A=
+ 	P2ROUNDUP(size, 1ULL << vd->vdev_top->vdev_ashift))=0A=
+=0A=
+#define TRIM_MAP_SINC(tm, size)					\=0A=
+	atomic_add_64(&(tm)->tm_bytes, (size))=0A=
+=0A=
+#define TRIM_MAP_SDEC(tm, size)					\=0A=
+	atomic_add_64(&(tm)->tm_bytes, -(size))=0A=
+=0A=
+#define TRIM_MAP_QINC(tm)					\=0A=
+	atomic_inc_64(&(tm)->tm_pending);			\=0A=
+=0A=
+#define TRIM_MAP_QDEC(tm)					\=0A=
+	atomic_dec_64(&(tm)->tm_pending);=0A=
+=0A=
 typedef struct trim_map {=0A=
 	list_t		tm_head;		/* List of segments sorted by txg. */=0A=
 	avl_tree_t	tm_queued_frees;	/* AVL tree of segments waiting for TRIM. =
*/=0A=
@@ -35,6 +59,8 @@=0A=
 	avl_tree_t	tm_inflight_writes;	/* AVL tree of in-flight writes. */=0A=
 	list_t		tm_pending_writes;	/* Writes blocked on in-flight frees. */=0A=
 	kmutex_t	tm_lock;=0A=
+	uint64_t	tm_pending;		/* Count of pending TRIMs. */=0A=
+	uint64_t	tm_bytes;		/* Total size in bytes of queued TRIMs. */=0A=
 } trim_map_t;=0A=
 =0A=
 typedef struct trim_seg {=0A=
@@ -43,17 +69,47 @@=0A=
 	uint64_t	ts_start;	/* Starting offset of this segment. */=0A=
 	uint64_t	ts_end;		/* Ending offset (non-inclusive). */=0A=
 	uint64_t	ts_txg;		/* Segment creation txg. */=0A=
+	hrtime_t	ts_time;	/* Segment creation time. */=0A=
 } trim_seg_t;=0A=
 =0A=
-extern boolean_t zfs_notrim;=0A=
+extern boolean_t zfs_trim_enabled;=0A=
 =0A=
+static u_int trim_txg_delay =3D 32;=0A=
+static u_int trim_timeout =3D 30;=0A=
+static u_int trim_max_interval =3D 1;=0A=
+/* Limit outstanding TRIMs to 2G (max size for a single TRIM request) */=0A=
+static uint64_t trim_vdev_max_bytes =3D 2147483648;=0A=
+/* Limit outstanding TRIMs to 64 (max ranges for a single TRIM request) =
*/	=0A=
+static u_int trim_vdev_max_pending =3D 64;=0A=
+=0A=
 SYSCTL_DECL(_vfs_zfs);=0A=
-/* Delay TRIMs by that many TXGs. */=0A=
-static int trim_txg_limit =3D 64;=0A=
-TUNABLE_INT("vfs.zfs.trim_txg_limit", &trim_txg_limit);=0A=
-SYSCTL_INT(_vfs_zfs, OID_AUTO, trim_txg_limit, CTLFLAG_RW, =
&trim_txg_limit, 0,=0A=
-    "Delay TRIMs by that many TXGs.");=0A=
+SYSCTL_NODE(_vfs_zfs, OID_AUTO, trim, CTLFLAG_RD, 0, "ZFS TRIM");=0A=
 =0A=
+TUNABLE_INT("vfs.zfs.trim.txg_delay", &trim_txg_delay);=0A=
+SYSCTL_UINT(_vfs_zfs_trim, OID_AUTO, txg_delay, CTLFLAG_RWTUN, =
&trim_txg_delay,=0A=
+    0, "Delay TRIMs by up to this many TXGs");=0A=
+=0A=
+TUNABLE_INT("vfs.zfs.trim.timeout", &trim_timeout);=0A=
+SYSCTL_UINT(_vfs_zfs_trim, OID_AUTO, timeout, CTLFLAG_RWTUN, =
&trim_timeout, 0,=0A=
+    "Delay TRIMs by up to this many seconds");=0A=
+=0A=
+TUNABLE_INT("vfs.zfs.trim.max_interval", &trim_max_interval);=0A=
+SYSCTL_UINT(_vfs_zfs_trim, OID_AUTO, max_interval, CTLFLAG_RWTUN,=0A=
+    &trim_max_interval, 0,=0A=
+    "Maximum interval between TRIM queue processing (seconds)");=0A=
+=0A=
+SYSCTL_DECL(_vfs_zfs_vdev);=0A=
+TUNABLE_QUAD("vfs.zfs.vdev.trim_max_bytes", &trim_vdev_max_bytes);=0A=
+SYSCTL_QUAD(_vfs_zfs_vdev, OID_AUTO, trim_max_bytes, CTLFLAG_RWTUN,=0A=
+    &trim_vdev_max_bytes, 0,=0A=
+    "Maximum pending TRIM bytes for a vdev");=0A=
+=0A=
+TUNABLE_INT("vfs.zfs.vdev.trim_max_pending", &trim_vdev_max_pending);=0A=
+SYSCTL_UINT(_vfs_zfs_vdev, OID_AUTO, trim_max_pending, CTLFLAG_RWTUN,=0A=
+    &trim_vdev_max_pending, 0,=0A=
+    "Maximum pending TRIM segments for a vdev");=0A=
+=0A=
+=0A=
 static void trim_map_vdev_commit_done(spa_t *spa, vdev_t *vd);=0A=
 =0A=
 static int=0A=
@@ -101,7 +157,7 @@=0A=
 =0A=
 	ASSERT(vd->vdev_ops->vdev_op_leaf);=0A=
 =0A=
-	if (zfs_notrim)=0A=
+	if (!zfs_trim_enabled)=0A=
 		return;=0A=
 =0A=
 	tm =3D kmem_zalloc(sizeof (*tm), KM_SLEEP);=0A=
@@ -127,7 +183,7 @@=0A=
 =0A=
 	ASSERT(vd->vdev_ops->vdev_op_leaf);=0A=
 =0A=
-	if (zfs_notrim)=0A=
+	if (!zfs_trim_enabled)=0A=
 		return;=0A=
 =0A=
 	tm =3D vd->vdev_trimmap;=0A=
@@ -146,6 +202,8 @@=0A=
 		avl_remove(&tm->tm_queued_frees, ts);=0A=
 		list_remove(&tm->tm_head, ts);=0A=
 		kmem_free(ts, sizeof (*ts));=0A=
+		TRIM_MAP_SDEC(tm, ts->ts_end - ts->ts_start);=0A=
+		TRIM_MAP_QDEC(tm);=0A=
 	}=0A=
 	mutex_exit(&tm->tm_lock);=0A=
 =0A=
@@ -165,10 +223,12 @@=0A=
 	avl_index_t where;=0A=
 	trim_seg_t tsearch, *ts_before, *ts_after, *ts;=0A=
 	boolean_t merge_before, merge_after;=0A=
+	hrtime_t time;=0A=
 =0A=
 	ASSERT(MUTEX_HELD(&tm->tm_lock));=0A=
 	VERIFY(start < end);=0A=
 =0A=
+	time =3D gethrtime();=0A=
 	tsearch.ts_start =3D start;=0A=
 	tsearch.ts_end =3D end;=0A=
 =0A=
@@ -184,25 +244,36 @@=0A=
 	ts_before =3D avl_nearest(&tm->tm_queued_frees, where, AVL_BEFORE);=0A=
 	ts_after =3D avl_nearest(&tm->tm_queued_frees, where, AVL_AFTER);=0A=
 =0A=
-	merge_before =3D (ts_before !=3D NULL && ts_before->ts_end =3D=3D =
start &&=0A=
-	    ts_before->ts_txg =3D=3D txg);=0A=
-	merge_after =3D (ts_after !=3D NULL && ts_after->ts_start =3D=3D end &&=0A=
-	    ts_after->ts_txg =3D=3D txg);=0A=
+	merge_before =3D (ts_before !=3D NULL && ts_before->ts_end =3D=3D =
start);=0A=
+	merge_after =3D (ts_after !=3D NULL && ts_after->ts_start =3D=3D end);=0A=
 =0A=
 	if (merge_before && merge_after) {=0A=
+		TRIM_MAP_SINC(tm, ts_after->ts_start - ts_before->ts_end);=0A=
+		TRIM_MAP_QDEC(tm);=0A=
 		avl_remove(&tm->tm_queued_frees, ts_before);=0A=
 		list_remove(&tm->tm_head, ts_before);=0A=
 		ts_after->ts_start =3D ts_before->ts_start;=0A=
+		ts_after->ts_txg =3D txg;=0A=
+		ts_after->ts_time =3D time;=0A=
 		kmem_free(ts_before, sizeof (*ts_before));=0A=
 	} else if (merge_before) {=0A=
+		TRIM_MAP_SINC(tm, end - ts_before->ts_end);=0A=
 		ts_before->ts_end =3D end;=0A=
+		ts_before->ts_txg =3D txg;=0A=
+		ts_before->ts_time =3D time;=0A=
 	} else if (merge_after) {=0A=
+		TRIM_MAP_SINC(tm, ts_after->ts_start - start);=0A=
 		ts_after->ts_start =3D start;=0A=
+		ts_after->ts_txg =3D txg;=0A=
+		ts_after->ts_time =3D time;=0A=
 	} else {=0A=
+		TRIM_MAP_SINC(tm, end - start);=0A=
+		TRIM_MAP_QINC(tm);=0A=
 		ts =3D kmem_alloc(sizeof (*ts), KM_SLEEP);=0A=
 		ts->ts_start =3D start;=0A=
 		ts->ts_end =3D end;=0A=
 		ts->ts_txg =3D txg;=0A=
+		ts->ts_time =3D time;=0A=
 		avl_insert(&tm->tm_queued_frees, ts, where);=0A=
 		list_insert_tail(&tm->tm_head, ts);=0A=
 	}=0A=
@@ -220,14 +291,17 @@=0A=
 	left_over =3D (ts->ts_start < start);=0A=
 	right_over =3D (ts->ts_end > end);=0A=
 =0A=
+	TRIM_MAP_SDEC(tm, end - start);=0A=
 	if (left_over && right_over) {=0A=
 		nts =3D kmem_alloc(sizeof (*nts), KM_SLEEP);=0A=
 		nts->ts_start =3D end;=0A=
 		nts->ts_end =3D ts->ts_end;=0A=
 		nts->ts_txg =3D ts->ts_txg;=0A=
+		nts->ts_time =3D ts->ts_time;=0A=
 		ts->ts_end =3D start;=0A=
 		avl_insert_here(&tm->tm_queued_frees, nts, ts, AVL_AFTER);=0A=
 		list_insert_after(&tm->tm_head, ts, nts);=0A=
+		TRIM_MAP_QINC(tm);=0A=
 	} else if (left_over) {=0A=
 		ts->ts_end =3D start;=0A=
 	} else if (right_over) {=0A=
@@ -235,6 +309,7 @@=0A=
 	} else {=0A=
 		avl_remove(&tm->tm_queued_frees, ts);=0A=
 		list_remove(&tm->tm_head, ts);=0A=
+		TRIM_MAP_QDEC(tm);=0A=
 		kmem_free(ts, sizeof (*ts));=0A=
 	}=0A=
 }=0A=
@@ -261,17 +336,15 @@=0A=
 }=0A=
 =0A=
 void=0A=
-trim_map_free(zio_t *zio)=0A=
+trim_map_free(vdev_t *vd, uint64_t offset, uint64_t size, uint64_t txg)=0A=
 {=0A=
-	vdev_t *vd =3D zio->io_vd;=0A=
 	trim_map_t *tm =3D vd->vdev_trimmap;=0A=
 =0A=
-	if (zfs_notrim || vd->vdev_notrim || tm =3D=3D NULL)=0A=
+	if (!zfs_trim_enabled || vd->vdev_notrim || tm =3D=3D NULL)=0A=
 		return;=0A=
 =0A=
 	mutex_enter(&tm->tm_lock);=0A=
-	trim_map_free_locked(tm, zio->io_offset, zio->io_offset + zio->io_size,=0A=
-	    vd->vdev_spa->spa_syncing_txg);=0A=
+	trim_map_free_locked(tm, offset, TRIM_ZIO_END(vd, offset, size), txg);=0A=
 	mutex_exit(&tm->tm_lock);=0A=
 }=0A=
 =0A=
@@ -284,11 +357,11 @@=0A=
 	boolean_t left_over, right_over;=0A=
 	uint64_t start, end;=0A=
 =0A=
-	if (zfs_notrim || vd->vdev_notrim || tm =3D=3D NULL)=0A=
+	if (!zfs_trim_enabled || vd->vdev_notrim || tm =3D=3D NULL)=0A=
 		return (B_TRUE);=0A=
 =0A=
 	start =3D zio->io_offset;=0A=
-	end =3D start + zio->io_size;=0A=
+	end =3D TRIM_ZIO_END(zio->io_vd, start, zio->io_size);=0A=
 	tsearch.ts_start =3D start;=0A=
 	tsearch.ts_end =3D end;=0A=
 =0A=
@@ -331,7 +404,7 @@=0A=
 	 * Don't check for vdev_notrim, since the write could have=0A=
 	 * started before vdev_notrim was set.=0A=
 	 */=0A=
-	if (zfs_notrim || tm =3D=3D NULL)=0A=
+	if (!zfs_trim_enabled || tm =3D=3D NULL)=0A=
 		return;=0A=
 =0A=
 	mutex_enter(&tm->tm_lock);=0A=
@@ -348,19 +421,25 @@=0A=
 }=0A=
 =0A=
 /*=0A=
- * Return the oldest segment (the one with the lowest txg) or false if=0A=
- * the list is empty or the first element's txg is greater than txg =
given=0A=
- * as function argument.=0A=
+ * Return the oldest segment (the one with the lowest txg / time) or =
NULL if:=0A=
+ * 1. The list is empty=0A=
+ * 2. The first element's txg is greater than txgsafe=0A=
+ * 3. The first element's txg is not greater than the txg argument and =
the=0A=
+ *    the first element's time is not greater than time argument=0A=
  */=0A=
 static trim_seg_t *=0A=
-trim_map_first(trim_map_t *tm, uint64_t txg)=0A=
+trim_map_first(trim_map_t *tm, uint64_t txg, uint64_t txgsafe, hrtime_t =
time)=0A=
 {=0A=
 	trim_seg_t *ts;=0A=
 =0A=
 	ASSERT(MUTEX_HELD(&tm->tm_lock));=0A=
+	VERIFY(txgsafe >=3D txg);=0A=
 =0A=
 	ts =3D list_head(&tm->tm_head);=0A=
-	if (ts !=3D NULL && ts->ts_txg <=3D txg)=0A=
+	if (ts !=3D NULL && ts->ts_txg <=3D txgsafe &&=0A=
+	    (ts->ts_txg <=3D txg || ts->ts_time <=3D time ||=0A=
+	    tm->tm_bytes > trim_vdev_max_bytes ||=0A=
+	    tm->tm_pending > trim_vdev_max_pending))=0A=
 		return (ts);=0A=
 	return (NULL);=0A=
 }=0A=
@@ -370,26 +449,37 @@=0A=
 {=0A=
 	trim_map_t *tm =3D vd->vdev_trimmap;=0A=
 	trim_seg_t *ts;=0A=
-	uint64_t start, size, txglimit;=0A=
+	uint64_t size, txgtarget, txgsafe;=0A=
+	hrtime_t timelimit;=0A=
 =0A=
 	ASSERT(vd->vdev_ops->vdev_op_leaf);=0A=
 =0A=
 	if (tm =3D=3D NULL)=0A=
 		return;=0A=
 =0A=
-	txglimit =3D MIN(spa->spa_syncing_txg, spa_freeze_txg(spa)) -=0A=
-	    trim_txg_limit;=0A=
+	timelimit =3D gethrtime() - trim_timeout * NANOSEC;=0A=
+	if (vd->vdev_isl2cache) {=0A=
+		txgsafe =3D UINT64_MAX;=0A=
+		txgtarget =3D UINT64_MAX;=0A=
+	} else {=0A=
+		txgsafe =3D MIN(spa_last_synced_txg(spa), spa_freeze_txg(spa));=0A=
+		if (txgsafe > trim_txg_delay)=0A=
+			txgtarget =3D txgsafe - trim_txg_delay;=0A=
+		else=0A=
+			txgtarget =3D 0;=0A=
+	}=0A=
 =0A=
 	mutex_enter(&tm->tm_lock);=0A=
-	/*=0A=
-	 * Loop until we send all frees up to the txglimit.=0A=
-	 */=0A=
-	while ((ts =3D trim_map_first(tm, txglimit)) !=3D NULL) {=0A=
+	/* Loop until we have sent all outstanding free's */=0A=
+	while ((ts =3D trim_map_first(tm, txgtarget, txgsafe, timelimit))=0A=
+	    !=3D NULL) {=0A=
 		list_remove(&tm->tm_head, ts);=0A=
 		avl_remove(&tm->tm_queued_frees, ts);=0A=
 		avl_add(&tm->tm_inflight_frees, ts);=0A=
-		zio_nowait(zio_trim(zio, spa, vd, ts->ts_start,=0A=
-		    ts->ts_end - ts->ts_start));=0A=
+		size =3D ts->ts_end - ts->ts_start;=0A=
+		zio_nowait(zio_trim(zio, spa, vd, ts->ts_start, size));=0A=
+		TRIM_MAP_SDEC(tm, size);=0A=
+		TRIM_MAP_QDEC(tm);=0A=
 	}=0A=
 	mutex_exit(&tm->tm_lock);=0A=
 }=0A=
@@ -434,7 +524,7 @@=0A=
 {=0A=
 	int c;=0A=
 =0A=
-	if (vd =3D=3D NULL || spa->spa_syncing_txg <=3D trim_txg_limit)=0A=
+	if (vd =3D=3D NULL)=0A=
 		return;=0A=
 =0A=
 	if (vd->vdev_ops->vdev_op_leaf) {=0A=
@@ -467,6 +557,11 @@=0A=
 	spa_t *spa =3D arg;=0A=
 	zio_t *zio;=0A=
 =0A=
+#ifdef _KERNEL=0A=
+	(void) snprintf(curthread->td_name, sizeof(curthread->td_name),=0A=
+	    "trim %s", spa_name(spa));=0A=
+#endif=0A=
+=0A=
 	for (;;) {=0A=
 		mutex_enter(&spa->spa_trim_lock);=0A=
 		if (spa->spa_trim_thread =3D=3D NULL) {=0A=
@@ -475,7 +570,9 @@=0A=
 			mutex_exit(&spa->spa_trim_lock);=0A=
 			thread_exit();=0A=
 		}=0A=
-		cv_wait(&spa->spa_trim_cv, &spa->spa_trim_lock);=0A=
+=0A=
+		(void) cv_timedwait(&spa->spa_trim_cv, &spa->spa_trim_lock,=0A=
+		    hz * trim_max_interval);=0A=
 		mutex_exit(&spa->spa_trim_lock);=0A=
 =0A=
 		zio =3D zio_root(spa, NULL, NULL, ZIO_FLAG_CANFAIL);=0A=
@@ -492,7 +589,7 @@=0A=
 trim_thread_create(spa_t *spa)=0A=
 {=0A=
 =0A=
-	if (zfs_notrim)=0A=
+	if (!zfs_trim_enabled)=0A=
 		return;=0A=
 =0A=
 	mutex_init(&spa->spa_trim_lock, NULL, MUTEX_DEFAULT, NULL);=0A=
@@ -507,7 +604,7 @@=0A=
 trim_thread_destroy(spa_t *spa)=0A=
 {=0A=
 =0A=
-	if (zfs_notrim)=0A=
+	if (!zfs_trim_enabled)=0A=
 		return;=0A=
 	if (spa->spa_trim_thread =3D=3D NULL)=0A=
 		return;=0A=
@@ -530,7 +627,7 @@=0A=
 trim_thread_wakeup(spa_t *spa)=0A=
 {=0A=
 =0A=
-	if (zfs_notrim)=0A=
+	if (!zfs_trim_enabled)=0A=
 		return;=0A=
 	if (spa->spa_trim_thread =3D=3D NULL)=0A=
 		return;=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c	(revision =
250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c	(working =
copy)=0A=
@@ -397,7 +397,8 @@=0A=
 dsl_free_sync(zio_t *pio, dsl_pool_t *dp, uint64_t txg, const blkptr_t =
*bpp)=0A=
 {=0A=
 	ASSERT(dsl_pool_sync_context(dp));=0A=
-	zio_nowait(zio_free_sync(pio, dp->dp_spa, txg, bpp, pio->io_flags));=0A=
+	zio_nowait(zio_free_sync(pio, dp->dp_spa, txg, bpp, BP_GET_PSIZE(bpp),=0A=
+	    pio->io_flags));=0A=
 }=0A=
 =0A=
 static uint64_t=0A=
@@ -1364,7 +1365,7 @@=0A=
 	}=0A=
 =0A=
 	zio_nowait(zio_free_sync(scn->scn_zio_root, scn->scn_dp->dp_spa,=0A=
-	    dmu_tx_get_txg(tx), bp, 0));=0A=
+	    dmu_tx_get_txg(tx), bp, BP_GET_PSIZE(bp), 0));=0A=
 	dsl_dir_diduse_space(tx->tx_pool->dp_free_dir, DD_USED_HEAD,=0A=
 	    -bp_get_dsize_sync(scn->scn_dp->dp_spa, bp),=0A=
 	    -BP_GET_PSIZE(bp), -BP_GET_UCSIZE(bp), tx);=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c	=
(revision 250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c	(working =
copy)=0A=
@@ -259,7 +259,9 @@=0A=
 	size_t size;=0A=
 =0A=
 	for (c =3D 0; c < rm->rm_firstdatacol; c++) {=0A=
-		zio_buf_free(rm->rm_col[c].rc_data, rm->rm_col[c].rc_size);=0A=
+		if (rm->rm_col[c].rc_data !=3D NULL)=0A=
+			zio_buf_free(rm->rm_col[c].rc_data,=0A=
+			    rm->rm_col[c].rc_size);=0A=
 =0A=
 		if (rm->rm_col[c].rc_gdata !=3D NULL)=0A=
 			zio_buf_free(rm->rm_col[c].rc_gdata,=0A=
@@ -504,14 +506,20 @@=0A=
 	ASSERT3U(rm->rm_asize - asize, =3D=3D, rm->rm_nskip << unit_shift);=0A=
 	ASSERT3U(rm->rm_nskip, <=3D, nparity);=0A=
 =0A=
-	for (c =3D 0; c < rm->rm_firstdatacol; c++)=0A=
-		rm->rm_col[c].rc_data =3D zio_buf_alloc(rm->rm_col[c].rc_size);=0A=
+	if (zio->io_type !=3D ZIO_TYPE_FREE) {=0A=
+		for (c =3D 0; c < rm->rm_firstdatacol; c++) {=0A=
+			rm->rm_col[c].rc_data =3D=0A=
+			    zio_buf_alloc(rm->rm_col[c].rc_size);=0A=
+		}=0A=
 =0A=
-	rm->rm_col[c].rc_data =3D zio->io_data;=0A=
+		rm->rm_col[c].rc_data =3D zio->io_data;=0A=
 =0A=
-	for (c =3D c + 1; c < acols; c++)=0A=
-		rm->rm_col[c].rc_data =3D (char *)rm->rm_col[c - 1].rc_data +=0A=
-		    rm->rm_col[c - 1].rc_size;=0A=
+		for (c =3D c + 1; c < acols; c++) {=0A=
+			rm->rm_col[c].rc_data =3D=0A=
+			    (char *)rm->rm_col[c - 1].rc_data +=0A=
+			    rm->rm_col[c - 1].rc_size;=0A=
+		}=0A=
+	}=0A=
 =0A=
 	/*=0A=
 	 * If all data stored spans all columns, there's a danger that parity=0A=
@@ -1536,6 +1544,18 @@=0A=
 =0A=
 	ASSERT3U(rm->rm_asize, =3D=3D, vdev_psize_to_asize(vd, zio->io_size));=0A=
 =0A=
+	if (zio->io_type =3D=3D ZIO_TYPE_FREE) {=0A=
+		for (c =3D 0; c < rm->rm_cols; c++) {=0A=
+			rc =3D &rm->rm_col[c];=0A=
+			cvd =3D vd->vdev_child[rc->rc_devidx];=0A=
+			zio_nowait(zio_vdev_child_io(zio, NULL, cvd,=0A=
+			    rc->rc_offset, rc->rc_data, rc->rc_size,=0A=
+			    zio->io_type, zio->io_priority, 0,=0A=
+			    vdev_raidz_child_done, rc));=0A=
+		}=0A=
+		return (ZIO_PIPELINE_CONTINUE);=0A=
+	}=0A=
+=0A=
 	if (zio->io_type =3D=3D ZIO_TYPE_WRITE) {=0A=
 		vdev_raidz_generate_parity(rm);=0A=
 =0A=
@@ -1918,6 +1938,8 @@=0A=
 			zio->io_error =3D vdev_raidz_worst_error(rm);=0A=
 =0A=
 		return;=0A=
+	} else if (zio->io_type =3D=3D ZIO_TYPE_FREE) {=0A=
+		return;=0A=
 	}=0A=
 =0A=
 	ASSERT(zio->io_type =3D=3D ZIO_TYPE_READ);=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c	(revision =
250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c	(working copy)=0A=
@@ -43,6 +43,7 @@=0A=
 #include <sys/arc.h>=0A=
 #include <sys/zil.h>=0A=
 #include <sys/dsl_scan.h>=0A=
+#include <sys/trim_map.h>=0A=
 =0A=
 SYSCTL_DECL(_vfs_zfs);=0A=
 SYSCTL_NODE(_vfs_zfs, OID_AUTO, vdev, CTLFLAG_RW, 0, "ZFS VDEV");=0A=
@@ -1196,6 +1197,11 @@=0A=
 	if (vd->vdev_ishole || vd->vdev_ops =3D=3D &vdev_missing_ops)=0A=
 		return (0);=0A=
 =0A=
+	if (vd->vdev_ops->vdev_op_leaf) {=0A=
+		vd->vdev_notrim =3D B_FALSE;=0A=
+		trim_map_create(vd);=0A=
+	}=0A=
+=0A=
 	for (int c =3D 0; c < vd->vdev_children; c++) {=0A=
 		if (vd->vdev_child[c]->vdev_state !=3D VDEV_STATE_HEALTHY) {=0A=
 			vdev_set_state(vd, B_TRUE, VDEV_STATE_DEGRADED,=0A=
@@ -1441,6 +1447,9 @@=0A=
 =0A=
 	vdev_cache_purge(vd);=0A=
 =0A=
+	if (vd->vdev_ops->vdev_op_leaf)=0A=
+		trim_map_destroy(vd);=0A=
+=0A=
 	/*=0A=
 	 * We record the previous state before we close it, so that if we are=0A=
 	 * doing a reopen(), we don't generate FMA ereports if we notice that=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/trim_map.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/trim_map.h	=
(working copy)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/trim_map.h	=
(working copy)=0A=
@@ -36,7 +36,7 @@=0A=
 =0A=
 extern void trim_map_create(vdev_t *vd);=0A=
 extern void trim_map_destroy(vdev_t *vd);=0A=
-extern void trim_map_free(zio_t *zio);=0A=
+extern void trim_map_free(vdev_t *vd, uint64_t offset, uint64_t size, =
uint64_t txg);=0A=
 extern boolean_t trim_map_write_start(zio_t *zio);=0A=
 extern void trim_map_write_done(zio_t *zio);=0A=
 =0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h	(revision =
250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h	(working =
copy)=0A=
@@ -46,6 +46,7 @@=0A=
 } vdev_dtl_type_t;=0A=
 =0A=
 extern boolean_t zfs_nocacheflush;=0A=
+extern boolean_t zfs_trim_enabled;=0A=
 =0A=
 extern int vdev_open(vdev_t *);=0A=
 extern void vdev_open_children(vdev_t *);=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h	=
(revision 250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h	=
(working copy)=0A=
@@ -221,6 +221,9 @@=0A=
 	spa_proc_state_t spa_proc_state;	/* see definition */=0A=
 	struct proc	*spa_proc;		/* "zpool-poolname" process */=0A=
 	uint64_t	spa_did;		/* if procp !=3D p0, did of t1 */=0A=
+	kthread_t	*spa_trim_thread;	/* thread sending TRIM I/Os */=0A=
+	kmutex_t	spa_trim_lock;		/* protects spa_trim_cv */=0A=
+	kcondvar_t	spa_trim_cv;		/* used to notify TRIM thread */=0A=
 	boolean_t	spa_autoreplace;	/* autoreplace set in open */=0A=
 	int		spa_vdev_locks;		/* locks grabbed */=0A=
 	uint64_t	spa_creation_version;	/* version at pool creation */=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h	=
(revision 250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h	=
(working copy)=0A=
@@ -130,9 +130,9 @@=0A=
 =0A=
 	ZIO_STAGE_READY			=3D 1 << 16,	/* RWFCI */=0A=
 =0A=
-	ZIO_STAGE_VDEV_IO_START		=3D 1 << 17,	/* RW--I */=0A=
-	ZIO_STAGE_VDEV_IO_DONE		=3D 1 << 18,	/* RW--- */=0A=
-	ZIO_STAGE_VDEV_IO_ASSESS	=3D 1 << 19,	/* RW--I */=0A=
+	ZIO_STAGE_VDEV_IO_START		=3D 1 << 17,	/* RWF-I */=0A=
+	ZIO_STAGE_VDEV_IO_DONE		=3D 1 << 18,	/* RWF-- */=0A=
+	ZIO_STAGE_VDEV_IO_ASSESS	=3D 1 << 19,	/* RWF-I */=0A=
 =0A=
 	ZIO_STAGE_CHECKSUM_VERIFY	=3D 1 << 20,	/* R---- */=0A=
 =0A=
@@ -214,7 +214,9 @@=0A=
 	(ZIO_INTERLOCK_STAGES |			\=0A=
 	ZIO_STAGE_FREE_BP_INIT |		\=0A=
 	ZIO_STAGE_ISSUE_ASYNC |			\=0A=
-	ZIO_STAGE_DVA_FREE)=0A=
+	ZIO_STAGE_DVA_FREE |			\=0A=
+	ZIO_STAGE_VDEV_IO_START |		\=0A=
+	ZIO_STAGE_VDEV_IO_ASSESS)=0A=
 =0A=
 #define	ZIO_DDT_FREE_PIPELINE			\=0A=
 	(ZIO_INTERLOCK_STAGES |			\=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h	=
(revision 250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h	=
(working copy)=0A=
@@ -183,6 +183,7 @@=0A=
 	uint64_t	vdev_unspare;	/* unspare when resilvering done */=0A=
 	hrtime_t	vdev_last_try;	/* last reopen time		*/=0A=
 	boolean_t	vdev_nowritecache; /* true if flushwritecache failed */=0A=
+	boolean_t	vdev_notrim;	/* true if trim failed */=0A=
 	boolean_t	vdev_checkremove; /* temporary online test	*/=0A=
 	boolean_t	vdev_forcefault; /* force online fault		*/=0A=
 	boolean_t	vdev_splitting;	/* split or repair in progress  */=0A=
@@ -198,6 +199,7 @@=0A=
 	spa_aux_vdev_t	*vdev_aux;	/* for l2cache vdevs		*/=0A=
 	zio_t		*vdev_probe_zio; /* root of current probe	*/=0A=
 	vdev_aux_t	vdev_label_aux;	/* on-disk aux state		*/=0A=
+	struct trim_map	*vdev_trimmap;=0A=
 =0A=
 	/*=0A=
 	 * For DTrace to work in userland (libzpool) context, these fields must=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h	(revision =
250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h	(working =
copy)=0A=
@@ -32,6 +32,7 @@=0A=
 #include <sys/spa.h>=0A=
 #include <sys/txg.h>=0A=
 #include <sys/avl.h>=0A=
+#include <sys/kstat.h>=0A=
 #include <sys/fs/zfs.h>=0A=
 #include <sys/zio_impl.h>=0A=
 =0A=
@@ -137,7 +138,8 @@=0A=
 #define	ZIO_PRIORITY_RESILVER		(zio_priority_table[9])=0A=
 #define	ZIO_PRIORITY_SCRUB		(zio_priority_table[10])=0A=
 #define	ZIO_PRIORITY_DDT_PREFETCH	(zio_priority_table[11])=0A=
-#define	ZIO_PRIORITY_TABLE_SIZE		12=0A=
+#define	ZIO_PRIORITY_TRIM		(zio_priority_table[12])=0A=
+#define	ZIO_PRIORITY_TABLE_SIZE		13=0A=
 =0A=
 #define	ZIO_PIPELINE_CONTINUE		0x100=0A=
 #define	ZIO_PIPELINE_STOP		0x101=0A=
@@ -367,6 +369,39 @@=0A=
 	list_node_t	zl_child_node;=0A=
 } zio_link_t;=0A=
 =0A=
+/*=0A=
+ * Used for TRIM kstat.=0A=
+ */=0A=
+typedef struct zio_trim_stats {=0A=
+	/*=0A=
+	 * Number of bytes successfully TRIMmed.=0A=
+	 */=0A=
+	kstat_named_t bytes;=0A=
+=0A=
+	/*=0A=
+	 * Number of successful TRIM requests.=0A=
+	 */=0A=
+	kstat_named_t success;=0A=
+=0A=
+	/*=0A=
+	 * Number of TRIM requests that failed because TRIM is not=0A=
+	 * supported.=0A=
+	 */=0A=
+	kstat_named_t unsupported;=0A=
+=0A=
+	/*=0A=
+	 * Number of TRIM requests that failed for other reasons.=0A=
+	 */=0A=
+	kstat_named_t failed;=0A=
+} zio_trim_stats_t;=0A=
+=0A=
+extern zio_trim_stats_t zio_trim_stats;=0A=
+=0A=
+#define ZIO_TRIM_STAT_INCR(stat, val) \=0A=
+	atomic_add_64(&zio_trim_stats.stat.value.ui64, (val));=0A=
+#define ZIO_TRIM_STAT_BUMP(stat) \=0A=
+	ZIO_TRIM_STAT_INCR(stat, 1);=0A=
+=0A=
 struct zio {=0A=
 	/* Core information about this I/O */=0A=
 	zbookmark_t	io_bookmark;=0A=
@@ -441,6 +476,8 @@=0A=
 	/* FreeBSD only. */=0A=
 	struct ostask	io_task;=0A=
 #endif=0A=
+	avl_node_t	io_trim_node;=0A=
+	list_node_t	io_trim_link;=0A=
 };=0A=
 =0A=
 extern zio_t *zio_null(zio_t *pio, spa_t *spa, vdev_t *vd,=0A=
@@ -472,8 +509,8 @@=0A=
     zio_done_func_t *done, void *priv, enum zio_flag flags);=0A=
 =0A=
 extern zio_t *zio_ioctl(zio_t *pio, spa_t *spa, vdev_t *vd, int cmd,=0A=
-    zio_done_func_t *done, void *priv, int priority,=0A=
-    enum zio_flag flags);=0A=
+    uint64_t offset, uint64_t size, zio_done_func_t *done, void *priv,=0A=
+    int priority, enum zio_flag flags);=0A=
 =0A=
 extern zio_t *zio_read_phys(zio_t *pio, vdev_t *vd, uint64_t offset,=0A=
     uint64_t size, void *data, int checksum,=0A=
@@ -486,12 +523,14 @@=0A=
     boolean_t labels);=0A=
 =0A=
 extern zio_t *zio_free_sync(zio_t *pio, spa_t *spa, uint64_t txg,=0A=
-    const blkptr_t *bp, enum zio_flag flags);=0A=
+    const blkptr_t *bp, uint64_t size, enum zio_flag flags);=0A=
 =0A=
 extern int zio_alloc_zil(spa_t *spa, uint64_t txg, blkptr_t *new_bp,=0A=
     blkptr_t *old_bp, uint64_t size, boolean_t use_slog);=0A=
 extern void zio_free_zil(spa_t *spa, uint64_t txg, blkptr_t *bp);=0A=
 extern void zio_flush(zio_t *zio, vdev_t *vd);=0A=
+extern zio_t *zio_trim(zio_t *zio, spa_t *spa, vdev_t *vd, uint64_t =
offset,=0A=
+    uint64_t size);=0A=
 extern void zio_shrink(zio_t *zio, uint64_t size);=0A=
 =0A=
 extern int zio_wait(zio_t *zio);=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	(revision =
250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	(working =
copy)=0A=
@@ -49,14 +49,17 @@=0A=
 =0A=
 DECLARE_GEOM_CLASS(zfs_vdev_class, zfs_vdev);=0A=
 =0A=
-/*=0A=
- * Don't send BIO_FLUSH.=0A=
- */=0A=
+SYSCTL_DECL(_vfs_zfs_vdev);=0A=
+/* Don't send BIO_FLUSH. */=0A=
 static int vdev_geom_bio_flush_disable =3D 0;=0A=
 TUNABLE_INT("vfs.zfs.vdev.bio_flush_disable", =
&vdev_geom_bio_flush_disable);=0A=
-SYSCTL_DECL(_vfs_zfs_vdev);=0A=
 SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, bio_flush_disable, CTLFLAG_RW,=0A=
     &vdev_geom_bio_flush_disable, 0, "Disable BIO_FLUSH");=0A=
+/* Don't send BIO_DELETE. */=0A=
+static int vdev_geom_bio_delete_disable =3D 0;=0A=
+TUNABLE_INT("vfs.zfs.vdev.bio_delete_disable", =
&vdev_geom_bio_delete_disable);=0A=
+SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, bio_delete_disable, CTLFLAG_RW,=0A=
+    &vdev_geom_bio_delete_disable, 0, "Disable BIO_DELETE");=0A=
 =0A=
 static void=0A=
 vdev_geom_orphan(struct g_consumer *cp)=0A=
@@ -663,8 +666,8 @@=0A=
 	*ashift =3D highbit(MAX(pp->sectorsize, SPA_MINBLOCKSIZE)) - 1;=0A=
 =0A=
 	/*=0A=
-	 * Clear the nowritecache bit, so that on a vdev_reopen() we will=0A=
-	 * try again.=0A=
+	 * Clear the nowritecache settings, so that on a vdev_reopen()=0A=
+	 * we will try again.=0A=
 	 */=0A=
 	vd->vdev_nowritecache =3D B_FALSE;=0A=
 =0A=
@@ -710,6 +713,15 @@=0A=
 		 */=0A=
 		vd->vdev_nowritecache =3D B_TRUE;=0A=
 	}=0A=
+	if (bp->bio_cmd =3D=3D BIO_DELETE && bp->bio_error =3D=3D ENOTSUP) {=0A=
+		/*=0A=
+		 * If we get ENOTSUP, we know that no future=0A=
+		 * attempts will ever succeed.  In this case we=0A=
+		 * set a persistent bit so that we don't bother=0A=
+		 * with the ioctl in the future.=0A=
+		 */=0A=
+		vd->vdev_notrim =3D B_TRUE;=0A=
+	}=0A=
 	if (zio->io_error =3D=3D EIO && !vd->vdev_remove_wanted) {=0A=
 		/*=0A=
 		 * If provider's error is set we assume it is being=0A=
@@ -752,18 +764,22 @@=0A=
 		}=0A=
 =0A=
 		switch (zio->io_cmd) {=0A=
-=0A=
 		case DKIOCFLUSHWRITECACHE:=0A=
-=0A=
 			if (zfs_nocacheflush || vdev_geom_bio_flush_disable)=0A=
 				break;=0A=
-=0A=
 			if (vd->vdev_nowritecache) {=0A=
 				zio->io_error =3D ENOTSUP;=0A=
 				break;=0A=
 			}=0A=
-=0A=
 			goto sendreq;=0A=
+		case DKIOCTRIM:=0A=
+			if (vdev_geom_bio_delete_disable)=0A=
+				break;=0A=
+			if (vd->vdev_notrim) {=0A=
+				zio->io_error =3D ENOTSUP;=0A=
+				break;=0A=
+			}=0A=
+			goto sendreq;=0A=
 		default:=0A=
 			zio->io_error =3D ENOTSUP;=0A=
 		}=0A=
@@ -787,11 +803,21 @@=0A=
 		bp->bio_length =3D zio->io_size;=0A=
 		break;=0A=
 	case ZIO_TYPE_IOCTL:=0A=
-		bp->bio_cmd =3D BIO_FLUSH;=0A=
-		bp->bio_flags |=3D BIO_ORDERED;=0A=
-		bp->bio_data =3D NULL;=0A=
-		bp->bio_offset =3D cp->provider->mediasize;=0A=
-		bp->bio_length =3D 0;=0A=
+		switch (zio->io_cmd) {=0A=
+		case DKIOCFLUSHWRITECACHE:=0A=
+			bp->bio_cmd =3D BIO_FLUSH;=0A=
+			bp->bio_flags |=3D BIO_ORDERED;=0A=
+			bp->bio_data =3D NULL;=0A=
+			bp->bio_offset =3D cp->provider->mediasize;=0A=
+			bp->bio_length =3D 0;=0A=
+			break;=0A=
+		case DKIOCTRIM:=0A=
+			bp->bio_cmd =3D BIO_DELETE;=0A=
+			bp->bio_data =3D NULL;=0A=
+			bp->bio_offset =3D zio->io_offset;=0A=
+			bp->bio_length =3D zio->io_size;=0A=
+			break;=0A=
+		}=0A=
 		break;=0A=
 	}=0A=
 	bp->bio_done =3D vdev_geom_io_intr;=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c	(revision =
250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c	(working copy)=0A=
@@ -67,6 +67,7 @@=0A=
 #include <sys/dsl_userhold.h>=0A=
 #include <sys/zfeature.h>=0A=
 #include <sys/zvol.h>=0A=
+#include <sys/trim_map.h>=0A=
 =0A=
 #ifdef	_KERNEL=0A=
 #include <sys/callb.h>=0A=
@@ -1001,6 +1002,11 @@=0A=
 		spa_create_zio_taskqs(spa);=0A=
 	}=0A=
 =0A=
+	/*=0A=
+	 * Start TRIM thread.=0A=
+	 */=0A=
+	trim_thread_create(spa);=0A=
+=0A=
 	list_create(&spa->spa_config_dirty_list, sizeof (vdev_t),=0A=
 	    offsetof(vdev_t, vdev_config_dirty_node));=0A=
 	list_create(&spa->spa_state_dirty_list, sizeof (vdev_t),=0A=
@@ -1029,6 +1035,12 @@=0A=
 	ASSERT(spa->spa_async_zio_root =3D=3D NULL);=0A=
 	ASSERT(spa->spa_state !=3D POOL_STATE_UNINITIALIZED);=0A=
 =0A=
+	/*=0A=
+	 * Stop TRIM thread in case spa_unload() wasn't called directly=0A=
+	 * before spa_deactivate().=0A=
+	 */=0A=
+	trim_thread_destroy(spa);=0A=
+=0A=
 	txg_list_destroy(&spa->spa_vdev_txg_list);=0A=
 =0A=
 	list_destroy(&spa->spa_config_dirty_list);=0A=
@@ -1145,6 +1157,11 @@=0A=
 	ASSERT(MUTEX_HELD(&spa_namespace_lock));=0A=
 =0A=
 	/*=0A=
+	 * Stop TRIM thread.=0A=
+	 */=0A=
+	trim_thread_destroy(spa);=0A=
+=0A=
+	/*=0A=
 	 * Stop async tasks.=0A=
 	 */=0A=
 	spa_async_suspend(spa);=0A=
@@ -5875,7 +5892,7 @@=0A=
 	zio_t *zio =3D arg;=0A=
 =0A=
 	zio_nowait(zio_free_sync(zio, zio->io_spa, dmu_tx_get_txg(tx), bp,=0A=
-	    zio->io_flags));=0A=
+	    BP_GET_PSIZE(bp), zio->io_flags));=0A=
 	return (0);=0A=
 }=0A=
 =0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	(revision =
250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	(working copy)=0A=
@@ -130,6 +130,7 @@=0A=
 #endif=0A=
 #include <sys/callb.h>=0A=
 #include <sys/kstat.h>=0A=
+#include <sys/trim_map.h>=0A=
 #include <zfs_fletcher.h>=0A=
 #include <sys/sdt.h>=0A=
 =0A=
@@ -1691,6 +1692,8 @@=0A=
 		}=0A=
 =0A=
 		if (l2hdr !=3D NULL) {=0A=
+			trim_map_free(l2hdr->b_dev->l2ad_vdev, l2hdr->b_daddr,=0A=
+			    hdr->b_size, 0);=0A=
 			list_remove(l2hdr->b_dev->l2ad_buflist, hdr);=0A=
 			ARCSTAT_INCR(arcstat_l2_size, -hdr->b_size);=0A=
 			kmem_free(l2hdr, sizeof (l2arc_buf_hdr_t));=0A=
@@ -3528,6 +3531,8 @@=0A=
 	buf->b_private =3D NULL;=0A=
 =0A=
 	if (l2hdr) {=0A=
+		trim_map_free(l2hdr->b_dev->l2ad_vdev, l2hdr->b_daddr,=0A=
+		    hdr->b_size, 0);=0A=
 		list_remove(l2hdr->b_dev->l2ad_buflist, hdr);=0A=
 		kmem_free(l2hdr, sizeof (l2arc_buf_hdr_t));=0A=
 		ARCSTAT_INCR(arcstat_l2_size, -buf_size);=0A=
@@ -4442,6 +4447,8 @@=0A=
 			list_remove(buflist, ab);=0A=
 			abl2 =3D ab->b_l2hdr;=0A=
 			ab->b_l2hdr =3D NULL;=0A=
+			trim_map_free(abl2->b_dev->l2ad_vdev, abl2->b_daddr,=0A=
+			    ab->b_size, 0);=0A=
 			kmem_free(abl2, sizeof (l2arc_buf_hdr_t));=0A=
 			ARCSTAT_INCR(arcstat_l2_size, -ab->b_size);=0A=
 		}=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c	(revision =
250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c	(working copy)=0A=
@@ -35,6 +35,7 @@=0A=
 #include <sys/dmu_objset.h>=0A=
 #include <sys/arc.h>=0A=
 #include <sys/ddt.h>=0A=
+#include <sys/trim_map.h>=0A=
 =0A=
 SYSCTL_DECL(_vfs_zfs);=0A=
 SYSCTL_NODE(_vfs_zfs, OID_AUTO, zio, CTLFLAG_RW, 0, "ZFS ZIO");=0A=
@@ -43,6 +44,19 @@=0A=
 SYSCTL_INT(_vfs_zfs_zio, OID_AUTO, use_uma, CTLFLAG_RDTUN, =
&zio_use_uma, 0,=0A=
     "Use uma(9) for ZIO allocations");=0A=
 =0A=
+zio_trim_stats_t zio_trim_stats =3D {=0A=
+	{ "bytes",		KSTAT_DATA_UINT64,=0A=
+	  "Number of bytes successfully TRIMmed" },=0A=
+	{ "success",		KSTAT_DATA_UINT64,=0A=
+	  "Number of successful TRIM requests" },=0A=
+	{ "unsupported",	KSTAT_DATA_UINT64,=0A=
+	  "Number of TRIM requests that failed because TRIM is not supported" =
},=0A=
+	{ "failed",		KSTAT_DATA_UINT64,=0A=
+	  "Number of TRIM requests that failed for reasons other than not =
supported" },=0A=
+};=0A=
+=0A=
+static kstat_t *zio_trim_ksp;=0A=
+=0A=
 /*=0A=
  * =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
  * I/O priority table=0A=
@@ -61,6 +75,7 @@=0A=
 	10,	/* ZIO_PRIORITY_RESILVER	*/=0A=
 	20,	/* ZIO_PRIORITY_SCRUB		*/=0A=
 	2,	/* ZIO_PRIORITY_DDT_PREFETCH	*/=0A=
+	30,	/* ZIO_PRIORITY_TRIM		*/=0A=
 };=0A=
 =0A=
 /*=0A=
@@ -209,6 +224,16 @@=0A=
 		zfs_mg_alloc_failures =3D 8;=0A=
 =0A=
 	zio_inject_init();=0A=
+=0A=
+	zio_trim_ksp =3D kstat_create("zfs", 0, "zio_trim", "misc",=0A=
+	    KSTAT_TYPE_NAMED,=0A=
+	    sizeof(zio_trim_stats) / sizeof(kstat_named_t),=0A=
+	    KSTAT_FLAG_VIRTUAL);=0A=
+=0A=
+	if (zio_trim_ksp !=3D NULL) {=0A=
+		zio_trim_ksp->ks_data =3D &zio_trim_stats;=0A=
+		kstat_install(zio_trim_ksp);=0A=
+	}=0A=
 }=0A=
 =0A=
 void=0A=
@@ -236,6 +261,11 @@=0A=
 	kmem_cache_destroy(zio_cache);=0A=
 =0A=
 	zio_inject_fini();=0A=
+=0A=
+	if (zio_trim_ksp !=3D NULL) {=0A=
+		kstat_delete(zio_trim_ksp);=0A=
+		zio_trim_ksp =3D NULL;=0A=
+	}=0A=
 }=0A=
 =0A=
 /*=0A=
@@ -543,7 +573,7 @@=0A=
 {=0A=
 	zio_t *zio;=0A=
 =0A=
-	ASSERT3U(size, <=3D, SPA_MAXBLOCKSIZE);=0A=
+	ASSERT3U(type =3D=3D ZIO_TYPE_FREE || size, <=3D, SPA_MAXBLOCKSIZE);=0A=
 	ASSERT(P2PHASE(size, SPA_MINBLOCKSIZE) =3D=3D 0);=0A=
 	ASSERT(P2PHASE(offset, SPA_MINBLOCKSIZE) =3D=3D 0);=0A=
 =0A=
@@ -730,7 +760,7 @@=0A=
 =0A=
 zio_t *=0A=
 zio_free_sync(zio_t *pio, spa_t *spa, uint64_t txg, const blkptr_t *bp,=0A=
-    enum zio_flag flags)=0A=
+    uint64_t size, enum zio_flag flags)=0A=
 {=0A=
 	zio_t *zio;=0A=
 =0A=
@@ -743,7 +773,7 @@=0A=
 =0A=
 	metaslab_check_free(spa, bp);=0A=
 =0A=
-	zio =3D zio_create(pio, spa, txg, bp, NULL, BP_GET_PSIZE(bp),=0A=
+	zio =3D zio_create(pio, spa, txg, bp, NULL, size,=0A=
 	    NULL, NULL, ZIO_TYPE_FREE, ZIO_PRIORITY_FREE, flags,=0A=
 	    NULL, 0, NULL, ZIO_STAGE_OPEN, ZIO_FREE_PIPELINE);=0A=
 =0A=
@@ -780,15 +810,16 @@=0A=
 }=0A=
 =0A=
 zio_t *=0A=
-zio_ioctl(zio_t *pio, spa_t *spa, vdev_t *vd, int cmd,=0A=
-    zio_done_func_t *done, void *private, int priority, enum zio_flag =
flags)=0A=
+zio_ioctl(zio_t *pio, spa_t *spa, vdev_t *vd, int cmd, uint64_t offset,=0A=
+    uint64_t size, zio_done_func_t *done, void *private, int priority,=0A=
+    enum zio_flag flags)=0A=
 {=0A=
 	zio_t *zio;=0A=
 	int c;=0A=
 =0A=
 	if (vd->vdev_children =3D=3D 0) {=0A=
-		zio =3D zio_create(pio, spa, 0, NULL, NULL, 0, done, private,=0A=
-		    ZIO_TYPE_IOCTL, priority, flags, vd, 0, NULL,=0A=
+		zio =3D zio_create(pio, spa, 0, NULL, NULL, size, done, private,=0A=
+		    ZIO_TYPE_IOCTL, priority, flags, vd, offset, NULL,=0A=
 		    ZIO_STAGE_OPEN, ZIO_IOCTL_PIPELINE);=0A=
 =0A=
 		zio->io_cmd =3D cmd;=0A=
@@ -797,7 +828,7 @@=0A=
 =0A=
 		for (c =3D 0; c < vd->vdev_children; c++)=0A=
 			zio_nowait(zio_ioctl(zio, spa, vd->vdev_child[c], cmd,=0A=
-			    done, private, priority, flags));=0A=
+			    offset, size, done, private, priority, flags));=0A=
 	}=0A=
 =0A=
 	return (zio);=0A=
@@ -922,11 +953,22 @@=0A=
 void=0A=
 zio_flush(zio_t *zio, vdev_t *vd)=0A=
 {=0A=
-	zio_nowait(zio_ioctl(zio, zio->io_spa, vd, DKIOCFLUSHWRITECACHE,=0A=
+	zio_nowait(zio_ioctl(zio, zio->io_spa, vd, DKIOCFLUSHWRITECACHE, 0, 0,=0A=
 	    NULL, NULL, ZIO_PRIORITY_NOW,=0A=
 	    ZIO_FLAG_CANFAIL | ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_DONT_RETRY));=0A=
 }=0A=
 =0A=
+zio_t *=0A=
+zio_trim(zio_t *zio, spa_t *spa, vdev_t *vd, uint64_t offset, uint64_t =
size)=0A=
+{=0A=
+=0A=
+	ASSERT(vd->vdev_ops->vdev_op_leaf);=0A=
+=0A=
+	return zio_ioctl(zio, spa, vd, DKIOCTRIM, offset, size,=0A=
+	    NULL, NULL, ZIO_PRIORITY_TRIM,=0A=
+	    ZIO_FLAG_CANFAIL | ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_DONT_RETRY);=0A=
+}=0A=
+=0A=
 void=0A=
 zio_shrink(zio_t *zio, uint64_t size)=0A=
 {=0A=
@@ -1549,6 +1591,7 @@=0A=
 zio_free_gang(zio_t *pio, blkptr_t *bp, zio_gang_node_t *gn, void *data)=0A=
 {=0A=
 	return (zio_free_sync(pio, pio->io_spa, pio->io_txg, bp,=0A=
+	    BP_IS_GANG(bp) ? SPA_GANGBLOCKSIZE : BP_GET_PSIZE(bp),=0A=
 	    ZIO_GANG_CHILD_FLAGS(pio)));=0A=
 }=0A=
 =0A=
@@ -1681,7 +1724,7 @@=0A=
 		}=0A=
 	}=0A=
 =0A=
-	if (gn =3D=3D gio->io_gang_tree)=0A=
+	if (gn =3D=3D gio->io_gang_tree && gio->io_data !=3D NULL)=0A=
 		ASSERT3P((char *)gio->io_data + gio->io_size, =3D=3D, data);=0A=
 =0A=
 	if (zio !=3D pio)=0A=
@@ -2403,7 +2446,7 @@=0A=
 =0A=
 /*=0A=
  * =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
- * Read and write to physical devices=0A=
+ * Read, write and delete to physical devices=0A=
  * =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
  */=0A=
 static int=0A=
@@ -2426,6 +2469,11 @@=0A=
 		return (vdev_mirror_ops.vdev_op_io_start(zio));=0A=
 	}=0A=
 =0A=
+	if (vd->vdev_ops->vdev_op_leaf && zio->io_type =3D=3D ZIO_TYPE_FREE) {=0A=
+		trim_map_free(vd, zio->io_offset, zio->io_size, zio->io_txg);=0A=
+		return (ZIO_PIPELINE_CONTINUE);=0A=
+	}=0A=
+=0A=
 	/*=0A=
 	 * We keep track of time-sensitive I/Os so that the scan thread=0A=
 	 * can quickly react to certain workloads.  In particular, we care=0A=
@@ -2450,18 +2498,22 @@=0A=
 =0A=
 	if (P2PHASE(zio->io_size, align) !=3D 0) {=0A=
 		uint64_t asize =3D P2ROUNDUP(zio->io_size, align);=0A=
-		char *abuf =3D zio_buf_alloc(asize);=0A=
+		char *abuf =3D NULL;=0A=
+		if (zio->io_type =3D=3D ZIO_TYPE_READ ||=0A=
+		    zio->io_type =3D=3D ZIO_TYPE_WRITE)=0A=
+			abuf =3D zio_buf_alloc(asize);=0A=
 		ASSERT(vd =3D=3D vd->vdev_top);=0A=
 		if (zio->io_type =3D=3D ZIO_TYPE_WRITE) {=0A=
 			bcopy(zio->io_data, abuf, zio->io_size);=0A=
 			bzero(abuf + zio->io_size, asize - zio->io_size);=0A=
 		}=0A=
-		zio_push_transform(zio, abuf, asize, asize, zio_subblock);=0A=
+		zio_push_transform(zio, abuf, asize, abuf ? asize : 0,=0A=
+		    zio_subblock);=0A=
 	}=0A=
 =0A=
 	ASSERT(P2PHASE(zio->io_offset, align) =3D=3D 0);=0A=
 	ASSERT(P2PHASE(zio->io_size, align) =3D=3D 0);=0A=
-	VERIFY(zio->io_type !=3D ZIO_TYPE_WRITE || spa_writeable(spa));=0A=
+	VERIFY(zio->io_type =3D=3D ZIO_TYPE_READ || spa_writeable(spa));=0A=
 =0A=
 	/*=0A=
 	 * If this is a repair I/O, and there's no self-healing involved --=0A=
@@ -2501,6 +2553,11 @@=0A=
 		}=0A=
 	}=0A=
 =0A=
+	if (vd->vdev_ops->vdev_op_leaf && zio->io_type =3D=3D ZIO_TYPE_WRITE) {=0A=
+		if (!trim_map_write_start(zio))=0A=
+			return (ZIO_PIPELINE_STOP);=0A=
+	}=0A=
+=0A=
 	return (vd->vdev_ops->vdev_op_io_start(zio));=0A=
 }=0A=
 =0A=
@@ -2514,10 +2571,17 @@=0A=
 	if (zio_wait_for_children(zio, ZIO_CHILD_VDEV, ZIO_WAIT_DONE))=0A=
 		return (ZIO_PIPELINE_STOP);=0A=
 =0A=
-	ASSERT(zio->io_type =3D=3D ZIO_TYPE_READ || zio->io_type =3D=3D =
ZIO_TYPE_WRITE);=0A=
+	ASSERT(zio->io_type =3D=3D ZIO_TYPE_READ ||=0A=
+	    zio->io_type =3D=3D ZIO_TYPE_WRITE || zio->io_type =3D=3D =
ZIO_TYPE_FREE);=0A=
 =0A=
-	if (vd !=3D NULL && vd->vdev_ops->vdev_op_leaf) {=0A=
+	if (vd !=3D NULL && vd->vdev_ops->vdev_op_leaf &&=0A=
+	    zio->io_type =3D=3D ZIO_TYPE_WRITE) {=0A=
+		trim_map_write_done(zio);=0A=
+	}=0A=
 =0A=
+	if (vd !=3D NULL && vd->vdev_ops->vdev_op_leaf &&=0A=
+	    (zio->io_type =3D=3D ZIO_TYPE_READ || zio->io_type =3D=3D =
ZIO_TYPE_WRITE)) {=0A=
+=0A=
 		vdev_queue_io_done(zio);=0A=
 =0A=
 		if (zio->io_type =3D=3D ZIO_TYPE_WRITE)=0A=
@@ -2592,6 +2656,20 @@=0A=
 	if (zio_injection_enabled && zio->io_error =3D=3D 0)=0A=
 		zio->io_error =3D zio_handle_fault_injection(zio, EIO);=0A=
 =0A=
+	if (zio->io_type =3D=3D ZIO_TYPE_IOCTL && zio->io_cmd =3D=3D DKIOCTRIM)=0A=
+		switch (zio->io_error) {=0A=
+		case 0:=0A=
+			ZIO_TRIM_STAT_INCR(bytes, zio->io_size);=0A=
+			ZIO_TRIM_STAT_BUMP(success);=0A=
+			break;=0A=
+		case EOPNOTSUPP:=0A=
+			ZIO_TRIM_STAT_BUMP(unsupported);=0A=
+			break;=0A=
+		default:=0A=
+			ZIO_TRIM_STAT_BUMP(failed);=0A=
+			break;=0A=
+		}=0A=
+=0A=
 	/*=0A=
 	 * If the I/O failed, determine whether we should attempt to retry it.=0A=
 	 *=0A=
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c	=
(revision 250526)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c	(working =
copy)=0A=
@@ -145,8 +145,14 @@=0A=
 #include <sys/metaslab.h>=0A=
 #include <sys/zio.h>=0A=
 #include <sys/dsl_scan.h>=0A=
+#include <sys/trim_map.h>=0A=
 #include <sys/fs/zfs.h>=0A=
 =0A=
+static boolean_t vdev_trim_on_init =3D B_TRUE;=0A=
+SYSCTL_DECL(_vfs_zfs_vdev);=0A=
+SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, trim_on_init, CTLFLAG_RW,=0A=
+    &vdev_trim_on_init, 0, "Enable/disable full vdev trim on =
initialisation");=0A=
+=0A=
 /*=0A=
  * Basic routines to read and write from a vdev label.=0A=
  * Used throughout the rest of this file.=0A=
@@ -718,6 +724,16 @@=0A=
 	}=0A=
 =0A=
 	/*=0A=
+	 * TRIM the whole thing so that we start with a clean slate.=0A=
+	 * It's just an optimization, so we don't care if it fails.=0A=
+	 * Don't TRIM if removing so that we don't interfere with zpool=0A=
+	 * disaster recovery.=0A=
+	 */=0A=
+	if (zfs_trim_enabled && vdev_trim_on_init && (reason =3D=3D =
VDEV_LABEL_CREATE ||=0A=
+	    reason =3D=3D VDEV_LABEL_SPARE || reason =3D=3D =
VDEV_LABEL_L2CACHE))=0A=
+		zio_wait(zio_trim(NULL, spa, vd, 0, vd->vdev_psize));=0A=
+=0A=
+	/*=0A=
 	 * Initialize its label.=0A=
 	 */=0A=
 	vp =3D zio_buf_alloc(sizeof (vdev_phys_t));=0A=
@@ -1282,5 +1298,10 @@=0A=
 	 * to disk to ensure that all odd-label updates are committed to=0A=
 	 * stable storage before the next transaction group begins.=0A=
 	 */=0A=
-	return (vdev_label_sync_list(spa, 1, txg, flags));=0A=
+	if ((error =3D vdev_label_sync_list(spa, 1, txg, flags)) !=3D 0)=0A=
+		return (error);=0A=
+=0A=
+	trim_thread_wakeup(spa);=0A=
+=0A=
+	return (0);=0A=
 }=0A=
Index: sys/cddl/compat/opensolaris/kern/opensolaris_kstat.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/compat/opensolaris/kern/opensolaris_kstat.c	(revision =
250526)=0A=
+++ sys/cddl/compat/opensolaris/kern/opensolaris_kstat.c	(working copy)=0A=
@@ -118,7 +118,7 @@=0A=
 		SYSCTL_ADD_PROC(&ksp->ks_sysctl_ctx,=0A=
 		    SYSCTL_CHILDREN(ksp->ks_sysctl_root), OID_AUTO, ksent->name,=0A=
 		    CTLTYPE_U64 | CTLFLAG_RD, ksent, sizeof(*ksent),=0A=
-		    kstat_sysctl, "QU", "");=0A=
+		    kstat_sysctl, "QU", ksent->desc);=0A=
 	}=0A=
 }=0A=
 =0A=
Index: sys/cddl/compat/opensolaris/sys/dkio.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/compat/opensolaris/sys/dkio.h	(revision 250526)=0A=
+++ sys/cddl/compat/opensolaris/sys/dkio.h	(working copy)=0A=
@@ -75,6 +75,8 @@=0A=
  */=0A=
 #define	DKIOCFLUSHWRITECACHE	(DKIOC|34)	/* flush cache to phys medium */=0A=
 =0A=
+#define	DKIOCTRIM		(DKIOC|35)	/* TRIM a block */=0A=
+=0A=
 struct dk_callback {=0A=
 	void (*dkc_callback)(void *dkc_cookie, int error);=0A=
 	void *dkc_cookie;=0A=
Index: sys/cddl/compat/opensolaris/sys/time.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/compat/opensolaris/sys/time.h	(revision 250526)=0A=
+++ sys/cddl/compat/opensolaris/sys/time.h	(working copy)=0A=
@@ -35,6 +35,7 @@=0A=
 #define MILLISEC	1000=0A=
 #define MICROSEC	1000000=0A=
 #define NANOSEC		1000000000=0A=
+#define TIME_MAX	LLONG_MAX=0A=
 =0A=
 typedef longlong_t	hrtime_t;=0A=
 =0A=
Index: sys/cddl/compat/opensolaris/sys/kstat.h=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/compat/opensolaris/sys/kstat.h	(revision 250526)=0A=
+++ sys/cddl/compat/opensolaris/sys/kstat.h	(working copy)=0A=
@@ -53,6 +53,8 @@=0A=
 #define	KSTAT_DATA_INT64	3=0A=
 #define	KSTAT_DATA_UINT64	4=0A=
 	uchar_t	data_type;=0A=
+#define	KSTAT_DESCLEN		128=0A=
+	char	desc[KSTAT_DESCLEN];=0A=
 	union {=0A=
 		uint64_t	ui64;=0A=
 	} value;=0A=

------=_NextPart_000_0730_01CE57A0.ECC7BD20--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?35ABA7AAEB7F4D86A1ED54C4C47FEB49>