Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Jul 2017 10:25:18 +0000 (UTC)
From:      Alexander Motin <mav@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org
Subject:   svn commit: r321610 - in stable/11: cddl/contrib/opensolaris/cmd/zdb cddl/contrib/opensolaris/cmd/ztest cddl/contrib/opensolaris/lib/libzfs/common sys/cddl/contrib/opensolaris/common/zfs sys/cddl/c...
Message-ID:  <201707271025.v6RAPIAj015418@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: mav
Date: Thu Jul 27 10:25:18 2017
New Revision: 321610
URL: https://svnweb.freebsd.org/changeset/base/321610

Log:
  MFC r320156, r320185, r320186, r320262, r320452, r321111:
  MFV r318946: 8021 ARC buf data scatter-ization
  
  illumos/illumos-gate@770499e185d15678ccb0be57ebc626ad18d93383
  https://github.com/illumos/illumos-gate/commit/770499e185d15678ccb0be57ebc626ad1
  8d93383
  
  https://www.illumos.org/issues/8021
    The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
    community) changes the way the ARC allocates `b_pdata` memory from using linea
  r
    `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
    improves ZFS's performance by helping to defragment the address space occupied
    by the ARC, in particular for cases where compressed ARC is enabled. It could
    also ease future work to allocate pages directly from `segkpm` for minimal-
    overhead memory allocations, bypassing the `kmem` subsystem.
    This is essentially the same change as the one which recently landed in ZFS on
    Linux, although they made some platform-specific changes while adapting this
    work to their codebase:
    1. Implemented the equivalent of the `segkpm` suggestion for future work
    mentioned above to bypass issues that they've had with the Linux kernel memory
    allocator.
    2. Changed the internal representation of the ABD's scatter/gather list so it
    could be used to pass I/O directly into Linux block device drivers. (This
    feature is not available in the illumos block device interface yet.)
  
  FreeBSD notes:
  - the actual (default) chunk size is 4KB (despite the text above saying 1KB)
  - we can try to reimplement ABDs, so that they are not permanently
    mapped into the KVA unless explicitly requested, especially on
    platforms with scarce KVA
  - we can try to use unmapped I/O and avoid intermediate allocation of a
    linear, virtual memory mapped buffer
  - we can try to avoid extra data copying by referring to chunks / pages
    in the original ABD
  
  Reviewed by: Matthew Ahrens <mahrens@delphix.com>
  Reviewed by: George Wilson <george.wilson@delphix.com>
  Reviewed by: Paul Dagnelie <pcd@delphix.com>
  Reviewed by: John Kennedy <john.kennedy@delphix.com>
  Reviewed by: Prakash Surya <prakash.surya@delphix.com>
  Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
  Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
  Reviewed by: Chris Williamson <chris.williamson@delphix.com>
  Approved by: Richard Lowe <richlowe@richlowe.net>
  Author: Dan Kimmel <dan.kimmel@delphix.com>

Added:
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c
     - copied unchanged from r320156, head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/abd.h
     - copied unchanged from r320156, head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/abd.h
Modified:
  stable/11/cddl/contrib/opensolaris/cmd/zdb/zdb.c
  stable/11/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c
  stable/11/cddl/contrib/opensolaris/cmd/ztest/ztest.c
  stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c
  stable/11/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c
  stable/11/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h
  stable/11/sys/cddl/contrib/opensolaris/uts/common/Makefile.files
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/blkptr.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/edonr_zfs.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lz4.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sha256.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/skein_zfs.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_checksum.h
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_cache.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_checksum.c
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c
  stable/11/sys/conf/files
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/cddl/contrib/opensolaris/cmd/zdb/zdb.c
==============================================================================
--- stable/11/cddl/contrib/opensolaris/cmd/zdb/zdb.c	Thu Jul 27 10:19:13 2017	(r321609)
+++ stable/11/cddl/contrib/opensolaris/cmd/zdb/zdb.c	Thu Jul 27 10:25:18 2017	(r321610)
@@ -59,6 +59,7 @@
 #include <sys/arc.h>
 #include <sys/ddt.h>
 #include <sys/zfeature.h>
+#include <sys/abd.h>
 #include <zfs_comutil.h>
 #undef verify
 #include <libzfs.h>
@@ -2410,7 +2411,7 @@ zdb_blkptr_done(zio_t *zio)
 	zdb_cb_t *zcb = zio->io_private;
 	zbookmark_phys_t *zb = &zio->io_bookmark;
 
-	zio_data_buf_free(zio->io_data, zio->io_size);
+	abd_free(zio->io_abd);
 
 	mutex_enter(&spa->spa_scrub_lock);
 	spa->spa_scrub_inflight--;
@@ -2477,7 +2478,7 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const blkptr
 	if (!BP_IS_EMBEDDED(bp) &&
 	    (dump_opt['c'] > 1 || (dump_opt['c'] && is_metadata))) {
 		size_t size = BP_GET_PSIZE(bp);
-		void *data = zio_data_buf_alloc(size);
+		abd_t *abd = abd_alloc(size, B_FALSE);
 		int flags = ZIO_FLAG_CANFAIL | ZIO_FLAG_SCRUB | ZIO_FLAG_RAW;
 
 		/* If it's an intent log block, failure is expected. */
@@ -2490,7 +2491,7 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const blkptr
 		spa->spa_scrub_inflight++;
 		mutex_exit(&spa->spa_scrub_lock);
 
-		zio_nowait(zio_read(NULL, spa, bp, data, size,
+		zio_nowait(zio_read(NULL, spa, bp, abd, size,
 		    zdb_blkptr_done, zcb, ZIO_PRIORITY_ASYNC_READ, flags, zb));
 	}
 
@@ -3270,6 +3271,13 @@ name:
 	return (NULL);
 }
 
+/* ARGSUSED */
+static int
+random_get_pseudo_bytes_cb(void *buf, size_t len, void *unused)
+{
+	return (random_get_pseudo_bytes(buf, len));
+}
+
 /*
  * Read a block from a pool and print it out.  The syntax of the
  * block descriptor is:
@@ -3301,7 +3309,8 @@ zdb_read_block(char *thing, spa_t *spa)
 	uint64_t offset = 0, size = 0, psize = 0, lsize = 0, blkptr_offset = 0;
 	zio_t *zio;
 	vdev_t *vd;
-	void *pbuf, *lbuf, *buf;
+	abd_t *pabd;
+	void *lbuf, *buf;
 	char *s, *p, *dup, *vdev, *flagstr;
 	int i, error;
 
@@ -3373,7 +3382,7 @@ zdb_read_block(char *thing, spa_t *spa)
 	psize = size;
 	lsize = size;
 
-	pbuf = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
+	pabd = abd_alloc_linear(SPA_MAXBLOCKSIZE, B_FALSE);
 	lbuf = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
 
 	BP_ZERO(bp);
@@ -3401,15 +3410,15 @@ zdb_read_block(char *thing, spa_t *spa)
 		/*
 		 * Treat this as a normal block read.
 		 */
-		zio_nowait(zio_read(zio, spa, bp, pbuf, psize, NULL, NULL,
+		zio_nowait(zio_read(zio, spa, bp, pabd, psize, NULL, NULL,
 		    ZIO_PRIORITY_SYNC_READ,
 		    ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL));
 	} else {
 		/*
 		 * Treat this as a vdev child I/O.
 		 */
-		zio_nowait(zio_vdev_child_io(zio, bp, vd, offset, pbuf, psize,
-		    ZIO_TYPE_READ, ZIO_PRIORITY_SYNC_READ,
+		zio_nowait(zio_vdev_child_io(zio, bp, vd, offset, pabd,
+		    psize, ZIO_TYPE_READ, ZIO_PRIORITY_SYNC_READ,
 		    ZIO_FLAG_DONT_CACHE | ZIO_FLAG_DONT_QUEUE |
 		    ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_DONT_RETRY |
 		    ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL, NULL));
@@ -3432,21 +3441,21 @@ zdb_read_block(char *thing, spa_t *spa)
 		void *pbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
 		void *lbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
 
-		bcopy(pbuf, pbuf2, psize);
+		abd_copy_to_buf(pbuf2, pabd, psize);
 
-		VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf + psize,
-		    SPA_MAXBLOCKSIZE - psize) == 0);
+		VERIFY0(abd_iterate_func(pabd, psize, SPA_MAXBLOCKSIZE - psize,
+		    random_get_pseudo_bytes_cb, NULL));
 
-		VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf2 + psize,
-		    SPA_MAXBLOCKSIZE - psize) == 0);
+		VERIFY0(random_get_pseudo_bytes((uint8_t *)pbuf2 + psize,
+		    SPA_MAXBLOCKSIZE - psize));
 
 		for (lsize = SPA_MAXBLOCKSIZE; lsize > psize;
 		    lsize -= SPA_MINBLOCKSIZE) {
 			for (c = 0; c < ZIO_COMPRESS_FUNCTIONS; c++) {
-				if (zio_decompress_data(c, pbuf, lbuf,
-				    psize, lsize) == 0 &&
-				    zio_decompress_data(c, pbuf2, lbuf2,
-				    psize, lsize) == 0 &&
+				if (zio_decompress_data(c, pabd,
+				    lbuf, psize, lsize) == 0 &&
+				    zio_decompress_data_buf(c, pbuf2,
+				    lbuf2, psize, lsize) == 0 &&
 				    bcmp(lbuf, lbuf2, lsize) == 0)
 					break;
 			}
@@ -3465,7 +3474,7 @@ zdb_read_block(char *thing, spa_t *spa)
 		buf = lbuf;
 		size = lsize;
 	} else {
-		buf = pbuf;
+		buf = abd_to_buf(pabd);
 		size = psize;
 	}
 
@@ -3483,7 +3492,7 @@ zdb_read_block(char *thing, spa_t *spa)
 		zdb_dump_block(thing, buf, size, flags);
 
 out:
-	umem_free(pbuf, SPA_MAXBLOCKSIZE);
+	abd_free(pabd);
 	umem_free(lbuf, SPA_MAXBLOCKSIZE);
 	free(dup);
 }

Modified: stable/11/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c
==============================================================================
--- stable/11/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c	Thu Jul 27 10:19:13 2017	(r321609)
+++ stable/11/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c	Thu Jul 27 10:25:18 2017	(r321610)
@@ -24,7 +24,7 @@
  */
 
 /*
- * Copyright (c) 2013, 2014 by Delphix. All rights reserved.
+ * Copyright (c) 2013, 2016 by Delphix. All rights reserved.
  */
 
 /*
@@ -41,6 +41,7 @@
 #include <sys/resource.h>
 #include <sys/zil.h>
 #include <sys/zil_impl.h>
+#include <sys/abd.h>
 
 extern uint8_t dump_opt[256];
 
@@ -117,13 +118,27 @@ zil_prt_rec_rename(zilog_t *zilog, int txtype, lr_rena
 }
 
 /* ARGSUSED */
+static int
+zil_prt_rec_write_cb(void *data, size_t len, void *unused)
+{
+	char *cdata = data;
+	for (int i = 0; i < len; i++) {
+		if (isprint(*cdata))
+			(void) printf("%c ", *cdata);
+		else
+			(void) printf("%2X", *cdata);
+		cdata++;
+	}
+	return (0);
+}
+
+/* ARGSUSED */
 static void
 zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write_t *lr)
 {
-	char *data, *dlimit;
+	abd_t *data;
 	blkptr_t *bp = &lr->lr_blkptr;
 	zbookmark_phys_t zb;
-	char buf[SPA_MAXBLOCKSIZE];
 	int verbose = MAX(dump_opt['d'], dump_opt['i']);
 	int error;
 
@@ -144,7 +159,6 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write
 		if (BP_IS_HOLE(bp)) {
 			(void) printf("\t\t\tLSIZE 0x%llx\n",
 			    (u_longlong_t)BP_GET_LSIZE(bp));
-			bzero(buf, sizeof (buf));
 			(void) printf("%s<hole>\n", prefix);
 			return;
 		}
@@ -157,28 +171,26 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write
 		    lr->lr_foid, ZB_ZIL_LEVEL,
 		    lr->lr_offset / BP_GET_LSIZE(bp));
 
+		data = abd_alloc(BP_GET_LSIZE(bp), B_FALSE);
 		error = zio_wait(zio_read(NULL, zilog->zl_spa,
-		    bp, buf, BP_GET_LSIZE(bp), NULL, NULL,
+		    bp, data, BP_GET_LSIZE(bp), NULL, NULL,
 		    ZIO_PRIORITY_SYNC_READ, ZIO_FLAG_CANFAIL, &zb));
 		if (error)
-			return;
-		data = buf;
+			goto out;
 	} else {
-		data = (char *)(lr + 1);
+		/* data is stored after the end of the lr_write record */
+		data = abd_alloc(lr->lr_length, B_FALSE);
+		abd_copy_from_buf(data, lr + 1, lr->lr_length);
 	}
 
-	dlimit = data + MIN(lr->lr_length,
-	    (verbose < 6 ? 20 : SPA_MAXBLOCKSIZE));
-
 	(void) printf("%s", prefix);
-	while (data < dlimit) {
-		if (isprint(*data))
-			(void) printf("%c ", *data);
-		else
-			(void) printf("%2X", *data);
-		data++;
-	}
+	(void) abd_iterate_func(data,
+	    0, MIN(lr->lr_length, (verbose < 6 ? 20 : SPA_MAXBLOCKSIZE)),
+	    zil_prt_rec_write_cb, NULL);
 	(void) printf("\n");
+
+out:
+	abd_free(data);
 }
 
 /* ARGSUSED */

Modified: stable/11/cddl/contrib/opensolaris/cmd/ztest/ztest.c
==============================================================================
--- stable/11/cddl/contrib/opensolaris/cmd/ztest/ztest.c	Thu Jul 27 10:19:13 2017	(r321609)
+++ stable/11/cddl/contrib/opensolaris/cmd/ztest/ztest.c	Thu Jul 27 10:25:18 2017	(r321610)
@@ -112,6 +112,7 @@
 #include <sys/refcount.h>
 #include <sys/zfeature.h>
 #include <sys/dsl_userhold.h>
+#include <sys/abd.h>
 #include <stdio.h>
 #include <stdio_ext.h>
 #include <stdlib.h>
@@ -190,6 +191,7 @@ extern uint64_t metaslab_df_alloc_threshold;
 extern uint64_t zfs_deadman_synctime_ms;
 extern int metaslab_preload_limit;
 extern boolean_t zfs_compressed_arc_enabled;
+extern boolean_t zfs_abd_scatter_enabled;
 
 static ztest_shared_opts_t *ztest_shared_opts;
 static ztest_shared_opts_t ztest_opts;
@@ -5042,7 +5044,7 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_t id)
 	enum zio_checksum checksum = spa_dedup_checksum(spa);
 	dmu_buf_t *db;
 	dmu_tx_t *tx;
-	void *buf;
+	abd_t *abd;
 	blkptr_t blk;
 	int copies = 2 * ZIO_DEDUPDITTO_MIN;
 
@@ -5122,14 +5124,14 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_t id)
 	 * Damage the block.  Dedup-ditto will save us when we read it later.
 	 */
 	psize = BP_GET_PSIZE(&blk);
-	buf = zio_buf_alloc(psize);
-	ztest_pattern_set(buf, psize, ~pattern);
+	abd = abd_alloc_linear(psize, B_TRUE);
+	ztest_pattern_set(abd_to_buf(abd), psize, ~pattern);
 
 	(void) zio_wait(zio_rewrite(NULL, spa, 0, &blk,
-	    buf, psize, NULL, NULL, ZIO_PRIORITY_SYNC_WRITE,
+	    abd, psize, NULL, NULL, ZIO_PRIORITY_SYNC_WRITE,
 	    ZIO_FLAG_CANFAIL | ZIO_FLAG_INDUCE_DAMAGE, NULL));
 
-	zio_buf_free(buf, psize);
+	abd_free(abd);
 
 	(void) rw_unlock(&ztest_name_lock);
 }
@@ -5413,6 +5415,12 @@ ztest_resume_thread(void *arg)
 		 */
 		if (ztest_random(10) == 0)
 			zfs_compressed_arc_enabled = ztest_random(2);
+
+		/*
+		 * Periodically change the zfs_abd_scatter_enabled setting.
+		 */
+		if (ztest_random(10) == 0)
+			zfs_abd_scatter_enabled = ztest_random(2);
 	}
 	return (NULL);
 }

Modified: stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c
==============================================================================
--- stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c	Thu Jul 27 10:19:13 2017	(r321609)
+++ stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c	Thu Jul 27 10:25:18 2017	(r321610)
@@ -199,19 +199,19 @@ dump_record(dmu_replay_record_t *drr, void *payload, i
 {
 	ASSERT3U(offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum),
 	    ==, sizeof (dmu_replay_record_t) - sizeof (zio_cksum_t));
-	fletcher_4_incremental_native(drr,
+	(void) fletcher_4_incremental_native(drr,
 	    offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum), zc);
 	if (drr->drr_type != DRR_BEGIN) {
 		ASSERT(ZIO_CHECKSUM_IS_ZERO(&drr->drr_u.
 		    drr_checksum.drr_checksum));
 		drr->drr_u.drr_checksum.drr_checksum = *zc;
 	}
-	fletcher_4_incremental_native(&drr->drr_u.drr_checksum.drr_checksum,
-	    sizeof (zio_cksum_t), zc);
+	(void) fletcher_4_incremental_native(
+	    &drr->drr_u.drr_checksum.drr_checksum, sizeof (zio_cksum_t), zc);
 	if (write(outfd, drr, sizeof (*drr)) == -1)
 		return (errno);
 	if (payload_len != 0) {
-		fletcher_4_incremental_native(payload, payload_len, zc);
+		(void) fletcher_4_incremental_native(payload, payload_len, zc);
 		if (write(outfd, payload, payload_len) == -1)
 			return (errno);
 	}
@@ -2096,9 +2096,9 @@ recv_read(libzfs_handle_t *hdl, int fd, void *buf, int
 
 	if (zc) {
 		if (byteswap)
-			fletcher_4_incremental_byteswap(buf, ilen, zc);
+			(void) fletcher_4_incremental_byteswap(buf, ilen, zc);
 		else
-			fletcher_4_incremental_native(buf, ilen, zc);
+			(void) fletcher_4_incremental_native(buf, ilen, zc);
 	}
 	return (0);
 }
@@ -3688,7 +3688,8 @@ zfs_receive_impl(libzfs_handle_t *hdl, const char *tos
 		 * recv_read() above; do it again correctly.
 		 */
 		bzero(&zcksum, sizeof (zio_cksum_t));
-		fletcher_4_incremental_byteswap(&drr, sizeof (drr), &zcksum);
+		(void) fletcher_4_incremental_byteswap(&drr,
+		    sizeof (drr), &zcksum);
 		flags->byteswap = B_TRUE;
 
 		drr.drr_type = BSWAP_32(drr.drr_type);

Modified: stable/11/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c
==============================================================================
--- stable/11/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c	Thu Jul 27 10:19:13 2017	(r321609)
+++ stable/11/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c	Thu Jul 27 10:25:18 2017	(r321610)
@@ -24,6 +24,7 @@
  */
 /*
  * Copyright 2013 Saso Kiselkov. All rights reserved.
+ * Copyright (c) 2016 by Delphix. All rights reserved.
  */
 
 /*
@@ -133,17 +134,29 @@
 #include <sys/byteorder.h>
 #include <sys/zio.h>
 #include <sys/spa.h>
+#include <zfs_fletcher.h>
 
-/*ARGSUSED*/
 void
-fletcher_2_native(const void *buf, uint64_t size,
-    const void *ctx_template, zio_cksum_t *zcp)
+fletcher_init(zio_cksum_t *zcp)
 {
+	ZIO_SET_CHECKSUM(zcp, 0, 0, 0, 0);
+}
+
+int
+fletcher_2_incremental_native(void *buf, size_t size, void *data)
+{
+	zio_cksum_t *zcp = data;
+
 	const uint64_t *ip = buf;
 	const uint64_t *ipend = ip + (size / sizeof (uint64_t));
 	uint64_t a0, b0, a1, b1;
 
-	for (a0 = b0 = a1 = b1 = 0; ip < ipend; ip += 2) {
+	a0 = zcp->zc_word[0];
+	a1 = zcp->zc_word[1];
+	b0 = zcp->zc_word[2];
+	b1 = zcp->zc_word[3];
+
+	for (; ip < ipend; ip += 2) {
 		a0 += ip[0];
 		a1 += ip[1];
 		b0 += a0;
@@ -151,18 +164,33 @@ fletcher_2_native(const void *buf, uint64_t size,
 	}
 
 	ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1);
+	return (0);
 }
 
 /*ARGSUSED*/
 void
-fletcher_2_byteswap(const void *buf, uint64_t size,
+fletcher_2_native(const void *buf, size_t size,
     const void *ctx_template, zio_cksum_t *zcp)
 {
+	fletcher_init(zcp);
+	(void) fletcher_2_incremental_native((void *) buf, size, zcp);
+}
+
+int
+fletcher_2_incremental_byteswap(void *buf, size_t size, void *data)
+{
+	zio_cksum_t *zcp = data;
+
 	const uint64_t *ip = buf;
 	const uint64_t *ipend = ip + (size / sizeof (uint64_t));
 	uint64_t a0, b0, a1, b1;
 
-	for (a0 = b0 = a1 = b1 = 0; ip < ipend; ip += 2) {
+	a0 = zcp->zc_word[0];
+	a1 = zcp->zc_word[1];
+	b0 = zcp->zc_word[2];
+	b1 = zcp->zc_word[3];
+
+	for (; ip < ipend; ip += 2) {
 		a0 += BSWAP_64(ip[0]);
 		a1 += BSWAP_64(ip[1]);
 		b0 += a0;
@@ -170,50 +198,23 @@ fletcher_2_byteswap(const void *buf, uint64_t size,
 	}
 
 	ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1);
+	return (0);
 }
 
 /*ARGSUSED*/
 void
-fletcher_4_native(const void *buf, uint64_t size,
+fletcher_2_byteswap(const void *buf, size_t size,
     const void *ctx_template, zio_cksum_t *zcp)
 {
-	const uint32_t *ip = buf;
-	const uint32_t *ipend = ip + (size / sizeof (uint32_t));
-	uint64_t a, b, c, d;
-
-	for (a = b = c = d = 0; ip < ipend; ip++) {
-		a += ip[0];
-		b += a;
-		c += b;
-		d += c;
-	}
-
-	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
+	fletcher_init(zcp);
+	(void) fletcher_2_incremental_byteswap((void *) buf, size, zcp);
 }
 
-/*ARGSUSED*/
-void
-fletcher_4_byteswap(const void *buf, uint64_t size,
-    const void *ctx_template, zio_cksum_t *zcp)
+int
+fletcher_4_incremental_native(void *buf, size_t size, void *data)
 {
-	const uint32_t *ip = buf;
-	const uint32_t *ipend = ip + (size / sizeof (uint32_t));
-	uint64_t a, b, c, d;
+	zio_cksum_t *zcp = data;
 
-	for (a = b = c = d = 0; ip < ipend; ip++) {
-		a += BSWAP_32(ip[0]);
-		b += a;
-		c += b;
-		d += c;
-	}
-
-	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
-}
-
-void
-fletcher_4_incremental_native(const void *buf, uint64_t size,
-    zio_cksum_t *zcp)
-{
 	const uint32_t *ip = buf;
 	const uint32_t *ipend = ip + (size / sizeof (uint32_t));
 	uint64_t a, b, c, d;
@@ -231,12 +232,23 @@ fletcher_4_incremental_native(const void *buf, uint64_
 	}
 
 	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
+	return (0);
 }
 
+/*ARGSUSED*/
 void
-fletcher_4_incremental_byteswap(const void *buf, uint64_t size,
-    zio_cksum_t *zcp)
+fletcher_4_native(const void *buf, size_t size,
+    const void *ctx_template, zio_cksum_t *zcp)
 {
+	fletcher_init(zcp);
+	(void) fletcher_4_incremental_native((void *) buf, size, zcp);
+}
+
+int
+fletcher_4_incremental_byteswap(void *buf, size_t size, void *data)
+{
+	zio_cksum_t *zcp = data;
+
 	const uint32_t *ip = buf;
 	const uint32_t *ipend = ip + (size / sizeof (uint32_t));
 	uint64_t a, b, c, d;
@@ -254,4 +266,14 @@ fletcher_4_incremental_byteswap(const void *buf, uint6
 	}
 
 	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
+	return (0);
+}
+
+/*ARGSUSED*/
+void
+fletcher_4_byteswap(const void *buf, size_t size,
+    const void *ctx_template, zio_cksum_t *zcp)
+{
+	fletcher_init(zcp);
+	(void) fletcher_4_incremental_byteswap((void *) buf, size, zcp);
 }

Modified: stable/11/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h
==============================================================================
--- stable/11/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h	Thu Jul 27 10:19:13 2017	(r321609)
+++ stable/11/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h	Thu Jul 27 10:25:18 2017	(r321610)
@@ -24,6 +24,7 @@
  */
 /*
  * Copyright 2013 Saso Kiselkov. All rights reserved.
+ * Copyright (c) 2016 by Delphix. All rights reserved.
  */
 
 #ifndef	_ZFS_FLETCHER_H
@@ -40,12 +41,15 @@ extern "C" {
  * fletcher checksum functions
  */
 
-void fletcher_2_native(const void *, uint64_t, const void *, zio_cksum_t *);
-void fletcher_2_byteswap(const void *, uint64_t, const void *, zio_cksum_t *);
-void fletcher_4_native(const void *, uint64_t, const void *, zio_cksum_t *);
-void fletcher_4_byteswap(const void *, uint64_t, const void *, zio_cksum_t *);
-void fletcher_4_incremental_native(const void *, uint64_t, zio_cksum_t *);
-void fletcher_4_incremental_byteswap(const void *, uint64_t, zio_cksum_t *);
+void fletcher_init(zio_cksum_t *);
+void fletcher_2_native(const void *, size_t, const void *, zio_cksum_t *);
+void fletcher_2_byteswap(const void *, size_t, const void *, zio_cksum_t *);
+int fletcher_2_incremental_native(void *, size_t, void *);
+int fletcher_2_incremental_byteswap(void *, size_t, void *);
+void fletcher_4_native(const void *, size_t, const void *, zio_cksum_t *);
+void fletcher_4_byteswap(const void *, size_t, const void *, zio_cksum_t *);
+int fletcher_4_incremental_native(void *, size_t, void *);
+int fletcher_4_incremental_byteswap(void *, size_t, void *);
 
 #ifdef	__cplusplus
 }

Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/Makefile.files
==============================================================================
--- stable/11/sys/cddl/contrib/opensolaris/uts/common/Makefile.files	Thu Jul 27 10:19:13 2017	(r321609)
+++ stable/11/sys/cddl/contrib/opensolaris/uts/common/Makefile.files	Thu Jul 27 10:25:18 2017	(r321610)
@@ -33,6 +33,7 @@
 # common to all SunOS systems.
 
 ZFS_COMMON_OBJS +=		\
+	abd.o			\
 	arc.o			\
 	bplist.o		\
 	blkptr.o		\

Copied: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c (from r320156, head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c)
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c	Thu Jul 27 10:25:18 2017	(r321610, copy of r320156, head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c)
@@ -0,0 +1,944 @@
+/*
+ * This file and its contents are supplied under the terms of the
+ * Common Development and Distribution License ("CDDL"), version 1.0.
+ * You may only use this file in accordance with the terms of version
+ * 1.0 of the CDDL.
+ *
+ * A full copy of the text of the CDDL should have accompanied this
+ * source.  A copy of the CDDL is also available via the Internet at
+ * http://www.illumos.org/license/CDDL.
+ */
+
+/*
+ * Copyright (c) 2014 by Chunwei Chen. All rights reserved.
+ * Copyright (c) 2016 by Delphix. All rights reserved.
+ */
+
+/*
+ * ARC buffer data (ABD).
+ *
+ * ABDs are an abstract data structure for the ARC which can use two
+ * different ways of storing the underlying data:
+ *
+ * (a) Linear buffer. In this case, all the data in the ABD is stored in one
+ *     contiguous buffer in memory (from a zio_[data_]buf_* kmem cache).
+ *
+ *         +-------------------+
+ *         | ABD (linear)      |
+ *         |   abd_flags = ... |
+ *         |   abd_size = ...  |     +--------------------------------+
+ *         |   abd_buf ------------->| raw buffer of size abd_size    |
+ *         +-------------------+     +--------------------------------+
+ *              no abd_chunks
+ *
+ * (b) Scattered buffer. In this case, the data in the ABD is split into
+ *     equal-sized chunks (from the abd_chunk_cache kmem_cache), with pointers
+ *     to the chunks recorded in an array at the end of the ABD structure.
+ *
+ *         +-------------------+
+ *         | ABD (scattered)   |
+ *         |   abd_flags = ... |
+ *         |   abd_size = ...  |
+ *         |   abd_offset = 0  |                           +-----------+
+ *         |   abd_chunks[0] ----------------------------->| chunk 0   |
+ *         |   abd_chunks[1] ---------------------+        +-----------+
+ *         |   ...             |                  |        +-----------+
+ *         |   abd_chunks[N-1] ---------+         +------->| chunk 1   |
+ *         +-------------------+        |                  +-----------+
+ *                                      |                      ...
+ *                                      |                  +-----------+
+ *                                      +----------------->| chunk N-1 |
+ *                                                         +-----------+
+ *
+ * Using a large proportion of scattered ABDs decreases ARC fragmentation since
+ * when we are at the limit of allocatable space, using equal-size chunks will
+ * allow us to quickly reclaim enough space for a new large allocation (assuming
+ * it is also scattered).
+ *
+ * In addition to directly allocating a linear or scattered ABD, it is also
+ * possible to create an ABD by requesting the "sub-ABD" starting at an offset
+ * within an existing ABD. In linear buffers this is simple (set abd_buf of
+ * the new ABD to the starting point within the original raw buffer), but
+ * scattered ABDs are a little more complex. The new ABD makes a copy of the
+ * relevant abd_chunks pointers (but not the underlying data). However, to
+ * provide arbitrary rather than only chunk-aligned starting offsets, it also
+ * tracks an abd_offset field which represents the starting point of the data
+ * within the first chunk in abd_chunks. For both linear and scattered ABDs,
+ * creating an offset ABD marks the original ABD as the offset's parent, and the
+ * original ABD's abd_children refcount is incremented. This data allows us to
+ * ensure the root ABD isn't deleted before its children.
+ *
+ * Most consumers should never need to know what type of ABD they're using --
+ * the ABD public API ensures that it's possible to transparently switch from
+ * using a linear ABD to a scattered one when doing so would be beneficial.
+ *
+ * If you need to use the data within an ABD directly, if you know it's linear
+ * (because you allocated it) you can use abd_to_buf() to access the underlying
+ * raw buffer. Otherwise, you should use one of the abd_borrow_buf* functions
+ * which will allocate a raw buffer if necessary. Use the abd_return_buf*
+ * functions to return any raw buffers that are no longer necessary when you're
+ * done using them.
+ *
+ * There are a variety of ABD APIs that implement basic buffer operations:
+ * compare, copy, read, write, and fill with zeroes. If you need a custom
+ * function which progressively accesses the whole ABD, use the abd_iterate_*
+ * functions.
+ */
+
+#include <sys/abd.h>
+#include <sys/param.h>
+#include <sys/zio.h>
+#include <sys/zfs_context.h>
+#include <sys/zfs_znode.h>
+
+typedef struct abd_stats {
+	kstat_named_t abdstat_struct_size;
+	kstat_named_t abdstat_scatter_cnt;
+	kstat_named_t abdstat_scatter_data_size;
+	kstat_named_t abdstat_scatter_chunk_waste;
+	kstat_named_t abdstat_linear_cnt;
+	kstat_named_t abdstat_linear_data_size;
+} abd_stats_t;
+
+static abd_stats_t abd_stats = {
+	/* Amount of memory occupied by all of the abd_t struct allocations */
+	{ "struct_size",			KSTAT_DATA_UINT64 },
+	/*
+	 * The number of scatter ABDs which are currently allocated, excluding
+	 * ABDs which don't own their data (for instance the ones which were
+	 * allocated through abd_get_offset()).
+	 */
+	{ "scatter_cnt",			KSTAT_DATA_UINT64 },
+	/* Amount of data stored in all scatter ABDs tracked by scatter_cnt */
+	{ "scatter_data_size",			KSTAT_DATA_UINT64 },
+	/*
+	 * The amount of space wasted at the end of the last chunk across all
+	 * scatter ABDs tracked by scatter_cnt.
+	 */
+	{ "scatter_chunk_waste",		KSTAT_DATA_UINT64 },
+	/*
+	 * The number of linear ABDs which are currently allocated, excluding
+	 * ABDs which don't own their data (for instance the ones which were
+	 * allocated through abd_get_offset() and abd_get_from_buf()). If an
+	 * ABD takes ownership of its buf then it will become tracked.
+	 */
+	{ "linear_cnt",				KSTAT_DATA_UINT64 },
+	/* Amount of data stored in all linear ABDs tracked by linear_cnt */
+	{ "linear_data_size",			KSTAT_DATA_UINT64 },
+};
+
+#define	ABDSTAT(stat)		(abd_stats.stat.value.ui64)
+#define	ABDSTAT_INCR(stat, val) \
+	atomic_add_64(&abd_stats.stat.value.ui64, (val))
+#define	ABDSTAT_BUMP(stat)	ABDSTAT_INCR(stat, 1)
+#define	ABDSTAT_BUMPDOWN(stat)	ABDSTAT_INCR(stat, -1)
+
+/*
+ * It is possible to make all future ABDs be linear by setting this to B_FALSE.
+ * Otherwise, ABDs are allocated scattered by default unless the caller uses
+ * abd_alloc_linear().
+ */
+boolean_t zfs_abd_scatter_enabled = B_TRUE;
+
+/*
+ * The size of the chunks ABD allocates. Because the sizes allocated from the
+ * kmem_cache can't change, this tunable can only be modified at boot. Changing
+ * it at runtime would cause ABD iteration to work incorrectly for ABDs which
+ * were allocated with the old size, so a safeguard has been put in place which
+ * will cause the machine to panic if you change it and try to access the data
+ * within a scattered ABD.
+ */
+size_t zfs_abd_chunk_size = 4096;
+
+#ifdef _KERNEL
+extern vmem_t *zio_alloc_arena;
+#endif
+
+kmem_cache_t *abd_chunk_cache;
+static kstat_t *abd_ksp;
+
+static void *
+abd_alloc_chunk()
+{
+	void *c = kmem_cache_alloc(abd_chunk_cache, KM_PUSHPAGE);
+	ASSERT3P(c, !=, NULL);
+	return (c);
+}
+
+static void
+abd_free_chunk(void *c)
+{
+	kmem_cache_free(abd_chunk_cache, c);
+}
+
+void
+abd_init(void)
+{
+#ifdef illumos
+	vmem_t *data_alloc_arena = NULL;
+
+#ifdef _KERNEL
+	data_alloc_arena = zio_alloc_arena;
+#endif
+
+	/*
+	 * Since ABD chunks do not appear in crash dumps, we pass KMC_NOTOUCH
+	 * so that no allocator metadata is stored with the buffers.
+	 */
+	abd_chunk_cache = kmem_cache_create("abd_chunk", zfs_abd_chunk_size, 0,
+	    NULL, NULL, NULL, NULL, data_alloc_arena, KMC_NOTOUCH);
+#else
+	abd_chunk_cache = kmem_cache_create("abd_chunk", zfs_abd_chunk_size, 0,
+	    NULL, NULL, NULL, NULL, 0, KMC_NOTOUCH | KMC_NODEBUG);
+#endif
+	abd_ksp = kstat_create("zfs", 0, "abdstats", "misc", KSTAT_TYPE_NAMED,
+	    sizeof (abd_stats) / sizeof (kstat_named_t), KSTAT_FLAG_VIRTUAL);
+	if (abd_ksp != NULL) {
+		abd_ksp->ks_data = &abd_stats;
+		kstat_install(abd_ksp);
+	}
+}
+
+void
+abd_fini(void)
+{
+	if (abd_ksp != NULL) {
+		kstat_delete(abd_ksp);
+		abd_ksp = NULL;
+	}
+
+	kmem_cache_destroy(abd_chunk_cache);
+	abd_chunk_cache = NULL;
+}
+
+static inline size_t
+abd_chunkcnt_for_bytes(size_t size)
+{
+	return (P2ROUNDUP(size, zfs_abd_chunk_size) / zfs_abd_chunk_size);
+}
+
+static inline size_t
+abd_scatter_chunkcnt(abd_t *abd)
+{
+	ASSERT(!abd_is_linear(abd));
+	return (abd_chunkcnt_for_bytes(
+	    abd->abd_u.abd_scatter.abd_offset + abd->abd_size));
+}
+
+static inline void
+abd_verify(abd_t *abd)
+{
+	ASSERT3U(abd->abd_size, >, 0);
+	ASSERT3U(abd->abd_size, <=, SPA_MAXBLOCKSIZE);
+	ASSERT3U(abd->abd_flags, ==, abd->abd_flags & (ABD_FLAG_LINEAR |
+	    ABD_FLAG_OWNER | ABD_FLAG_META));
+	IMPLY(abd->abd_parent != NULL, !(abd->abd_flags & ABD_FLAG_OWNER));
+	IMPLY(abd->abd_flags & ABD_FLAG_META, abd->abd_flags & ABD_FLAG_OWNER);
+	if (abd_is_linear(abd)) {
+		ASSERT3P(abd->abd_u.abd_linear.abd_buf, !=, NULL);
+	} else {
+		ASSERT3U(abd->abd_u.abd_scatter.abd_offset, <,
+		    zfs_abd_chunk_size);
+		size_t n = abd_scatter_chunkcnt(abd);
+		for (int i = 0; i < n; i++) {
+			ASSERT3P(
+			    abd->abd_u.abd_scatter.abd_chunks[i], !=, NULL);
+		}
+	}
+}
+
+static inline abd_t *
+abd_alloc_struct(size_t chunkcnt)
+{
+	size_t size = offsetof(abd_t, abd_u.abd_scatter.abd_chunks[chunkcnt]);
+	abd_t *abd = kmem_alloc(size, KM_PUSHPAGE);
+	ASSERT3P(abd, !=, NULL);
+	ABDSTAT_INCR(abdstat_struct_size, size);
+
+	return (abd);
+}
+
+static inline void
+abd_free_struct(abd_t *abd)
+{
+	size_t chunkcnt = abd_is_linear(abd) ? 0 : abd_scatter_chunkcnt(abd);
+	int size = offsetof(abd_t, abd_u.abd_scatter.abd_chunks[chunkcnt]);
+	kmem_free(abd, size);
+	ABDSTAT_INCR(abdstat_struct_size, -size);
+}
+
+/*
+ * Allocate an ABD, along with its own underlying data buffers. Use this if you
+ * don't care whether the ABD is linear or not.
+ */
+abd_t *
+abd_alloc(size_t size, boolean_t is_metadata)
+{
+	if (!zfs_abd_scatter_enabled)
+		return (abd_alloc_linear(size, is_metadata));
+
+	VERIFY3U(size, <=, SPA_MAXBLOCKSIZE);
+
+	size_t n = abd_chunkcnt_for_bytes(size);
+	abd_t *abd = abd_alloc_struct(n);
+
+	abd->abd_flags = ABD_FLAG_OWNER;
+	if (is_metadata) {
+		abd->abd_flags |= ABD_FLAG_META;
+	}
+	abd->abd_size = size;
+	abd->abd_parent = NULL;
+	refcount_create(&abd->abd_children);
+
+	abd->abd_u.abd_scatter.abd_offset = 0;
+	abd->abd_u.abd_scatter.abd_chunk_size = zfs_abd_chunk_size;
+
+	for (int i = 0; i < n; i++) {
+		void *c = abd_alloc_chunk();
+		ASSERT3P(c, !=, NULL);
+		abd->abd_u.abd_scatter.abd_chunks[i] = c;
+	}
+
+	ABDSTAT_BUMP(abdstat_scatter_cnt);
+	ABDSTAT_INCR(abdstat_scatter_data_size, size);
+	ABDSTAT_INCR(abdstat_scatter_chunk_waste,
+	    n * zfs_abd_chunk_size - size);
+
+	return (abd);
+}
+
+static void
+abd_free_scatter(abd_t *abd)
+{
+	size_t n = abd_scatter_chunkcnt(abd);
+	for (int i = 0; i < n; i++) {
+		abd_free_chunk(abd->abd_u.abd_scatter.abd_chunks[i]);
+	}
+
+	refcount_destroy(&abd->abd_children);
+	ABDSTAT_BUMPDOWN(abdstat_scatter_cnt);
+	ABDSTAT_INCR(abdstat_scatter_data_size, -(int)abd->abd_size);
+	ABDSTAT_INCR(abdstat_scatter_chunk_waste,
+	    abd->abd_size - n * zfs_abd_chunk_size);
+
+	abd_free_struct(abd);
+}
+
+/*
+ * Allocate an ABD that must be linear, along with its own underlying data
+ * buffer. Only use this when it would be very annoying to write your ABD
+ * consumer with a scattered ABD.
+ */
+abd_t *
+abd_alloc_linear(size_t size, boolean_t is_metadata)
+{
+	abd_t *abd = abd_alloc_struct(0);
+
+	VERIFY3U(size, <=, SPA_MAXBLOCKSIZE);
+
+	abd->abd_flags = ABD_FLAG_LINEAR | ABD_FLAG_OWNER;
+	if (is_metadata) {
+		abd->abd_flags |= ABD_FLAG_META;
+	}
+	abd->abd_size = size;
+	abd->abd_parent = NULL;
+	refcount_create(&abd->abd_children);
+
+	if (is_metadata) {
+		abd->abd_u.abd_linear.abd_buf = zio_buf_alloc(size);
+	} else {
+		abd->abd_u.abd_linear.abd_buf = zio_data_buf_alloc(size);
+	}
+
+	ABDSTAT_BUMP(abdstat_linear_cnt);
+	ABDSTAT_INCR(abdstat_linear_data_size, size);
+
+	return (abd);
+}
+
+static void
+abd_free_linear(abd_t *abd)
+{
+	if (abd->abd_flags & ABD_FLAG_META) {
+		zio_buf_free(abd->abd_u.abd_linear.abd_buf, abd->abd_size);
+	} else {
+		zio_data_buf_free(abd->abd_u.abd_linear.abd_buf, abd->abd_size);
+	}
+
+	refcount_destroy(&abd->abd_children);
+	ABDSTAT_BUMPDOWN(abdstat_linear_cnt);
+	ABDSTAT_INCR(abdstat_linear_data_size, -(int)abd->abd_size);
+
+	abd_free_struct(abd);
+}
+
+/*
+ * Free an ABD. Only use this on ABDs allocated with abd_alloc() or
+ * abd_alloc_linear().
+ */
+void
+abd_free(abd_t *abd)
+{
+	abd_verify(abd);
+	ASSERT3P(abd->abd_parent, ==, NULL);
+	ASSERT(abd->abd_flags & ABD_FLAG_OWNER);
+	if (abd_is_linear(abd))
+		abd_free_linear(abd);
+	else
+		abd_free_scatter(abd);
+}
+
+/*
+ * Allocate an ABD of the same format (same metadata flag, same scatterize
+ * setting) as another ABD.
+ */
+abd_t *
+abd_alloc_sametype(abd_t *sabd, size_t size)
+{
+	boolean_t is_metadata = (sabd->abd_flags & ABD_FLAG_META) != 0;
+	if (abd_is_linear(sabd)) {
+		return (abd_alloc_linear(size, is_metadata));
+	} else {
+		return (abd_alloc(size, is_metadata));
+	}
+}
+
+/*
+ * If we're going to use this ABD for doing I/O using the block layer, the
+ * consumer of the ABD data doesn't care if it's scattered or not, and we don't
+ * plan to store this ABD in memory for a long period of time, we should
+ * allocate the ABD type that requires the least data copying to do the I/O.
+ *
+ * Currently this is linear ABDs, however if ldi_strategy() can ever issue I/Os
+ * using a scatter/gather list we should switch to that and replace this call
+ * with vanilla abd_alloc().
+ */
+abd_t *
+abd_alloc_for_io(size_t size, boolean_t is_metadata)
+{
+	return (abd_alloc_linear(size, is_metadata));
+}
+
+/*
+ * Allocate a new ABD to point to offset off of sabd. It shares the underlying
+ * buffer data with sabd. Use abd_put() to free. sabd must not be freed while
+ * any derived ABDs exist.
+ */
+abd_t *
+abd_get_offset(abd_t *sabd, size_t off)
+{
+	abd_t *abd;
+
+	abd_verify(sabd);
+	ASSERT3U(off, <=, sabd->abd_size);

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201707271025.v6RAPIAj015418>