Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Jun 2017 16:29:47 -0400
From:      Ken Merry <ken@freebsd.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r320156 - in head: cddl/contrib/opensolaris/cmd/zdb cddl/contrib/opensolaris/cmd/ztest cddl/contrib/opensolaris/lib/libzfs/common sys/cddl/contrib/opensolaris/common/zfs sys/cddl/contri...
Message-ID:  <81F84BCA-E973-4D78-B81C-1D398ADFA47E@freebsd.org>
In-Reply-To: <201706201739.v5KHdPhO051256@repo.freebsd.org>
References:  <201706201739.v5KHdPhO051256@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
I don=E2=80=99t know for sure that this commit is the cause, but it (and =
r320153) are the only ZFS commits between a version of head from June =
14th that boots off a ZFS mirror, and one that panics.

Here=E2=80=99s the stack trace:

Fatal trap 12: page fault while in kernel mode
cpuid =3D 22;=20

Fatal trap 12: page fault while in kernel mode
cpuid =3D 9; apic id =3D 09
fault virtual address   =3D 0x0
fault code              =3D supervisor read data, page not present
instruction pointer     =3D 0x20:0xffffffff81e47f21
stack pointer           =3D 0x28:0xfffffe08b37f8810
frame pointer           =3D 0x28:0xfffffe08b37f8860
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 0 (zio_free_issue_0_3)
[ thread pid 0 tid 100478 ]
Stopped at      0xffffffff81e47f21 =3D zio_vdev_io_start+0x1f1:   testb  =
 $0x1,(%rax)
db> bt
Tracing pid 0 tid 100478 td 0xfffff80193156000
zio_vdev_io_start() at 0xffffffff81e47f21 =3D =
zio_vdev_io_start+0x1f1/frame 0xfffffe08b37f8860
zio_execute() at 0xffffffff81e4312c =3D zio_execute+0x36c/frame =
0xfffffe08b37f88b0
zio_nowait() at 0xffffffff81e422b8 =3D zio_nowait+0xb8/frame =
0xfffffe08b37f88e0
vdev_mirror_io_start() at 0xffffffff81e224fc =3D =
vdev_mirror_io_start+0x38c/frame 0xfffffe08b37f8930
zio_vdev_io_start() at 0xffffffff81e48030 =3D =
zio_vdev_io_start+0x300/frame 0xfffffe08b37f8990
zio_execute() at 0xffffffff81e4312c =3D zio_execute+0x36c/frame =
0xfffffe08b37f89e0
taskqueue_run_locked() at 0xffffffff809a9d6d =3D =
taskqueue_run_locked+0x13d/frame 0xfffffe08b37f8a40
taskqueue_thread_loop() at 0xffffffff809aab28 =3D =
taskqueue_thread_loop+0x88/frame 0xfffffe08b37f8a70
fork_exit() at 0xffffffff8091e3e4 =3D fork_exit+0x84/frame =
0xfffffe08b37f8ab0
fork_trampoline() at 0xffffffff80d930fe =3D fork_trampoline+0xe/frame =
0xfffffe08b37f8ab0
--- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
db>=20

(kgdb) list *(zio_vdev_io_start+0x1f1)
0xd9f21 is in zio_vdev_io_start =
(/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/u=
ts/common/fs/zfs/zio.c:350).
345
346             /*
347              * Ensure that anyone expecting this zio to contain a =
linear ABD isn't
348              * going to get a nasty surprise when they try to access =
the data.
349              */
350             IMPLY(abd_is_linear(zio->io_abd), abd_is_linear(data));
351
352             zt->zt_orig_abd =3D zio->io_abd;
353             zt->zt_orig_size =3D zio->io_size;
354             zt->zt_bufsize =3D bufsize;

I=E2=80=99ll try rebooting and see if the problem goes away.  If not, =
I=E2=80=99ll roll back the ABD change and see if the problem goes away.

Ken
=E2=80=94=20
Ken Merry
ken@FreeBSD.ORG



> On Jun 20, 2017, at 1:39 PM, Andriy Gapon <avg@freebsd.org> wrote:
>=20
> Author: avg
> Date: Tue Jun 20 17:39:24 2017
> New Revision: 320156
> URL: https://svnweb.freebsd.org/changeset/base/320156
>=20
> Log:
>  MFV r318946: 8021 ARC buf data scatter-ization
>=20
>  illumos/illumos-gate@770499e185d15678ccb0be57ebc626ad18d93383
>  =
https://github.com/illumos/illumos-gate/commit/770499e185d15678ccb0be57ebc=
626ad18d93383
>=20
>  https://www.illumos.org/issues/8021
>    The ARC buf data project (known simply as "ABD" since its genesis =
in the ZoL
>    community) changes the way the ARC allocates `b_pdata` memory from =
using linear
>    `void *` buffers to using scatter/gather lists of fixed-size 1KB =
chunks. This
>    improves ZFS's performance by helping to defragment the address =
space occupied
>    by the ARC, in particular for cases where compressed ARC is =
enabled. It could
>    also ease future work to allocate pages directly from `segkpm` for =
minimal-
>    overhead memory allocations, bypassing the `kmem` subsystem.
>    This is essentially the same change as the one which recently =
landed in ZFS on
>    Linux, although they made some platform-specific changes while =
adapting this
>    work to their codebase:
>    1. Implemented the equivalent of the `segkpm` suggestion for future =
work
>    mentioned above to bypass issues that they've had with the Linux =
kernel memory
>    allocator.
>    2. Changed the internal representation of the ABD's scatter/gather =
list so it
>    could be used to pass I/O directly into Linux block device drivers. =
(This
>    feature is not available in the illumos block device interface =
yet.)
>=20
>  FreeBSD notes:
>  - the actual (default) chunk size is 4KB (despite the text above =
saying 1KB)
>  - we can try to reimplement ABDs, so that they are not permanently
>    mapped into the KVA unless explicitly requested, especially on
>    platforms with scarce KVA
>  - we can try to use unmapped I/O and avoid intermediate allocation of =
a
>    linear, virtual memory mapped buffer
>  - we can try to avoid extra data copying by referring to chunks / =
pages
>    in the original ABD
>=20
>  Reviewed by: Matthew Ahrens <mahrens@delphix.com>
>  Reviewed by: George Wilson <george.wilson@delphix.com>
>  Reviewed by: Paul Dagnelie <pcd@delphix.com>
>  Reviewed by: John Kennedy <john.kennedy@delphix.com>
>  Reviewed by: Prakash Surya <prakash.surya@delphix.com>
>  Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
>  Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
>  Reviewed by: Chris Williamson <chris.williamson@delphix.com>
>  Approved by: Richard Lowe <richlowe@richlowe.net>
>  Author: Dan Kimmel <dan.kimmel@delphix.com>
>=20
>  MFC after:	3 weeks
>=20
> Added:
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c
>     - copied, changed from r318946, =
vendor-sys/illumos/dist/uts/common/fs/zfs/abd.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/abd.h
>     - copied, changed from r318946, =
vendor-sys/illumos/dist/uts/common/fs/zfs/sys/abd.h
> Modified:
>  head/cddl/contrib/opensolaris/cmd/zdb/zdb.c
>  head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c
>  head/cddl/contrib/opensolaris/cmd/ztest/ztest.c
>  head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c
>  head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c
>  head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h
>  head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/blkptr.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/edonr_zfs.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lz4.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sha256.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/skein_zfs.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
>  =
head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_checksum.h
>  =
head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_cache.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_checksum.c
>  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c
>  head/sys/conf/files
> Directory Properties:
>  head/cddl/contrib/opensolaris/   (props changed)
>  head/cddl/contrib/opensolaris/cmd/zdb/   (props changed)
>  head/cddl/contrib/opensolaris/lib/libzfs/   (props changed)
>  head/sys/cddl/contrib/opensolaris/   (props changed)
>=20
> Modified: head/cddl/contrib/opensolaris/cmd/zdb/zdb.c
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> --- head/cddl/contrib/opensolaris/cmd/zdb/zdb.c	Tue Jun 20 =
17:38:25 2017	(r320155)
> +++ head/cddl/contrib/opensolaris/cmd/zdb/zdb.c	Tue Jun 20 =
17:39:24 2017	(r320156)
> @@ -59,6 +59,7 @@
> #include <sys/arc.h>
> #include <sys/ddt.h>
> #include <sys/zfeature.h>
> +#include <sys/abd.h>
> #include <zfs_comutil.h>
> #undef verify
> #include <libzfs.h>
> @@ -2410,7 +2411,7 @@ zdb_blkptr_done(zio_t *zio)
> 	zdb_cb_t *zcb =3D zio->io_private;
> 	zbookmark_phys_t *zb =3D &zio->io_bookmark;
>=20
> -	zio_data_buf_free(zio->io_data, zio->io_size);
> +	abd_free(zio->io_abd);
>=20
> 	mutex_enter(&spa->spa_scrub_lock);
> 	spa->spa_scrub_inflight--;
> @@ -2477,7 +2478,7 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const =
blkptr
> 	if (!BP_IS_EMBEDDED(bp) &&
> 	    (dump_opt['c'] > 1 || (dump_opt['c'] && is_metadata))) {
> 		size_t size =3D BP_GET_PSIZE(bp);
> -		void *data =3D zio_data_buf_alloc(size);
> +		abd_t *abd =3D abd_alloc(size, B_FALSE);
> 		int flags =3D ZIO_FLAG_CANFAIL | ZIO_FLAG_SCRUB | =
ZIO_FLAG_RAW;
>=20
> 		/* If it's an intent log block, failure is expected. */
> @@ -2490,7 +2491,7 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const =
blkptr
> 		spa->spa_scrub_inflight++;
> 		mutex_exit(&spa->spa_scrub_lock);
>=20
> -		zio_nowait(zio_read(NULL, spa, bp, data, size,
> +		zio_nowait(zio_read(NULL, spa, bp, abd, size,
> 		    zdb_blkptr_done, zcb, ZIO_PRIORITY_ASYNC_READ, =
flags, zb));
> 	}
>=20
> @@ -3270,6 +3271,13 @@ name:
> 	return (NULL);
> }
>=20
> +/* ARGSUSED */
> +static int
> +random_get_pseudo_bytes_cb(void *buf, size_t len, void *unused)
> +{
> +	return (random_get_pseudo_bytes(buf, len));
> +}
> +
> /*
>  * Read a block from a pool and print it out.  The syntax of the
>  * block descriptor is:
> @@ -3301,7 +3309,8 @@ zdb_read_block(char *thing, spa_t *spa)
> 	uint64_t offset =3D 0, size =3D 0, psize =3D 0, lsize =3D 0, =
blkptr_offset =3D 0;
> 	zio_t *zio;
> 	vdev_t *vd;
> -	void *pbuf, *lbuf, *buf;
> +	abd_t *pabd;
> +	void *lbuf, *buf;
> 	char *s, *p, *dup, *vdev, *flagstr;
> 	int i, error;
>=20
> @@ -3373,7 +3382,7 @@ zdb_read_block(char *thing, spa_t *spa)
> 	psize =3D size;
> 	lsize =3D size;
>=20
> -	pbuf =3D umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
> +	pabd =3D abd_alloc_linear(SPA_MAXBLOCKSIZE, B_FALSE);
> 	lbuf =3D umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
>=20
> 	BP_ZERO(bp);
> @@ -3401,15 +3410,15 @@ zdb_read_block(char *thing, spa_t *spa)
> 		/*
> 		 * Treat this as a normal block read.
> 		 */
> -		zio_nowait(zio_read(zio, spa, bp, pbuf, psize, NULL, =
NULL,
> +		zio_nowait(zio_read(zio, spa, bp, pabd, psize, NULL, =
NULL,
> 		    ZIO_PRIORITY_SYNC_READ,
> 		    ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL));
> 	} else {
> 		/*
> 		 * Treat this as a vdev child I/O.
> 		 */
> -		zio_nowait(zio_vdev_child_io(zio, bp, vd, offset, pbuf, =
psize,
> -		    ZIO_TYPE_READ, ZIO_PRIORITY_SYNC_READ,
> +		zio_nowait(zio_vdev_child_io(zio, bp, vd, offset, pabd,
> +		    psize, ZIO_TYPE_READ, ZIO_PRIORITY_SYNC_READ,
> 		    ZIO_FLAG_DONT_CACHE | ZIO_FLAG_DONT_QUEUE |
> 		    ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_DONT_RETRY |
> 		    ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL, NULL));
> @@ -3432,21 +3441,21 @@ zdb_read_block(char *thing, spa_t *spa)
> 		void *pbuf2 =3D umem_alloc(SPA_MAXBLOCKSIZE, =
UMEM_NOFAIL);
> 		void *lbuf2 =3D umem_alloc(SPA_MAXBLOCKSIZE, =
UMEM_NOFAIL);
>=20
> -		bcopy(pbuf, pbuf2, psize);
> +		abd_copy_to_buf(pbuf2, pabd, psize);
>=20
> -		VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf + psize,
> -		    SPA_MAXBLOCKSIZE - psize) =3D=3D 0);
> +		VERIFY0(abd_iterate_func(pabd, psize, SPA_MAXBLOCKSIZE - =
psize,
> +		    random_get_pseudo_bytes_cb, NULL));
>=20
> -		VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf2 + psize,
> -		    SPA_MAXBLOCKSIZE - psize) =3D=3D 0);
> +		VERIFY0(random_get_pseudo_bytes((uint8_t *)pbuf2 + =
psize,
> +		    SPA_MAXBLOCKSIZE - psize));
>=20
> 		for (lsize =3D SPA_MAXBLOCKSIZE; lsize > psize;
> 		    lsize -=3D SPA_MINBLOCKSIZE) {
> 			for (c =3D 0; c < ZIO_COMPRESS_FUNCTIONS; c++) {
> -				if (zio_decompress_data(c, pbuf, lbuf,
> -				    psize, lsize) =3D=3D 0 &&
> -				    zio_decompress_data(c, pbuf2, lbuf2,
> -				    psize, lsize) =3D=3D 0 &&
> +				if (zio_decompress_data(c, pabd,
> +				    lbuf, psize, lsize) =3D=3D 0 &&
> +				    zio_decompress_data_buf(c, pbuf2,
> +				    lbuf2, psize, lsize) =3D=3D 0 &&
> 				    bcmp(lbuf, lbuf2, lsize) =3D=3D 0)
> 					break;
> 			}
> @@ -3465,7 +3474,7 @@ zdb_read_block(char *thing, spa_t *spa)
> 		buf =3D lbuf;
> 		size =3D lsize;
> 	} else {
> -		buf =3D pbuf;
> +		buf =3D abd_to_buf(pabd);
> 		size =3D psize;
> 	}
>=20
> @@ -3483,7 +3492,7 @@ zdb_read_block(char *thing, spa_t *spa)
> 		zdb_dump_block(thing, buf, size, flags);
>=20
> out:
> -	umem_free(pbuf, SPA_MAXBLOCKSIZE);
> +	abd_free(pabd);
> 	umem_free(lbuf, SPA_MAXBLOCKSIZE);
> 	free(dup);
> }
>=20
> Modified: head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> --- head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c	Tue Jun 20 =
17:38:25 2017	(r320155)
> +++ head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c	Tue Jun 20 =
17:39:24 2017	(r320156)
> @@ -24,7 +24,7 @@
>  */
>=20
> /*
> - * Copyright (c) 2013, 2014 by Delphix. All rights reserved.
> + * Copyright (c) 2013, 2016 by Delphix. All rights reserved.
>  */
>=20
> /*
> @@ -41,6 +41,7 @@
> #include <sys/resource.h>
> #include <sys/zil.h>
> #include <sys/zil_impl.h>
> +#include <sys/abd.h>
>=20
> extern uint8_t dump_opt[256];
>=20
> @@ -117,13 +118,27 @@ zil_prt_rec_rename(zilog_t *zilog, int txtype, =
lr_rena
> }
>=20
> /* ARGSUSED */
> +static int
> +zil_prt_rec_write_cb(void *data, size_t len, void *unused)
> +{
> +	char *cdata =3D data;
> +	for (int i =3D 0; i < len; i++) {
> +		if (isprint(*cdata))
> +			(void) printf("%c ", *cdata);
> +		else
> +			(void) printf("%2X", *cdata);
> +		cdata++;
> +	}
> +	return (0);
> +}
> +
> +/* ARGSUSED */
> static void
> zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write_t *lr)
> {
> -	char *data, *dlimit;
> +	abd_t *data;
> 	blkptr_t *bp =3D &lr->lr_blkptr;
> 	zbookmark_phys_t zb;
> -	char buf[SPA_MAXBLOCKSIZE];
> 	int verbose =3D MAX(dump_opt['d'], dump_opt['i']);
> 	int error;
>=20
> @@ -144,7 +159,6 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, =
lr_write
> 		if (BP_IS_HOLE(bp)) {
> 			(void) printf("\t\t\tLSIZE 0x%llx\n",
> 			    (u_longlong_t)BP_GET_LSIZE(bp));
> -			bzero(buf, sizeof (buf));
> 			(void) printf("%s<hole>\n", prefix);
> 			return;
> 		}
> @@ -157,28 +171,26 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, =
lr_write
> 		    lr->lr_foid, ZB_ZIL_LEVEL,
> 		    lr->lr_offset / BP_GET_LSIZE(bp));
>=20
> +		data =3D abd_alloc(BP_GET_LSIZE(bp), B_FALSE);
> 		error =3D zio_wait(zio_read(NULL, zilog->zl_spa,
> -		    bp, buf, BP_GET_LSIZE(bp), NULL, NULL,
> +		    bp, data, BP_GET_LSIZE(bp), NULL, NULL,
> 		    ZIO_PRIORITY_SYNC_READ, ZIO_FLAG_CANFAIL, &zb));
> 		if (error)
> -			return;
> -		data =3D buf;
> +			goto out;
> 	} else {
> -		data =3D (char *)(lr + 1);
> +		/* data is stored after the end of the lr_write record =
*/
> +		data =3D abd_alloc(lr->lr_length, B_FALSE);
> +		abd_copy_from_buf(data, lr + 1, lr->lr_length);
> 	}
>=20
> -	dlimit =3D data + MIN(lr->lr_length,
> -	    (verbose < 6 ? 20 : SPA_MAXBLOCKSIZE));
> -
> 	(void) printf("%s", prefix);
> -	while (data < dlimit) {
> -		if (isprint(*data))
> -			(void) printf("%c ", *data);
> -		else
> -			(void) printf("%2X", *data);
> -		data++;
> -	}
> +	(void) abd_iterate_func(data,
> +	    0, MIN(lr->lr_length, (verbose < 6 ? 20 : =
SPA_MAXBLOCKSIZE)),
> +	    zil_prt_rec_write_cb, NULL);
> 	(void) printf("\n");
> +
> +out:
> +	abd_free(data);
> }
>=20
> /* ARGSUSED */
>=20
> Modified: head/cddl/contrib/opensolaris/cmd/ztest/ztest.c
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> --- head/cddl/contrib/opensolaris/cmd/ztest/ztest.c	Tue Jun 20 =
17:38:25 2017	(r320155)
> +++ head/cddl/contrib/opensolaris/cmd/ztest/ztest.c	Tue Jun 20 =
17:39:24 2017	(r320156)
> @@ -112,6 +112,7 @@
> #include <sys/refcount.h>
> #include <sys/zfeature.h>
> #include <sys/dsl_userhold.h>
> +#include <sys/abd.h>
> #include <stdio.h>
> #include <stdio_ext.h>
> #include <stdlib.h>
> @@ -190,6 +191,7 @@ extern uint64_t metaslab_df_alloc_threshold;
> extern uint64_t zfs_deadman_synctime_ms;
> extern int metaslab_preload_limit;
> extern boolean_t zfs_compressed_arc_enabled;
> +extern boolean_t zfs_abd_scatter_enabled;
>=20
> static ztest_shared_opts_t *ztest_shared_opts;
> static ztest_shared_opts_t ztest_opts;
> @@ -5042,7 +5044,7 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_t id)
> 	enum zio_checksum checksum =3D spa_dedup_checksum(spa);
> 	dmu_buf_t *db;
> 	dmu_tx_t *tx;
> -	void *buf;
> +	abd_t *abd;
> 	blkptr_t blk;
> 	int copies =3D 2 * ZIO_DEDUPDITTO_MIN;
>=20
> @@ -5122,14 +5124,14 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_t id)
> 	 * Damage the block.  Dedup-ditto will save us when we read it =
later.
> 	 */
> 	psize =3D BP_GET_PSIZE(&blk);
> -	buf =3D zio_buf_alloc(psize);
> -	ztest_pattern_set(buf, psize, ~pattern);
> +	abd =3D abd_alloc_linear(psize, B_TRUE);
> +	ztest_pattern_set(abd_to_buf(abd), psize, ~pattern);
>=20
> 	(void) zio_wait(zio_rewrite(NULL, spa, 0, &blk,
> -	    buf, psize, NULL, NULL, ZIO_PRIORITY_SYNC_WRITE,
> +	    abd, psize, NULL, NULL, ZIO_PRIORITY_SYNC_WRITE,
> 	    ZIO_FLAG_CANFAIL | ZIO_FLAG_INDUCE_DAMAGE, NULL));
>=20
> -	zio_buf_free(buf, psize);
> +	abd_free(abd);
>=20
> 	(void) rw_unlock(&ztest_name_lock);
> }
> @@ -5413,6 +5415,12 @@ ztest_resume_thread(void *arg)
> 		 */
> 		if (ztest_random(10) =3D=3D 0)
> 			zfs_compressed_arc_enabled =3D ztest_random(2);
> +
> +		/*
> +		 * Periodically change the zfs_abd_scatter_enabled =
setting.
> +		 */
> +		if (ztest_random(10) =3D=3D 0)
> +			zfs_abd_scatter_enabled =3D ztest_random(2);
> 	}
> 	return (NULL);
> }
>=20
> Modified: =
head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> --- head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c	=
Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c	=
Tue Jun 20 17:39:24 2017	(r320156)
> @@ -199,19 +199,19 @@ dump_record(dmu_replay_record_t *drr, void =
*payload, i
> {
> 	ASSERT3U(offsetof(dmu_replay_record_t, =
drr_u.drr_checksum.drr_checksum),
> 	    =3D=3D, sizeof (dmu_replay_record_t) - sizeof =
(zio_cksum_t));
> -	fletcher_4_incremental_native(drr,
> +	(void) fletcher_4_incremental_native(drr,
> 	    offsetof(dmu_replay_record_t, =
drr_u.drr_checksum.drr_checksum), zc);
> 	if (drr->drr_type !=3D DRR_BEGIN) {
> 		ASSERT(ZIO_CHECKSUM_IS_ZERO(&drr->drr_u.
> 		    drr_checksum.drr_checksum));
> 		drr->drr_u.drr_checksum.drr_checksum =3D *zc;
> 	}
> -	=
fletcher_4_incremental_native(&drr->drr_u.drr_checksum.drr_checksum,
> -	    sizeof (zio_cksum_t), zc);
> +	(void) fletcher_4_incremental_native(
> +	    &drr->drr_u.drr_checksum.drr_checksum, sizeof (zio_cksum_t), =
zc);
> 	if (write(outfd, drr, sizeof (*drr)) =3D=3D -1)
> 		return (errno);
> 	if (payload_len !=3D 0) {
> -		fletcher_4_incremental_native(payload, payload_len, zc);
> +		(void) fletcher_4_incremental_native(payload, =
payload_len, zc);
> 		if (write(outfd, payload, payload_len) =3D=3D -1)
> 			return (errno);
> 	}
> @@ -2096,9 +2096,9 @@ recv_read(libzfs_handle_t *hdl, int fd, void =
*buf, int
>=20
> 	if (zc) {
> 		if (byteswap)
> -			fletcher_4_incremental_byteswap(buf, ilen, zc);
> +			(void) fletcher_4_incremental_byteswap(buf, =
ilen, zc);
> 		else
> -			fletcher_4_incremental_native(buf, ilen, zc);
> +			(void) fletcher_4_incremental_native(buf, ilen, =
zc);
> 	}
> 	return (0);
> }
> @@ -3688,7 +3688,8 @@ zfs_receive_impl(libzfs_handle_t *hdl, const =
char *tos
> 		 * recv_read() above; do it again correctly.
> 		 */
> 		bzero(&zcksum, sizeof (zio_cksum_t));
> -		fletcher_4_incremental_byteswap(&drr, sizeof (drr), =
&zcksum);
> +		(void) fletcher_4_incremental_byteswap(&drr,
> +		    sizeof (drr), &zcksum);
> 		flags->byteswap =3D B_TRUE;
>=20
> 		drr.drr_type =3D BSWAP_32(drr.drr_type);
>=20
> Modified: head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> --- head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c	=
Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c	=
Tue Jun 20 17:39:24 2017	(r320156)
> @@ -24,6 +24,7 @@
>  */
> /*
>  * Copyright 2013 Saso Kiselkov. All rights reserved.
> + * Copyright (c) 2016 by Delphix. All rights reserved.
>  */
>=20
> /*
> @@ -133,17 +134,29 @@
> #include <sys/byteorder.h>
> #include <sys/zio.h>
> #include <sys/spa.h>
> +#include <zfs_fletcher.h>
>=20
> -/*ARGSUSED*/
> void
> -fletcher_2_native(const void *buf, uint64_t size,
> -    const void *ctx_template, zio_cksum_t *zcp)
> +fletcher_init(zio_cksum_t *zcp)
> {
> +	ZIO_SET_CHECKSUM(zcp, 0, 0, 0, 0);
> +}
> +
> +int
> +fletcher_2_incremental_native(void *buf, size_t size, void *data)
> +{
> +	zio_cksum_t *zcp =3D data;
> +
> 	const uint64_t *ip =3D buf;
> 	const uint64_t *ipend =3D ip + (size / sizeof (uint64_t));
> 	uint64_t a0, b0, a1, b1;
>=20
> -	for (a0 =3D b0 =3D a1 =3D b1 =3D 0; ip < ipend; ip +=3D 2) {
> +	a0 =3D zcp->zc_word[0];
> +	a1 =3D zcp->zc_word[1];
> +	b0 =3D zcp->zc_word[2];
> +	b1 =3D zcp->zc_word[3];
> +
> +	for (; ip < ipend; ip +=3D 2) {
> 		a0 +=3D ip[0];
> 		a1 +=3D ip[1];
> 		b0 +=3D a0;
> @@ -151,18 +164,33 @@ fletcher_2_native(const void *buf, uint64_t =
size,
> 	}
>=20
> 	ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1);
> +	return (0);
> }
>=20
> /*ARGSUSED*/
> void
> -fletcher_2_byteswap(const void *buf, uint64_t size,
> +fletcher_2_native(const void *buf, size_t size,
>     const void *ctx_template, zio_cksum_t *zcp)
> {
> +	fletcher_init(zcp);
> +	(void) fletcher_2_incremental_native((void *) buf, size, zcp);
> +}
> +
> +int
> +fletcher_2_incremental_byteswap(void *buf, size_t size, void *data)
> +{
> +	zio_cksum_t *zcp =3D data;
> +
> 	const uint64_t *ip =3D buf;
> 	const uint64_t *ipend =3D ip + (size / sizeof (uint64_t));
> 	uint64_t a0, b0, a1, b1;
>=20
> -	for (a0 =3D b0 =3D a1 =3D b1 =3D 0; ip < ipend; ip +=3D 2) {
> +	a0 =3D zcp->zc_word[0];
> +	a1 =3D zcp->zc_word[1];
> +	b0 =3D zcp->zc_word[2];
> +	b1 =3D zcp->zc_word[3];
> +
> +	for (; ip < ipend; ip +=3D 2) {
> 		a0 +=3D BSWAP_64(ip[0]);
> 		a1 +=3D BSWAP_64(ip[1]);
> 		b0 +=3D a0;
> @@ -170,50 +198,23 @@ fletcher_2_byteswap(const void *buf, uint64_t =
size,
> 	}
>=20
> 	ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1);
> +	return (0);
> }
>=20
> /*ARGSUSED*/
> void
> -fletcher_4_native(const void *buf, uint64_t size,
> +fletcher_2_byteswap(const void *buf, size_t size,
>     const void *ctx_template, zio_cksum_t *zcp)
> {
> -	const uint32_t *ip =3D buf;
> -	const uint32_t *ipend =3D ip + (size / sizeof (uint32_t));
> -	uint64_t a, b, c, d;
> -
> -	for (a =3D b =3D c =3D d =3D 0; ip < ipend; ip++) {
> -		a +=3D ip[0];
> -		b +=3D a;
> -		c +=3D b;
> -		d +=3D c;
> -	}
> -
> -	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
> +	fletcher_init(zcp);
> +	(void) fletcher_2_incremental_byteswap((void *) buf, size, zcp);
> }
>=20
> -/*ARGSUSED*/
> -void
> -fletcher_4_byteswap(const void *buf, uint64_t size,
> -    const void *ctx_template, zio_cksum_t *zcp)
> +int
> +fletcher_4_incremental_native(void *buf, size_t size, void *data)
> {
> -	const uint32_t *ip =3D buf;
> -	const uint32_t *ipend =3D ip + (size / sizeof (uint32_t));
> -	uint64_t a, b, c, d;
> +	zio_cksum_t *zcp =3D data;
>=20
> -	for (a =3D b =3D c =3D d =3D 0; ip < ipend; ip++) {
> -		a +=3D BSWAP_32(ip[0]);
> -		b +=3D a;
> -		c +=3D b;
> -		d +=3D c;
> -	}
> -
> -	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
> -}
> -
> -void
> -fletcher_4_incremental_native(const void *buf, uint64_t size,
> -    zio_cksum_t *zcp)
> -{
> 	const uint32_t *ip =3D buf;
> 	const uint32_t *ipend =3D ip + (size / sizeof (uint32_t));
> 	uint64_t a, b, c, d;
> @@ -231,12 +232,23 @@ fletcher_4_incremental_native(const void *buf, =
uint64_
> 	}
>=20
> 	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
> +	return (0);
> }
>=20
> +/*ARGSUSED*/
> void
> -fletcher_4_incremental_byteswap(const void *buf, uint64_t size,
> -    zio_cksum_t *zcp)
> +fletcher_4_native(const void *buf, size_t size,
> +    const void *ctx_template, zio_cksum_t *zcp)
> {
> +	fletcher_init(zcp);
> +	(void) fletcher_4_incremental_native((void *) buf, size, zcp);
> +}
> +
> +int
> +fletcher_4_incremental_byteswap(void *buf, size_t size, void *data)
> +{
> +	zio_cksum_t *zcp =3D data;
> +
> 	const uint32_t *ip =3D buf;
> 	const uint32_t *ipend =3D ip + (size / sizeof (uint32_t));
> 	uint64_t a, b, c, d;
> @@ -254,4 +266,14 @@ fletcher_4_incremental_byteswap(const void *buf, =
uint6
> 	}
>=20
> 	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
> +	return (0);
> +}
> +
> +/*ARGSUSED*/
> +void
> +fletcher_4_byteswap(const void *buf, size_t size,
> +    const void *ctx_template, zio_cksum_t *zcp)
> +{
> +	fletcher_init(zcp);
> +	(void) fletcher_4_incremental_byteswap((void *) buf, size, zcp);
> }
>=20
> Modified: head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> --- head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h	=
Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h	=
Tue Jun 20 17:39:24 2017	(r320156)
> @@ -24,6 +24,7 @@
>  */
> /*
>  * Copyright 2013 Saso Kiselkov. All rights reserved.
> + * Copyright (c) 2016 by Delphix. All rights reserved.
>  */
>=20
> #ifndef	_ZFS_FLETCHER_H
> @@ -40,12 +41,15 @@ extern "C" {
>  * fletcher checksum functions
>  */
>=20
> -void fletcher_2_native(const void *, uint64_t, const void *, =
zio_cksum_t *);
> -void fletcher_2_byteswap(const void *, uint64_t, const void *, =
zio_cksum_t *);
> -void fletcher_4_native(const void *, uint64_t, const void *, =
zio_cksum_t *);
> -void fletcher_4_byteswap(const void *, uint64_t, const void *, =
zio_cksum_t *);
> -void fletcher_4_incremental_native(const void *, uint64_t, =
zio_cksum_t *);
> -void fletcher_4_incremental_byteswap(const void *, uint64_t, =
zio_cksum_t *);
> +void fletcher_init(zio_cksum_t *);
> +void fletcher_2_native(const void *, size_t, const void *, =
zio_cksum_t *);
> +void fletcher_2_byteswap(const void *, size_t, const void *, =
zio_cksum_t *);
> +int fletcher_2_incremental_native(void *, size_t, void *);
> +int fletcher_2_incremental_byteswap(void *, size_t, void *);
> +void fletcher_4_native(const void *, size_t, const void *, =
zio_cksum_t *);
> +void fletcher_4_byteswap(const void *, size_t, const void *, =
zio_cksum_t *);
> +int fletcher_4_incremental_native(void *, size_t, void *);
> +int fletcher_4_incremental_byteswap(void *, size_t, void *);
>=20
> #ifdef	__cplusplus
> }
>=20
> Modified: head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> --- head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files	=
Tue Jun 20 17:38:25 2017	(r320155)
> +++ head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files	=
Tue Jun 20 17:39:24 2017	(r320156)
> @@ -33,6 +33,7 @@
> # common to all SunOS systems.
>=20
> ZFS_COMMON_OBJS +=3D		\
> +	abd.o			\
> 	arc.o			\
> 	bplist.o		\
> 	blkptr.o		\
>=20
> Copied and modified: =
head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c (from r318946, =
vendor-sys/illumos/dist/uts/common/fs/zfs/abd.c)
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> --- vendor-sys/illumos/dist/uts/common/fs/zfs/abd.c	Fri May 26 =
12:13:27 2017	(r318946, copy source)
> +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c	Tue Jun =
20 17:39:24 2017	(r320156)
> @@ -174,6 +174,7 @@ abd_free_chunk(void *c)
> void
> abd_init(void)
> {
> +#ifdef illumos
> 	vmem_t *data_alloc_arena =3D NULL;
>=20
> #ifdef _KERNEL
> @@ -186,7 +187,10 @@ abd_init(void)
> 	 */
> 	abd_chunk_cache =3D kmem_cache_create("abd_chunk", =
zfs_abd_chunk_size, 0,
> 	    NULL, NULL, NULL, NULL, data_alloc_arena, KMC_NOTOUCH);
> -
> +#else
> +	abd_chunk_cache =3D kmem_cache_create("abd_chunk", =
zfs_abd_chunk_size, 0,
> +	    NULL, NULL, NULL, NULL, 0, KMC_NOTOUCH | KMC_NODEBUG);
> +#endif
> 	abd_ksp =3D kstat_create("zfs", 0, "abdstats", "misc", =
KSTAT_TYPE_NAMED,
> 	    sizeof (abd_stats) / sizeof (kstat_named_t), =
KSTAT_FLAG_VIRTUAL);
> 	if (abd_ksp !=3D NULL) {
>=20
> Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Tue Jun =
20 17:38:25 2017	(r320155)
> +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Tue Jun =
20 17:39:24 2017	(r320156)
> @@ -128,14 +128,14 @@
>  * the arc_buf_hdr_t that will point to the data block in memory. A =
block can
>  * only be read by a consumer if it has an l1arc_buf_hdr_t. The L1ARC
>  * caches data in two ways -- in a list of ARC buffers (arc_buf_t) and
> - * also in the arc_buf_hdr_t's private physical data block pointer =
(b_pdata).
> + * also in the arc_buf_hdr_t's private physical data block pointer =
(b_pabd).
>  *
>  * The L1ARC's data pointer may or may not be uncompressed. The ARC =
has the
> - * ability to store the physical data (b_pdata) associated with the =
DVA of the
> - * arc_buf_hdr_t. Since the b_pdata is a copy of the on-disk physical =
block,
> + * ability to store the physical data (b_pabd) associated with the =
DVA of the
> + * arc_buf_hdr_t. Since the b_pabd is a copy of the on-disk physical =
block,
>  * it will match its on-disk compression characteristics. This =
behavior can be
>  * disabled by setting 'zfs_compressed_arc_enabled' to B_FALSE. When =
the
> - * compressed ARC functionality is disabled, the b_pdata will point =
to an
> + * compressed ARC functionality is disabled, the b_pabd will point to =
an
>  * uncompressed version of the on-disk data.
>  *
>  * Data in the L1ARC is not accessed by consumers of the ARC directly. =
Each
> @@ -174,7 +174,7 @@
>  *   | l1arc_buf_hdr_t
>  *   |           |              arc_buf_t
>  *   | b_buf     +------------>+-----------+      arc_buf_t
> - *   | b_pdata   +-+           |b_next     +---->+-----------+
> + *   | b_pabd    +-+           |b_next     +---->+-----------+
>  *   +-----------+ |           |-----------|     |b_next     +-->NULL
>  *                 |           |b_comp =3D T |     +-----------+
>  *                 |           |b_data     +-+   |b_comp =3D F |
> @@ -191,8 +191,8 @@
>  * When a consumer reads a block, the ARC must first look to see if =
the
>  * arc_buf_hdr_t is cached. If the hdr is cached then the ARC =
allocates a new
>  * arc_buf_t and either copies uncompressed data into a new data =
buffer from an
> - * existing uncompressed arc_buf_t, decompresses the hdr's b_pdata =
buffer into a
> - * new data buffer, or shares the hdr's b_pdata buffer, depending on =
whether the
> + * existing uncompressed arc_buf_t, decompresses the hdr's b_pabd =
buffer into a
> + * new data buffer, or shares the hdr's b_pabd buffer, depending on =
whether the
>  * hdr is compressed and the desired compression characteristics of =
the
>  * arc_buf_t consumer. If the arc_buf_t ends up sharing data with the
>  * arc_buf_hdr_t and both of them are uncompressed then the arc_buf_t =
must be
> @@ -216,7 +216,7 @@
>  *                |           |                 arc_buf_t    (shared)
>  *                |    b_buf  +------------>+---------+      arc_buf_t
>  *                |           |             |b_next   =
+---->+---------+
> - *                |  b_pdata  +-+           |---------|     |b_next   =
+-->NULL
> + *                |  b_pabd   +-+           |---------|     |b_next   =
+-->NULL
>  *                +-----------+ |           |         |     =
+---------+
>  *                              |           |b_data   +-+   |         =
|
>  *                              |           +---------+ |   |b_data   =
+-+
> @@ -230,19 +230,19 @@
>  *                                    |                    +------+    =
 |
>  *                                    =
+---------------------------------+
>  *
> - * Writing to the ARC requires that the ARC first discard the hdr's =
b_pdata
> + * Writing to the ARC requires that the ARC first discard the hdr's =
b_pabd
>  * since the physical block is about to be rewritten. The new data =
contents
>  * will be contained in the arc_buf_t. As the I/O pipeline performs =
the write,
>  * it may compress the data before writing it to disk. The ARC will be =
called
>  * with the transformed data and will bcopy the transformed on-disk =
block into
> - * a newly allocated b_pdata. Writes are always done into buffers =
which have
> + * a newly allocated b_pabd. Writes are always done into buffers =
which have
>  * either been loaned (and hence are new and don't have other readers) =
or
>  * buffers which have been released (and hence have their own hdr, if =
there
>  * were originally other readers of the buf's original hdr). This =
ensures that
>  * the ARC only needs to update a single buf and its hdr after a write =
occurs.
>  *
> - * When the L2ARC is in use, it will also take advantage of the =
b_pdata. The
> - * L2ARC will always write the contents of b_pdata to the L2ARC. This =
means
> + * When the L2ARC is in use, it will also take advantage of the =
b_pabd. The
> + * L2ARC will always write the contents of b_pabd to the L2ARC. This =
means
>  * that when compressed ARC is enabled that the L2ARC blocks are =
identical
>  * to the on-disk block in the main data pool. This provides a =
significant
>  * advantage since the ARC can leverage the bp's checksum when reading =
from the
> @@ -263,7 +263,9 @@
> #include <sys/vdev.h>
> #include <sys/vdev_impl.h>
> #include <sys/dsl_pool.h>
> +#include <sys/zio_checksum.h>
> #include <sys/multilist.h>
> +#include <sys/abd.h>
> #ifdef _KERNEL
> #include <sys/dnlc.h>
> #include <sys/racct.h>
> @@ -307,7 +309,7 @@ int zfs_arc_evict_batch_limit =3D 10;
> /* number of seconds before growing cache again */
> static int		arc_grow_retry =3D 60;
>=20
> -/* shift of arc_c for calculating overflow limit in arc_get_data_buf =
*/
> +/* shift of arc_c for calculating overflow limit in arc_get_data_impl =
*/
> int		zfs_arc_overflow_shift =3D 8;
>=20
> /* shift of arc_c for calculating both min and max arc_p */
> @@ -543,13 +545,13 @@ typedef struct arc_stats {
> 	kstat_named_t arcstat_c_max;
> 	kstat_named_t arcstat_size;
> 	/*
> -	 * Number of compressed bytes stored in the arc_buf_hdr_t's =
b_pdata.
> +	 * Number of compressed bytes stored in the arc_buf_hdr_t's =
b_pabd.
> 	 * Note that the compressed bytes may match the uncompressed =
bytes
> 	 * if the block is either not compressed or compressed arc is =
disabled.
> 	 */
> 	kstat_named_t arcstat_compressed_size;
> 	/*
> -	 * Uncompressed size of the data stored in b_pdata. If =
compressed
> +	 * Uncompressed size of the data stored in b_pabd. If compressed
> 	 * arc is disabled then this value will be identical to the stat
> 	 * above.
> 	 */
> @@ -988,7 +990,7 @@ typedef struct l1arc_buf_hdr {
> 	refcount_t		b_refcnt;
>=20
> 	arc_callback_t		*b_acb;
> -	void			*b_pdata;
> +	abd_t			*b_pabd;
> } l1arc_buf_hdr_t;
>=20
> typedef struct l2arc_dev l2arc_dev_t;
> @@ -1341,7 +1343,7 @@ typedef struct l2arc_read_callback {
> 	blkptr_t		l2rcb_bp;		/* original =
blkptr */
> 	zbookmark_phys_t	l2rcb_zb;		/* original =
bookmark */
> 	int			l2rcb_flags;		/* original =
flags */
> -	void			*l2rcb_data;		/* temporary =
buffer */
> +	void			*l2rcb_abd;		/* temporary =
buffer */
> } l2arc_read_callback_t;
>=20
> typedef struct l2arc_write_callback {
> @@ -1351,7 +1353,7 @@ typedef struct l2arc_write_callback {
>=20
> typedef struct l2arc_data_free {
> 	/* protected by l2arc_free_on_write_mtx */
> -	void		*l2df_data;
> +	abd_t		*l2df_abd;
> 	size_t		l2df_size;
> 	arc_buf_contents_t l2df_type;
> 	list_node_t	l2df_list_node;
> @@ -1361,10 +1363,14 @@ static kmutex_t l2arc_feed_thr_lock;
> static kcondvar_t l2arc_feed_thr_cv;
> static uint8_t l2arc_thread_exit;
>=20
> +static abd_t *arc_get_data_abd(arc_buf_hdr_t *, uint64_t, void *);
> static void *arc_get_data_buf(arc_buf_hdr_t *, uint64_t, void *);
> +static void arc_get_data_impl(arc_buf_hdr_t *, uint64_t, void *);
> +static void arc_free_data_abd(arc_buf_hdr_t *, abd_t *, uint64_t, =
void *);
> static void arc_free_data_buf(arc_buf_hdr_t *, void *, uint64_t, void =
*);
> -static void arc_hdr_free_pdata(arc_buf_hdr_t *hdr);
> -static void arc_hdr_alloc_pdata(arc_buf_hdr_t *);
> +static void arc_free_data_impl(arc_buf_hdr_t *hdr, uint64_t size, =
void *tag);
> +static void arc_hdr_free_pabd(arc_buf_hdr_t *);
> +static void arc_hdr_alloc_pabd(arc_buf_hdr_t *);
> static void arc_access(arc_buf_hdr_t *, kmutex_t *);
> static boolean_t arc_is_overflowing();
> static void arc_buf_watch(arc_buf_t *);
> @@ -1718,7 +1724,9 @@ static inline boolean_t
> arc_buf_is_shared(arc_buf_t *buf)
> {
> 	boolean_t shared =3D (buf->b_data !=3D NULL &&
> -	    buf->b_data =3D=3D buf->b_hdr->b_l1hdr.b_pdata);
> +	    buf->b_hdr->b_l1hdr.b_pabd !=3D NULL &&
> +	    abd_is_linear(buf->b_hdr->b_l1hdr.b_pabd) &&
> +	    buf->b_data =3D=3D abd_to_buf(buf->b_hdr->b_l1hdr.b_pabd));
> 	IMPLY(shared, HDR_SHARED_DATA(buf->b_hdr));
> 	IMPLY(shared, ARC_BUF_SHARED(buf));
> 	IMPLY(shared, ARC_BUF_COMPRESSED(buf) || ARC_BUF_LAST(buf));
> @@ -1822,7 +1830,8 @@ arc_cksum_is_equal(arc_buf_hdr_t *hdr, zio_t =
*zio)
> 		uint64_t csize;
>=20
> 		void *cbuf =3D zio_buf_alloc(HDR_GET_PSIZE(hdr));
> -		csize =3D zio_compress_data(compress, zio->io_data, =
cbuf, lsize);
> +		csize =3D zio_compress_data(compress, zio->io_abd, cbuf, =
lsize);
> +
> 		ASSERT3U(csize, <=3D, HDR_GET_PSIZE(hdr));
> 		if (csize < HDR_GET_PSIZE(hdr)) {
> 			/*
> @@ -1857,7 +1866,7 @@ arc_cksum_is_equal(arc_buf_hdr_t *hdr, zio_t =
*zio)
> 	 * logical I/O size and not just a gang fragment.
> 	 */
> 	valid_cksum =3D (zio_checksum_error_impl(zio->io_spa, =
zio->io_bp,
> -	    BP_GET_CHECKSUM(zio->io_bp), zio->io_data, zio->io_size,
> +	    BP_GET_CHECKSUM(zio->io_bp), zio->io_abd, zio->io_size,
> 	    zio->io_offset, NULL) =3D=3D 0);
> 	zio_pop_transforms(zio);
> 	return (valid_cksum);
> @@ -2161,7 +2170,7 @@ arc_buf_fill(arc_buf_t *buf, boolean_t =
compressed)
>=20
> 	if (hdr_compressed =3D=3D compressed) {
> 		if (!arc_buf_is_shared(buf)) {
> -			bcopy(hdr->b_l1hdr.b_pdata, buf->b_data,
> +			abd_copy_to_buf(buf->b_data, =
hdr->b_l1hdr.b_pabd,
> 			    arc_buf_size(buf));
> 		}
> 	} else {
> @@ -2213,7 +2222,7 @@ arc_buf_fill(arc_buf_t *buf, boolean_t =
compressed)
> 			return (0);
> 		} else {
> 			int error =3D =
zio_decompress_data(HDR_GET_COMPRESS(hdr),
> -			    hdr->b_l1hdr.b_pdata, buf->b_data,
> +			    hdr->b_l1hdr.b_pabd, buf->b_data,
> 			    HDR_GET_PSIZE(hdr), HDR_GET_LSIZE(hdr));
>=20
> 			/*
> @@ -2250,7 +2259,7 @@ arc_decompress(arc_buf_t *buf)
> }
>=20
> /*
> - * Return the size of the block, b_pdata, that is stored in the =
arc_buf_hdr_t.
> + * Return the size of the block, b_pabd, that is stored in the =
arc_buf_hdr_t.
>  */
> static uint64_t
> arc_hdr_size(arc_buf_hdr_t *hdr)
> @@ -2282,14 +2291,14 @@ arc_evictable_space_increment(arc_buf_hdr_t =
*hdr, arc_
> 	if (GHOST_STATE(state)) {
> 		ASSERT0(hdr->b_l1hdr.b_bufcnt);
> 		ASSERT3P(hdr->b_l1hdr.b_buf, =3D=3D, NULL);
> -		ASSERT3P(hdr->b_l1hdr.b_pdata, =3D=3D, NULL);
> +		ASSERT3P(hdr->b_l1hdr.b_pabd, =3D=3D, NULL);
> 		(void) refcount_add_many(&state->arcs_esize[type],
> 		    HDR_GET_LSIZE(hdr), hdr);
> 		return;
> 	}
>=20
> 	ASSERT(!GHOST_STATE(state));
> -	if (hdr->b_l1hdr.b_pdata !=3D NULL) {
> +	if (hdr->b_l1hdr.b_pabd !=3D NULL) {
> 		(void) refcount_add_many(&state->arcs_esize[type],
> 		    arc_hdr_size(hdr), hdr);
> 	}
> @@ -2317,14 +2326,14 @@ arc_evictable_space_decrement(arc_buf_hdr_t =
*hdr, arc_
> 	if (GHOST_STATE(state)) {
> 		ASSERT0(hdr->b_l1hdr.b_bufcnt);
> 		ASSERT3P(hdr->b_l1hdr.b_buf, =3D=3D, NULL);
> -		ASSERT3P(hdr->b_l1hdr.b_pdata, =3D=3D, NULL);
> +		ASSERT3P(hdr->b_l1hdr.b_pabd, =3D=3D, NULL);
> 		(void) refcount_remove_many(&state->arcs_esize[type],
> 		    HDR_GET_LSIZE(hdr), hdr);
> 		return;
> 	}
>=20
> 	ASSERT(!GHOST_STATE(state));
> -	if (hdr->b_l1hdr.b_pdata !=3D NULL) {
> +	if (hdr->b_l1hdr.b_pabd !=3D NULL) {
> 		(void) refcount_remove_many(&state->arcs_esize[type],
> 		    arc_hdr_size(hdr), hdr);
> 	}
> @@ -2421,7 +2430,7 @@ arc_change_state(arc_state_t *new_state, =
arc_buf_hdr_t
> 		old_state =3D hdr->b_l1hdr.b_state;
> 		refcnt =3D refcount_count(&hdr->b_l1hdr.b_refcnt);
> 		bufcnt =3D hdr->b_l1hdr.b_bufcnt;
> -		update_old =3D (bufcnt > 0 || hdr->b_l1hdr.b_pdata !=3D =
NULL);
> +		update_old =3D (bufcnt > 0 || hdr->b_l1hdr.b_pabd !=3D =
NULL);
> 	} else {
> 		old_state =3D arc_l2c_only;
> 		refcnt =3D 0;
> @@ -2491,7 +2500,7 @@ arc_change_state(arc_state_t *new_state, =
arc_buf_hdr_t
> 			 */
> 			(void) refcount_add_many(&new_state->arcs_size,
> 			    HDR_GET_LSIZE(hdr), hdr);
> -			ASSERT3P(hdr->b_l1hdr.b_pdata, =3D=3D, NULL);
> +			ASSERT3P(hdr->b_l1hdr.b_pabd, =3D=3D, NULL);
> 		} else {
> 			uint32_t buffers =3D 0;
>=20
> @@ -2520,7 +2529,7 @@ arc_change_state(arc_state_t *new_state, =
arc_buf_hdr_t
> 			}
> 			ASSERT3U(bufcnt, =3D=3D, buffers);
>=20
> -			if (hdr->b_l1hdr.b_pdata !=3D NULL) {
> +			if (hdr->b_l1hdr.b_pabd !=3D NULL) {
> 				(void) =
refcount_add_many(&new_state->arcs_size,
> 				    arc_hdr_size(hdr), hdr);
> 			} else {
> @@ -2533,7 +2542,7 @@ arc_change_state(arc_state_t *new_state, =
arc_buf_hdr_t
> 		ASSERT(HDR_HAS_L1HDR(hdr));
> 		if (GHOST_STATE(old_state)) {
> 			ASSERT0(bufcnt);
> -			ASSERT3P(hdr->b_l1hdr.b_pdata, =3D=3D, NULL);
> +			ASSERT3P(hdr->b_l1hdr.b_pabd, =3D=3D, NULL);
>=20
> 			/*
> 			 * When moving a header off of a ghost state,
> @@ -2573,7 +2582,7 @@ arc_change_state(arc_state_t *new_state, =
arc_buf_hdr_t
> 				    buf);
> 			}
> 			ASSERT3U(bufcnt, =3D=3D, buffers);
> -			ASSERT3P(hdr->b_l1hdr.b_pdata, !=3D, NULL);
> +			ASSERT3P(hdr->b_l1hdr.b_pabd, !=3D, NULL);
> 			(void) refcount_remove_many(
> 			    &old_state->arcs_size, arc_hdr_size(hdr), =
hdr);
> 		}
> @@ -2655,7 +2664,7 @@ arc_space_return(uint64_t space, =
arc_space_type_t type
>=20
> /*
>  * Given a hdr and a buf, returns whether that buf can share its =
b_data buffer
> - * with the hdr's b_pdata.
> + * with the hdr's b_pabd.
>  */
> static boolean_t
> arc_can_share(arc_buf_hdr_t *hdr, arc_buf_t *buf)
> @@ -2732,20 +2741,23 @@ arc_buf_alloc_impl(arc_buf_hdr_t *hdr, void =
*tag, bool
> 	/*
> 	 * If the hdr's data can be shared then we share the data buffer =
and
> 	 * set the appropriate bit in the hdr's b_flags to indicate the =
hdr is
> -	 * sharing it's b_pdata with the arc_buf_t. Otherwise, we =
allocate a new
> +	 * sharing it's b_pabd with the arc_buf_t. Otherwise, we =
allocate a new
> 	 * buffer to store the buf's data.
> 	 *
> -	 * There is one additional restriction here because we're =
sharing
> -	 * hdr -> buf instead of the usual buf -> hdr: the hdr can't be =
actively
> -	 * involved in an L2ARC write, because if this buf is used by an
> -	 * arc_write() then the hdr's data buffer will be released when =
the
> +	 * There are two additional restrictions here because we're =
sharing
> +	 * hdr -> buf instead of the usual buf -> hdr. First, the hdr =
can't be
> +	 * actively involved in an L2ARC write, because if this buf is =
used by
> +	 * an arc_write() then the hdr's data buffer will be released =
when the
> 	 * write completes, even though the L2ARC write might still be =
using it.
> +	 * Second, the hdr's ABD must be linear so that the buf's user =
doesn't
> +	 * need to be ABD-aware.
> 	 */
> -	boolean_t can_share =3D arc_can_share(hdr, buf) && =
!HDR_L2_WRITING(hdr);
> +	boolean_t can_share =3D arc_can_share(hdr, buf) && =
!HDR_L2_WRITING(hdr) &&
> +	    abd_is_linear(hdr->b_l1hdr.b_pabd);
>=20
> 	/* Set up b_data and sharing */
> 	if (can_share) {
> -		buf->b_data =3D hdr->b_l1hdr.b_pdata;
> +		buf->b_data =3D abd_to_buf(hdr->b_l1hdr.b_pabd);
> 		buf->b_flags |=3D ARC_BUF_FLAG_SHARED;
> 		arc_hdr_set_flags(hdr, ARC_FLAG_SHARED_DATA);
> 	} else {
> @@ -2841,11 +2853,11 @@ arc_loan_inuse_buf(arc_buf_t *buf, void *tag)
> }
>=20
> static void
> -l2arc_free_data_on_write(void *data, size_t size, arc_buf_contents_t =
type)
> +l2arc_free_abd_on_write(abd_t *abd, size_t size, arc_buf_contents_t =
type)
> {
> 	l2arc_data_free_t *df =3D kmem_alloc(sizeof (*df), KM_SLEEP);
>=20
> -	df->l2df_data =3D data;
> +	df->l2df_abd =3D abd;
> 	df->l2df_size =3D size;
> 	df->l2df_type =3D type;
> 	mutex_enter(&l2arc_free_on_write_mtx);
> @@ -2876,7 +2888,7 @@ arc_hdr_free_on_write(arc_buf_hdr_t *hdr)
> 		arc_space_return(size, ARC_SPACE_DATA);
> 	}
>=20
> -	l2arc_free_data_on_write(hdr->b_l1hdr.b_pdata, size, type);
> +	l2arc_free_abd_on_write(hdr->b_l1hdr.b_pabd, size, type);
> }
>=20
> /*
> @@ -2890,7 +2902,7 @@ arc_share_buf(arc_buf_hdr_t *hdr, arc_buf_t =
*buf)
> 	arc_state_t *state =3D hdr->b_l1hdr.b_state;
>=20
> 	ASSERT(arc_can_share(hdr, buf));
> -	ASSERT3P(hdr->b_l1hdr.b_pdata, =3D=3D, NULL);
> +	ASSERT3P(hdr->b_l1hdr.b_pabd, =3D=3D, NULL);
> 	ASSERT(MUTEX_HELD(HDR_LOCK(hdr)) || HDR_EMPTY(hdr));
>=20
> 	/*
> @@ -2899,7 +2911,9 @@ arc_share_buf(arc_buf_hdr_t *hdr, arc_buf_t =
*buf)
> 	 * the refcount whenever an arc_buf_t is shared.
> 	 */
> 	refcount_transfer_ownership(&state->arcs_size, buf, hdr);
> -	hdr->b_l1hdr.b_pdata =3D buf->b_data;
> +	hdr->b_l1hdr.b_pabd =3D abd_get_from_buf(buf->b_data, =
arc_buf_size(buf));
> +	abd_take_ownership_of_buf(hdr->b_l1hdr.b_pabd,
> +	    HDR_ISTYPE_METADATA(hdr));
> 	arc_hdr_set_flags(hdr, ARC_FLAG_SHARED_DATA);
> 	buf->b_flags |=3D ARC_BUF_FLAG_SHARED;
>=20
> @@ -2919,7 +2933,7 @@ arc_unshare_buf(arc_buf_hdr_t *hdr, arc_buf_t =
*buf)
> 	arc_state_t *state =3D hdr->b_l1hdr.b_state;
>=20
> 	ASSERT(arc_buf_is_shared(buf));
> -	ASSERT3P(hdr->b_l1hdr.b_pdata, !=3D, NULL);
> +	ASSERT3P(hdr->b_l1hdr.b_pabd, !=3D, NULL);
> 	ASSERT(MUTEX_HELD(HDR_LOCK(hdr)) || HDR_EMPTY(hdr));
>=20
> 	/*
> @@ -2928,7 +2942,9 @@ arc_unshare_buf(arc_buf_hdr_t *hdr, arc_buf_t =
*buf)
> 	 */
> 	refcount_transfer_ownership(&state->arcs_size, hdr, buf);
> 	arc_hdr_clear_flags(hdr, ARC_FLAG_SHARED_DATA);
> -	hdr->b_l1hdr.b_pdata =3D NULL;
> +	abd_release_ownership_of_buf(hdr->b_l1hdr.b_pabd);
> +	abd_put(hdr->b_l1hdr.b_pabd);
> +	hdr->b_l1hdr.b_pabd =3D NULL;
> 	buf->b_flags &=3D ~ARC_BUF_FLAG_SHARED;
>=20
> 	/*
> @@ -3025,7 +3041,7 @@ arc_buf_destroy_impl(arc_buf_t *buf)
> 	if (ARC_BUF_SHARED(buf) && !ARC_BUF_COMPRESSED(buf)) {
> 		/*
> 		 * If the current arc_buf_t is sharing its data buffer =
with the
> -		 * hdr, then reassign the hdr's b_pdata to share it with =
the new
> +		 * hdr, then reassign the hdr's b_pabd to share it with =
the new
> 		 * buffer at the end of the list. The shared buffer is =
always
> 		 * the last one on the hdr's buffer list.
> 		 *
> @@ -3040,8 +3056,8 @@ arc_buf_destroy_impl(arc_buf_t *buf)
> 			/* hdr is uncompressed so can't have compressed =
buf */
> 			VERIFY(!ARC_BUF_COMPRESSED(lastbuf));
>=20
> -			ASSERT3P(hdr->b_l1hdr.b_pdata, !=3D, NULL);
> -			arc_hdr_free_pdata(hdr);
> +			ASSERT3P(hdr->b_l1hdr.b_pabd, !=3D, NULL);
> +			arc_hdr_free_pabd(hdr);
>=20
> 			/*
> 			 * We must setup a new shared block between the
> @@ -3079,26 +3095,26 @@ arc_buf_destroy_impl(arc_buf_t *buf)
> }
>=20
> *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?81F84BCA-E973-4D78-B81C-1D398ADFA47E>