From owner-svn-src-vendor@freebsd.org Mon Apr 11 21:07:19 2016 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B5785AECFC7; Mon, 11 Apr 2016 21:07:19 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9000F1E9B; Mon, 11 Apr 2016 21:07:19 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id u3BL7IcT000587; Mon, 11 Apr 2016 21:07:18 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id u3BL7IFG000582; Mon, 11 Apr 2016 21:07:18 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201604112107.u3BL7IFG000582@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Mon, 11 Apr 2016 21:07:18 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r297831 - in vendor-sys/illumos/dist/uts/common/fs/zfs: . sys X-SVN-Group: vendor-sys MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Apr 2016 21:07:19 -0000 Author: mav Date: Mon Apr 11 21:07:18 2016 New Revision: 297831 URL: https://svnweb.freebsd.org/changeset/base/297831 Log: 6322 ZFS indirect block predictive prefetch Reviewed by: Matthew Ahrens Reviewed by: Paul Dagnelie Author: Alexander Motin Improve speculative prefetch of indirect blocks. Scalability of many operations on wide ZFS pool can be limited by requirement to prefetch indirect blocks first. Recently added asynchronous indirect block read partially helped, but did not solve the problem completely. This patch extends existing prefetcher functionality to explicitly work with indirect blocks. Before this change prefetcher issued reads for up to 8MB of data in advance. With this change it also issues indirect block reads for up to 64MB of data in advance, so that when it will be time to actually read those data, it can be done immediately. Alike effect can be achieved by just increasing maximal data prefetch distance, but at higher memory cost. Also this change introduces indirect block prefetch for rewrite operations, that was never done before. Previously ARC miss for Indirect blocks regularly blocked rewrites, converting perfectly aligned asynchronous operations into synchronous read-write pairs, significantly reducing maximal rewrite speed. While being there this issue was also fixed: - prefetch was done always, even if caching for the dataset was completely disabled. Testing on FreeBSD with zvol on top of 6x striped 2x mirrored pool of 12 assorted HDDs shown me such performance numbers: ------- BEFORE -------- Write 491363677 bytes/sec Read 312430631 bytes/sec Rewrite 97680464 bytes/sec -------- AFTER -------- Write 493524146 bytes/sec Read 438598079 bytes/sec Rewrite 277506044 bytes/sec Closes #65 Closes #80 openzfs/openzfs@792fd28ac04f78cc5e43ead2d72a96f244ea84e8 Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c vendor-sys/illumos/dist/uts/common/fs/zfs/dmu.c vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_zfetch.c vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_zfetch.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dnode.h Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c Mon Apr 11 18:10:20 2016 (r297830) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c Mon Apr 11 21:07:18 2016 (r297831) @@ -721,7 +721,7 @@ dbuf_read(dmu_buf_impl_t *db, zio_t *zio if (db->db_state == DB_CACHED) { mutex_exit(&db->db_mtx); if (prefetch) - dmu_zfetch(&dn->dn_zfetch, db->db_blkid, 1); + dmu_zfetch(&dn->dn_zfetch, db->db_blkid, 1, B_TRUE); if ((flags & DB_RF_HAVESTRUCT) == 0) rw_exit(&dn->dn_struct_rwlock); DB_DNODE_EXIT(db); @@ -735,7 +735,7 @@ dbuf_read(dmu_buf_impl_t *db, zio_t *zio /* dbuf_read_impl has dropped db_mtx for us */ if (prefetch) - dmu_zfetch(&dn->dn_zfetch, db->db_blkid, 1); + dmu_zfetch(&dn->dn_zfetch, db->db_blkid, 1, B_TRUE); if ((flags & DB_RF_HAVESTRUCT) == 0) rw_exit(&dn->dn_struct_rwlock); @@ -754,7 +754,7 @@ dbuf_read(dmu_buf_impl_t *db, zio_t *zio */ mutex_exit(&db->db_mtx); if (prefetch) - dmu_zfetch(&dn->dn_zfetch, db->db_blkid, 1); + dmu_zfetch(&dn->dn_zfetch, db->db_blkid, 1, B_TRUE); if ((flags & DB_RF_HAVESTRUCT) == 0) rw_exit(&dn->dn_struct_rwlock); DB_DNODE_EXIT(db); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dmu.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dmu.c Mon Apr 11 18:10:20 2016 (r297830) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dmu.c Mon Apr 11 21:07:18 2016 (r297831) @@ -441,9 +441,10 @@ dmu_buf_hold_array_by_dnode(dnode_t *dn, dbp[i] = &db->db; } - if ((flags & DMU_READ_NO_PREFETCH) == 0 && read && - length <= zfetch_array_rd_sz) { - dmu_zfetch(&dn->dn_zfetch, blkid, nblks); + if ((flags & DMU_READ_NO_PREFETCH) == 0 && + DNODE_META_IS_CACHEABLE(dn) && length <= zfetch_array_rd_sz) { + dmu_zfetch(&dn->dn_zfetch, blkid, nblks, + read && DNODE_IS_CACHEABLE(dn)); } rw_exit(&dn->dn_struct_rwlock); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_zfetch.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_zfetch.c Mon Apr 11 18:10:20 2016 (r297830) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_zfetch.c Mon Apr 11 21:07:18 2016 (r297831) @@ -49,6 +49,8 @@ uint32_t zfetch_max_streams = 8; uint32_t zfetch_min_sec_reap = 2; /* max bytes to prefetch per stream (default 8MB) */ uint32_t zfetch_max_distance = 8 * 1024 * 1024; +/* max bytes to prefetch indirects for per stream (default 64MB) */ +uint32_t zfetch_max_idistance = 64 * 1024 * 1024; /* max number of bytes in an array_read in which we allow prefetching (1MB) */ uint64_t zfetch_array_rd_sz = 1024 * 1024; @@ -186,6 +188,7 @@ dmu_zfetch_stream_create(zfetch_t *zf, u zstream_t *zs = kmem_zalloc(sizeof (*zs), KM_SLEEP); zs->zs_blkid = blkid; zs->zs_pf_blkid = blkid; + zs->zs_ipf_blkid = blkid; zs->zs_atime = gethrtime(); mutex_init(&zs->zs_lock, NULL, MUTEX_DEFAULT, NULL); @@ -193,13 +196,21 @@ dmu_zfetch_stream_create(zfetch_t *zf, u } /* - * This is the prefetch entry point. It calls all of the other dmu_zfetch - * routines to create, delete, find, or operate upon prefetch streams. + * This is the predictive prefetch entry point. It associates dnode access + * specified with blkid and nblks arguments with prefetch stream, predicts + * further accesses based on that stats and initiates speculative prefetch. + * fetch_data argument specifies whether actual data blocks should be fetched: + * FALSE -- prefetch only indirect blocks for predicted data blocks; + * TRUE -- prefetch predicted data blocks plus following indirect blocks. */ void -dmu_zfetch(zfetch_t *zf, uint64_t blkid, uint64_t nblks) +dmu_zfetch(zfetch_t *zf, uint64_t blkid, uint64_t nblks, boolean_t fetch_data) { zstream_t *zs; + int64_t pf_start, ipf_start, ipf_istart, ipf_iend; + int64_t pf_ahead_blks, max_blks; + int epbs, max_dist_blks, pf_nblks, ipf_nblks; + uint64_t end_of_access_blkid = blkid + nblks; if (zfs_prefetch_disable) return; @@ -236,7 +247,7 @@ dmu_zfetch(zfetch_t *zf, uint64_t blkid, */ ZFETCHSTAT_BUMP(zfetchstat_misses); if (rw_tryupgrade(&zf->zf_rwlock)) - dmu_zfetch_stream_create(zf, blkid + nblks); + dmu_zfetch_stream_create(zf, end_of_access_blkid); rw_exit(&zf->zf_rwlock); return; } @@ -248,35 +259,74 @@ dmu_zfetch(zfetch_t *zf, uint64_t blkid, * Normally, we start prefetching where we stopped * prefetching last (zs_pf_blkid). But when we get our first * hit on this stream, zs_pf_blkid == zs_blkid, we don't - * want to prefetch to block we just accessed. In this case, + * want to prefetch the block we just accessed. In this case, * start just after the block we just accessed. */ - int64_t pf_start = MAX(zs->zs_pf_blkid, blkid + nblks); + pf_start = MAX(zs->zs_pf_blkid, end_of_access_blkid); /* * Double our amount of prefetched data, but don't let the * prefetch get further ahead than zfetch_max_distance. */ - int pf_nblks = - MIN((int64_t)zs->zs_pf_blkid - zs->zs_blkid + nblks, - zs->zs_blkid + nblks + - (zfetch_max_distance >> zf->zf_dnode->dn_datablkshift) - pf_start); + if (fetch_data) { + max_dist_blks = + zfetch_max_distance >> zf->zf_dnode->dn_datablkshift; + /* + * Previously, we were (zs_pf_blkid - blkid) ahead. We + * want to now be double that, so read that amount again, + * plus the amount we are catching up by (i.e. the amount + * read just now). + */ + pf_ahead_blks = zs->zs_pf_blkid - blkid + nblks; + max_blks = max_dist_blks - (pf_start - end_of_access_blkid); + pf_nblks = MIN(pf_ahead_blks, max_blks); + } else { + pf_nblks = 0; + } zs->zs_pf_blkid = pf_start + pf_nblks; - zs->zs_atime = gethrtime(); - zs->zs_blkid = blkid + nblks; /* - * dbuf_prefetch() issues the prefetch i/o - * asynchronously, but it may need to wait for an - * indirect block to be read from disk. Therefore - * we do not want to hold any locks while we call it. + * Do the same for indirects, starting from where we stopped last, + * or where we will stop reading data blocks (and the indirects + * that point to them). */ + ipf_start = MAX(zs->zs_ipf_blkid, zs->zs_pf_blkid); + max_dist_blks = zfetch_max_idistance >> zf->zf_dnode->dn_datablkshift; + /* + * We want to double our distance ahead of the data prefetch + * (or reader, if we are not prefetching data). Previously, we + * were (zs_ipf_blkid - blkid) ahead. To double that, we read + * that amount again, plus the amount we are catching up by + * (i.e. the amount read now + the amount of data prefetched now). + */ + pf_ahead_blks = zs->zs_ipf_blkid - blkid + nblks + pf_nblks; + max_blks = max_dist_blks - (ipf_start - end_of_access_blkid); + ipf_nblks = MIN(pf_ahead_blks, max_blks); + zs->zs_ipf_blkid = ipf_start + ipf_nblks; + + epbs = zf->zf_dnode->dn_indblkshift - SPA_BLKPTRSHIFT; + ipf_istart = P2ROUNDUP(ipf_start, 1 << epbs) >> epbs; + ipf_iend = P2ROUNDUP(zs->zs_ipf_blkid, 1 << epbs) >> epbs; + + zs->zs_atime = gethrtime(); + zs->zs_blkid = end_of_access_blkid; mutex_exit(&zs->zs_lock); rw_exit(&zf->zf_rwlock); + + /* + * dbuf_prefetch() is asynchronous (even when it needs to read + * indirect blocks), but we still prefer to drop our locks before + * calling it to reduce the time we hold them. + */ + for (int i = 0; i < pf_nblks; i++) { dbuf_prefetch(zf->zf_dnode, 0, pf_start + i, ZIO_PRIORITY_ASYNC_READ, ARC_FLAG_PREDICTIVE_PREFETCH); } + for (int64_t iblk = ipf_istart; iblk < ipf_iend; iblk++) { + dbuf_prefetch(zf->zf_dnode, 1, iblk, + ZIO_PRIORITY_ASYNC_READ, ARC_FLAG_PREDICTIVE_PREFETCH); + } ZFETCHSTAT_BUMP(zfetchstat_hits); } Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_zfetch.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_zfetch.h Mon Apr 11 18:10:20 2016 (r297830) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_zfetch.h Mon Apr 11 21:07:18 2016 (r297831) @@ -43,6 +43,13 @@ struct dnode; /* so we can reference typedef struct zstream { uint64_t zs_blkid; /* expect next access at this blkid */ uint64_t zs_pf_blkid; /* next block to prefetch */ + + /* + * We will next prefetch the L1 indirect block of this level-0 + * block id. + */ + uint64_t zs_ipf_blkid; + kmutex_t zs_lock; /* protects stream */ hrtime_t zs_atime; /* time last prefetch issued */ list_node_t zs_node; /* link for zf_stream */ @@ -59,7 +66,7 @@ void zfetch_fini(void); void dmu_zfetch_init(zfetch_t *, struct dnode *); void dmu_zfetch_fini(zfetch_t *); -void dmu_zfetch(zfetch_t *, uint64_t, uint64_t); +void dmu_zfetch(zfetch_t *, uint64_t, uint64_t, boolean_t); #ifdef __cplusplus Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dnode.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dnode.h Mon Apr 11 18:10:20 2016 (r297830) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dnode.h Mon Apr 11 21:07:18 2016 (r297831) @@ -305,6 +305,15 @@ int dnode_next_offset(dnode_t *dn, int f void dnode_evict_dbufs(dnode_t *dn); void dnode_evict_bonus(dnode_t *dn); +#define DNODE_IS_CACHEABLE(_dn) \ + ((_dn)->dn_objset->os_primary_cache == ZFS_CACHE_ALL || \ + (DMU_OT_IS_METADATA((_dn)->dn_type) && \ + (_dn)->dn_objset->os_primary_cache == ZFS_CACHE_METADATA)) + +#define DNODE_META_IS_CACHEABLE(_dn) \ + ((_dn)->dn_objset->os_primary_cache == ZFS_CACHE_ALL || \ + (_dn)->dn_objset->os_primary_cache == ZFS_CACHE_METADATA) + #ifdef ZFS_DEBUG /* From owner-svn-src-vendor@freebsd.org Tue Apr 12 22:54:21 2016 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50BB4B0D6FB; Tue, 12 Apr 2016 22:54:21 +0000 (UTC) (envelope-from phil@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1864E17AB; Tue, 12 Apr 2016 22:54:21 +0000 (UTC) (envelope-from phil@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id u3CMsKK9079421; Tue, 12 Apr 2016 22:54:20 GMT (envelope-from phil@FreeBSD.org) Received: (from phil@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id u3CMsKhE079417; Tue, 12 Apr 2016 22:54:20 GMT (envelope-from phil@FreeBSD.org) Message-Id: <201604122254.u3CMsKhE079417@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: phil set sender to phil@FreeBSD.org using -f From: Phil Shafer Date: Tue, 12 Apr 2016 22:54:20 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r297885 - in vendor/Juniper/libxo/dist: . doc libxo m4 X-SVN-Group: vendor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Apr 2016 22:54:21 -0000 Author: phil Date: Tue Apr 12 22:54:19 2016 New Revision: 297885 URL: https://svnweb.freebsd.org/changeset/base/297885 Log: Import libxo 0.4.6 Added: vendor/Juniper/libxo/dist/.gitignore vendor/Juniper/libxo/dist/.svnignore vendor/Juniper/libxo/dist/doc/libxo-manual.html (contents, props changed) Deleted: vendor/Juniper/libxo/dist/install-sh vendor/Juniper/libxo/dist/libxo/add.man vendor/Juniper/libxo/dist/libxo/xo_config.h.in vendor/Juniper/libxo/dist/m4/libtool.m4 vendor/Juniper/libxo/dist/m4/ltoptions.m4 vendor/Juniper/libxo/dist/m4/ltsugar.m4 vendor/Juniper/libxo/dist/m4/ltversion.m4 vendor/Juniper/libxo/dist/m4/lt~obsolete.m4 Modified: vendor/Juniper/libxo/dist/configure.ac vendor/Juniper/libxo/dist/libxo/libxo.c vendor/Juniper/libxo/dist/libxo/xo_open_container.3 vendor/Juniper/libxo/dist/libxo/xo_open_list.3 Directory Properties: vendor/Juniper/libxo/dist/ (props changed) vendor/Juniper/libxo/dist/doc/ (props changed) vendor/Juniper/libxo/dist/encoder/ (props changed) vendor/Juniper/libxo/dist/encoder/cbor/ (props changed) vendor/Juniper/libxo/dist/encoder/test/ (props changed) vendor/Juniper/libxo/dist/libxo/ (props changed) vendor/Juniper/libxo/dist/tests/ (props changed) vendor/Juniper/libxo/dist/tests/core/ (props changed) vendor/Juniper/libxo/dist/tests/gettext/ (props changed) vendor/Juniper/libxo/dist/tests/xo/ (props changed) vendor/Juniper/libxo/dist/xo/ (props changed) vendor/Juniper/libxo/dist/xohtml/ (props changed) vendor/Juniper/libxo/dist/xolint/ (props changed) vendor/Juniper/libxo/dist/xopo/ (props changed) Added: vendor/Juniper/libxo/dist/.gitignore ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ vendor/Juniper/libxo/dist/.gitignore Tue Apr 12 22:54:19 2016 (r297885) @@ -0,0 +1,46 @@ +# Object files +*.o + +# Libraries +*.lib +*.a + +# Shared objects (inc. Windows DLLs) +*.dll +*.so +*.so.* +*.dylib + +# Executables +*.exe +*.app + +*~ +*.orig + +aclocal.m4 +ar-lib +autom4te.cache +build +compile +config.guess +config.h.in +config.sub +depcomp +install-sh +ltmain.sh +missing +m4 + +Makefile.in +configure +.DS_Store + +xoconfig.h.in +xo_config.h.in + +.gdbinit +.gdbinit.local +xtest +xtest.dSYM +tests/w Added: vendor/Juniper/libxo/dist/.svnignore ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ vendor/Juniper/libxo/dist/.svnignore Tue Apr 12 22:54:19 2016 (r297885) @@ -0,0 +1,18 @@ +Makefile.in +aclocal.m4 +ar-lib +autom4te.cache +bin* +build* +compile +configure +config.guess +config.sub +depcomp +doc/Makefile.in +info* +install-sh +ltmain.sh +m4* +missing +patches* Modified: vendor/Juniper/libxo/dist/configure.ac ============================================================================== --- vendor/Juniper/libxo/dist/configure.ac Tue Apr 12 22:31:48 2016 (r297884) +++ vendor/Juniper/libxo/dist/configure.ac Tue Apr 12 22:54:19 2016 (r297885) @@ -12,7 +12,7 @@ # AC_PREREQ(2.2) -AC_INIT([libxo], [0.4.5], [phil@juniper.net]) +AC_INIT([libxo], [0.4.6], [phil@juniper.net]) AM_INIT_AUTOMAKE([-Wall -Werror foreign -Wno-portability]) # Support silent build rules. Requires at least automake-1.11. Added: vendor/Juniper/libxo/dist/doc/libxo-manual.html ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ vendor/Juniper/libxo/dist/doc/libxo-manual.html Tue Apr 12 22:54:19 2016 (r297885) @@ -0,0 +1,27134 @@ + + + + +libxo: The Easy Way to Generate text, XML, JSON, and HTML output + + +