From owner-svn-src-all@FreeBSD.ORG Sat Aug 2 04:06:37 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1ACDE6F5; Sat, 2 Aug 2014 04:06:37 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 06B3623D7; Sat, 2 Aug 2014 04:06:37 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.9/8.14.9) with ESMTP id s7246aXO006884; Sat, 2 Aug 2014 04:06:36 GMT (envelope-from delphij@svn.freebsd.org) Received: (from delphij@localhost) by svn.freebsd.org (8.14.9/8.14.9/Submit) id s7246ZsC006874; Sat, 2 Aug 2014 04:06:35 GMT (envelope-from delphij@svn.freebsd.org) Message-Id: <201408020406.s7246ZsC006874@svn.freebsd.org> From: Xin LI Date: Sat, 2 Aug 2014 04:06:35 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r269419 - in stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs: . sys X-SVN-Group: stable-10 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Aug 2014 04:06:37 -0000 Author: delphij Date: Sat Aug 2 04:06:35 2014 New Revision: 269419 URL: http://svnweb.freebsd.org/changeset/base/269419 Log: MFC r268865: MFV r268852: Reduce lock contention on the z_teardown_lock under heavily cached read workload by splitting the single teardown rrw lock into RRM_NUM_LOCKS (17) of them. Read acquisitions are randomly distributed among these locks based on curthread pointer. Write acquisitions are going to all the locks, which for the usage of this type of lock should be rare. Illumos issue: 5008 lock contention (rrw_exit) while running a read only load Modified: stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Directory Properties: stable/10/ (props changed) Modified: stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c ============================================================================== --- stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c Sat Aug 2 04:01:44 2014 (r269418) +++ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c Sat Aug 2 04:06:35 2014 (r269419) @@ -286,3 +286,91 @@ rrw_tsd_destroy(void *arg) (void *)curthread, (void *)rn->rn_rrl); } } + +/* + * A reader-mostly lock implementation, tuning above reader-writer locks + * for hightly parallel read acquisitions, while pessimizing writes. + * + * The idea is to split single busy lock into array of locks, so that + * each reader can lock only one of them for read, depending on result + * of simple hash function. That proportionally reduces lock congestion. + * Writer same time has to sequentially aquire write on all the locks. + * That makes write aquisition proportionally slower, but in places where + * it is used (filesystem unmount) performance is not critical. + * + * All the functions below are direct wrappers around functions above. + */ +void +rrm_init(rrmlock_t *rrl, boolean_t track_all) +{ + int i; + + for (i = 0; i < RRM_NUM_LOCKS; i++) + rrw_init(&rrl->locks[i], track_all); +} + +void +rrm_destroy(rrmlock_t *rrl) +{ + int i; + + for (i = 0; i < RRM_NUM_LOCKS; i++) + rrw_destroy(&rrl->locks[i]); +} + +void +rrm_enter(rrmlock_t *rrl, krw_t rw, void *tag) +{ + if (rw == RW_READER) + rrm_enter_read(rrl, tag); + else + rrm_enter_write(rrl); +} + +/* + * This maps the current thread to a specific lock. Note that the lock + * must be released by the same thread that acquired it. We do this + * mapping by taking the thread pointer mod a prime number. We examine + * only the low 32 bits of the thread pointer, because 32-bit division + * is faster than 64-bit division, and the high 32 bits have little + * entropy anyway. + */ +#define RRM_TD_LOCK() (((uint32_t)(uintptr_t)(curthread)) % RRM_NUM_LOCKS) + +void +rrm_enter_read(rrmlock_t *rrl, void *tag) +{ + rrw_enter_read(&rrl->locks[RRM_TD_LOCK()], tag); +} + +void +rrm_enter_write(rrmlock_t *rrl) +{ + int i; + + for (i = 0; i < RRM_NUM_LOCKS; i++) + rrw_enter_write(&rrl->locks[i]); +} + +void +rrm_exit(rrmlock_t *rrl, void *tag) +{ + int i; + + if (rrl->locks[0].rr_writer == curthread) { + for (i = 0; i < RRM_NUM_LOCKS; i++) + rrw_exit(&rrl->locks[i], tag); + } else { + rrw_exit(&rrl->locks[RRM_TD_LOCK()], tag); + } +} + +boolean_t +rrm_held(rrmlock_t *rrl, krw_t rw) +{ + if (rw == RW_WRITER) { + return (rrw_held(&rrl->locks[0], rw)); + } else { + return (rrw_held(&rrl->locks[RRM_TD_LOCK()], rw)); + } +} Modified: stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h ============================================================================== --- stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h Sat Aug 2 04:01:44 2014 (r269418) +++ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h Sat Aug 2 04:06:35 2014 (r269419) @@ -79,6 +79,31 @@ void rrw_tsd_destroy(void *arg); #define RRW_LOCK_HELD(x) \ (rrw_held(x, RW_WRITER) || rrw_held(x, RW_READER)) +/* + * A reader-mostly lock implementation, tuning above reader-writer locks + * for hightly parallel read acquisitions, pessimizing write acquisitions. + * + * This should be a prime number. See comment in rrwlock.c near + * RRM_TD_LOCK() for details. + */ +#define RRM_NUM_LOCKS 17 +typedef struct rrmlock { + rrwlock_t locks[RRM_NUM_LOCKS]; +} rrmlock_t; + +void rrm_init(rrmlock_t *rrl, boolean_t track_all); +void rrm_destroy(rrmlock_t *rrl); +void rrm_enter(rrmlock_t *rrl, krw_t rw, void *tag); +void rrm_enter_read(rrmlock_t *rrl, void *tag); +void rrm_enter_write(rrmlock_t *rrl); +void rrm_exit(rrmlock_t *rrl, void *tag); +boolean_t rrm_held(rrmlock_t *rrl, krw_t rw); + +#define RRM_READ_HELD(x) rrm_held(x, RW_READER) +#define RRM_WRITE_HELD(x) rrm_held(x, RW_WRITER) +#define RRM_LOCK_HELD(x) \ + (rrm_held(x, RW_WRITER) || rrm_held(x, RW_READER)) + #ifdef __cplusplus } #endif Modified: stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h ============================================================================== --- stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h Sat Aug 2 04:01:44 2014 (r269418) +++ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h Sat Aug 2 04:06:35 2014 (r269419) @@ -64,7 +64,7 @@ struct zfsvfs { int z_norm; /* normalization flags */ boolean_t z_atime; /* enable atimes mount option */ boolean_t z_unmounted; /* unmounted */ - rrwlock_t z_teardown_lock; + rrmlock_t z_teardown_lock; krwlock_t z_teardown_inactive_lock; list_t z_all_znodes; /* all vnodes in the fs */ kmutex_t z_znodes_lock; /* lock for z_all_znodes */ Modified: stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h ============================================================================== --- stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h Sat Aug 2 04:01:44 2014 (r269418) +++ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h Sat Aug 2 04:06:35 2014 (r269419) @@ -256,7 +256,7 @@ VTOZ(vnode_t *vp) /* Called on entry to each ZFS vnode and vfs operation */ #define ZFS_ENTER(zfsvfs) \ { \ - rrw_enter_read(&(zfsvfs)->z_teardown_lock, FTAG); \ + rrm_enter_read(&(zfsvfs)->z_teardown_lock, FTAG); \ if ((zfsvfs)->z_unmounted) { \ ZFS_EXIT(zfsvfs); \ return (EIO); \ @@ -265,10 +265,10 @@ VTOZ(vnode_t *vp) /* Called on entry to each ZFS vnode and vfs operation that can not return EIO */ #define ZFS_ENTER_NOERROR(zfsvfs) \ - rrw_enter(&(zfsvfs)->z_teardown_lock, RW_READER, FTAG) + rrm_enter(&(zfsvfs)->z_teardown_lock, RW_READER, FTAG) /* Must be called before exiting the vop */ -#define ZFS_EXIT(zfsvfs) rrw_exit(&(zfsvfs)->z_teardown_lock, FTAG) +#define ZFS_EXIT(zfsvfs) rrm_exit(&(zfsvfs)->z_teardown_lock, FTAG) /* Verifies the znode is valid */ #define ZFS_VERIFY_ZP(zp) \ Modified: stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c ============================================================================== --- stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c Sat Aug 2 04:01:44 2014 (r269418) +++ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c Sat Aug 2 04:06:35 2014 (r269419) @@ -1466,7 +1466,7 @@ zfsvfs_hold(const char *name, void *tag, if (getzfsvfs(name, zfvp) != 0) error = zfsvfs_create(name, zfvp); if (error == 0) { - rrw_enter(&(*zfvp)->z_teardown_lock, (writer) ? RW_WRITER : + rrm_enter(&(*zfvp)->z_teardown_lock, (writer) ? RW_WRITER : RW_READER, tag); if ((*zfvp)->z_unmounted) { /* @@ -1474,7 +1474,7 @@ zfsvfs_hold(const char *name, void *tag, * thread should be just about to disassociate the * objset from the zfsvfs. */ - rrw_exit(&(*zfvp)->z_teardown_lock, tag); + rrm_exit(&(*zfvp)->z_teardown_lock, tag); return (SET_ERROR(EBUSY)); } } @@ -1484,7 +1484,7 @@ zfsvfs_hold(const char *name, void *tag, static void zfsvfs_rele(zfsvfs_t *zfsvfs, void *tag) { - rrw_exit(&zfsvfs->z_teardown_lock, tag); + rrm_exit(&zfsvfs->z_teardown_lock, tag); if (zfsvfs->z_vfs) { VFS_RELE(zfsvfs->z_vfs); Modified: stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c ============================================================================== --- stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c Sat Aug 2 04:01:44 2014 (r269418) +++ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c Sat Aug 2 04:06:35 2014 (r269419) @@ -988,7 +988,7 @@ zfsvfs_create(const char *osname, zfsvfs mutex_init(&zfsvfs->z_lock, NULL, MUTEX_DEFAULT, NULL); list_create(&zfsvfs->z_all_znodes, sizeof (znode_t), offsetof(znode_t, z_link_node)); - rrw_init(&zfsvfs->z_teardown_lock, B_FALSE); + rrm_init(&zfsvfs->z_teardown_lock, B_FALSE); rw_init(&zfsvfs->z_teardown_inactive_lock, NULL, RW_DEFAULT, NULL); rw_init(&zfsvfs->z_fuid_lock, NULL, RW_DEFAULT, NULL); for (i = 0; i != ZFS_OBJ_MTX_SZ; i++) @@ -1104,7 +1104,7 @@ zfsvfs_free(zfsvfs_t *zfsvfs) mutex_destroy(&zfsvfs->z_znodes_lock); mutex_destroy(&zfsvfs->z_lock); list_destroy(&zfsvfs->z_all_znodes); - rrw_destroy(&zfsvfs->z_teardown_lock); + rrm_destroy(&zfsvfs->z_teardown_lock); rw_destroy(&zfsvfs->z_teardown_inactive_lock); rw_destroy(&zfsvfs->z_fuid_lock); for (i = 0; i != ZFS_OBJ_MTX_SZ; i++) @@ -1833,7 +1833,7 @@ zfsvfs_teardown(zfsvfs_t *zfsvfs, boolea { znode_t *zp; - rrw_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG); + rrm_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG); if (!unmounting) { /* @@ -1866,7 +1866,7 @@ zfsvfs_teardown(zfsvfs_t *zfsvfs, boolea */ if (!unmounting && (zfsvfs->z_unmounted || zfsvfs->z_os == NULL)) { rw_exit(&zfsvfs->z_teardown_inactive_lock); - rrw_exit(&zfsvfs->z_teardown_lock, FTAG); + rrm_exit(&zfsvfs->z_teardown_lock, FTAG); return (SET_ERROR(EIO)); } @@ -1893,7 +1893,7 @@ zfsvfs_teardown(zfsvfs_t *zfsvfs, boolea */ if (unmounting) { zfsvfs->z_unmounted = B_TRUE; - rrw_exit(&zfsvfs->z_teardown_lock, FTAG); + rrm_exit(&zfsvfs->z_teardown_lock, FTAG); rw_exit(&zfsvfs->z_teardown_inactive_lock); } @@ -1970,9 +1970,9 @@ zfs_umount(vfs_t *vfsp, int fflag) * vflush(FORCECLOSE). This way we ensure no future vnops * will be called and risk operating on DOOMED vnodes. */ - rrw_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG); + rrm_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG); zfsvfs->z_unmounted = B_TRUE; - rrw_exit(&zfsvfs->z_teardown_lock, FTAG); + rrm_exit(&zfsvfs->z_teardown_lock, FTAG); } /* @@ -2240,7 +2240,7 @@ zfs_resume_fs(zfsvfs_t *zfsvfs, const ch znode_t *zp; uint64_t sa_obj = 0; - ASSERT(RRW_WRITE_HELD(&zfsvfs->z_teardown_lock)); + ASSERT(RRM_WRITE_HELD(&zfsvfs->z_teardown_lock)); ASSERT(RW_WRITE_HELD(&zfsvfs->z_teardown_inactive_lock)); /* @@ -2296,7 +2296,7 @@ zfs_resume_fs(zfsvfs_t *zfsvfs, const ch bail: /* release the VOPs */ rw_exit(&zfsvfs->z_teardown_inactive_lock); - rrw_exit(&zfsvfs->z_teardown_lock, FTAG); + rrm_exit(&zfsvfs->z_teardown_lock, FTAG); if (err) { /* Modified: stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c ============================================================================== --- stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Sat Aug 2 04:01:44 2014 (r269418) +++ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Sat Aug 2 04:06:35 2014 (r269419) @@ -276,7 +276,7 @@ zfs_znode_move(void *buf, void *newbuf, * can safely ensure that the filesystem is not and will not be * unmounted. The next statement is equivalent to ZFS_ENTER(). */ - rrw_enter(&zfsvfs->z_teardown_lock, RW_READER, FTAG); + rrm_enter(&zfsvfs->z_teardown_lock, RW_READER, FTAG); if (zfsvfs->z_unmounted) { ZFS_EXIT(zfsvfs); rw_exit(&zfsvfs_lock);