From owner-svn-src-stable@freebsd.org Wed Oct 3 02:50:08 2018 Return-Path: Delivered-To: svn-src-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CCF5110B5861; Wed, 3 Oct 2018 02:50:08 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3BEA7885D0; Wed, 3 Oct 2018 02:50:08 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 36E8210E7F; Wed, 3 Oct 2018 02:50:08 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w932o86R061706; Wed, 3 Oct 2018 02:50:08 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w932o8Dm061705; Wed, 3 Oct 2018 02:50:08 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201810030250.w932o8Dm061705@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 3 Oct 2018 02:50:08 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org Subject: svn commit: r339116 - stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs X-SVN-Group: stable-11 X-SVN-Commit-Author: mav X-SVN-Commit-Paths: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs X-SVN-Commit-Revision: 339116 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-stable@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SVN commit messages for all the -stable branches of the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2018 02:50:09 -0000 Author: mav Date: Wed Oct 3 02:50:07 2018 New Revision: 339116 URL: https://svnweb.freebsd.org/changeset/base/339116 Log: MFC r337030: MFV r337029: 9426 metaslab size can exceed offset addressable by spacemap metaslab size can exceed offset addressable by spacemap. The vdev can address up to 2^63 * SPA_MAXBLOCKSIZE (512). A metaslab can address up to 2^47 * 2^vdev_ashift. Therefore we may need to increase the number of metaslabs so that the maximum metaslab size is capped at the amount that can be addressed by the spacemap. This should happen in vdev_metaslab_set_size(). illumos/illumos-gate@b4bf0cf0458759c67920a031021a9d96cd683cfe Reviewed by: Paul Dagnelie Reviewed by: Matt Ahrens Approved by: Dan McDonald Author: Don Brady Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c Directory Properties: stable/11/ (props changed) Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c ============================================================================== --- stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c Wed Oct 3 02:49:24 2018 (r339115) +++ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c Wed Oct 3 02:50:07 2018 (r339116) @@ -163,24 +163,30 @@ static vdev_ops_t *vdev_ops_table[] = { }; -/* maximum number of metaslabs per top-level vdev */ +/* target number of metaslabs per top-level vdev */ int vdev_max_ms_count = 200; SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, max_ms_count, CTLFLAG_RDTUN, &vdev_max_ms_count, 0, "Maximum number of metaslabs per top-level vdev"); -/* minimum amount of metaslabs per top-level vdev */ +/* minimum number of metaslabs per top-level vdev */ int vdev_min_ms_count = 16; SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, min_ms_count, CTLFLAG_RDTUN, &vdev_min_ms_count, 0, "Minimum number of metaslabs per top-level vdev"); -/* see comment in vdev_metaslab_set_size() */ +/* practical upper limit of total metaslabs per top-level vdev */ +int vdev_ms_count_limit = 1ULL << 17; + +/* lower limit for metaslab size (512M) */ int vdev_default_ms_shift = 29; SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, default_ms_shift, CTLFLAG_RDTUN, &vdev_default_ms_shift, 0, "Shift between vdev size and number of metaslabs"); +/* upper limit for metaslab size (256G) */ +int vdev_max_ms_shift = 38; + boolean_t vdev_validate_skip = B_FALSE; /* @@ -2167,34 +2173,53 @@ void vdev_metaslab_set_size(vdev_t *vd) { uint64_t asize = vd->vdev_asize; - uint64_t ms_shift = 0; + uint64_t ms_count = asize >> vdev_default_ms_shift; + uint64_t ms_shift; /* - * For vdevs that are bigger than 8G the metaslab size varies in - * a way that the number of metaslabs increases in powers of two, - * linearly in terms of vdev_asize, starting from 16 metaslabs. - * So for vdev_asize of 8G we get 16 metaslabs, for 16G, we get 32, - * and so on, until we hit the maximum metaslab count limit - * [vdev_max_ms_count] from which point the metaslab count stays - * the same. + * There are two dimensions to the metaslab sizing calculation: + * the size of the metaslab and the count of metaslabs per vdev. + * In general, we aim for vdev_max_ms_count (200) metaslabs. The + * range of the dimensions are as follows: + * + * 2^29 <= ms_size <= 2^38 + * 16 <= ms_count <= 131,072 + * + * On the lower end of vdev sizes, we aim for metaslabs sizes of + * at least 512MB (2^29) to minimize fragmentation effects when + * testing with smaller devices. However, the count constraint + * of at least 16 metaslabs will override this minimum size goal. + * + * On the upper end of vdev sizes, we aim for a maximum metaslab + * size of 256GB. However, we will cap the total count to 2^17 + * metaslabs to keep our memory footprint in check. + * + * The net effect of applying above constrains is summarized below. + * + * vdev size metaslab count + * -------------|----------------- + * < 8GB ~16 + * 8GB - 100GB one per 512MB + * 100GB - 50TB ~200 + * 50TB - 32PB one per 256GB + * > 32PB ~131,072 + * ------------------------------- */ - ms_shift = vdev_default_ms_shift; - if ((asize >> ms_shift) < vdev_min_ms_count) { - /* - * For devices that are less than 8G we want to have - * exactly 16 metaslabs. We don't want less as integer - * division rounds down, so less metaslabs mean more - * wasted space. We don't want more as these vdevs are - * small and in the likely event that we are running - * out of space, the SPA will have a hard time finding - * space due to fragmentation. - */ + if (ms_count < vdev_min_ms_count) ms_shift = highbit64(asize / vdev_min_ms_count); - ms_shift = MAX(ms_shift, SPA_MAXBLOCKSHIFT); - - } else if ((asize >> ms_shift) > vdev_max_ms_count) { + else if (ms_count > vdev_max_ms_count) ms_shift = highbit64(asize / vdev_max_ms_count); + else + ms_shift = vdev_default_ms_shift; + + if (ms_shift < SPA_MAXBLOCKSHIFT) { + ms_shift = SPA_MAXBLOCKSHIFT; + } else if (ms_shift > vdev_max_ms_shift) { + ms_shift = vdev_max_ms_shift; + /* cap the total count to constrain memory footprint */ + if ((asize >> ms_shift) > vdev_ms_count_limit) + ms_shift = highbit64(asize / vdev_ms_count_limit); } vd->vdev_ms_shift = ms_shift;