From owner-svn-src-projects@FreeBSD.ORG Mon Oct 14 21:50:57 2013 Return-Path: Delivered-To: svn-src-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 956197E1; Mon, 14 Oct 2013 21:50:57 +0000 (UTC) (envelope-from asomers@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 742942AAD; Mon, 14 Oct 2013 21:50:57 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.7/8.14.7) with ESMTP id r9ELov65082212; Mon, 14 Oct 2013 21:50:57 GMT (envelope-from asomers@svn.freebsd.org) Received: (from asomers@localhost) by svn.freebsd.org (8.14.7/8.14.5/Submit) id r9ELovC1082211; Mon, 14 Oct 2013 21:50:57 GMT (envelope-from asomers@svn.freebsd.org) Message-Id: <201310142150.r9ELovC1082211@svn.freebsd.org> From: Alan Somers Date: Mon, 14 Oct 2013 21:50:57 +0000 (UTC) To: src-committers@freebsd.org, svn-src-projects@freebsd.org Subject: svn commit: r256463 - projects/zfsd/head/cddl/sbin/zfsd X-SVN-Group: projects MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Oct 2013 21:50:57 -0000 Author: asomers Date: Mon Oct 14 21:50:56 2013 New Revision: 256463 URL: http://svnweb.freebsd.org/changeset/base/256463 Log: Fix a bug in zfsd: when a drive is experiencing a rapid storm of IO or checksum errors, zfsd will not degrade/fault it until hundreds or thousands of errors have occured. cddl/sbin/zfsd/case_file.cc RefreshVdevState() iterates through all the system's zpools, which involves the ioctls ZFS_IOC_POOL_CONFIGS and ZFS_IOC_POOL_STATS. Both of those acquire spa_namespace_lock, which may block for a long time under certain circumstances, including when the system has a storm of IO or checksum errors. This change eliminates the call to RefreshVdevState() whenever a ZFSEvent is received. Instead, RefreshVdevState() will only be called when a CaseFile is closed, if necessary. This way, zfsd won't spend too much time blocking on ioctl()s and miss reading events from devd. Submitted by: alans Approved by: ken (mentor) Sponsored by: Spectra Logic Corporation Modified: projects/zfsd/head/cddl/sbin/zfsd/case_file.cc Modified: projects/zfsd/head/cddl/sbin/zfsd/case_file.cc ============================================================================== --- projects/zfsd/head/cddl/sbin/zfsd/case_file.cc Mon Oct 14 21:41:36 2013 (r256462) +++ projects/zfsd/head/cddl/sbin/zfsd/case_file.cc Mon Oct 14 21:50:56 2013 (r256463) @@ -298,28 +298,6 @@ CaseFile::ReEvaluate(const ZfsEvent &eve { bool consumed(false); - if (!RefreshVdevState()) { - /* - * The pool or vdev for this case file is no longer - * part of the configuration. This can happen - * if we process a device arrival notification - * before seeing the ZFS configuration change - * event. - */ - syslog(LOG_INFO, - "CaseFile::ReEvaluate(%s,%s) Pool/Vdev unconfigured. " - "Closing\n", - PoolGUIDString().c_str(), - VdevGUIDString().c_str()); - Close(); - - /* - * Since this event was not used to close this - * case, do not report it as consumed. - */ - return (/*consumed*/false); - } - if (event.Value("type") == "misc.fs.zfs.vdev_remove") { /* * The Vdev we represent has been removed from the @@ -333,6 +311,28 @@ CaseFile::ReEvaluate(const ZfsEvent &eve if (event.Value("class") == "resource.fs.zfs.removed") { bool spare_activated; + if (!RefreshVdevState()) { + /* + * The pool or vdev for this case file is no longer + * part of the configuration. This can happen + * if we process a device arrival notification + * before seeing the ZFS configuration change + * event. + */ + syslog(LOG_INFO, + "CaseFile::ReEvaluate(%s,%s) Pool/Vdev " + "unconfigured. Closing\n", + PoolGUIDString().c_str(), + VdevGUIDString().c_str()); + Close(); + + /* + * Since this event was not used to close this + * case, do not report it as consumed. + */ + return (/*consumed*/false); + } + /* * Discard any tentative I/O error events for * this case. They were most likely caused by the