From owner-freebsd-current@freebsd.org Sat Jun 4 09:32:41 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CB613B6579B for ; Sat, 4 Jun 2016 09:32:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 76AF3118C; Sat, 4 Jun 2016 09:32:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u549Wamj043330 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 4 Jun 2016 12:32:36 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u549Wamj043330 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u549WaVQ043329; Sat, 4 Jun 2016 12:32:36 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 4 Jun 2016 12:32:36 +0300 From: Konstantin Belousov To: Mark Johnston Cc: freebsd-current@FreeBSD.org Subject: Re: thread suspension when dumping core Message-ID: <20160604093236.GA38613@kib.kiev.ua> References: <20160604022347.GA1096@wkstn-mjohnston.west.isilon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160604022347.GA1096@wkstn-mjohnston.west.isilon.com> User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jun 2016 09:32:41 -0000 On Fri, Jun 03, 2016 at 07:23:47PM -0700, Mark Johnston wrote: > Hi, > > I've recently observed a hang in a multi-threaded process that had hit > an assertion failure and was attempting to dump core. One thread was > sleeping interruptibly on an advisory lock with TDF_SBDRY set (our > filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to > suspend other threads with thread_single(SINGLE_NO_EXIT), which fails > to interrupt the sleeping thread, resulting in the hang. > > My question is, why does the SA_CORE handler not force all threads to > the user boundary before attempting to dump core? It must do so later > anyway in order to exit. As I understand it, TDF_SBDRY is intended to > avoid deadlocks that can occur when stopping a process, but in this > case we don't stop the process with the intention of resuming it, so it > seems erroneous to apply this flag. Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ? This cannot work, regardless of the mode of single-threading. TDF_SBDRY makes such sleep non-interruptible by any single-threading request, on the promise that the thread owns some resources (typically vnode locks). I.e. changing the mode would not help. I see two reasons to use SINGLE_NO_EXIT for coredumping. It allows coredump writer to record more exact state of the process into the notes. Another one is that SINGLE_NO_EXIT is generally faster and more reliable than SINGLE_BOUNDARY. Some states are already good enough for SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In other words, core dump write starts earlier. It might be not very significant reasons. >From what I see in the code, our NFS client has similar issue of calling lf_advlock() with TDF_SBDRY set. Below is the patch to fix that. Similar bug existed in our fifofs, see r277321. diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clvnops.c index 2a8afa9..98625ee 100644 --- a/sys/fs/nfsclient/nfs_clvnops.c +++ b/sys/fs/nfsclient/nfs_clvnops.c @@ -2992,7 +2992,7 @@ nfs_advlock(struct vop_advlock_args *ap) struct proc *p = (struct proc *)ap->a_id; struct thread *td = curthread; /* XXX */ struct vattr va; - int ret, error = EOPNOTSUPP; + int sbdry, ret, error = EOPNOTSUPP; u_quad_t size; if (NFS_ISV4(vp) && (ap->a_flags & (F_POSIX | F_FLOCK)) != 0) { @@ -3087,7 +3087,10 @@ nfs_advlock(struct vop_advlock_args *ap) if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NOLOCKD) != 0) { size = VTONFS(vp)->n_size; NFSVOPUNLOCK(vp, 0); + sbdry = sigallowstop(); error = lf_advlock(ap, &(vp->v_lockf), size); + if (sbdry) + sigdeferstop(); } else { if (nfs_advlock_p != NULL) error = nfs_advlock_p(ap); @@ -3114,7 +3117,7 @@ nfs_advlockasync(struct vop_advlockasync_args *ap) { struct vnode *vp = ap->a_vp; u_quad_t size; - int error; + int error, sbdry; if (NFS_ISV4(vp)) return (EOPNOTSUPP); @@ -3124,7 +3127,10 @@ nfs_advlockasync(struct vop_advlockasync_args *ap) if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NOLOCKD) != 0) { size = VTONFS(vp)->n_size; NFSVOPUNLOCK(vp, 0); + sbdry = sigallowstop(); error = lf_advlockasync(ap, &(vp->v_lockf), size); + if (sbdry) + sigdeferstop(); } else { NFSVOPUNLOCK(vp, 0); error = EOPNOTSUPP;