Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 Jun 2016 10:13:11 -0700
From:      Mark Johnston <markj@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-current@FreeBSD.org
Subject:   Re: thread suspension when dumping core
Message-ID:  <20160606171311.GC10101@wkstn-mjohnston.west.isilon.com>
In-Reply-To: <20160604093236.GA38613@kib.kiev.ua>
References:  <20160604022347.GA1096@wkstn-mjohnston.west.isilon.com> <20160604093236.GA38613@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote:
> On Fri, Jun 03, 2016 at 07:23:47PM -0700, Mark Johnston wrote:
> > Hi,
> > 
> > I've recently observed a hang in a multi-threaded process that had hit
> > an assertion failure and was attempting to dump core. One thread was
> > sleeping interruptibly on an advisory lock with TDF_SBDRY set (our
> > filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to
> > suspend other threads with thread_single(SINGLE_NO_EXIT), which fails
> > to interrupt the sleeping thread, resulting in the hang.
> > 
> > My question is, why does the SA_CORE handler not force all threads to
> > the user boundary before attempting to dump core? It must do so later
> > anyway in order to exit. As I understand it, TDF_SBDRY is intended to
> > avoid deadlocks that can occur when stopping a process, but in this
> > case we don't stop the process with the intention of resuming it, so it
> > seems erroneous to apply this flag.
> 
> Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ?

It doesn't. This code belongs to a general framework for distributed FS
locks; in this particular case, the application was using it to acquire
a custom advisory lock.

> This cannot work, regardless of the mode of single-threading.  TDF_SBDRY
> makes such sleep non-interruptible by any single-threading request, on
> the promise that the thread owns some resources (typically vnode locks).
> I.e. changing the mode would not help.

I'm a bit confused by this. How does TDF_SBDRY prevent thread_single()
from waking up the thread? The sleepq_abort() call is only elided in the
SINGLE_ALLPROC case, so in other cases, I think we will still interrupt
the sleep. Thus, since thread_suspend_check() is only invoked prior to
going to sleep, the application I referred to must have attempted to
single-thread the process before the thread in question went to sleep.

> 
> I see two reasons to use SINGLE_NO_EXIT for coredumping.  It allows
> coredump writer to record more exact state of the process into the notes.
> 
> Another one is that SINGLE_NO_EXIT is generally faster and more
> reliable than SINGLE_BOUNDARY. Some states are already good enough for
> SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In
> other words, core dump write starts earlier.
> 
> It might be not very significant reasons. 
> 
> From what I see in the code, our NFS client has similar issue of calling
> lf_advlock() with TDF_SBDRY set.  Below is the patch to fix that.
> Similar bug existed in our fifofs, see r277321.

Thanks. It may be that a similar fix is appropriate in our locking code,
but I'll have to spend more time reading it.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160606171311.GC10101>