From owner-freebsd-current@freebsd.org Mon Jun 6 17:09:23 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DFF97B6D881 for ; Mon, 6 Jun 2016 17:09:23 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-pf0-x230.google.com (mail-pf0-x230.google.com [IPv6:2607:f8b0:400e:c00::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B209F1DA9 for ; Mon, 6 Jun 2016 17:09:23 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-pf0-x230.google.com with SMTP id g64so68713415pfb.2 for ; Mon, 06 Jun 2016 10:09:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=6fNtWOYnBf8bSRJA7f2530WZDsfxJV1yzK5vdQ/TlPg=; b=e4VbdGTZfcNTcboHTrUYBCRygmbShEVfpYyJIqZGF9kbDwPWJKl875jG8NVnNpaccL olZBYt69F50AKYGZu/bPeqBHm6sMQ5n1PlMKEy/IRQh9cDznqdXv1+v0UlToDqG6768i k7OMyweisGxHUe4PXL5PVPqJ5airBt3HD9L48jMjKKyd6hp+CYuYTDRuoI9LqYtVM0Dp TcX4wVvLkIH+zevHcJ29kGjh17o0lhypSD9w3z4bIt9lfVXJ8GDOIZBqT5MKhegSbNDf zJhGEsNGkf5ZKBico2OouMDDCujCnK9mkXQTfyex5dBG6vcuyw2lvW8d5Ot6vHcUtOl3 bYHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=6fNtWOYnBf8bSRJA7f2530WZDsfxJV1yzK5vdQ/TlPg=; b=FsGPsO3+e0lxFMj1Vj7LSrcb4tDHThpkO6BszNPsUlvYRoej6ukdee8oKqA17frMqh AJwOXdobb6SohlYCbF3R5Fb/aR3BB6kT7KeDMP63dRlAcPZ+WbWs1CyO1YF3opHf3HTq Xl5K5y5cTQeh2FIsdYvzz/qPROFDADhiu0aROekKZq6JvzHEUkPyZJNWaRTPspZdV4Yf 4HskdZCSMP067mmmcxSr4bZBp2BGLsEdhL7ZIUPGfcfJhmNHBG9Im4WLX2PCwlUcUSoF 1mWqkERmY/9sGEvy9Bpp3Y1wgMJOGFG89U+MtbQbNNa3mhTBGnB/K2r4NiO/Qa6/GnHp IYrg== X-Gm-Message-State: ALyK8tIJh/6lDxNmzFnWTU4MXC08k0jwMo7dNqQhKKno5HDaDlv0ZpKn0eAvTRIENEv4zg== X-Received: by 10.98.21.210 with SMTP id 201mr26512066pfv.51.1465232963133; Mon, 06 Jun 2016 10:09:23 -0700 (PDT) Received: from wkstn-mjohnston.west.isilon.com (c-76-104-201-218.hsd1.wa.comcast.net. [76.104.201.218]) by smtp.gmail.com with ESMTPSA id g82sm29078077pfj.22.2016.06.06.10.09.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Jun 2016 10:09:22 -0700 (PDT) Sender: Mark Johnston Date: Mon, 6 Jun 2016 10:13:11 -0700 From: Mark Johnston To: Konstantin Belousov Cc: freebsd-current@FreeBSD.org Subject: Re: thread suspension when dumping core Message-ID: <20160606171311.GC10101@wkstn-mjohnston.west.isilon.com> References: <20160604022347.GA1096@wkstn-mjohnston.west.isilon.com> <20160604093236.GA38613@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160604093236.GA38613@kib.kiev.ua> User-Agent: Mutt/1.6.1 (2016-04-27) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jun 2016 17:09:24 -0000 On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote: > On Fri, Jun 03, 2016 at 07:23:47PM -0700, Mark Johnston wrote: > > Hi, > > > > I've recently observed a hang in a multi-threaded process that had hit > > an assertion failure and was attempting to dump core. One thread was > > sleeping interruptibly on an advisory lock with TDF_SBDRY set (our > > filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to > > suspend other threads with thread_single(SINGLE_NO_EXIT), which fails > > to interrupt the sleeping thread, resulting in the hang. > > > > My question is, why does the SA_CORE handler not force all threads to > > the user boundary before attempting to dump core? It must do so later > > anyway in order to exit. As I understand it, TDF_SBDRY is intended to > > avoid deadlocks that can occur when stopping a process, but in this > > case we don't stop the process with the intention of resuming it, so it > > seems erroneous to apply this flag. > > Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ? It doesn't. This code belongs to a general framework for distributed FS locks; in this particular case, the application was using it to acquire a custom advisory lock. > This cannot work, regardless of the mode of single-threading. TDF_SBDRY > makes such sleep non-interruptible by any single-threading request, on > the promise that the thread owns some resources (typically vnode locks). > I.e. changing the mode would not help. I'm a bit confused by this. How does TDF_SBDRY prevent thread_single() from waking up the thread? The sleepq_abort() call is only elided in the SINGLE_ALLPROC case, so in other cases, I think we will still interrupt the sleep. Thus, since thread_suspend_check() is only invoked prior to going to sleep, the application I referred to must have attempted to single-thread the process before the thread in question went to sleep. > > I see two reasons to use SINGLE_NO_EXIT for coredumping. It allows > coredump writer to record more exact state of the process into the notes. > > Another one is that SINGLE_NO_EXIT is generally faster and more > reliable than SINGLE_BOUNDARY. Some states are already good enough for > SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In > other words, core dump write starts earlier. > > It might be not very significant reasons. > > From what I see in the code, our NFS client has similar issue of calling > lf_advlock() with TDF_SBDRY set. Below is the patch to fix that. > Similar bug existed in our fifofs, see r277321. Thanks. It may be that a similar fix is appropriate in our locking code, but I'll have to spend more time reading it.