Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Jul 2013 21:39:23 +0200
From:      Andreas Longwitz <longwitz@incore.de>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Shutdown hangs on unmount of a gjournaled file system in 8-Stable
Message-ID:  <51DC66EB.40109@incore.de>
In-Reply-To: <20130708054301.GI91021@kib.kiev.ua>
References:  <51D9EB23.4070505@incore.de> <20130708054301.GI91021@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote:
> On Mon, Jul 08, 2013 at 12:26:43AM +0200, Andreas Longwitz wrote:
>> The deadlock can be explained now: pid 1 (init) sleeps on "mount drain"
>> because mp->mnt_lockref was 1. This setting was done by pid 18 (gjournal
>> switcher) by calling vfs_busy(). pid 18 now sleeps on "suspwt" because
>> mp->mnt_writeopcount was 1. This setting was done by pid 1 before going
>> to sleep by calling vn_start_write() in dounmount().
>>
>> I think the reason for this deadlock is the commit r249055 which seems
>> not to be compatible with gjournal.
> Thank you for the analysis. I think 'not compatible' is some
> understatement. The situation clearly causes a deadlock, you are right.
> 
> The vfs_busy(); vfs_write_suspend(); call sequence is somewhat dubious,
> in fact, exactly because unmount could start in between. I think that
> vfs_write_suspend() must avoid setting MNT_SUSPEND if unmount was
> started. Patch below, for HEAD, should fix the problem, by marking the
> callers of vfs_write_suspend(), which are not protected by the covered
> vnode lock, with the VS_SKIP_UNMOUNT flag.

Agree.

> I believe that the conflicts on stable/8 should be trivial, if any.

Yes, I have adapted r244795, r244925 and r245286 from head and your
patch for the umount hang to 8-Stable and everything looks fine. All my
reboots worked as expected.

By the way, because the source gjounal.c is involved: can you extend the
panic message for Journal overflow a little bit:

-> diff g_journal.c.orig g_journal.c
342,343c343,344
< panic("Journal overflow (joffset=%jd active=%jd inactive=%jd)",
<                   (intmax_t)sc->sc_journal_offset,
---
> panic("Journal overflow (id=%d joffset=%jd active=%jd inactive=%jd)",
>                   sc->sc_id, (intmax_t)sc->sc_journal_offset,

This was helpful for analyzing the still unsolved "suspwt" lock problem
from kern/164252, please look at
 http://lists.freebsd.org/pipermail/freebsd-geom/2012-May/005246.html

-- 
Andreas Longwitz





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51DC66EB.40109>