Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Jan 2006 11:43:18 -0800 (PST)
From:      Don Lewis <truckman@FreeBSD.org>
To:        Tor.Egge@cvsup.no.freebsd.org
Cc:        gcr+freebsd-stable@tharned.org, freebsd-stable@FreeBSD.org, kris@obsecurity.org
Subject:   Re: Recurring problem: processes block accessing UFS file system
Message-ID:  <200601021943.k02JhI1d005076@gw.catspoiler.org>
In-Reply-To: <20051126.000406.74717773.Tor.Egge@cvsup.no.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 26 Nov, Tor Egge wrote:
> 
>> Thanks Kris, these are exactly the clues I needed.  Since the deadlock 
>> during a snapshot is fairly easy to reproduce, I did so and collected this 
>> information below.  "alltrace" didn't work as I expected (didn't produce a 
>> trace), so I traced each pid associated with a locked vnode separately.
> 
> The vnode syncing loop in ffs_sync() has some problems:
> 
>   1. Softupdate processing performed after the loop has started might
>      trigger the need for retrying the loop.  Processing of dirrem work
>      items can cause IN_CHANGE to be set on some inodes, causing
>      deadlock in ufs_inactive() later on while the file system is
>      suspended).

I also don't like how this loop interacts with the vnode list churn done
by vnlru_free().  Maybe vnode recycling for a file system should be
skipped while a file system is being suspended or unmounted.

>   2. nvp might no longer be associated with the same mount point after
>      MNT_IUNLOCK(mp) has been called in the loop.  This can cause the
>      vnode list traversal to be incomplete, with stale information in
>      the snapshot.  Further damage can occur when background fsck uses
>      that stale information.

It looks like this is handled in __mnt_vnode_next() by starting over.
Skipping vnode recycling should avoid this problem in the snapshot case.

This loop should be bypassed in normal operation and the individual
vnode syncing should be done by the syncer.  The only reason this loop
isn't skipped during normal operation is that that timestamp updates
aren't sufficient to add vnodes to the syncer worklist.

> Just a few lines down from that loop is a new problem:
> 
>   3. softdep_flushworklist() might not have processed all dirrem work
>      items associated with the file system even if both error and count
>      are zero. This can cause both background fsck and softupdate
>      processing (after file system has been resumed) to decrement the
>      link count of an inode, causing file system corruption or a panic.

Are you sure this is still true after the changes that were committed to
both HEAD and RELENG_6 before 6.0-RELEASE?

All the pending items that hang around various lists make me nervous,
though.  I really thing the number of each flavor should be tracked per
mount point and softupdates should complain if the counts are non-zero
at the end of the suspend and unmount tasks.

>      Processing of these work items while the file system is suspended
>      causes a panic.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200601021943.k02JhI1d005076>