Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Jul 2017 13:22:00 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Kirk McKusick <mckusick@mckusick.com>
Cc:        Andreas Longwitz <longwitz@incore.de>, freebsd-fs@freebsd.org
Subject:   Re: ufs snapshot is sometimes corrupt on gjourneled partition
Message-ID:  <20170718102200.GT1935@kib.kiev.ua>
In-Reply-To: <201707180044.v6I0iKvg040471@chez.mckusick.com>
References:  <596C7201.8090700@incore.de> <201707180044.v6I0iKvg040471@chez.mckusick.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jul 17, 2017 at 05:44:20PM -0700, Kirk McKusick wrote:
> The sequence of calls when using bread is:
> 
> Function	Line	File
> --------	----	----
> bread		491	sys/buf.h
> breadn_flags	1814	kern/vfs_bio.c
> bstrategy	397	sys/buf.h
> BO_STRATEGY	86	sys/bufobj.h
> bufstrategy	4535	kern/vfs_bio.c
> ufs_strategy	2290	ufs/ufs/ufs_vnops.c
> BO_STRATEGY on filesystem device -> ffs_geom_strategy
> ffs_geom_strategy 2141	ufs/ffs/ffs_vfsops.c
> g_vfs_strategy	161	geom/geom_vfs.c
> g_io_request	470	geom/geom_io.c
> 
> Whereas readblock skips all these steps and calls g_io_request
> directly. In my looking at the gjournal code, I believe that we
> will still enter it with the g_io_request() call as I believe that
> it does not hook itself into any of the VOP_ call structure. but I
> have not done a backtrace to confirm this fact. Assuming that we
> are still getting into g_journal_start(), then it should be possible
> to catch reads that are only in the log and pull out the data as
> needed.
> 
> Another alternative is to send gjournal a request to flush its log
> before starting the removal of a snapshot.
I do not think that UFS call sequence is relevant there.  It is clearly
an underlying io device (gjournal) malfunction if it returns a data block
which is different from the latest successful written block.  As is,
whether UFS pass the read request from buffer cache by the BO_STRATEGY
layers, or directly creates bio and reads the block, is not important.

OTOH, I do not think that this is an issue that gjournal always reads
from the data area and misses journal.  The failure would be much more
spectacular in this case.  I see some gjournal code which tries to find the 
data in 'cache' on read, whatever it means.  It is clearly that sometimes
it does not find the data.  The failure is probably additionally hidden
by the buffer cache eliminating most reads for recently written data.

So the way to fix the bug is to read gjournal code and understand why
does it sometime returns wrong data.  For instance, there were relatively
recent changes to geom infrastructure allowing for direct completion of
bios.  Anyway, I have no interest in gjournal.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170718102200.GT1935>