Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 08 Jun 2005 01:12:41 -0600
From:      Scott Long <scottl@samsco.org>
To:        Richard Coleman <rcoleman@criticalmagic.com>
Cc:        Pawel Jakub Dawidek <pjd@FreeBSD.org>, scottl@FreeBSD.org, Ivan Voras <ivoras@fer.hr>, David Malone <dwmalone@maths.tcd.ie>, hackers@FreeBSD.org, phk@FreeBSD.org
Subject:   Re: Google SoC idea
Message-ID:  <42A69A69.2040005@samsco.org>
In-Reply-To: <42A647B8.30709@criticalmagic.com>
References:  <42A475AB.6020808@fer.hr>	<20050607194005.GG837@darkness.comp.waw.pl>	<20050607201642.GA58346@walton.maths.tcd.ie> <42A6091C.40409@samsco.org> <42A647B8.30709@criticalmagic.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Richard Coleman wrote:
> Scott Long wrote:
> 
>> /me jumps up and down and waves his hands
>>
>> The problem with journalling at the block layer is that you pretty 
>> much become forced to journal metadata and data, since the block layer 
>> really doesn't know the distinction, and definitely not in a 
>> filesystem-independent way (yes, UFS does evil things to the buffer 
>> cache by representing metadata with negative block numbers, but that 
>> is just UFS).  Full journalling has many drawbacks from the viewpoint 
>> of speed and complexity, of course.  So you really want to be able to 
>> do just metadata journalling.
>>
>> Another hard part of distinguishing between metadata and data is that 
>> filesystems have a habit of migrating disk blocks from holding 
>> metadata to holding data, and vice versa (think indirect pointer 
>> blocks, not inode blocks).  If you are only replaying metadata, you 
>> want to make sure that you don't smash data blocks with old metadata.
>>
>> Coming up with a filesystem independent way to represent all of this 
>> for the block layer is not easy.  Filesystems would have to be able to 
>> be modified to provide proper metadata vs. data hints to the block 
>> layer. And if you're going to do that, then why not just make it a 
>> library in VFS, like what Darwin does?
>>
>> The UFS Journalling work is already well underway, and I expect it to 
>> follow the path of being a VFS library.  Note that I'm saying 
>> 'library' here, not 'layer'.  There really is no way to make 
>> journalling work with an arbitrary filesystem 'for free', whether as a 
>> VFS layer or a GEOM transform, since journalling is 100% dependent on 
>> the filesystem working with the buffer-cache to do sane operations in 
>> a defined in order.
>>
>> An alternate SoC project that would be very useful is block-level 
>> snapshots.  I'm not sure if I'll be able to retain the filesystem 
>> snapshot functionality in UFS with journalling enabled, so moving to 
>> doing the snapshots in the block layer would be a good way to make up 
>> for this.  Beware that while the GEOM transform would be pretty 
>> straight-forward to write, the real trick comes from being able to 
>> make the consumer of a block device (a filesystem, maybe) flush itself 
>> to a consistent state while the snapshot is being taken.  The 
>> infrastructure for this is the part that is very interesting, but also 
>> the most work.
>>
>> Scott
> 
> 
> Scott,
> 
> Have you looked at the journaling layer that Matt has been adding to 
> DragonflyBSD?  What you are talking about appears very similar.  Or am I 
> misunderstanding something?
> 
> Richard Coleman
> rcoleman@criticalmagic.com

Ah, you might have misunderstood my use of the term 'VFS library'.  This
is distinctly different from a 'VFS layer', which is what Matt did.
I've looked extensively at his work, but unfortunately it doesn't solve
the kinds of problems that I'm looking to solve.  After discussing
journalling this evening with the author of BeFS and HFS+J, I'm pretty
happy that I'm taking the approach that I am.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42A69A69.2040005>