Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 08 Jun 2005 06:46:59 -0500
From:      Eric Anderson <anderson@centtech.com>
To:        Scott Long <scottl@samsco.org>
Cc:        Pawel Jakub Dawidek <pjd@freebsd.org>, scottl@freebsd.org, Ivan Voras <ivoras@fer.hr>, David Malone <dwmalone@maths.tcd.ie>, hackers@freebsd.org, phk@freebsd.org, Richard Coleman <rcoleman@criticalmagic.com>
Subject:   Re: Google SoC idea
Message-ID:  <42A6DAB3.4080105@centtech.com>
In-Reply-To: <42A69A69.2040005@samsco.org>
References:  <42A475AB.6020808@fer.hr>	<20050607194005.GG837@darkness.comp.waw.pl>	<20050607201642.GA58346@walton.maths.tcd.ie>	<42A6091C.40409@samsco.org> <42A647B8.30709@criticalmagic.com> <42A69A69.2040005@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Scott Long wrote:
> Richard Coleman wrote:
> 
>> Scott Long wrote:
>>
>>> /me jumps up and down and waves his hands
>>>
>>> The problem with journalling at the block layer is that you pretty 
>>> much become forced to journal metadata and data, since the block 
>>> layer really doesn't know the distinction, and definitely not in a 
>>> filesystem-independent way (yes, UFS does evil things to the buffer 
>>> cache by representing metadata with negative block numbers, but that 
>>> is just UFS).  Full journalling has many drawbacks from the viewpoint 
>>> of speed and complexity, of course.  So you really want to be able to 
>>> do just metadata journalling.
>>>
>>> Another hard part of distinguishing between metadata and data is that 
>>> filesystems have a habit of migrating disk blocks from holding 
>>> metadata to holding data, and vice versa (think indirect pointer 
>>> blocks, not inode blocks).  If you are only replaying metadata, you 
>>> want to make sure that you don't smash data blocks with old metadata.
>>>
>>> Coming up with a filesystem independent way to represent all of this 
>>> for the block layer is not easy.  Filesystems would have to be able 
>>> to be modified to provide proper metadata vs. data hints to the block 
>>> layer. And if you're going to do that, then why not just make it a 
>>> library in VFS, like what Darwin does?
>>>
>>> The UFS Journalling work is already well underway, and I expect it to 
>>> follow the path of being a VFS library.  Note that I'm saying 
>>> 'library' here, not 'layer'.  There really is no way to make 
>>> journalling work with an arbitrary filesystem 'for free', whether as 
>>> a VFS layer or a GEOM transform, since journalling is 100% dependent 
>>> on the filesystem working with the buffer-cache to do sane operations 
>>> in a defined in order.
>>>
>>> An alternate SoC project that would be very useful is block-level 
>>> snapshots.  I'm not sure if I'll be able to retain the filesystem 
>>> snapshot functionality in UFS with journalling enabled, so moving to 
>>> doing the snapshots in the block layer would be a good way to make up 
>>> for this.  Beware that while the GEOM transform would be pretty 
>>> straight-forward to write, the real trick comes from being able to 
>>> make the consumer of a block device (a filesystem, maybe) flush 
>>> itself to a consistent state while the snapshot is being taken.  The 
>>> infrastructure for this is the part that is very interesting, but 
>>> also the most work.
>>>
>>> Scott
>>
>>
>>
>> Scott,
>>
>> Have you looked at the journaling layer that Matt has been adding to 
>> DragonflyBSD?  What you are talking about appears very similar.  Or am 
>> I misunderstanding something?
>>
>> Richard Coleman
>> rcoleman@criticalmagic.com
> 
> 
> Ah, you might have misunderstood my use of the term 'VFS library'.  This
> is distinctly different from a 'VFS layer', which is what Matt did.
> I've looked extensively at his work, but unfortunately it doesn't solve
> the kinds of problems that I'm looking to solve.  After discussing
> journalling this evening with the author of BeFS and HFS+J, I'm pretty
> happy that I'm taking the approach that I am.

Maybe a good SoC project (but maybe too much work) would be getting the 
clustering UFS stuff going.. :)

Eric



-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
A lost ounce of gold may be found, a lost moment of time never.
------------------------------------------------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42A6DAB3.4080105>