Date: Wed, 08 Jun 2005 06:46:59 -0500 From: Eric Anderson <anderson@centtech.com> To: Scott Long <scottl@samsco.org> Cc: Pawel Jakub Dawidek <pjd@freebsd.org>, scottl@freebsd.org, Ivan Voras <ivoras@fer.hr>, David Malone <dwmalone@maths.tcd.ie>, hackers@freebsd.org, phk@freebsd.org, Richard Coleman <rcoleman@criticalmagic.com> Subject: Re: Google SoC idea Message-ID: <42A6DAB3.4080105@centtech.com> In-Reply-To: <42A69A69.2040005@samsco.org> References: <42A475AB.6020808@fer.hr> <20050607194005.GG837@darkness.comp.waw.pl> <20050607201642.GA58346@walton.maths.tcd.ie> <42A6091C.40409@samsco.org> <42A647B8.30709@criticalmagic.com> <42A69A69.2040005@samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Scott Long wrote: > Richard Coleman wrote: > >> Scott Long wrote: >> >>> /me jumps up and down and waves his hands >>> >>> The problem with journalling at the block layer is that you pretty >>> much become forced to journal metadata and data, since the block >>> layer really doesn't know the distinction, and definitely not in a >>> filesystem-independent way (yes, UFS does evil things to the buffer >>> cache by representing metadata with negative block numbers, but that >>> is just UFS). Full journalling has many drawbacks from the viewpoint >>> of speed and complexity, of course. So you really want to be able to >>> do just metadata journalling. >>> >>> Another hard part of distinguishing between metadata and data is that >>> filesystems have a habit of migrating disk blocks from holding >>> metadata to holding data, and vice versa (think indirect pointer >>> blocks, not inode blocks). If you are only replaying metadata, you >>> want to make sure that you don't smash data blocks with old metadata. >>> >>> Coming up with a filesystem independent way to represent all of this >>> for the block layer is not easy. Filesystems would have to be able >>> to be modified to provide proper metadata vs. data hints to the block >>> layer. And if you're going to do that, then why not just make it a >>> library in VFS, like what Darwin does? >>> >>> The UFS Journalling work is already well underway, and I expect it to >>> follow the path of being a VFS library. Note that I'm saying >>> 'library' here, not 'layer'. There really is no way to make >>> journalling work with an arbitrary filesystem 'for free', whether as >>> a VFS layer or a GEOM transform, since journalling is 100% dependent >>> on the filesystem working with the buffer-cache to do sane operations >>> in a defined in order. >>> >>> An alternate SoC project that would be very useful is block-level >>> snapshots. I'm not sure if I'll be able to retain the filesystem >>> snapshot functionality in UFS with journalling enabled, so moving to >>> doing the snapshots in the block layer would be a good way to make up >>> for this. Beware that while the GEOM transform would be pretty >>> straight-forward to write, the real trick comes from being able to >>> make the consumer of a block device (a filesystem, maybe) flush >>> itself to a consistent state while the snapshot is being taken. The >>> infrastructure for this is the part that is very interesting, but >>> also the most work. >>> >>> Scott >> >> >> >> Scott, >> >> Have you looked at the journaling layer that Matt has been adding to >> DragonflyBSD? What you are talking about appears very similar. Or am >> I misunderstanding something? >> >> Richard Coleman >> rcoleman@criticalmagic.com > > > Ah, you might have misunderstood my use of the term 'VFS library'. This > is distinctly different from a 'VFS layer', which is what Matt did. > I've looked extensively at his work, but unfortunately it doesn't solve > the kinds of problems that I'm looking to solve. After discussing > journalling this evening with the author of BeFS and HFS+J, I'm pretty > happy that I'm taking the approach that I am. Maybe a good SoC project (but maybe too much work) would be getting the clustering UFS stuff going.. :) Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology A lost ounce of gold may be found, a lost moment of time never. ------------------------------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42A6DAB3.4080105>