Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 May 1999 22:21:45 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        Dom.Mitchell@palmerharvey.co.uk (Dom Mitchell)
Cc:        naddy@mips.rhein-neckar.de, freebsd-chat@FreeBSD.ORG
Subject:   Re: SGI, XFS and OSS?
Message-ID:  <199905212221.PAA06728@usr07.primenet.com>
In-Reply-To: <E10knhm-000CNE-00@voodoo.pandhm.co.uk> from "Dom Mitchell" at May 21, 99 12:43:10 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > For those of us who don't use Irix systems, much less administrate any,
> > could somebody sum up what's so remarkable about XFS?
> > 
> > Jamie Bowden <ragnar@sysabend.org> wrote:
> > 
> > > XFS is -FAST-
> > 
> > Anything else?
> 
> Basically, it's a transactional logging filesystem (fast recovery, fast
> metadata updates), like LFS was going to be.  It also has Btree based
> directories (as opposed to FFS's linear directories) which can make
> things quicker.
> 
> Many other filesystems also have these attributes.  For example HPFS
> (OS2) and NTFS (WinNT).  However, XFS appears to be well done and
> designed with Unix in mind.

HPFS has btree's, and NTFS has logs.

XFS is more similar to IBM's JFS; it's a Jouranalling filesystem.

The difference between a Journaliing filesystem and a log structured
filesystem is that a log structured filesystem logs transactions,
followed by a log of a validation timestamp after they have been
committed.

A log structured FS moves forward in timestamp increments through
transaction records.

A journalling filesystem journals the intended action, completes the
intended action, and logs a timestamp.

The difference here is whether you merely log the action, or you
journal your intent.

This means that a journaling FS is capable of rolling uncommited
transactions backward OR forwards, whereas an LFS can only roll
transactions backwards.  This is less useful if you are, for
example, implementing an ATM machine or doing wire transfers.

The LFS will degrade to fsync() performance, whereas the JFS will
delay the acknowledgement until the time stamp (commit), but will
continue to allow concurrent operations.

Similarly, LFS's are unable to imply state; however, a JFS can
imply state.  This allows you to create a transaction, and then
create subtransactions which have been committed, but then abort
the transaction, decommitting the subtransactions at the same time.

The LFS in BSD 4.4, and in NTFS, and (as has been described) in
ext3fs, is inferior to a JFS.  Without a JFS, you can't export a
transactioning interface to user space without introducing
synchronization points.

Soft updates can be though of as a logging mechanism, where the log
is in memory, and the stanchion commits are really implicit in the
metadata ordering.  You take one hit because you have to impose an
order on the operations, potentially pessimizing them, and you take
another because of the graph order vs. whether you are bredth or depth
first in your operations, if you perform operations in a tree.

In practice, soft updates roll back, just like LFS, and they take
the same hierarchy order hit for not being btree'ed in one of depth
vs. bredth ordering (i.e., the most intentionally pessimal case you
can possibly obtain is the deletion of the /usr/ports tree).

Like logging, soft updates *could* expose a user level transaction
interface (by adding a "user transaction" order dependency) by
introducing additional synchronization points, but such an interface
would be far less efficient than the concurrent one a JFS can offer.

Finally, as to the "fsck time" argument: the fsck of a soft updates
volume following a crash can occur in the backgraound, assuming the
creash was not the result of a disk or controller failure, since
the only thing that is incorrect is that the cylinder group bitmaps
indicate allocations that do not, in fact, exist.  This could easily
be taken care of by running a "CG fixup" process (as opposed to a
full fsck) in the background.  The algorithm would be to merely
traverse each cylinder group by locking access to it, correcting
the bitmap, unlocking it, and going on to the next group.

Thus the "reboot time" argument goes out the window, and we are
left with: (1) additional synchronization points for stanchion
events relative to XFS, (2) the inability to currently support a
user leve transactioning interface, and (3) the inability to roll
completed transactions forward instead of backward, and the resulting
synchronization and/or distributed coherency issues arising therefrom.

XSF would be neat technology to integrate, but there is additional
work that could be done on soft updates as it currently stands (e.g.,
the most obvious, which Kirk McKusick and Matt Day, Mark Muhlestien,
and myself independently arrived at, is "soft read-only", where if
there are no pending transactions for two updated cycles, a flag can
be set, and the FS superblock could have the clean bit set.  Any
dirtying operation thereafter would redirty the superblock, unset
the soft read-only bit in the incore flags, and allow the operation
to complete.  The BSDI implementation has this feature, in fact).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199905212221.PAA06728>