Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Dec 1999 19:18:51 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        noslenj@swbell.net (Jay Nelson)
Cc:        tlambert@primenet.com, chat@FreeBSD.ORG
Subject:   Re: Log file systems? (Was: Re: dual 400 -> dual 600 worth it?)
Message-ID:  <199912141919.MAA20684@usr02.primenet.com>
In-Reply-To: <Pine.BSF.4.05.9912132046590.782-100000@acp.swbell.net> from "Jay Nelson" at Dec 13, 99 09:37:16 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> >They are FAQs, not "in the FAQ".
> 
> I suspect they probably should be in the FAQ. The average admin who
> doesn't follow mailing lists asks questions like this. The more we
> claim (justifiably) stability, the more seriously they evaluate
> FreeBSD against commercial alternatives. This is an area where few of
> us really understand the issues involved.
> 
> >The archives you should be looking at, and the place you should be
> >asking the question are the freebsd-fs list.
> 
> I did look in the fs archives -- although I'm not sure the general
> question belongs there since it seems to have more to do with the
> differences between FreeBSD and the commercal offerings.
> 
> Is it fair to summarize the differences as:
> 
> Soft updates provide little in terms of recovering data, but enhances
> performance during runtime. Recovery being limited to ignoring
> metadata that wasn't written to disk.

No.

Soft updates:

What is lost are uncommitted writes.  Committed writes are
guaranteed to have been ordered.  This means that you can
deterministically recover the disk not just to a stable state,
but to the stable state that it was intended to be in.  The
things that are lost are implied state between files (e.g.
a record file and an index file for a database); this can be
worked around using two stage commits on the data in the
database software.

Soft updates is slow to recover because of the need to tell
the difference between a hard failure and a soft failure (a
hard failure is a software or hardware fault; a soft failure
is the loss of power).  If you can tell this, then you don't
need to fsck the drive, only recover over-allocated cylinder
group bitmaps.  This can be done in the background, locking
access to a cylinder group at a time.

Distinguishing the failure type is the biggest problem here,
and requires NVRAM or a technology like soft read-only (first
implemented by a team I was on at Artisoft around 1996 for a
port of the Heidemann framework and soft updates to Windows
95, as far as I can tell).


> Log file systems offers little data recovery in return for faster
> system recovery after an unorderly halt at the cost of a runtime
> penalty.

Log structured FSs:

Zero rotational latency on writes, fast recovery after a hard
or soft failure.

What is lost are uncommitted writes (see above).

LFSs recover quickly because they look for the metadata log
entry with the most recent date, and they are "magically"
recovered to that point.

There is still a catch-22 with regard to soft vs. hard failures,
but most hard failures can be safely ignored, since any data
dated from before the hard failure is OK, unless the drive is
going south.  You must therefore differentiate hard failures in
the kernel "panic" messages, so that a human has an opportunity
to see them.

LFSs have an ongoing runtime cost that is effectively the need
to "garbage collect" outdated logs so that their extents can
be reused by new data.


> Journaled filesystem offer the potential of data recovery at a boot
> time and runtime cost.

JFSs:

A JFS maintains a Journal; this is sometimes called an intention
log.  Because it logs its intent before the fact, it can offer
a transactional interface to user space.  This lets the programmer
skip the more expensive two stage commit process in favor of hooks
into the intention log.

Because transactions done this way can be nested, a completed
but uncommitted transaction can be rolled forward to the extent
that the nesting level has returned to "0" -- in other words,
all nested transation intents have been logged.

Because transactions can be rolled forward, you will recover to
the state that the JFS would have been in had the failure not
ever occurred.  This works, because writes, etc., are not
acknowledged back to the caller until the intention has been
carried out.  Things like an intent to delete a file, rename a
file, etc. are logged at level 0 (i.e. not in a user defined
transaction bound), and so can be acknowledged immediately;
wirtes of actual data need to be delayed, if they are in a
transaction bound.

This lets you treat a JFS as a committed stable storage, without
second-guessing the kernel or the drive cache, etc..

A JFS recovery, like an LFS recovery, uses the most recent valid
timestamp in the intention log, and then rules all transactions
that have completed forward.

Like LFS, hard errors can be ignored, unless the hard errors
occur during replay of the journal in rolling some completed
transaction forward.  Because of this, care must be taken on
recovery.

JFS recovery can take a while, if there are a lot of completed
intentions in the journal.

Many JFS implementations also use logs in order to write user
data, so that the write acknowledge can be accelerated.


> I know this is disgustingly over simplified, but about all you can get
> through to typical management.
> 
> I also have to admit, I'm a little confused with your usage of the
> word orthogonal. Do you mean that an orthogonal technology projects
> cleanly or uniformly into different dimensions of system space?

Yes.  "Mutually perpendicular" and "Intersecting at only one point".
It's my training in physics seeping through...


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912141919.MAA20684>