Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Dec 2001 13:07:46 +1000
From:      "Andrew Hannam" <famzon@bigfoot.com>
To:        "John Hanley" <jh_@yahoo.com>
Cc:        <freebsd-small@freebsd.org>
Subject:   Re: Disk Writes
Message-ID:  <001901c18dba$83dfcbc0$0104010a@famzon.com.au>
References:  <20011224070645.77398.qmail@web10108.mail.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks for your help...

Just a short note on the power-fail condition ; With our equipment a power
failure is more likely to come just after a transaction that requires
writing to the disk. The power fail (if it occurs) is most likely to occur
5 -> 10 seconds after the transaction and is the result of action by a
serviceman at the machine (about once or twice a week). I therefore believe
it should be possible to achieve a safe write 100% of the time.
This however has not been born out in practice with a failure rate of about
0.5% in these conditions. With a 1000 machines in the field this would
equate to a failure of about 1 to 2 machines a day. This is not acceptable
in practice so I must find a solution.

The hard-links idea is a useful bit - I'll add it to my toolkit for FreeBSD.
I had tried this technique before on a Linux box but in Linux the rename(2)
call is not atomic where an existing file exists.

Using fsync() or even sync are not generally options for me as I am using
java for a large part of the application. Where C has been used - it is
liberally sprinkled with fsync and sync. Special care has been taken with
the java to ensure that files are being closed properly. Without the files
being closed after each write operation I found that even on unbuffered
writes that there was a high probability of file corruption on power-down.
Now that files are being closed after each write, I never seem to lose
information during an fsck auto-repair (a great improvement) however
occasionally fsck is not able to repair it at all effectively causing the
device to be inoperable with a return to depot for repair. The return to
depot is expensive and requires special low level data extraction to try and
get the information off the now badly corrupted disk.

I have tried both with and without soft updates. The largest problem of
using soft updates is the latency - using the standard parameters it can
take up to 30 seconds for data to actually be written out to disk thus
introducing a good probability of losing information on a power down.

The 0, 1 & 2 second delays are much better (using my kernel parameters) but
2 seconds is still a long time in my application. Looking through the code -
0, 1 & 2 second delays appear to be the smallest periods available without
affecting the way soft updates work. With soft updates I seem to be more
likely to lose information but less likely to kill the disk. Given this
compromise I have turned off soft-updates.

Is there an alternative file-system that would be more tolerant to
power-down issues? For example, with original DOS operating system I can't
remember ever having this sort of problem until Windows started adding write
caching.

Is there some option to make file-system calls totally synchronous (turning
off all write caching) thus significantly reducing the risk? Write speed
performance is not a critical criteria. Integrity and completeness are far
more important.

It might be true that voltage sag on write is toasting extra super-block
copies or alternatively that the super-blocks are not being written
synchronously. I have two variants of the motherboard hardware using
different chipsets with two different sized hard-drives (3" and 2") so it
doesn't appear to be a hardware specific problem.

If this is what is happening then having more than one super-block is an
integrity risk rather than an integrity improver because I cannot manually
fsck after a super-block corruption. How then would I turn off the extra
super-block copies? I presume this would be done at file-system creation
time.

An example of redundant information being useless is in the original FAT
file-system. The second copy of the FAT is only ever used to detect that the
two copies of the FAT are out of sync. I have never seen a DOS or Windows
utility that takes any notice of the information written in the second copy
of the FAT. For example, scandisk (equivalent to fsck) detects that they are
different but the only repair option is to write the first copy of the FAT
on top of the second copy of the FAT.

Is the UFS file-system and fsck different in this regard?

----- Original Message -----
From: "John Hanley" <jh_@yahoo.com>
To: "Andrew Hannam" <famzon@bigfoot.com>
Sent: Monday, December 24, 2001 5:06 PM
Subject: Re: Disk Writes


> --- Andrew Hannam <famzon@bigfoot.com> wrote:
> > The application has been
> > written to only append to files or to replace the file by creating a new
> > file and then writing a single byte to redirect which file is in use.
>
> BTW, using hard links might be pretty slick, here.  Watch me atomically
> delete "file":
>
>   $ date > file
>   $ TMP=file.$$
>   $ mv $TMP file
>
> At worst, file.${pid} is left around.
> At all times, we have either the old or new contents available.
> The rename(2) does the "delete" and the "make new contents available"
> operations as a single atomic operation, safe in the face of power fails.
>
>
> > My current kernel settings are:
> > sysctl -w vfs.write_behind=0 kern.filedelay=2 kern.dirdelay=1
> > kern.metadelay=0
>
> I'll bet that combination of parameters has received less testing
> than the default parameters.  I feel that the default params with
> soft updates should be working just fine for you.
>
> Is power fail pretty straightforward?  Does the CPU go down before
> the device that /app is on goes down?  Maybe voltage sag at the time
> of a write is toasting one or more superblock replicas?
>
>
>        Cheers,
>        JH
>
> __________________________________________________
> Do You Yahoo!?
> Send your FREE holiday greetings online!
> http://greetings.yahoo.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-small" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?001901c18dba$83dfcbc0$0104010a>