FreeBSD Mail Archives

Date:      Mon, 14 May 2001 20:44:31 -0700 (PDT)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        Greg Lehey <grog@lemis.com>, Kris Kennaway <kris@obsecurity.org>, Terry Lambert <tlambert@primenet.com>, Kirk McKusick <mckusick@mckusick.com>, Mikhail Teterin <mi@misha.privatelabs.com>, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG, Ruslan Ermilov <ru@FreeBSD.ORG>, fs@FreeBSD.ORG
Subject:   Re: [kris@obsecurity.org: Re: cvs commit: src/etc rc]
Message-ID:  <200105150344.f4F3iVI45699@earth.backplane.com>
References:  <200105132342.QAA21879@beastie.mckusick.com> <200105142334.QAA05923@usr06.primenet.com> <20010515115630.H59553@wantadilla.lemis.com> <20010514193332.A85465@xor.obsecurity.org> <20010515120558.M59553@wantadilla.lemis.com> <20010514202707.B93481@xor.obsecurity.org>

    I have to say, just IMHO, that as much as I like the concept of a
    background fsck, I will never ever in my life use the feature.  I'll
    use the snapshots, definitely.  But not the background fsck.  It
    is plain and simply too dangerous, *especially* on large partitions where
    one has a lot to lose if something goes wrong.  UFS just isn't designed
    to be able to guarentee recovery, even if softupdates can't fail
    theoretically.  We would need a log or journal to reach the safety
    factor that something like XFS or ReiserFS can theoretically achieve.

    I welcome Kirk's addition of the feature, but I have to say that,
    IMHO, the *default* should not be to background fsck.  The default should
    be to remain safe and foreground fsck.

    If I have a huge partition that I intend to store a database in (for
    example), then judicious use of newfs's -c and -i options is sufficient
    to reduce fsck times.

    Ultimately I believe that as storage systems get larger, the only safe
    solution is going to be replicated, distributed, quorum-based 
    transactional filesystems.  That way if a node goes down, it doesn't
    matter if it takes an hour to validate itself before coming back up.
    RAID-XYZ doesn't hack it -- it's still vulnerable to filesystem corruption
    due to software.  Having written a database that does this sort of
    replication (in a read-write transactional environment), I've become
    a great believer in it and I think it is the only future for storing
    huge amounts of data.  I think it is possible to solve the slow-write
    problem (needing a quorum to commit a write) through the use of a 
    client-side cache, similar to what NFS does (note: I haven't done this
    for my database yet, but I can see how it could be done for a filesystem).
    It gives me great peace of mind to know that I can pull the plug on
    an entire colocation site and have the realtime users of our product
    NOT notice that it happened. 

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200105150344.f4F3iVI45699>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation