Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Jan 2007 10:15:46 -0600
From:      Eric Anderson <anderson@centtech.com>
To:        Scott Oertel <freebsd@scottevil.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: skipping fsck with soft-updates enabled
Message-ID:  <45A662B2.9080801@centtech.com>
In-Reply-To: <45A511C0.9000402@scottevil.com>
References:  <45A3C96A.6030307@scottevil.com>		<200701101139.l0ABdJ9K088810@lurza.secnetix.de>	<ac00e00a0701100538m16395e87t2fbf69acfeeb04ed@mail.gmail.com> <45A485C6.2060405@scottevil.com> <45A5024F.10502@centtech.com> <45A511C0.9000402@scottevil.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 01/10/07 10:18, Scott Oertel wrote:
> Eric Anderson wrote:
>> On 01/10/07 00:20, Scott Oertel wrote:
>>> Victor Loureiro Lima wrote:
>>>> From rc.conf man page:
>>>> ---
>>>> background_fsck_delay
>>>>                 (int) The amount of time in seconds to sleep before 
>>>> starting
>>>>                 a background fsck(8).  It defaults to sixty seconds 
>>>> to allow
>>>>                 large applications such as the X server to start 
>>>> before disk
>>>>                 I/O bandwidth is monopolized by fsck(8).
>>>> ---
>>>>
>>>> You can set the delay as long as you want, so it wont have to start
>>>> right away, in fact it can start as late as a year (if thats really
>>>> what you want ;))
>>>>
>>>> att,
>>>> victor loureiro lima
>>>>
>>>> 2007/1/10, Oliver Fromme <olli@lurza.secnetix.de>:
>>>>> Scott Oertel wrote:
>>>>>  > I am wondering what kind of problems would occur, besides lost 
>>>>> space, if
>>>>>  > after a system crash a fsck is skipped. According to the 
>>>>> documentation,
>>>>>  > with soft-updates enabled, the file system would be consistant, 
>>>>> there
>>>>>  > would just be lost resources to be recovered which I am assuming 
>>>>> can be
>>>>>  > safely done at a later time to avoid long periods of downtime 
>>>>> during
>>>>>  > peek hours.
>>>>>
>>>>> I think that's exactly what the background fsck feature
>>>>> does.  If you enable it (which is even the default), the
>>>>> fsck process doesn' start right away, so the system comes
>>>>> up in multi-user mode immediately.  Then a snapshot is
>>>>> created on the file system, and fsck runs on the snap-
>>>>> shot, freeing the lost space in the file system.
>>>>>
>>>>> Of course, it only works reliably with soft-updates enabled,
>>>>> _and_ there must not be any unexpected inconsistencies.
>>>>> However, with some common setups (e.g. cheap disks lying
>>>>> about completed write operation) it is difficult to
>>>>> guarantee the consistency.  Soft-updates is rather fragile
>>>>> when the hardware doesn't work exactly as it's supposed to.
>>>>> I've witnessed breakage in the past, and for that reason
>>>>> I always disable the background fsck feature.  And it's the
>>>>> reason I'm looking forward to gjournal to become stable,
>>>>> because it seems to be less fragile in the presence of
>>>>> imperfect hardware.
>>>>>
>>>>> Best regards
>>>>>    Oliver
>>>>>
>>>>> -- 
>>>>> Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
>>>>> Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
>>>>> Any opinions expressed in this message may be personal to the author
>>>>> and may not necessarily reflect the opinions of secnetix in any way.
>>>>>
>>>>> "C++ is to C as Lung Cancer is to Lung."
>>>>>         -- Thomas Funke
>>>>> _______________________________________________
>>> The problem with background fsck is that on my machines, it doesn't 
>>> work well. These machines have 8x750gb SATA drives and they are under 
>>> extreme stress all the time. When you run fsck in the background each 
>>> drive takes 10+ minutes to create the snapshot file, during which 
>>> time the machine is completely unresponsive, and unstable.
>> What version of FreeBSD are you running?  You might try gjournal, 
>> which I've had great luck with, and Pawel (pjd@) is incredibly 
>> responsive to bug reports, etc.
>>
>>> That is why I am wondering, if it is ok to skip the background 
>>> fsck's, foreground fsck's and reschedule them for a later time, 
>>> during non peak hours.
>> I think most people would be nervous to tell you 'sure, skip it until 
>> later', but I can tell you from experience that I myself have delayed 
>> fscking for weeks on end, to do exactly what you want.
>>
>> Eric
>>
>>
>>
> I'm running on 6.2-RC2. For fun I tried to create a snapshot on one of 
> our newest machines, same drive config as the previous ones, it's just 
> less active then the others. It's running 6.2RC2 and it just completely 
> locked up. Anyway, thanks for the suggestion about running gjournal, i'm 
> not sure running non-offical patches on the file system code with 
> production machines is such a great idea. Have you had any problems with 
> gjournal, if so, of what nature were they?
> 


Honestly, I haven't had many issues with snapshots since 6.1-ish and 
before.  There were lots of deadlocks, livelocks, etc.  I think Kris@ 
has done a bang up job at finding bugs and getting them fixed.  If you 
still see snapshot issues like this, it would be great if you could 
start sending some info like a ps -auxl, and if it's a deadlock, drop to 
the debugger and get a crash dump.

As far as gjournal, I now have it running on several systems, all very 
high usage NFS servers (~1000 high end machines pounding them very hard, 
24x7).  I've only seen a few little issues on one of my systems that is 
running an older 6-STABLE (it's a little difficult for me to update it 
right now), but all my other systems have been very solid.  PJD has done 
a great job getting it stable and ready for production use.  As far as I 
have experienced, I have had no data loss, and no file system corruption 
using it.  The worst that's happened is a livelock, followed by a 
reboot.  Since it is indeed journaled, the reboot takes a few minutes, 
and the fsck takes a few *seconds* (on a 10TB volume).   I would say, 
that using gjournal is more reliable over time, than relying on 
background fsck's.  Gjournal is, however, still in a beta test mode, 
however you should do your own testing to evaluate it.  You can always 
disable it very easily, without losing your data.

Eric



-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
An undefined problem has an infinite number of solutions.
------------------------------------------------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45A662B2.9080801>