Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 19 Jan 2010 15:23:17 -1000 (HST)
From:      Jeff Roberson <jroberson@jroberson.net>
To:        arch@freebsd.org
Subject:   Softdep journaling
Message-ID:  <alpine.BSF.2.00.1001191510070.1027@desktop>

next in thread | raw e-mail | index | archive | help
Hello,

Many of you may have already noticed that I have implemented a journaling 
layer that co-exists with softdep to eliminate fsck after an unclean 
shutdown.  I have written about this here:

http://jeffr-tech.livejournal.com/

And I have a patch against current here:

http://people.freebsd.org/~jeff/suj.diff

I have been working with McKusick and he has been providing review 
feedback.  Tegge and kib have been reviewing my rename changes.  Peter 
Holm has generously provided his time for testing.  I am within a week of 
being able to commit this to CURRENT.  I'm raising this here so people can 
discuss the project and I can answer any questions or concerns before it 
goes in the tree.

Briefly, I have added an intent log to softdep that journals block 
allocation and free along with inode link count changes.  After an unclean 
shutdown a special fsck pass reads this journal and frees blocks and 
inodes.  The recovery pass is not like traditional block journaling as it 
actually evaluates the filesystem state to determine how far along the 
operation made it and rolls back intelligently.

The worst case journal recovery time I've seen is a couple of minutes, 
however, I'm still generating a few hundred megabytes of text describing 
the operation when I run fsck so that I can quickly resolve any bugs. 
This worst case performance was generated using pho's stress2 and a 
completely full 64MB journal containing nearly 2 million outstanding 
records.  Recovery time for a crash during buildworld, for example, is on 
the order of 10 seconds even while producing the text log.  Without the 
log I expect the maximum on any drive to be around 2 minutes.  Presently 
recovery is actually cpu bound and I'm using 3 year old hardware.  It 
scales up with the size of the journal and down with the speed of the 
processor.  The size of the filesystem makes little difference.

The filesystem can not be mounted read/write until the journal is 
recovered or a full fsck pass is run.  The filesystem will be backwards 
compatible with earlier ffs implementations.  The journal can be enabled 
or disable with tunefs.  The only requirement is sufficient free space for 
the journal which is stored in a regular inode.

The patch I have presented is mostly complete.  It only lacks the recovery 
operation for partial truncation.  I'm still running through various 
scenarios to validate the checker, however, the kernel has been very 
stable as of late.

Please raise any comments or concerns here.  I'm going to make another 
call for testers on current@ and want to keep that reserved for bug 
reports.

Thanks,
Jeff



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1001191510070.1027>