Date: Tue, 30 Sep 2008 12:00:30 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Jeremy Chadwick <koitsu@FreeBSD.org> Cc: freebsd-stable@FreeBSD.org Subject: Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY Message-ID: <200809301900.m8UJ0Ui2047243@apollo.backplane.com> References: <20080927051413.GA42700@icarus.home.lan> <765067435.20080926223557@takeda.tk> <20080927064417.GA43638@icarus.home.lan> <588787159.20080927003750@takeda.tk> <5f67a8c40809282030l7888d942q548d570cd0b33be9@mail.gmail.com> <20080929040025.GA97332@icarus.home.lan> <48E080C0.9070103@modulus.org> <5f67a8c40809290809j58639df8ka65184151161cab6@mail.gmail.com> <5f67a8c40809290849m413eebe6sd31a493aea506932@mail.gmail.com> <200809291744.m8THiBlR034739@apollo.backplane.com> <20080930053619.GA37286@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
:The topic of BIO_FLUSH is something I got to thinking about last night :at work; the only condition where a disk with write caching enabled :*would not* fully write the data to the platter would in fact be power :loss. All other conditions (specifically soft reset and panic) should :not require explicit flushing. : :I wonder why this is being done, especially on shutdown of FreeBSD. :Assuming I understand it correctly, I'm talking about this: : :Waiting (max 60 seconds) for system process `bufdaemon' to stop...done :Waiting (max 60 seconds) for system process `syncer' to stop... :Syncing disks, vnodes remaining...3 3 3 2 2 0 0 done :All buffers synced. : :-- :| Jeremy Chadwick jdc at parodius.com | BIO_FLUSH and "Syncing disks, vnodes ..." are two different things, so I'm not sure of the context but I will describe issues with both. -- BIO_FLUSH commands the disk firmware to flush out any dirty buffers in its drive cache. That is, writes that you have *already* issued to the drive and which returned completion, but which have not actually made it to the physical media yet. This is different from dirty buffers still being maintained by the kernel which have not yet been sent to the drive. (Just repeating this so the definition is clear to all the readers). So, yes, you would want to do a BIO_FLUSH before powering down a machine (halt -p) to ensure that all the dirty data you sent to the disk actually gets to the platter. I think you also want to issue it for a soft reset. It would not effect a SATA drive but it certainly would effect a USB drive powered from the computer. USB ports will be powered down during a soft reset. BIO_FLUSH isn't likely to cause problems during a crash, unlike flushing the buffer cache. Some people may remember earlier versions of Windows XP often powered the machine down before the hard drive managed to write all of its data to the platter. Sometime that would even destroy sectors on the drive. We know bad things happen if we don't issue the command, so best not to take chances by making assumptions. -- The "Syncing disks, vnodes ..." is the kernel flushing out any dirty data in the buffer cache which has not yet been sent to the disk driver. This is more problematic. Filesystems such as HAMMER (and presumably ZFS) absolutely do NOT want the system to flush dirty buffers unless they explicitly give permission to do so, because the dirty buffers might represent data for which the recovery information has not yet been written out, and thus can corrupt the filesystem on-media if a crash were to occur right then. In HAMMER's case I enchanced the bioops a bit to allow HAMMER to veto write-outs initiated by the system. sync_on_panic is irrelevant, the buffers will not be synced without HAMMER's permission and it won't give it. There is also the very real general case where a traditional filesystem such as UFS must peform multiple buffer cache ops, dirtying multiple buffer cache buffers, in order to complete an operation. If a crash were to occur right in the middle of such a sequence the kernel would wind up writing dirty buffers related to incomplete operations to the media, resulting in corruption. In the case of softupdates one is presented with a conundrum. If you don't write out the buffer cache during a crash you stand to lose a lot more then 60 seconds worth of changes due to deep dependancy chains. One 'sync' doesn't do the job and even though it is supposed to get all the primary data and meta-data onto the disk and just leave the bitmap updates for background operations it doesn't always seem to do that. The softupdates code is very fragile. On the other hand, if you *DO* try to write out the buffer cache during a crash you have a good chance of deadlocking the system or double-panicing, resulting in inconsistencies on the media, and you risk doing a partial write out also resulting in inconsistencies on the media. Here is example: How does the crash code deal with dirty but locked buffer cache buffers? Say you have a softupdates filesystem and through the course of operations you dirty a dozen buffers, then a crash occurs while you are in the middle of ANOTHER softupdates operation which is holding several buffers already dirtied by previous operations locked. What happens now if the crash code tries to sync the buffer cache? Will it sync the previously dirtied buffers that are currently locked? Will it sync the ones that haven't been locked but skip the ones that are locked? You lose both ways. There is no way to safely sync ANYTHING, whether locked or not, without risking unexpected softupdates inconsistencies on-media. This alone makes background fsck problematic and risky. -Matt Matthew Dillon <dillon@backplane.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200809301900.m8UJ0Ui2047243>