Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Apr 2013 22:36:50 -0600
From:      Will Andrews <will@firepipe.net>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        "freebsd-fs@FreeBSD.org Filesystems" <freebsd-fs@freebsd.org>
Subject:   Re: Does sync(8) really flush everything? Lost writes with journaled SU after sync+power cycle
Message-ID:  <CADBaqmjqVzSt8BjL7YOoPtxhK0SQTbmvzdzvsQCQZ65JYHq8uw@mail.gmail.com>
In-Reply-To: <20130411160253.V1041@besplex.bde.org>
References:  <87CC14D8-7DC6-481A-8F85-46629F6D2249@dragondata.com> <20130411160253.V1041@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Apr 11, 2013 at 12:30 AM, Bruce Evans <brde@optusnet.com.au> wrote:

> On Wed, 10 Apr 2013, Kevin Day wrote:
>
>  Working with an environment where a system (with journaled soft-updates)
>> is going to be notified that it's going to be losing power shortly, and
>> needs to shut down daemons and flush everything to disk. It doesn't
>> actually shutdown though, because the "power down now" command may get
>> cancelled and we need to bring things back up. My understanding was that we
>> could call sync(8), then just wait for the power to drop.
>>
>> The problem is that we were frequently losing the last 30-60 seconds
>> worth of filesystem changes prior to the shutdown. i.e. newly created
>> directories would disappear or fsck would reclaim them and throw them into
>> lost+found.
>>
>> I confirmed that there is no caching disk controller, and write caching
>> is disabled on the drives themselves, and the problem continued.
>>
>> On a whim, after running sync(8) once and waiting 10 seconds, I did
>> "mount -u -o ro -f /" to force the filesystem into read-only mode. It took
>> about 8 seconds to finish, gstat showed a lot of write activity, and
>> SIGINFO on the mount command showed:
>>
>
> sync(2) only schedules all writing of all modified buffers to disk.  Its
> man page even says this.  It doesn't wait for any of the writes to
> complete.
> Its man page says that this is a BUG, but it is intentional and sync() has
> always done this.  There is no way for sync() to guarantee that all
> modified
> buffers have been written to disk when it returns, since even if it waited,
> buffers might be modified while it is returning.  Perhaps even ones that
> would take 8 seconds to complete can be written in the few nanoseconds that
> it takes to return.
>

The behavior of sync(2) is actually filesystem-specific.  sync(8) calls
sync(2), which calls sys_sync, which calls VFS_SYNC, which means the
filesystem determines the exact behavior.

In the case of ZFS, its vfs_sync performs a ZIL commit, which means that
all writes up to that point will be committed to disk prior to returning.

sync(8) is just a wrapper around sync(2).  One that doesn't even check
> for errors.  Not that it could handle sync() failure.  Its man page
> bogusly first claims that it "forces completion".  This is not
> completely wrong, since it doesn't claim that the completion occurs
> before sync(8) exits.  But then it claims that sync(8) is suitable "to
> ensure that all disk writes have been completed in a way not suitably
> done by reboot(8) or halt(8).  This wording is poor, unless it is
> intentionally weaselishly worded so that it doesn't actually claim
> full completion.  It only claims more suitable completion than with
> reboot or halt.  Actually, completion is not guaranteed, and what
> sync(8) provides is just less unsuitable than what reboot and halt
> provide.
>

I think sync(2) should implemented to mean, where possible, the filesystem
equivalent of a CPU memory barrier.  In short, you should be guaranteed
that every write you know you made prior to calling sync, has been
committed to disk.  Writes performed in other contexts do not receive any
such guarantee.

To ensure completion, you have to freeze the file systems of interest
> before rebooting.  I don't know of any ways to do this from userland
> except mount -u -o ro or unmount.
>

This is certainly true, if you want to guarantee that all writes in all
contexts were committed.  But sync(2) could never be useful for that
purpose, for the reasons you mention.

--Will.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADBaqmjqVzSt8BjL7YOoPtxhK0SQTbmvzdzvsQCQZ65JYHq8uw>