Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Jul 2013 00:33:33 -0700
From:      Jeremy Chadwick <jdc@koitsu.org>
To:        Will Andrews <will@firepipe.net>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze?
Message-ID:  <20130703073333.GA57318@icarus.home.lan>
In-Reply-To: <CADBaqmihCB5JP01hLwXTWHoZiJJ5-jkT-Ro=oDwOcKZT_zvEKA@mail.gmail.com>
References:  <87li5o5tz2.wl%berend@pobox.com> <CA%2BtpaK1jQuKneQsxkVfxJGzXdPdLZfqBM1QWQ0e19nK5t71t1Q@mail.gmail.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <CADBaqmihCB5JP01hLwXTWHoZiJJ5-jkT-Ro=oDwOcKZT_zvEKA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 03, 2013 at 12:53:13AM -0600, Will Andrews wrote:
> On Wednesday, July 3, 2013, Kevin Day wrote:
>=20
> > The closest thing we can do in FreeBSD is to unmount the filesystem, =
take
> > the snapshot, and remount. This has the side effect of closing all op=
en
> > files, so it's not really an alternative.
> >
> > The other option is to not freeze the filesystem before taking the
> > snapshot, but again you risk leaving things in an inconsistent state,
> > and/or the last few writes you think you made didn't actually get com=
mitted
> > to disk yet. For automated systems that create then clone filesystems=
 for
> > new VMs, this can be a big problem. At best, you're going to get a wa=
rning
> > that the filesystem wasn't cleanly unmounted.
> >
>=20
> Actually, sync(2)/sync(8) will do the job on ZFS. It won't stop/pause I=
/O
> running in other contexts, but it does guarantee that any commands you =
ran
> and completed prior to calling sync will make it to disk in ZFS.
>=20
> This is because sync in ZFS is implemented as a ZIL commit, so transact=
ions
> that haven't yet made it to disk via the normal syncing context will at
> least be committed via their ZIL blocks. Which can then be replayed whe=
n
> the pool is imported later, in this case from the EBS snapshots.
>=20
> And since the entire tree from the =C3=BCberblock down in ZFS is COW, y=
ou can't
> get an inconsistent pool simply by doing a virtual disk snapshot,
> regardless of how that is implemented.

I'm a little confused about this statement, particularly as a result of
this thread (read the entire thing time permitting):

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html

UFS is what's being discussed there, but there are some blanket
statements (maybe I'm taking them out of context, not entirely sure)
made by Bruce there that seem to imply that sync(2) may not actually
flush all memory buffers to disk when issued, only that they're
"scheduled" to be flushed.

The part that's confusing to me is this part of your paragraph:

> This is because sync in ZFS is implemented as a ZIL commit, so transact=
ions
> that haven't yet made it to disk via the normal syncing context will at
> least be committed via their ZIL blocks.  ...

What confuses me about this is that it implies these "ZIL block commits"
(I/O writes of a certain type) are somehow being done outside of a
normal I/O write (e.g. "normal syncing context").

To me this indicates ZFS is somehow able to tell the underlying storage
subsystem driver to (speaking ATA here because it's what I'm familiar
with) issue WRITE DMA EXT (0x35) or the NCQ equivalent, followed
immediately by FLUSH CACHE EXT (0xea)?  My understanding of the latter
was that it was accomplished via BIO_FLUSH.

Looking at sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
it seems there are BIO_FLUSH handlers in place (at the GEOM level).

So all this makes me wonder: why exactly does sync(2) result in
different behaviour on UFS than it does on ZFS?  Do both of these
filesystems not use BIO_write() and friends?  Does sync(2) not simply
iterate over all the queued BIO_write()s and BIO_FLUSH them all?

Sorry if I'm overthinking this or missing something, but I just don't
understand why sync(2) would flush stuff to disk with one filesystem but
not another.

--=20
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130703073333.GA57318>