Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Dec 2016 17:08:28 -0800
From:      Aleksandr Miroslav <alexmiroslav@gmail.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: zfs (zxfer) replication -- "holes" in backups?
Message-ID:  <CACcSE1znx_3H=71hT_3TOu-tMhWjkm7_sx-nxLQA83iz4aeR6w@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
On Tue, Dec 13, 2016 at 3:44 PM, Aleksandr Miroslav <alexmiroslav@gmail.com>
wrote:
> I'm using zxfer to replicate my ZFS snapshot to another host.
Occasionally,
> for whatever reason, zxfer can't replicate a particular snapshot.
>
> What I find is that later when zxfer tries again, it skips that snapshot
it
> couldn't replicate and sends a newer one. This leaves the back up server
> with with a "hole", i.e. a missing snapshot.

So I think I may have found the reason for this problem I'm seeing...

I'm using a pkg called zfstools to take my snapshots. It takes snapshots
called frequent, hourly, daily, weekly, and monthly.

The frequency of the last 4 you can probably guess. For "frequent", I
take it every 15 minutes -- but not at the top of the hour, that's taken
care of by the hourly snapshot.

My cron looks like this:

    15,30,45 * * * * root zfs-auto-snapshot frequent  4
    0        * * * * root zfs-auto-snapshot hourly    24
    0        0 * * * root zfs-auto-snapshot daily     31
    0        0 * * 7 root zfs-auto-snapshot weekly    4
    0        0 1 * * root zfs-auto-snapshot monthly   48

The number after the name of the snapshot is how many copies I keep
around of that particular snapshot.

You can problem see the problem right away: while I take an hourly
snapshot every hour, and keep 24 copies of it (so that each hourly
snapshot lives for 24 hours), I am only keeping the frequent snapshots
for 4 copies. This means that each frequent snapshot only lives about 75
or 90 minutes max before it is deleted.

Since my replication runs about every hour to my primary replica, and
every 4 hours to another replica, and since the replication takes some
time to run, it happens that a particular frequent snapshot could be
marked for transfer at the start of the replication, but deleted before
it can be transfered. (To be fair, I believe I had seen some errors from
cron to this effect.)

I believe the solution is to increase the number of frequent copies that
are kept, such that each replication run can transfer all the frequent
snapshots that it sees. I will increase this number and see if this
fixes the problem. I will fix the already created holes manually as
well.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACcSE1znx_3H=71hT_3TOu-tMhWjkm7_sx-nxLQA83iz4aeR6w>