Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Aug 2005 09:23:55 -0500
From:      Greg Barniskis <nalists@scls.lib.wi.us>
To:        questions@freebsd.org
Cc:        Norberto Meijome <freebsd@meijome.net>
Subject:   Re: rsync and moving files [Re: backup w/ snapshots]
Message-ID:  <43146BFB.4080607@scls.lib.wi.us>
In-Reply-To: <20050830091919.J13913@maren.thelosingend.net>
References:  <20050828234043.H22315@maren.thelosingend.net> <20050829161506.E2522@maren.thelosingend.net> <43131C85.1070100@meijome.net> <20050829170053.M3014@maren.thelosingend.net> <43133BA5.2010608@scls.lib.wi.us> <20050830091919.J13913@maren.thelosingend.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Svein Halvor Halvorsen wrote:
> * Greg Barniskis [2005-08-29 11:45 -0500]
> 
>> Eh? Bad assumptions about snapshots, I think. If a snapshot occupied even a
>> tenth of the space of the data that it represented, we would quickly fill all
>> our disks and the snapshot technology would be almost as painful as useful.
>> 
>> A snapshot is essentially only an index of occupied disk space, not a copy of
>> the actual data, and a snapshot is therefore much, much, much, much smaller
>> than the data files that have changed. Read the relevant man pages and
>> handbook sections again, and test your assumptions by measuring the actual
>> change in snapshot size. I don't think your perceived problem really exists.
> 
> 
> 
> Yes, that's correct! But let's say I keep more than one snapshot around. I 
> maybe didn't mention this, but this the sole purpose of using snapshots; 
> for me to have more full backups laying around.

Ah. That does change things a bit, I guess. A previous post 
indicated file renames and replication followed by taking a new 
snapshot, and I thought it was implied your older snapshots were 
going away.

> If I change the disk alot between snapshots. Eg. I rsync moved files (yes, 
> within tha same fs), this will result in alot of file deletion and 
> creation. Next, when I make the snapshot, a new list of occupied diskspace 
> will be made, and all of these blocks will be marked "in use", and 
> therefore take up alot of diskspace.
> 
> In reality the information change between the two snapshots, didn't change 
> much at all, but the effect remains: my disk cannot longer store two 
> snapshots (unless the backup disk is twice as large, which it is not).
> 
> 
> The solution: Somehow, I need to mirror all the move ops on the remote 
> system before doing the rsync. This could probably be done by making a 
> hash table of inodes/filenames pairs (or triplets, etc) each time i sync. 
> Then the next time, I could compare the old table with the new, to find 
> out which files are the same only with new names, then find those names on 
> the remote system, change them to the new ones, and then rsyncing. If the 
> inodes are recycled for brand new files between syncs, I don't think that 
> would be a problem. The following rsync-job would recognize the diffs and 
> sync that, which it would have done anyway, if the file is new.
> 
> 
> What do you think?

This is admittedly beyond my ken, at least within the limited number 
of brain cycles I can offer to the problem. Hopefully someone else 
will provide clues for you. Personally, I think you're violating the 
KISS principle unless there's a really compelling need to keep your 
previous file system states accessible online. Dumping older states 
to offline media and reclaiming that space would be my first order 
of business, but that's just me. Or just buy some whopping big disks 
appropriate to the task, since that's generally cheaper than admin 
time to create workarounds (unless you just consider this fun =).

Good luck,

-- 
Greg Barniskis, Computer Systems Integrator
South Central Library System (SCLS)
Library Interchange Network (LINK)
<gregb at scls.lib.wi.us>, (608) 266-6348



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43146BFB.4080607>