Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jul 2012 07:57:54 -0700
From:      CH <freebsd-fs@ch.pkts.ca>
To:        Kai Gallasch <gallasch@free.de>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Can you list internal checksums of a ZFS filesystem?
Message-ID:  <20120718075754.4908266b@kirk.lan>
In-Reply-To: <6D778EEA-5B8F-4F59-B198-E5B098F3AE2C@free.de>
References:  <20120717152629.42e0641e@fedora14-x86-64.shechinah.mi.microbiology.ubc.ca> <6D778EEA-5B8F-4F59-B198-E5B098F3AE2C@free.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 18 Jul 2012 08:43:57 +0200
Kai Gallasch <gallasch@free.de> wrote:

> Am 18.07.2012 um 00:26 schrieb CH:
> > 
> > Hello list,
> > 
> > I'm moving data to a ZFS filesystem, and it's a ton of big files
> > (more than 3 terabytes).  I don't trust the network copy command
> > completely, and so I'd like to compare checksums.  I'm not looking
> > forward to it, since it's going to be a slow process, especially if
> > I can't run the command on the server. 
> 
> You could use rsync for transfering the data.
> 
> According to its man page rsync calculates checksums for transfered
> files and on its initial run compares checksums on the sending and
> receiving side for each file:
> 
> http://www.freebsd.org/cgi/man.cgi?query=rsync&apropos=0&sektion=0&manpath=FreeBSD+Ports&arch=default&format=html
 
> <rsync's -c option detailed> 
> 
>   So at the first run starting rsync without -c switch and on a
> second run with -c should be quite sufficient for making sure, data
> has not changed after being transfered. (Except of course, the
> underlying filesystem layers lie about this to the application or a
> wrongly implemented MD5 in rsync :-)
> 
> Also rsync makes it possible to transfer the data in severeal runs,
> at times most convenient to you (or your network). It also supports a
> switch for limiting bandwith usage..
> 
> Have a nice day,
>  Kai.

Actually, I did do rsync for the initial transfers, and it had to be
restarted a couple of times for reasons that were not its fault (source
computer rebooted, ssh connection lost, etc).  However, after it
finished copying everything (ie: exiting normally), I ran it again, and
it found more stuff to copy.  This shouldn't have happened since
nothing was added to the source computer, and so now I distrust its
results and want to check it independently.  In particular, I don't
trust its directory-walking algorithm, so some files may have been
missed and may continue to be missed in future runs of rsync, with or
without -c.

The method I was going to use was 'find . -type f -print0 | xargs -0
md5sum > my.big.md5sum.file' on both source and destination, but if I
can harvest the ZFS checksums (file or block) it would cut the cpu
workload in half, and save a tree's worth of energy.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120718075754.4908266b>