Date: Wed, 18 Jul 2012 07:57:54 -0700 From: CH <freebsd-fs@ch.pkts.ca> To: Kai Gallasch <gallasch@free.de> Cc: freebsd-fs@freebsd.org Subject: Re: Can you list internal checksums of a ZFS filesystem? Message-ID: <20120718075754.4908266b@kirk.lan> In-Reply-To: <6D778EEA-5B8F-4F59-B198-E5B098F3AE2C@free.de> References: <20120717152629.42e0641e@fedora14-x86-64.shechinah.mi.microbiology.ubc.ca> <6D778EEA-5B8F-4F59-B198-E5B098F3AE2C@free.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 18 Jul 2012 08:43:57 +0200 Kai Gallasch <gallasch@free.de> wrote: > Am 18.07.2012 um 00:26 schrieb CH: > > > > Hello list, > > > > I'm moving data to a ZFS filesystem, and it's a ton of big files > > (more than 3 terabytes). I don't trust the network copy command > > completely, and so I'd like to compare checksums. I'm not looking > > forward to it, since it's going to be a slow process, especially if > > I can't run the command on the server. > > You could use rsync for transfering the data. > > According to its man page rsync calculates checksums for transfered > files and on its initial run compares checksums on the sending and > receiving side for each file: > > http://www.freebsd.org/cgi/man.cgi?query=rsync&apropos=0&sektion=0&manpath=FreeBSD+Ports&arch=default&format=html > <rsync's -c option detailed> > > So at the first run starting rsync without -c switch and on a > second run with -c should be quite sufficient for making sure, data > has not changed after being transfered. (Except of course, the > underlying filesystem layers lie about this to the application or a > wrongly implemented MD5 in rsync :-) > > Also rsync makes it possible to transfer the data in severeal runs, > at times most convenient to you (or your network). It also supports a > switch for limiting bandwith usage.. > > Have a nice day, > Kai. Actually, I did do rsync for the initial transfers, and it had to be restarted a couple of times for reasons that were not its fault (source computer rebooted, ssh connection lost, etc). However, after it finished copying everything (ie: exiting normally), I ran it again, and it found more stuff to copy. This shouldn't have happened since nothing was added to the source computer, and so now I distrust its results and want to check it independently. In particular, I don't trust its directory-walking algorithm, so some files may have been missed and may continue to be missed in future runs of rsync, with or without -c. The method I was going to use was 'find . -type f -print0 | xargs -0 md5sum > my.big.md5sum.file' on both source and destination, but if I can harvest the ZFS checksums (file or block) it would cut the cpu workload in half, and save a tree's worth of energy.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120718075754.4908266b>