Date: Wed, 28 Apr 2004 00:40:17 -0700 (PDT) From: Bruce Evans <bde@zeta.org.au> To: freebsd-bugs@FreeBSD.org Subject: Re: bin/53475: cp(1) copies files in reverse order to destination Message-ID: <200404280740.i3S7eHcG049167@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR bin/53475; it has been noted by GNATS. From: Bruce Evans <bde@zeta.org.au> To: "Dorr H. Clark" <dclark@applmath.scu.edu> Cc: freebsd-gnats-submit@freebsd.org Subject: Re: bin/53475: cp(1) copies files in reverse order to destination Date: Wed, 28 Apr 2004 17:37:25 +1000 (EST) On Tue, 27 Apr 2004, Dorr H. Clark wrote: > ... > -/* > - * mastercmp -- > - * The comparison function for the copy order. The order is to > copy > - * non-directory files before directory files. The reason for this > - * is because files tend to be in the same cylinder group as their > - * parent directory, whereas directories tend not to be. Copying > the > - * files first reduces seeking. > - */ According to cp -pRv, mastercmp() gets this perfectly backwards: cp actually copies directories first. It seems to just randomize the order of regular files; this is presumably because mastercmp() doesn't distinguish between all pairs of different files and qsort() doesn't preserve the original order. > ... > As quoted above, the comments in cp.c tell us the function > mastercmp() is an attempt to improve performance based on > knowing something about physical disks. > > This is an old optimization strategy (it's in the original > version of cp.c). AFAIK, in the updated BSD filesystem, > when we copy a file, we don't actually move the > physical data block of the file but change the information in its > inode such as the address of its data block and owner. Copying still involves lots of physical i/o. The difference in relatively recent versions of ffs is that it doesn't scatter the files so much by switching the cylinder group too often. IIRC, it switched for every directory. > The next question is whether deleting mastercmp eliminates > an optimization. Our testing shows the exact opposite, > mastercmp is degrading performance. We did several experiments > with cp -R to measure elapsed time on transfers between devices > of differing file system types (to avoid UFS2 optimizations). > Our results show removing mastercmp yields a small performance > gain (note: we had no SCSI devices available, and second note: > variability in file system performance seems dominated > by other factors). It would be interesting to know if mastercmp() works better if it does what its comment says it does. I suspect that the backwardsness doesn't make much difference, but is worse than it used to be because there is now more competition for space in the same cylinder group. I think benchmarks that don't descend into subdirs would show that using mastercmp really is an optimization for that access pattern, but I think that access pattern is relatively unusual. Optimizing for the default fts order seems as good as anything. > M. K. McKusick has indicated in seminars that modern disk drives > lie to the driver about their physical layouts. The use of > mastercmp in cp.c is a legacy optimization from a different > era of disk technology. We recommend removing this call > from cp.c to address 53475. Large seeks (especially ones larger than the drive's cache) still matter, and I think drivers rarely lie about these. cp's attempted optimization is more about second-guessing what ffs does. I agree that it shouldn't do this. The file system might not even be ffs. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200404280740.i3S7eHcG049167>