Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Apr 2007 07:03:45 -0500
From:      Alexander Anderson <a.anderson@utoronto.ca>
To:        freebsd-questions@freebsd.org
Subject:   Re: ISO Image Size Increasing
Message-ID:  <20070404120345.GA95748@upful.org>
In-Reply-To: <1d3ed48c0704031200w27431474h46a3f482f65b9bfe@mail.gmail.com>
References:  <4342.12.170.206.13.1175622392.squirrel@admintool.trueband.net> <1d3ed48c0704031200w27431474h46a3f482f65b9bfe@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
>>The image copied from the CD is approximately 234 MB in size, and the
>>image created by mkisofs is 664 MB.
> It sounds like you may be running into a hardlink issue with iso9660.

Yes, ISO-9660 file system does not assign the same inode to hard links.

I had to write a Perl script that finds identical files and links them
(see below).
Hope it helps.

===begin hardlink.pl===
#!/usr/bin/perl
#
# $Id: hardlink.pl,v 1.2 2007/03/29 01:20:53 alex Exp $

use File::Find;
use strict;

die "Usage: $0 file ...\n" unless @ARGV;

my %count;
my %files;

find({ wanted => \&wanted, no_chdir => 1 }, @ARGV);

sub wanted {
    next unless -f;
    next if -l;

    print "$_\n";

    my $md5 = `md5 -q $_`;  # shorter than Digest::MD5 (am I lazy)
    chomp $md5;
    $md5 =~ /^[0-9a-f]{32}$/ or die 'md5 failed';

    $count{$md5}++;

    push(@{ $files{$md5} }, $_);
}

for my $md5 (grep { $count{$_} > 1 } keys %count) {
    my @files = @{ $files{$md5} };

    my $source = shift @files;
    for my $target (@files) {
        system("ln -fv $source $target") == 0 or die;
    }
}

__END__

=head1 NAME

hardlink.pl - find copies of files and create hard links instead

=head1 SYNOPSIS

    hardlink.pl file ...

=head1 DESCRIPTION

    Newsgroups: fa.netbsd.tech.kern
    From: Wolfgang Solfrank <w...@tools.de>
    Subject: Re: hard links in mounted cd9660 file system
    Date: Thu, 3 Mar 2005 13:31:42 GMT
    Message-ID: <fa.crmoqrl.l1ik3t@ifi.uio.no>

    Hmm, the problem is that there is no good way to know that two files
    are hardlinks on a 9660 filesystem.  9660 doesn't have a concept of
    inodes as is common in standard unix filesystems.  Instead, the
    information about the file is stored in the directory entry.  This
    means that the two directory entries pointing to the same data blocks
    may in fact describe two different files (e.g. the may have different
    owner or permission, or they may even differ in size!).

    Currently, the inode number shown by 9660 is just the offset of the
    directory entry of the file relative to the disk/partition, with the
    special case for directories, where we use the start of the directory
    itself, i.e. the offset of the '.' entry.  This way, it's quite easy
    to determine the file attributes given the inode number.

=head1 AUTHOR

Alexander Anderson <a.anderson@utoronto.ca>

=cut
===end hardlink.pl===



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070404120345.GA95748>