Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jun 2009 15:06:06 -0400 (EDT)
From:      vogelke+unix@pobox.com (Karl Vogel)
To:        freebsd-questions@freebsd.org
Subject:   Re: Need a filesystem with "unlimited" inodes
Message-ID:  <20090610190606.6B89ABEDB@kev.msw.wpafb.af.mil>
In-Reply-To: <200906090945.48548.kirk@strauser.com> (message from Kirk Strauser on Tue, 9 Jun 2009 09:45:48 -0500)

next in thread | previous in thread | raw e-mail | index | archive | help
>> On Tue, 9 Jun 2009 03:10:46 am Matthew Seaman wrote:
M> Or store your data in a RDBMS rather than in the filesystem.

>> On Tue, 9 Jun 2009 09:45:48 -0500, Kirk Strauser <kirk@strauser.com> said:
K> Hear, hear.  I'm hard pressed to imagine why you'd need 100M 1KB files.

   DBs are great when you have structured data, but semi-structured text
   (like email) makes for a very poor fit.  To see why, have a look at
   http://www.memoryhole.net/~kyle/databaseemail.html

   If you really need to store 100 million smallish chunks of information,
   consider using zip.  Create 256 folders named 00-ff:

       #!/bin/sh
       hex='0 1 2 3 4 5 6 7 8 9 a b c d e f'
       for x in $hex ; do
           for y in $hex ; do
               mkdir ${x}${y}
           done
       done
       exit 0

   Use the hash of your choice to map the name of each chunk to one of 256
   zipfiles under each directory.  This gives you 64k zipfiles, and if you
   put 1500 or so chunks in each one, you're pretty close to 100 million.

       me% cat mkchunks
       #!/usr/bin/perl -w
       for $chunk (@ARGV) {
           $_ = chunk2file($chunk);
           $file = "$1/$2.zip" if m/(..)(..)/;
           print "$file  $chunk\n";
       }
       exit(0);

       sub chunk2file {
           my $str = shift;
           my ($byte, $sum);
           use integer;
       
           $sum = 0;
           foreach $byte (unpack("C*", $str)) {   # SDBM hash
               $sum = $byte + 65587 * $sum;
           }
           $sum &= 0xffff;    # keep lowest 16 bits
       
           no integer;
           return sprintf("%4.4x", $sum);
       }

       me% ./mkchunks freebsd solaris
       16/f7.zip  freebsd
       ca/1f.zip  solaris

   You'll get a better distribution if you use a hash like Digest::SHA1.

-- 
Karl Vogel                      I don't speak for the USAF or my company

People like you are the reason people like me need medication. --bumper sticker



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090610190606.6B89ABEDB>