Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 Jul 2001 19:18:17 -0600 (CST)
From:      Ryan Thompson <ryan@sasknow.com>
To:        Philip Murray <webmaster@open2view.com>
Cc:        freebsd-questions@FreeBSD.ORG
Subject:   Re: Many directories or many files?
Message-ID:  <Pine.BSF.4.21.0107151903060.61534-100000@ren.sasknow.com>
In-Reply-To: <002101c10d27$d69b1d60$0300a8c0@sparlak>

next in thread | previous in thread | raw e-mail | index | archive | help
Philip Murray wrote to freebsd-questions@FreeBSD.ORG:

> Hi,
> 
> I have a large library of photos (>100,000, Real Estate) and at the
> moment I'm storing them in a big dump split up by a few directories.
> 
> I'm getting to the point where I have ~8000 files per directory. I was
> wondering whether I should write some kind of hashing function and
> have lots and lots of directories, or whether it's best to have more
> files per directory?

It pays to split up the directories, perhaps into chunks of a few hundred.
Depending on your requirements, you might get away with something as
braindead simple as this old function:

a/
	a/	b/	c/	...	z/
b/
	a/	b/	c/	...	z/
.
.
z/
	a/	b/	c/	...	z/


Of course, you don't get an even distribution that way, and you end up
creating 26^2 = 676 directories, but this is simple enough that a shell
script can figure it out, and probably "fast enough" for nearly all
applications. If these files are named in English, you'll naturally have
some directories (/t/h/, /z/x/) with a disproportionate number of files,
but, on average, you've got about 150 links per directory.

So, if you're REALLY going for efficiency, you may not want to follow the
above approach (but then you're getting a lot more complex with hash
distributions, etc). An approach of this fashion WILL, however, reduce the
overall directory search time required, and I think that will help your
situation.


> Also, if anyone knows of a Free image/media storage system that I can
> use, that would be wonderful.
> 
> I also looked into storing them in a mysql database, but I'm pretty
> sure it couldn't handle it.

Well, it's not impossible... 100,000 x 20k per image(high estimate?) is
under 2GB. Put the pictures in their own table and use a foreign key to
connect to your property description table (or whatever). Depending on how
you deploy this, though, you could waste more time than you gain if you're
creating a new database connection for every request, and don't forget
about the memory/disk IO required to do the SELECT, return the data, have
your program read it, and send them through CGI to the web server. With
adequate caching, serving files from disk is almost always faster.

> 
> Cheers
> 
> -------------------------------- -  -- -  -   -
> Philip Murray - webmaster@open2view.com
> http://www.open2view.com - Open2View.com
> ------------- -  -- -   -
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-questions" in the body of the message
> 

-- 
  Ryan Thompson <ryan@sasknow.com>
  Network Administrator, Accounts

  SaskNow Technologies - http://www.sasknow.com
  #106-380 3120 8th St E - Saskatoon, SK - S7H 0W2

        Tel: 306-664-3600   Fax: 306-664-1161   Saskatoon
  Toll-Free: 877-727-5669     (877-SASKNOW)     North America


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0107151903060.61534-100000>