Date: Sun, 15 Jul 2001 19:18:17 -0600 (CST) From: Ryan Thompson <ryan@sasknow.com> To: Philip Murray <webmaster@open2view.com> Cc: freebsd-questions@FreeBSD.ORG Subject: Re: Many directories or many files? Message-ID: <Pine.BSF.4.21.0107151903060.61534-100000@ren.sasknow.com> In-Reply-To: <002101c10d27$d69b1d60$0300a8c0@sparlak>
next in thread | previous in thread | raw e-mail | index | archive | help
Philip Murray wrote to freebsd-questions@FreeBSD.ORG: > Hi, > > I have a large library of photos (>100,000, Real Estate) and at the > moment I'm storing them in a big dump split up by a few directories. > > I'm getting to the point where I have ~8000 files per directory. I was > wondering whether I should write some kind of hashing function and > have lots and lots of directories, or whether it's best to have more > files per directory? It pays to split up the directories, perhaps into chunks of a few hundred. Depending on your requirements, you might get away with something as braindead simple as this old function: a/ a/ b/ c/ ... z/ b/ a/ b/ c/ ... z/ . . z/ a/ b/ c/ ... z/ Of course, you don't get an even distribution that way, and you end up creating 26^2 = 676 directories, but this is simple enough that a shell script can figure it out, and probably "fast enough" for nearly all applications. If these files are named in English, you'll naturally have some directories (/t/h/, /z/x/) with a disproportionate number of files, but, on average, you've got about 150 links per directory. So, if you're REALLY going for efficiency, you may not want to follow the above approach (but then you're getting a lot more complex with hash distributions, etc). An approach of this fashion WILL, however, reduce the overall directory search time required, and I think that will help your situation. > Also, if anyone knows of a Free image/media storage system that I can > use, that would be wonderful. > > I also looked into storing them in a mysql database, but I'm pretty > sure it couldn't handle it. Well, it's not impossible... 100,000 x 20k per image(high estimate?) is under 2GB. Put the pictures in their own table and use a foreign key to connect to your property description table (or whatever). Depending on how you deploy this, though, you could waste more time than you gain if you're creating a new database connection for every request, and don't forget about the memory/disk IO required to do the SELECT, return the data, have your program read it, and send them through CGI to the web server. With adequate caching, serving files from disk is almost always faster. > > Cheers > > -------------------------------- - -- - - - > Philip Murray - webmaster@open2view.com > http://www.open2view.com - Open2View.com > ------------- - -- - - > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-questions" in the body of the message > -- Ryan Thompson <ryan@sasknow.com> Network Administrator, Accounts SaskNow Technologies - http://www.sasknow.com #106-380 3120 8th St E - Saskatoon, SK - S7H 0W2 Tel: 306-664-3600 Fax: 306-664-1161 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0107151903060.61534-100000>